DataFerrett.census.gov link

TheDataWeb  

A collaboration between the U.S. Census Bureau and the Centers for Disease Control     
 INSIDE TheDataWeb:

 TheDataWeb Home

 What is TheDataWeb

 DataFerrett Home

 What is DataFerrett

 TheDataWeb  Browser: DataFerrett

 Datasets Available

 TheDataWeb Services

 TheDataWeb  Publisher & Server  Setup

 FAQ

 TheDataWeb HelpDesk:
 Toll Free: 866-437-0171

 DataFerrettTeam Email:
 dsd_ferrett@census.gov

 Use our Online Form
 for Comments,
 Questions, or Errors

MIF File Format

The following describes the layout of the DataFerrett Metadata Interface File (MIF). (View a sample MIF.) This file is used to populate the DataFerrett database metadata repository, and to create the internal description files. These can then be used to create a data dictionary, either complete or customized (through DataFerrett). It also is used to drive the information passed to the users through the DataFerrett front end. View some illustrations of how the MIF information is used by DataFerrett.

Note that the delimiters (tokens) should start in column one followed immediately by at least one space. The delimiters surrounded by colons allow for multiple line entries without having the delimiter at the beginning of each line.

A MIF file is split up into segments. Each segment contains a different type of metadata. The first segment is the Dataset Level Metadata segment and is the only segment that must precede any others. Each token for Dataset Level Metadata contain two letters begriming with an S. The following are valid Dataset Level tokens. The tag [Optional] at the end of the token description indicates an optional token.

VER 1.0
(This denotes the version of the metadata loading system. This token MUST be the first token in the MIF and the version is currently 1.0)

SC  Component Description 
    (limit 60 characters)
SL  Dataset long name 
    (limit 255 characters)
SS  Dataset short name 
    (limit 12 characters)
ST  Dataset time frame or version date 
    (must have start date and stop date, 
    e.g. Jan 2000:Jan 2000 or 2001:2001)
SD  Dataset data category 
    (1=microdata, 2=aggregate data)
SZ  Display type 
    (1=normal, 
    2=inverted component and instance)
SA  Tabulation machine name/IP address and  
    port (e.g. www.ferret.census.gov:4506 
    or 148.129.129.2 - if no port is given, 
    the default is 4505)
SX  Extraction machine name/IP address 
	and port 
SI  Inherited Component [Optional] 
    (Must use an existing component's 
    description)
SB  Subsurvey Name [Optional] 
    (limit 255 characters)
SU  GIF URL for the dataset [Optional] 
    (fully qualified URL, e.g. 
    http://www.name.com/image/example.gif)

The remaining segments can come in any order throughout the rest of the MIF file. The possible segments are: Global, New, Update, Timeframe, or Stop.

The Global segment contains item level metadata tokens and should appear at the top of the MIF file after the Dataset Level segment, although this is not required. A global value can be changed within the file by entering a new global value at the point in which the new value should begin. Also, global values are overridden by an individual value for any specific item. The following tokens are valid Global tokens and the definitions can be found the item-level metadata section below (GC is the C token, etc):

GC
GT
GW
GX
GY
GZ
GI
GN

The remaining four segments describe the operation to be performed on item-level metadata contained within the segment. Each segment is preceded by the GO global token which indicates the Operation the segment should perform. The following operations are valid values for the GO token:

  • GO NEW
      The variable is inserted into the repository. An error occurs if a variable with the same mnemonic and timeframe exists.
  • GO UPDATE
    • Updates everything except the timeframe for an existing variable. If the timeframe (T) token is specified, it is ignored by the UPDATE operation. An error occurs if the variable does not exist.
  • GO TIMEFRAME
    • Performs the same function as UPDATE but modifies the timeframe as well. Do not use this operation unless you need to modify a variable's timeframe (T). The end timeframe (E) token is required in addition to item-level metadata as specified below.
  • GO STOP
    • Stops a variable. Only the mnemonic (M), and end timeframe (E) token are specified.

    Item-level metadata appears between operation tokens (GO) and must begin with the M token to indicate a new item-level variable is begriming. The definition of a variable is complete when the ingestion system finds either another M token, any global token, or the end-of-file, whichever comes first.

    M  Item (variable) name or mnemonic. 
    S  Short description or English label 
       (limit 60 characters, cannot contain 
       quotation marks)
    C  Concept or topic label
    T  Time of item (when it began),
       e.g. Jan 1994 for January 1994, 
       and when it ended (if it has,
       if it continues into the future, 
       there is no stopdate),
       e.g. Jan 1994:Jun 1994
    W  Suggested weight variable name1 
       (e.g. BASEWGT), OR
       Yes (if item IS a weight), OR
       NONE (if there is no suggested weight)
    X  Security Level, use entire word 
       as follows:
       Public
       Sponsor
    Y  Variable type abbreviation as follows:
       E = Edited
       U = Unedited
       W = Weighting
       R = Recode
       X = Allocation flag
       T = Topcoded
       S = Sample Control
       G = Geography
       P = Replicate Weights
       N = Public Use
    Z  Data type abbreviation as follows:
       B = Binary (numeric)
       Cx = Character (user defines x to be 
            the length of field from 1 to 255)
       T = Military time (HH:MM)
       Ix.y = Implied decimal 
              (user defines the x to be the   
              total length of value including   
              the decimal and y is the number  
              of digits to the right of decimal
       For example:
       I10.4 = Implied decimal 
               (5 digits to the left and 4  
               digits to right of decimal)
       I5.2 = Implied decimal 
              (2 digits to left and 
    		  2 to right)
    Note: the value line should then  
          contain minimum and maximum value   
          with a decimal, e.g., for Z I5.2, 
    	  the value line should be 
    	  V 0.00:99.99
    Fx.y = Floating point with precision  
           as described for Implied decimals
    N  Unit type abbreviation as follows: 1
       ABS = Absolute number
       AVG = Average
       DOL = Dollars
       MIN = Minutes
       PCT = Percent
       SQM = Square miles
       TH$ = Thousands of dollars
       RTE = Rate
    V  Value with description. 
       (limit 100 characters) Each value  
       line should have a V at the beginning, 
       but DO NOT put a V at the beginning 
       of a line if the description wraps to 
       the next line.
       V  1 Male
       V  2 Female
       or for a continuous range variable,  
       the minimum and maximum values MUST be 
       separated by a colon:
       V  -1 Blank
       V  0:99 Years
       or for a continuous range with  
       decimals (e.g. Z I10.4):
       V  0.0000:99999.9999
       IF the data contains a blank, 
       the value should be defined exactly as:
       V  Blank
    

    The following item is optional, but strongly recommended for items that are not allocation flags or topcoded items:

    :L:
    Long description. There may be a multiple 
    line description. There must be a :L: on 
    the lines before and after the description.
    :L:
    

    The following items are optional:

    P  CD-ROM or ascii file data start  
       and end positions, e.g. P 15 16
    U  Universe description 
       (Universe descriptions MUST follow 
       Long description for an item.)
    :A:  Attachment type 
         (e.g. Edit Specs, Recode Specs,  
         Instrument Specs, Sampling, User  
         Note,etc.) followed by the URL of 
         the text, beginning on the next 
    	 line, e.g. http://www.census.gov/
         mydir/myfile.htm (Please note:  
         there is no :A: line after 
    	 the URL line.)
    B  Synonyms(Multiple words should either be 
       be listed separately, or comma 
       delimited).
       (e.g. B men
             B boy
             B gender
             B women
             B girl
       or
             B men, boy, gender, women, girl)
    I  Iteration group size for longitudinal 
       data (i.e., variable repeats 12 times, 
       then 12 would be the group size).
       
    _____________________________
      1When entering new variables, please place variables used as Suggested Weight variables with their corresponding information at the top of the file.

    Contact: (whazard@census.gov) Bill Hazard-Census/DSD/SMPB

    Last modified: March 29, 2005
    GET & LEARN:

    Download
    DataFerrett


    DataFerrett Users' Guide

    Microdata
    Tutorial


    Longitudinal
    Tutorial


    Aggregate
    Data Tutorial


    DataSet
    Topics


    What is DataFerrett | Install DataFerrett | Install Permission | Users' Guide | GoTo DataFerrett