|
TheDataWebA collaboration between the U.S. Census Bureau and the Centers for Disease Control |
|
INSIDE TheDataWeb:
TheDataWeb Home What is TheDataWeb DataFerrett Home What is DataFerrett TheDataWeb Browser: DataFerrett Datasets Available TheDataWeb Services TheDataWeb Publisher & Server Setup FAQ TheDataWeb HelpDesk: Toll Free: 866-437-0171 DataFerrettTeam Email: for Comments, Questions, or Errors |
|
MIF File FormatThe following describes the layout of the DataFerrett Metadata Interface File (MIF). (View a sample MIF.) This file is used to populate the DataFerrett database metadata repository, and to create the internal description files. These can then be used to create a data dictionary, either complete or customized (through DataFerrett). It also is used to drive the information passed to the users through the DataFerrett front end. View some illustrations of how the MIF information is used by DataFerrett. Note that the delimiters (tokens) should start in column one followed immediately by at least one space. The delimiters surrounded by colons allow for multiple line entries without having the delimiter at the beginning of each line. A MIF file is split up into segments. Each segment contains a different type of metadata. The first segment is the Dataset Level Metadata segment and is the only segment that must precede any others. Each token for Dataset Level Metadata contain two letters begriming with an S. The following are valid Dataset Level tokens. The tag [Optional] at the end of the token description indicates an optional token. VER 1.0 SC Component Description
(limit 60 characters)
SL Dataset long name
(limit 255 characters)
SS Dataset short name
(limit 12 characters)
ST Dataset time frame or version date
(must have start date and stop date,
e.g. Jan 2000:Jan 2000 or 2001:2001)
SD Dataset data category
(1=microdata, 2=aggregate data)
SZ Display type
(1=normal,
2=inverted component and instance)
SA Tabulation machine name/IP address and
port (e.g. www.ferret.census.gov:4506
or 148.129.129.2 - if no port is given,
the default is 4505)
SX Extraction machine name/IP address
and port
SI Inherited Component [Optional]
(Must use an existing component's
description)
SB Subsurvey Name [Optional]
(limit 255 characters)
SU GIF URL for the dataset [Optional]
(fully qualified URL, e.g.
http://www.name.com/image/example.gif)
The remaining segments can come in any order throughout the rest of the MIF file. The possible segments are: Global, New, Update, Timeframe, or Stop. The Global segment contains item level metadata tokens and should appear at the top of the MIF file after the Dataset Level segment, although this is not required. A global value can be changed within the file by entering a new global value at the point in which the new value should begin. Also, global values are overridden by an individual value for any specific item. The following tokens are valid Global tokens and the definitions can be found the item-level metadata section below (GC is the C token, etc): GCGT GW GX GY GZ GI GN The remaining four segments describe the operation to be performed on item-level metadata contained within the segment. Each segment is preceded by the GO global token which indicates the Operation the segment should perform. The following operations are valid values for the GO token:
Item-level metadata appears between operation tokens (GO) and must begin with the M token to indicate a new item-level variable is begriming. The definition of a variable is complete when the ingestion system finds either another M token, any global token, or the end-of-file, whichever comes first.
M Item (variable) name or mnemonic.
S Short description or English label
(limit 60 characters, cannot contain
quotation marks)
C Concept or topic label
T Time of item (when it began),
e.g. Jan 1994 for January 1994,
and when it ended (if it has,
if it continues into the future,
there is no stopdate),
e.g. Jan 1994:Jun 1994
W Suggested weight variable name1
(e.g. BASEWGT), OR
Yes (if item IS a weight), OR
NONE (if there is no suggested weight)
X Security Level, use entire word
as follows:
Public
Sponsor
Y Variable type abbreviation as follows:
E = Edited
U = Unedited
W = Weighting
R = Recode
X = Allocation flag
T = Topcoded
S = Sample Control
G = Geography
P = Replicate Weights
N = Public Use
Z Data type abbreviation as follows:
B = Binary (numeric)
Cx = Character (user defines x to be
the length of field from 1 to 255)
T = Military time (HH:MM)
Ix.y = Implied decimal
(user defines the x to be the
total length of value including
the decimal and y is the number
of digits to the right of decimal
For example:
I10.4 = Implied decimal
(5 digits to the left and 4
digits to right of decimal)
I5.2 = Implied decimal
(2 digits to left and
2 to right)
Note: the value line should then
contain minimum and maximum value
with a decimal, e.g., for Z I5.2,
the value line should be
V 0.00:99.99
Fx.y = Floating point with precision
as described for Implied decimals
N Unit type abbreviation as follows: 1
ABS = Absolute number
AVG = Average
DOL = Dollars
MIN = Minutes
PCT = Percent
SQM = Square miles
TH$ = Thousands of dollars
RTE = Rate
V Value with description.
(limit 100 characters) Each value
line should have a V at the beginning,
but DO NOT put a V at the beginning
of a line if the description wraps to
the next line.
V 1 Male
V 2 Female
or for a continuous range variable,
the minimum and maximum values MUST be
separated by a colon:
V -1 Blank
V 0:99 Years
or for a continuous range with
decimals (e.g. Z I10.4):
V 0.0000:99999.9999
IF the data contains a blank,
the value should be defined exactly as:
V Blank
The following item is optional, but strongly recommended for items that are not allocation flags or topcoded items: :L: Long description. There may be a multiple line description. There must be a :L: on the lines before and after the description. :L: The following items are optional: P CD-ROM or ascii file data start
and end positions, e.g. P 15 16
U Universe description
(Universe descriptions MUST follow
Long description for an item.)
:A: Attachment type
(e.g. Edit Specs, Recode Specs,
Instrument Specs, Sampling, User
Note,etc.) followed by the URL of
the text, beginning on the next
line, e.g. http://www.census.gov/
mydir/myfile.htm (Please note:
there is no :A: line after
the URL line.)
B Synonyms(Multiple words should either be
be listed separately, or comma
delimited).
(e.g. B men
B boy
B gender
B women
B girl
or
B men, boy, gender, women, girl)
I Iteration group size for longitudinal
data (i.e., variable repeats 12 times,
then 12 would be the group size).
_____________________________
Contact: (whazard@census.gov) Bill Hazard-Census/DSD/SMPB Last modified: March 29, 2005 |
|
GET & LEARN: Download DataFerrett DataFerrett Users' Guide Microdata Tutorial Longitudinal Tutorial Aggregate Data Tutorial DataSet Topics |