DataFerrett Home
What is DataFerrett
Datasets Available
FAQs
DataFerrett HelpDesk:
Toll Free: 866-437-0171
DataFerrettTeam Email:
dsd_ferrett@census.gov
DataFerrett Video
|
 |
MIF File Format
The following describes the layout of the DataFerrett Metadata Interface
File (MIF). (View
a sample MIF.) This file is used to populate the DataFerrett database metadata
repository, and to create the internal description files. These can then
be used to create a data dictionary, either complete or customized (through
DataFerrett). It also is used to drive the information passed to the users
through the DataFerrett front end. View
some illustrations of how the MIF information is used by DataFerrett.
Note that the delimiters (tokens) should start in column
one followed immediately by at least one space. The delimiters surrounded
by colons allow for multiple line entries without having the delimiter
at the beginning of each line.
A MIF file is split up into segments. Each segment contains a different
type of metadata. The first segment is the Dataset Level Metadata segment
and is the only segment that must precede any others. Each token for Dataset
Level Metadata contain two letters begriming with an S. The following are
valid Dataset Level tokens. The tag [Optional] at the end of the token description
indicates an optional token.
VER 1.0
(This denotes the version of the metadata loading system.
This token MUST be the first token in the MIF and the version is
currently 1.0)
SC Component Description
(limit 60 characters)
SL Dataset long name
(limit 255 characters)
SS Dataset short name
(limit 12 characters)
ST Dataset time frame or version date
(must have start date and stop date,
e.g. Jan 2000:Jan 2000 or 2001:2001)
SD Dataset data category
(1=microdata, 2=aggregate data)
SZ Display type
(1=normal,
2=inverted component and instance)
SA Tabulation machine name/IP address and
port (e.g. www.ferret.census.gov:4506
or 148.129.129.2 - if no port is given,
the default is 4505)
SX Extraction machine name/IP address
and port
SI Inherited Component [Optional]
(Must use an existing component's
description)
SB Subsurvey Name [Optional]
(limit 255 characters)
SU GIF URL for the dataset [Optional]
(fully qualified URL, e.g.
http://www.name.com/image/example.gif)
The remaining segments can come in any order throughout the rest of the
MIF file. The possible segments are: Global, New, Update, Timeframe, or
Stop.
The Global segment contains item level metadata tokens and should appear
at the top of the MIF file after the Dataset Level segment, although this
is not required. A global value can be changed within the file by entering
a new global value at the point in which the new value should begin. Also,
global values are overridden by an individual value for any specific item.
The following tokens are valid Global tokens and the definitions can be
found the item-level metadata section below (GC is the C token, etc):
GC
GT
GW
GX
GY
GZ
GI
GN
The remaining four segments describe the operation to be performed on item-level
metadata contained within the segment. Each segment is preceded by the
GO global token which indicates the Operation the segment should perform. The
following operations are valid values for the GO token:
GO NEW
The variable is inserted into the repository. An error occurs if a
variable with the same mnemonic and timeframe exists.
GO UPDATE
Updates everything except the timeframe for an existing variable. If
the timeframe (T) token is specified, it is ignored by the UPDATE operation.
An error occurs if the variable does not exist.
GO TIMEFRAME
Performs the same function as UPDATE but modifies the timeframe as
well. Do not use this operation unless you need to modify a variable's
timeframe (T). The end timeframe (E) token is required in addition to item-level
metadata as specified below.
GO STOP
Stops a variable. Only the mnemonic (M), and end timeframe (E) token are
specified.
Item-level metadata appears between operation tokens (GO) and must begin
with the M token to indicate a new item-level variable is begriming. The
definition of a variable is complete when the ingestion system finds either
another M token, any global token, or the end-of-file, whichever comes
first.
M Item (variable) name or mnemonic.
S Short description or English label
(limit 60 characters, cannot contain
quotation marks)
C Concept or topic label
T Time of item (when it began),
e.g. Jan 1994 for January 1994,
and when it ended (if it has,
if it continues into the future,
there is no stopdate),
e.g. Jan 1994:Jun 1994
W Suggested weight variable name1
(e.g. BASEWGT), OR
Yes (if item IS a weight), OR
NONE (if there is no suggested weight)
X Security Level, use entire word
as follows:
Public
Sponsor
Y Variable type abbreviation as follows:
E = Edited
U = Unedited
W = Weighting
R = Recode
X = Allocation flag
T = Topcoded
S = Sample Control
G = Geography
P = Replicate Weights
N = Public Use
Z Data type abbreviation as follows:
B = Binary (numeric)
Cx = Character (user defines x to be
the length of field from 1 to 255)
T = Military time (HH:MM)
Ix.y = Implied decimal
(user defines the x to be the
total length of value including
the decimal and y is the number
of digits to the right of decimal
For example:
I10.4 = Implied decimal
(5 digits to the left and 4
digits to right of decimal)
I5.2 = Implied decimal
(2 digits to left and
2 to right)
Note: the value line should then
contain minimum and maximum value
with a decimal, e.g., for Z I5.2,
the value line should be
V 0.00:99.99
Fx.y = Floating point with precision
as described for Implied decimals
N Unit type abbreviation as follows: 1
ABS = Absolute number
AVG = Average
DOL = Dollars
MIN = Minutes
PCT = Percent
SQM = Square miles
TH$ = Thousands of dollars
RTE = Rate
V Value with description.
(limit 100 characters) Each value
line should have a V at the beginning,
but DO NOT put a V at the beginning
of a line if the description wraps to
the next line.
V 1 Male
V 2 Female
or for a continuous range variable,
the minimum and maximum values MUST be
separated by a colon:
V -1 Blank
V 0:99 Years
or for a continuous range with
decimals (e.g. Z I10.4):
V 0.0000:99999.9999
IF the data contains a blank,
the value should be defined exactly as:
V Blank
The following item is optional, but strongly recommended for items that
are not allocation flags or topcoded items:
:L:
Long description. There may be a multiple
line description. There must be a :L: on
the lines before and after the description.
:L:
The following items are optional:
P CD-ROM or ascii file data start
and end positions, e.g. P 15 16
U Universe description
(Universe descriptions MUST follow
Long description for an item.)
:A: Attachment type
(e.g. Edit Specs, Recode Specs,
Instrument Specs, Sampling, User
Note,etc.) followed by the URL of
the text, beginning on the next
line, e.g. http://www.census.gov/
mydir/myfile.htm (Please note:
there is no :A: line after
the URL line.)
B Synonyms(Multiple words should either be
be listed separately, or comma
delimited).
(e.g. B men
B boy
B gender
B women
B girl
or
B men, boy, gender, women, girl)
I Iteration group size for longitudinal
data (i.e., variable repeats 12 times,
then 12 would be the group size).
_____________________________
1When entering new variables, please place variables used
as Suggested Weight variables with their corresponding information at the
top of the file.
Contact: (whazard@census.gov) Bill
Hazard-Census/DSD/SMPB
Last modified: March 29, 2005
|
 |
Get Data ~ Run:
Troubleshooting
1st time only:
RunSecurityPolicy
QuickTour
Tutorials
Users' Guide
Advanced Topics
Use Examples
Types of Datasets
DataSet Topics
MacBetaDataFerrett
MacMovie Guide
MacDataFerrett Info
|