TheDataWeb Publisher – Overview
TheDataWeb Publisher is a tool that was developed to help data providers
put their data on TheDataWeb (and automatically in DataFerrett). Using
TheDataWeb Publisher you can do a number of things. You can use the
tool to describe a data file (define metadata) and then publish it to TheDataWeb
and also load that data file into a MySQL database. Or, you can simply
define and publish the metadata for a data file that you already have loaded
in another database. Or, you can use it simply publish a metadata file that
you created manually. You will also be able to use TheDataWeb Publisher
to interactively modify metadata that you’ve already loaded. Metadata is
the information that defines a dataset and the variables, or items, found
within that dataset. This includes the name of the Data Collection,
the name of each dataset within that collection, the time period for the dataset,
and the name, description, and values of each item in the dataset, as well
as other information.
Describing and Loading a Data File
If you have an ASCII data file, either tab or comma delimited, you can
use that data file as the basis for defining the variables it contains,
and then also load the data file into a MySQL database. TheDataWeb
Publisher will lead you through the process of defining all the metadata
needed to publish the dataset to TheDataWeb. It will also step you
through the loading of the data into the MySQL database.
Describing a Data File
You may have data that you want to keep in its existing format. For
example, TheDataWeb can tabulate data that is in Sybase, MySQL, Oracle,
SAS, and other formats. If you are keeping the data in its current
format, you still need to describe the dataset and publish that metadata
to TheDataWeb in order for the dataset to be used in DataFerrett and other
DataWeb components. TheDataWeb Publisher will lead you through
the process of defining all the metadata needed to publish the dataset to
TheDataWeb.
Publishing a Manually Created Metadata File
You can create a Metadata Interface File (MIF) manually if you prefer. The
MIF is an ASCII file that is used to populate the DataFerrett metadata database.
Each piece of metadata is denoted by a 2-3 character delimiter (or token).
If you have a dataset that contains a very large number of variables,
it may be easier to write a parser or use some other automated process to
transform existing documentation into the MIF format. You can find
detailed information about the MIF
to see if this may be how you want to define your metadata.
Using an Existing Project File
If you have already defined a dataset or begun defining a dataset, you can
open that project and make changes or finish what you started.
Skipping the Wizards
If you do not want to go through the wizard, you can skip directly to the
metadata defining part of the system.
Email:
dsd_ferrett@census.gov