TheDataWeb Publisher – Overview


TheDataWeb Publisher is a tool that was developed to help data providers put their data on TheDataWeb (and automatically in DataFerrett).  Using TheDataWeb Publisher you can do a number of things.  You can use the tool to describe a data file (define metadata) and then publish it to TheDataWeb and also load that data file into a MySQL database.  Or, you can simply define and publish the metadata for a data file that you already have loaded in another database. Or, you can use it simply publish a metadata file that you created manually.  You will also be able to use TheDataWeb Publisher to interactively modify metadata that you’ve already loaded. Metadata is the information that defines a dataset and the variables, or items, found within that dataset.  This includes the name of the Data Collection, the name of each dataset within that collection, the time period for the dataset, and the name, description, and values of each item in the dataset, as well as other information.

Describing and Loading a Data File

If you have an ASCII data file, either tab or comma delimited, you can use that data file as the basis for defining the variables it contains, and then also load the data file into a MySQL database.  TheDataWeb Publisher will lead you through the process of defining all the metadata needed to publish the dataset to TheDataWeb.  It will also step you through the loading of the data into the MySQL database.

Describing a Data File

You may have data that you want to keep in its existing format.  For example, TheDataWeb can tabulate data that is in Sybase, MySQL, Oracle, SAS, and other formats.  If you are keeping the data in its current format, you still need to describe the dataset and publish that metadata to TheDataWeb in order for the dataset to be used in DataFerrett and other DataWeb components.   TheDataWeb Publisher will lead you through the process of defining all the metadata needed to publish the dataset to TheDataWeb.

Publishing a Manually Created Metadata File

You can create a Metadata Interface File (MIF) manually if you prefer.  The MIF is an ASCII file that is used to populate the DataFerrett metadata database.  Each piece of metadata is denoted by a 2-3 character delimiter (or token).  If you have a dataset that contains a very large number of variables, it may be easier to write a parser or use some other automated process to transform existing documentation into the MIF format.  You can find detailed information about the MIF to see if this may be how you want to define your metadata.

Using an Existing Project File

If you have already defined a dataset or begun defining a dataset, you can open that project and make changes or finish what you started.

Skipping the Wizards

If you do not want to go through the wizard, you can skip directly to the metadata defining part of the system.


 Email:
 dsd_ferrett@census.gov