Searching based on variable names, variable labels, value labels, dataset names.
Searching across themes.
Keyword searches can be canceled. Variable definitions from different data sets can be compared.
Overviews of surveys are included so that users can compare the strengths and weaknesses of particular datasets at the point they are ready to look at data.
Support for extended variable documentation in the codebook, including support for question text, interviewer instructions, universe definitions and links to extended definitions.
Information about comparability of files and variables across time is available.
Expert information that informs users about data file merges across time or topic that is appropriate.
File Manipulation controlled by rules in DataFerrett Metadata.
Appropriate automatic merging and matching of various hierarchical files (household, family, geography, person, diary files) making up a dataset for one based on rules in DataFerrett metadata.
Appropriate automatic merging and matching of datasets across time based on rules in DataFerrett metadata.
Ability to merge data from different datasets in the spreadsheet based on shared aggregate values.
Automatic merging and matching of data from different topical modules, supplements, basic and core data based on rules in the DataFerrett metadata.
Metadata is separated into private and public metadata systems, enhancing the ability for organizations to work with confidential and public data.
Data Manipulation
The ability to do dataset subsetting directly from the codebook.
Support data manipulations that uses cross-sectional microdata, longitudinal microdata, aggregate data, geographic point data, and time series supported.
Geocoding and mapping supported.
Forced selection of required variables for aggregate data sets was added.
Recoding:
Recoding continuous variables into custom range recodes, e.g. income into income groups.
Recoding categories into other groups.
Recodes and formulas can be saved and shared among users as virtual variables
Saving variable selections and recodes from the “data basket” that reflect all the work done in a work session into ascii files that can be mailed and used to define new sessions.
Recodes can be made of recodes.
SAS-like “Data Step” was added to allow recodes of multiple variables to be created, if-then-else logic for dummy variable creation and other dynamic variables to be created.
Formulas and recodes created via Data Step or recoder can be stored in the metadata as “virtual variables”.
Table Layout and Manipulations (Spreadsheet)
Drop and drag variables into complex tabulations.
Drop and drag manipulations can be “undone” in the spreadsheet.
Multiple variables can be highlighted and dragged into the spreadsheet at one time.
Variable nesting up to eight variables deep are supported.
Asymmetrical table layouts are supported, including the ability to hide rows and columns.
User can define column spanners on spreadsheet.
Spanners appropriately wrap in the spreadsheet.
Complex “Excel-like” spreadsheet formulas that act on manipulations of rows and columns to enable aggregate ad hoc variable creation. Shorthand summation of rows/columns in spreadsheet closely emulates Excel.
Interpolated medians are properly calculated, including linear interpolations and pareto interpolated medians.
Percentage totals can be created in the spreadsheet.
Complex table layouts (table shells) from previous months can then be used with the current month with a simple change in the dataset month.
Table shells that include all the information from the databasket and also include all the formulas, titles, fonts, footnotes can be saved, emailed and applied to another appropriate dataset, facilitated by the rules stored in the DataFerrett metadata.
Table shells can be stored, and run as a set in batch for intensive overnight or daytime back ground processing. Table shells are laid out via the metadata, perhaps before the data is collected, and then can be run in batch as the data comes in as part of a data review process.
Table properties can be added to spreadsheet, including titles, author, version, miscellaneous notes.
Standard business graphing is supported.
For individual count (not subtotals at this time) user can view the underlying records that composed that cell.
Timeseries graphs for individual cells on spreadsheet can be accessed via a single keystroke in the spreadsheet.
Timeseries data with more than two dimensions can be displayed.
Row/column definitions can be cleared from the spreadsheet.
Users can sort tables based on specific columns or rows.
Users can create a rank column/row based on values in another column/row.
Rounding for disclosure in special tabs, and DataFerrett allows AUTHORIZED users to view the actual numbers.
DataFerrett tables have special formatting capabilities,including displaying the numbers in terms of thousands (for CPS).
Spreadsheet appropriately manipulates data and creates footnotes based on data suppression flags.
Simple if-then logic is available for row/column arithmetic, which enables users to create their own calculations based on tabulated values or to create flags for comparison cells that fail some threshold test.
Tabulated cells from different months of datasets can be displayed across time in a chart.
Appropriate weights are automatically used in tabulations if defined in the metadata.
User can get counts weighted differently, and get unweighted counts.
Standard mapping supported for all Census geographies and geographic points.
Data from multiple datasets can be merged from different servers to be displayed as multiple layers on one map.
Users can define the creation of a list of all records meeting the user specified criteria in the spreadsheet.
PDF creation is available from the spreadsheet, including support for footnotes, suppression flags, multiple fonts, alignment, sizing tables to be printed across different pages, etc.
Extractions and Cut & Paste
Data can be output into SAS, SPSS, and STATA formats.
Data can be output as tab delimited for Excel or Access, and comma delimited to read into other packages.
Extracted files can be compressed for users to minimize disk storage and download times.
Record counts of extract file are provided to the user.
Longitudinal files can be dynamically output as “person month” or “longitudinal records” on demand, no matter how the underlying data is stored.
Custom code books can be created for variables extracted.
Cut/copy from spreadsheet is supported directly into Excel and Word.
Printouts of a simple table or selected cells from a table are supported.
Processing Options
Individual requests can be run in batch.
Parallel processing supported for large files (currently for decennial and ACS).
The “DataFerrett” front-end error automatically emails any errors and the history of the session to a central bug reporting and recording system.
Work sessions in DataFerrett can be recorded and played back to facilitate training and bug fixing.
Data Bases and File systems supported
MySQL, Oracle, Sybase, Microsoft SQL Server and Access, DB2, PostGres
SAS Internet
Microsoft Excel
ESRI shape files
Flat files
The DataFerrett Library interoperates with SPSS files from the Harvard Digital Library (VDC):
Metadata is shared via real time updates.
Metadata is translated between Harvard XML and DataFerrett metadata format.
Remote connections are enabled based on security rights.
Software updates are automatically done in background.