Example 1 - Data Integration: Creating a Single Table using Mutiple Datasets

Other features illustrated: Aggregate data multi-variable recode

Background: Different datasets may have variables whose value sets are defined alike(comparable). Examples include demographic variables (e.g. gender, race, marital status) and geographic variables (e.g. FIPS state, FIPS county). Co-tabulation allows you to tabulate information from the different datsets side by side using the comparable variable as a hinge (one of the dimensions).

Goals: 1) Estimate the number of people per gas station in each county in Maryland. 2) Add the number of households without vehicles to the table.

Steps:
Goal 1

  1. Select first dataset - County Business Patterns, County Level, 2003.
  2. Select variables - Selectable Geographies (all counties in MD), Establishments (est), and industry (naics=447///).
  3. Select second dataset - Decennial Census, Summary File 3, 2000.
  4. Select the variable from Summary File 3 - P1. Total Popluation (P001001).
  5. Go to Step 2 tab and press the "Make a Table" button. This brings up the DataFerrett spreadsheet window.
  6. Click and drag the first "MERGED-200 Three digit FIPS County Number" variable into C1,R1 to define the counties in the rows. This is the comparable dimension and all other variables will be dropped in the columns.
  7. Click and drag the "est" variable into C2,R1.
  8. Click and drag the "P001001" variable into C3,R1.
  9. Create a formula calculating people per gas station in column 4 (C4) Screenshot of co-tabulation table and a formula:
    merge_table_and_formula.JPG
  10. Hit the green "Go Get Data" button in the toolbar.

Goal 2
At this point, we can explore how many households have no vehicle within each county.

  1. DO NOT CLOSE the spreadsheet window, but go to your main >DataFerrett window using your Windows taskbar.
  2. Go to the Step 1 tab so that you can add an additional variable from SF3.
  3. Scroll down the dataset list under SF3 until you find table "H44. Tenure by vehicles available", and view all of the variables in this table.
  4. Select the "D_H044_1 - Total Housing Units: No Vehicle Available" variable. This is a pre-defined recode that sums together the H044003 and H044010 variables.
  5. Now go back to the tabulation window from your taskbar.
  6. The added variable is there, now drag it into column 5 (C5,R1) and hit the "Go Get Data" button.
  7. If this summed variable had not already been defined, we could have created it ourselves.
  8. Now go back to the tabulation window from your taskbar.
  9. The newly created variable is there, now drag it into column 6 (C6,R1) and hit the "Go Get Data" button.
    Note: The numbers in columns 6 and 5 should be identical.