r3 - 09 Oct 2007 - 14:49:45 - BillHazardYou are here: Top >  DataFerrett Web  >  OnlineTraining > AdvancedTopics > AdvExample4

Example 4 - Spreadsheet functions - Ranking, weighted ranking, order of formula processing.

Other features illustrated: Summing calculation, hiding columns, creating column spanners, viewing hidden columns, IF conditions

Background: This example shows how you can use ranking and apply weights to those ranks in order to answer questions like "What is the best county for ...?"

Goals: 1) Create a table that will rank the best county in the US in which to live, based on our 3 criteria - commute time, median household income, and median housing value. 2) Limit our rankings to counties that meet our population size criteria - between 50,000 and 150,000 people.

Steps:
Goal 1

  1. Open the Ferrett Session File (or saved data basket) named sf3_bestcounty_tolive.fsf .
  2. Go to the Step 2 tab to see the variables in the data basket - number of workers not working at home, aggregate travel time to work, median household income, median value of owner-occupied housing units, and county (all counties in US).
  3. "Make a Table"
  4. Turn off the "Automatic Totals" (in the "Options" menu) and drag the county variable to column 1 to define the rows.
  5. Drag the aggregate travel time variable, P033001, to column 2 - C2, R1.
  6. Drag the workers variable, P031002, to column 3 - C3, R1.
  7. Skip 2 columns and drag the median household income variable, P053001, to column 6 - C6, R1.
  8. Skip 2 columns and drag the median housing unit value variable, P085001, to column 9 - C9, R1.
  9. "Go Get Data"
Create calculations and rankings.
  1. Calculate the average commute time in column 4.
    • Click in the C4 column header to select the column in which to create the calculation.
    • In the formula bar, enter =comp(c2/c3) == aggregate travel time divided by workers not working at home equals average travel time.
  2. Create the county ranking for commute time to work. The lower the commute time, the higher the rank number.
    • Click the C5 column header to select the column in which to create the ranking.
    • In the formula bar, enter =rank(c4)
  3. Create the county ranking for median household income. In this case we want the higher value to have the higher rank number. Therefore we will need to create an inverse income column on which to create the ranking.
    • Click the C7 column header to select the column in which to create the calculation for inverting the income.
    • In the formula bar, enter =comp(0-c6)
    • Click the C8 column header to select the column in which to create the ranking.
    • In the formula bar, enter =rank(c7)
  4. Create the county ranking for median housing unit value. The lower the median value, the higher the rank number. If for some reason you were to want to rank the counties higher based on HIGH housing values, then you would need to create an inverted housing value column like we did for income.
    • Click the C10 column header to select the column in which to create the ranking.
    • In the formula bar, enter =rank(c9)
Hide Columns
Since several of our columns were used to create our desired columns, we can hide them. In our example, we do not need to see the columns used to calculate the commute time or the inverted income column used for ranking income.
  1. Click the C2 column header to select the column you wish to hide, then select the "Edit" menu, then select "Hide".
  2. Repeat for columns 3 and 7. Currently you can only highlight one column at a time.
  3. If you ever want to see the hidden columns, select the "View" menu, then select "Hidden Rows/Columns".
  4. If you ever want to change a hidden column back to not being hidden (unhide), first view the hidden columns as in the previous step, then highlight the column, then select the "Edit" menu and select "Unhide". The "Unhide" selection allows you to unhide only the selected column, or unhide all hidden columns.
Create Custom Column Spanners
You can create your own custom column spanners to group columns together. In our example, we can create spanners to group the three different measures and their rankings together.
  1. Select the "Edit" menu, then select "Column Spanners". This opens the "Column Spanner Editing Dialog" box. There are three fields for you to supply - 1) the text for the spanner, 2) the beginning column, and 3) the ending column. See the image.
    Screenshot of column spanner creation:
    column_spanner_creation.JPG
  2. Hit the "Insert" button at the bottom to enter the first spanner.
  3. Click in the empty "Column Spanner" field and type in your first spanner - Commuting - then hit the "Enter" key.
  4. Click in the empty "From" column and enter the column number the spanner should start with - 2 - then hit the "Enter" key.
  5. Click in the empty "To" column and enter the column number where the spanner should end - 5 - then hit the "Enter" key.
  6. Hit the "Insert" button at the bottom to enter another spanner.
  7. Enter the information as show in the image above.
  8. Once the three spanners have been entered, hit the "OK" button.
Create the Combined Ranking
Now we want to sum all three rankings together to find which counties rank the best based on the three criteria. The summed rankings will show us which counties have the shortest commute times, the highest median income, and the lowest median housing values.
  1. Click in the C11 column header to select the column in which to create the summed ranking.
  2. In the formula bar, enter =sum(c5,c8,c10) == this sums the rankings together.
  3. A small dialog box will appear showing the computations being done and the last one shows a conflict. See the image.
    Screenshot of calculation sequence conflict dialog:
    computation_sequence_conflict.JPG
    The computations must happen in order, or they cannot be completed properly. In this case, all previous ranking functions must be completed prior to our summing the ranks together, therefore we want to change the conflicting computation's sequence.
  4. Click in the "Sequence" cell of the "Conflict" row and change the 0 to a 2 and hit the "Enter" key. This will make this computation occur last.
  5. Hit the "OK" button.
    To see which county ranks the highest or lowest, highlight column 11 and use the sort buttons in the toolbar.
Adding Weights to the Ranking Criteria
We can customize our final ranking by weighting the criteria that are of more importance to us. For example, if the commute time is the most important criterion to us, we can weight it higher than the others. The weights must add up to 1. We will weight commute time as 0.5, and the other two as 0.25 each.
  1. Unsort the table by hitting the "Return to original order" button next to the sort buttons on the toolbar.
  2. Click in the C12 column header to select the column in which to create the weighted ranking.
  3. In the formula bar, enter =comp((c5*0.5)+(c8*0.25)+(c10*0.25)) == this weights the ranks and sums them together.
  4. We notice that we get all 0s in our table. We have a sequence conflict again, but we were not notified the this time. We can change the sequence manually.
  5. Go to the "Options" menu and select the "Specify Sequence" option. This will bring up the dialog we saw before. Now, we want to change the last two calculations to have a sequence of 2, making sure to hit the "Enter" key after each change. See the image.
    Screenshot of calculation sequence specification dialog:
    computation_sequence_conflict_fix2.JPG
Once the sequence is changed, the weighted rankings appear. You can easily change these weighted rankings by changing the weights in the formula for column 12. To edit a formula, simply highlight the column and the formula should appear in the formula bar. Edit the formula and hit the "Enter" key and the changes will be applied.

You can sort the table to find the best and worst counties based on your weighted ranks.

Goal 2
If we only want to see the best counties of a certain size, we can include the population variable and then use an IF condition in a formula to remove counties from the top of the list.

  1. DO NOT CLOSE your spreadsheet window.
  2. Go back to the Step 1 tab and find the P1. Total Population table in Summary File 3.
  3. Get the variable, and add the P001001 variable to your data basket.
  4. Now go back to the spreadsheet window and you will see the P001001 variable in your variable list.
  5. Drag P001001 into column 13. Hit the "Go Get Data" button.
  6. Click in the C14 column header to select the column in which to create the weighted ranking for counties of a certain size.
  7. In the formula bar, enter =IF(((C13>50000)AND(C13<250001)),C12,3500) == this condition will use the weighted rank (C12) if the county population is between 50,001 and 250,000 and if the population is smaller or greater than that, it will set the rank to 3500 (an arbitrary number higher than the highest county rank).
  8. We notice that we get all 0s and 3500s in our table. We have a sequence conflict again, and again we were not notified. We must change the sequence manually.
  9. Go to the "Options" menu and select the "Specify Sequence" option. This will bring up the dialog we saw before. Now, we want to change the last three calculations - two to have a sequence of 2, and the last one to a sequence of 3, making sure to hit the "Enter" key after each change. See the image.
    Screenshot of calculation sequence specification dialog:
    computation_sequence_conflict_fix3.JPG
    Once the sequence is changed, we now can sort the list and see the best counties based on the weighted criteria we used and limiting the counties to the size we specified.

-- BillHazard - 04 Oct 2007

toggleopenShow attachmentstogglecloseHide attachments
Topic attachments
I Attachment Action Size Date Who Comment
jpgJPG column_spanner_creation.JPG manage 14.6 K 04 Oct 2007 - 19:34 BillHazard Screenshot of column spanner creation
jpgJPG computation_sequence_conflict.JPG manage 28.2 K 04 Oct 2007 - 19:35 BillHazard Screenshot of calculation sequence conflict dialog
jpgJPG computation_sequence_conflict_fix2.JPG manage 29.5 K 04 Oct 2007 - 19:35 BillHazard Screenshot of calculation sequence specification dialog
jpgJPG computation_sequence_conflict_fix3.JPG manage 30.7 K 04 Oct 2007 - 19:36 BillHazard Screenshot of calculation sequence specification dialog
elsefsf sf3_bestcounty_tolive.fsf manage 568.0 K 04 Oct 2007 - 19:39 BillHazard Ferrett data basket for creating the rankings of best county to live
elseftf sf3_bestcounty_tolive_with_if.ftf manage 1139.8 K 09 Oct 2007 - 14:49 BillHazard Ferrett table definition for the final table created. Use this for comparison to what you create.
Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r3 < r2 < r1 | More topic actions
 
TheDataWeb Main Site
This site is powered by the Top collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Top? Send feedback