This page explains a method for evaluating the accuracy of a proposed pinewood derby race procedure, i.e. how closely a ranking obtained from a competition procedure matches the objective ranking.

I am a Cub Scout leader. A few years ago, I was helping plan our Pinewood Derby. The way that competition had been conducted for the prior few years was process called Double Elimination, i.e. organized racing in which Scouts continue racing until they have accumulated 2 losses.

We needed to select the 4 fastest scouts from each grade to represent the pack at the District Pinewood Derby. Double Elimination only claims to do a pretty good job of picking the 2 fastest cars.

How bad was it going to be? And what effect should I expect if the lanes that the cars are assigned to race in are not identical? Since trophies were being awarded for 1st through 4th place, how likely was it that the trophies would be awarded correctly?

Select criteria for evaluation. For my purposes, I selected 4 cars and 4 trophies as important measures:

- 4-Trophy accuracy: What percentage of the place assignments (of the top 4 places) were correct? In other words, how accurately were the 4 trophies assigned?
- Top-4 accuracy: What percentage of the top 4 cars were assigned among the top 4 places? In other words, how accurately were the 4 fastest cars selected?

The evaluation method involves running computer simulations of a full chart many times. For each trial, the cars (of known relative speed) are randomly assigned to the grid. For simplicity, I assumed that car #1 was the fastest, car #2 was second fastest, etc. Then the car numbers were randomly mapped to the grid numbers of the racing procedure. The chart is run according to it's rules, and the results are tallied into a matrix: Rows are "cars"; Columns are "places assigned by the method."

After a predetermined number of trials, the resulting matrix shows how well competition procedure ranks the cars.

Let *Trials* be the number of trials, and

*Tally (car, place)* be the matrix in which the trial results are tallied.

Then

4-trophy accuracy = 100% * (Tally(1,1) + Tally(2,2) + Tally(3,3) + Tally(4,4)) / (4 * Trials)

Top-4 accuracy = 100% * (Tally(1,1) + Tally(1,2) + Tally(1,3) + Tally(1,4) + Tally(2,1) + Tally(2,2) + Tally(2,3) + Tally(2,4) + Tally(3,1) + Tally(3,2) + Tally(3,3) + Tally(3,4) + Tally(4,1) + Tally(4,2) + Tally(4,3) + Tally(4,4)) / (4 * Trials)

N-Trophy accuracy considers the tallies along the diagonal through the N, the number of places of interest.

Top-N accuracy considers all of the tallies in the NxN sub-matrix in the upper left hand corner of the Tally matrix.

Simplifying assumption: To incorporate lane inequity into a set of trials, the cars' speeds are assumed to uniformly distributed. A lane inequity of K means that any car, running on that unequal lane, will be beaten by any of the faster cars and by the next K slower cars. K = 0 indicates "equal lanes."

For example, when evaluating a 16-car double elimination chart, I chose 1000 "Years" of den racing as the number of trials and lane inequity values of 0, 1, 2, 3, and 4. (5 total conditions resulting 5000 total trials!) Each lane inequity "condition" produced a 4-trophy accuracy percentage and a top-4 accuracy percentage.

Charting a "method profile"... one chart for each criterion: Y-axis = % accuracy; X-axis = K lane inequity index. Plot and connect the points... label the line!

The method uses "simulation." Because of this, there are possibilities for error. Some of the possible errors include

- incorrect modeling of the competition
- insufficient duration of the simulation
- computational error
- bias in the randomizing techniques
- over-simplification in the model

Therefore the results of these simulations should not be interpreted as "facts", but, rather, as "best available information." Independently obtained results are needed before I could expect general acceptance.

constructed a very nice piece of software for chart simulation. It is available in the "Software" section of The results are more or less consistent with what I produced, but his program is much more sophistocated that what I built for my C-64. The results are not directly comparable because, with his mathematics and programming prowess, his more powerful computer and his better programming languages, he was able to bridge some of my simplifying assumptions.

In consultation with Cory, I have performed some analysis involving Stearns Method, alone and in conjunction with other methods that show how to preserve the benefits of Stearns and to gain accuracy that single methods alone can not accomplish in a reasonable time span. See CASE STUDY.

1. Computers are very handy when doing stuff like this!

2. See, that PC is really good for something after all!

3. You don't need a 200MHz Pentium... I evaluated a 128 car D.E. chart in 4 conditions for 500 trials each using a program written in BASIC and running on a (0.8 MHZ) Commodore C-64. It required about 1 day of "number crunching" per condition, if I recall correctly.

Most of this work was done a number of years ago, about 1989. Now that I have a "real computer", a "real programming language", and a few more methods to evalutate, I will probably revisit the process.

However, it would be helpful if others could independently arrive at evaluations, which, when taken together could lend credence to the results. If you undertake such evaluations, I would be pleased to catalog the results, along with the specification of the method(s) being evaluated on or under this page.

Latest update: 12/30/97

Copyright 1997 © by Stan Pope. All rights reserved.