Launch, grow, and unlock your career in data

January 23, 2017

The Pitfalls of Averages of Averages

1 comment

As soon as I started exploring the data for Makeover Monday week 4, I had a suspicion that people wouldn’t pick up on a few things:

  1. There was a region named “Total (all TLAs)” that represented the total but was mixed in with all of the other regions.
  2. The data was an index, which is calculated as a weighted number for each region, month, and visitor type. 
  3. When using indexes in the dataset, using an average aggregation is appropriate as long as you only use it at the individual region, month, and visitor type level. You can’t use an average of the average to represent the total.

I saw a few people make the mistake of using an average of the average very early on (I won’t shame them publicly), so I thought it would be appropriate to explain averages of averages, why they don’t calculate “accurately", and how people should be using them.

1 comment :

  1. Would be more accurate to talk about the average of a ratio, instead of the average on an average. The index is calculated as a ratio, not an average. It's simply the 100 * (dollar spending in year X / dollar spending in 2008). It's not correct to take an average of a ratio, but it's a bit misleading to talk about averages of averages in this case.