Data Viz Done Right

August 10, 2012

Displaying time-series data: Stacked bars, area charts or lines…you decide!

4 comments

Matt Stiles of The Daily Viz presented this chart to “see the trend in this quick column chart” in one of his recent blog posts.

First, let me say that this is a tremendous improvement over those produced by the U.S. Bureau of Alcohol, Tobacco, Firearms and Explosives (a.k.a. the ATF).  Don’t bother reading the ATF report, unless you love 3D bar charts and 3D pie charts created in Excel.

A stacked bar chart is basically a pie chart unrolled to make a stick.  And more often than not, when plotted as a time series, they do a poor job at showing the overall trends.  Stacked bars are good up to three bars, no more.  Why? Because it’s difficult to compare the heights of any of the bars except for the bottom bar, rifles in this case. 

Let’s go through several alternative displays.  If you’re interested in playing with the data, Matt published it here for me.  Thank you Matt!

All of the charts below were built with Tableau.  You can view an interactive version of all of these charts here and download the workbook here.

Let’s start with a redesigned stacked bar chart that uses Tableau’s built-in color blind palette.

image

Can you see the trends for each of the weapons?  Maybe an area chart would be better.

image

Well, ok.  Now the trends are easier to see, right?  Area charts certainly improve the ability to see trends over time, but there are only two trends that give an accurate reading:

  1. The line at the top of the bottom area, i.e., rifles.
  2. The top of the top chart, which represents the total.

We still don’t have the ability to see the trends for any weapon except for rifles. 

Before you read on, take out a piece of paper and sketch what you think the trend is for shotguns (light blue) based on the area chart above.

Ok.  Now let’s compare the area chart above with the area chart for shotguns.

image

Did you come close?  I doubt you did.  Why?  Because the tops of each color are influenced by the size of the colors below it, therefore making gauging the true size of each individual color extremely difficult.

Here’s another way to prove it.  I know this isn’t a good way to represent the data, but bear with me, I’m trying to prove a point.  If I overlay lines for each weapon over the area chart, look how different the shapes of the lines become.

image

Like most time-series data, your best way to represent the data is nearly always going to be a line chart.

image

Using a line chart we can quickly make some observations:

  1. There was a three-year spike in the early 90s for pistols made and there’s been a similar, but longer, surge since 2006.  What was the cause of the big decline in 1995?  Was there a change in handgun laws in 2005 or 2006? 
  2. Revolvers were on a steady 20-year decline until 2005-2006.  Is this merely coincidental with the pistols?  Possibly so, possibly not. 
  3. Rifles have increased recently, but shotguns have decreased.  Are people buying rifles instead of shotguns? Their rate of variance since 1994 has grown consistently and the gap continues to get wider.

Using a line chart, you’re immediately asking questions of your data.  Rapid-fire analysis!

When analyzing time-series data across several categories, consider not only looking at the raw numbers like above, but also review how each category contributes to the total.  Let’s go through the same series of charts.

image

We’re off to a good start with the stacked bar chart.  It looks like measuring the contribution of each weapon to the total may tell us something.  Let’s try it as an area chart.

image

Not much better, other than it looks smoother.  How about a line chart?

image

Ok, now we’re onto something.  You might think that this is the same as the line chart for the raw numbers, and I can see how you might make that conclusion at a quick glance.  But let’s look at them side-by-side.

image

The charts look very similar up until 1997, but then look at how many more rifles started to be made compared to the rest.  And look at the drop off in percentage of shotguns produced since 2004.

Hopefully you’ve learned two main lessons:

  1. Don’t display time-series data as stacked bars (or pies unrolled onto on a stick if you prefer).  The best medium for time-series data is a line chart.
  2. Consider looking at both the raw numbers and their contribution to the total.  It’s always a good idea to look at your data in more than one way.  You may get some additional and/or different insights.

Let me wrap with two charts that disturbed me a bit as I was playing with the data for this blog post.  I’m not disturbed by their visual display, but by what they reveal.

image

The chart on the left is the running total of guns made by gun type since 1986.  The chart on the right summarizes the chart on the left.

These charts tell us that the US has manufactured over 99 million guns since 1986.  Seriously!  99 million!   According to the US Census Bureau, there were ~238M Americans over 18.  That means that approximately one of every five Americans 18 or older owns a gun. 

That terrifies me!

Perhaps political interests (and lobbyists) have played a part??

image

UPDATE – Source CNN: This certainly explains the drop that started in 1994 and the subsequent increase in 2005.

The Clinton administration imposed a ban on several types of military-style semi-automatic rifles and high-capacity magazines in 1994, but that ban was allowed to lapse in 2004. Obama has proposed restoring the ban, requiring background checks for buyers at gun shows, and other "common-sense measures."

4 comments :

  1. Is it too soon to say that stacked bar charts are the new pie charts? I think they are easily overused and misused.

    I've never liked stacked area charts the way they are traditionally used because I can never read or interpret them!

    I've been through the same issues with data trying to figure out when to use stacked bar vs. stacked area vs. lines vs. % lines. I've come to the conclusion that I only use stacked area when working with percent of total (so that the Y axis is 0-100%). I only use stacked bar when there are just a few categories and you can significantly view each of the bars.

    tl;dr I encounter the same problems day-to-day and completely agree with your approach!

    ReplyDelete
  2. I only use area fill chats when I have a single dimension on the axis and typically only in Sparklines. I typically only use stacked bar when displaying mix % over multiple hierarchies.

    ReplyDelete
  3. I added the following update to the end of the blog post that certainly explains the drop that started in 1994 and the subsequent increase in 2005.

    There's no way you could have seen this unless you looked at the top of the area chart to get the total.

    UPDATE – Source CNN

    The Clinton administration imposed a ban on several types of military-style semi-automatic rifles and high-capacity magazines in 1994, but that ban was allowed to lapse in 2004. Obama has proposed restoring the ban, requiring background checks for buyers at gun shows, and other "common-sense measures."

    ReplyDelete
  4. Haa anyone run into a 3d stacked pie charts such as what is presented using the cake metaphor in this article, http://www.deltamatrix.com/2012-04-17-04-37-50/horizontal-and-vertical-user-stories-slicing-the-cake?

    ReplyDelete