Launch, grow, and unlock your career in data

September 19, 2016

Makeover Monday: Data breaches are getting bigger and more frequent


Several people have recommended Makeover Monday for the Project of the Year in the Kantar Information is Beautiful Awards, which I must admit is quite stunning and flattering at the same time. The suggestion for this week’s makeover came from Andy Cotgreave. We intentionally picked something from Information is Beautiful with the hope that it gets a bit more exposure. Shameless perhaps, but what can it hurt? This viz from David McCandless certainly deserves a makeover.

What works well?
  • The viz is eye-catching and definitely draws you in. There’s something to say for that.
  • The interactivity is fantastic.
  • Good filtering, colouring and sizing options

What doesn’t work well?
  • The bubbles move all around for no apparent reason.
  • There’s way too much overlapping, making it hard to identify any insights.
  • Whether something is interesting is extremely subjective. I wouldn’t make these same choices.
  • The viz doesn’t fit in a single view, requiring too much scrolling.
  • Not all records are included. I guess this was done for artistic purposes as David is known to do, but it distorts the message.

I decided to work on my makeover during my flight to Prague, thus imposing a time limit on me. I started by creating a view that simply shows the number of data breaches by year using circles. This basically flattens out the original.

While this shows the distribution nicely, I don’t love it. Next, I converted the circles to squares, hoping the result would be more visually impactful as the squares take up more space.

This is definitely better, however I don’t like how it doesn’t incorporate the records stolen in each data breach well enough for my liking. So I decided to add a dot for every breach in the data set and change the location of each dot to the number of records stolen.

Getting there…iterating is really helpful. This shows some of the outliers really well, but I feel like I’ve lost the distribution a bit. I decided to quickly open the data in Vizable and when I switch the view to records stolen by year, Vizable presented me this interesting view that shows the median and the distribution.

I really liked this so I decided to build upon it in Tableau. My final viz incorporates the view from Vizable, the distribution of each data breach and allows me to focus the story on data breaches that were hacks versus not hacks.

Click to view interactive version

I find this final view much, much easier to look at than the original and also it provides much better context. For me, context is key. Every visualisation you create should include context somehow. Why? Context makes it much easier for your audience to understand the story.


  1. Info is Beautiful (and, hence, this makeover) makes a pretty large assumption about the quality & consistency of breach reporting since 2004 (i.e. some treatment of uncertainty should be in the vis as it should be in the Info is Beautiful bubbles and is not). This makeover is also making a huge assumption about the statistics underlying the data and hasn't actually communicated how accurate the "bigger and faster" claim is. Also, by including a records view and lumping all of said record types together, it's making a further huge assumption that all record types and breaches are equal (at least the record types are a toggle on the Info is Beautiful vis). It may help to take a look at how the Verizon Data Breach Investigations Report talks about and visualizes the topic of data breaches.

    Now, one may dismiss the above comments, but I doubt folks would lump all kinetic (murder, robbery, jaywalking, speeding, etc) crime together like this in a makeover of that type of data/vis (at least, I would hope not). I'm only pointing this out since a makeover really should not be done w/o an attempt to understand the topic (at least, in my opinion) and to also faithfully communicate the data/message (warts and all).

    1. Fair point. I made the (inaccurate) assumption that the data was complete. It obviously isn't so that flaws my analysis. 2010, for example, is clearly not right. However, I do believe that mine is better than the original given it was done on the same data set.

  2. Is it just me, or does "engaging" more often than not just mean that you *must* engage with it in order to actually learn anything from it?

    1. Jamie, in this case, to me, engaging meant that they designed it so that I would want to interact with it.

  3. Hey Andy!
    I love how you decompose (and expose!) the data exploration process in #MakeoverMonday. Creating a dashboard usually involves *dozens* of Tableau worksheets to determine... a) the shape and size of the data, b) outliers & curiosities, c) where the story is, d) what's visually appealing, e) finishing touches -- just to name a few.

    In short, THANKS for walkthrough of your design thought process, including the "what works" and "what doesn't" list.

    On another note... As the enigmatic "Unknown" user implies, this data set is particularly interesting/troubling because it relies on *self-reported* data breaches. Undoubtedly, private information falls into the wrong hands far more often than we realize.

    Keep vizzin!

  4. Hi Andy,
    I really like the way you display the data in your charts and dashboards. I am trying to learn from some of your visuals. Could you kindly share how did you reduce the space between the vertical axis in the chart, so that the space above and below the circles is less in the 1st chart. Thank you.

    1. I have no idea what you mean. Can you clarify?