Data Viz Done Right

February 19, 2017

Makeover Monday: Who's Winning Europe's Battle for Potato Supremacy?

No comments
Well, this certainly was a data set I never thought I'd see. Leave it to Eva to surprise us again. I'm really enjoying how she's mixing things up and allowing me to participate like everyone else. I also need to thank her for sending me her great color palettes again.

This week, we looked at the EU potato sector. Seriously! We're creating vizzes about potato production. The original website has things kind of all over the place. First there's this table:

Then there are a few donuts chart, most of them look more or less like this one:

They also included a few bars charts and a line chart. All in all, it's quite colorful.

What works well?
  • Donut charts are sorted
  • Tables are good for looking up specific values
  • Line chart provides context by comparing to an index of 100 to make yearly change easier to understand

What doesn't work?
  • Inconsistent colors
  • Hard to identify the "story" in the data; The story is buried in the article.
  • Pretty busy overall; too much going on
  • Tables are terrible for finding insight in the data

For my version, I first read through the entire article to get a feel for their conclusions. I then focused in on the information about harvesting and decided to basically take their paragraph and turn it into a visual story. I used Eva's color palette to help highlight the important data points and I used Matt Chambers' shade slope charts blog post to create the second chart. 

I wanted to create a beginning, middle and end to the story, and I feel like I did that. I used a question in the infographic title to help the reader understand what the viz is about. I used dividers to the viz into "parts" of the story and I used the chart titles as legends. Lastly, I used Roboto Condensed font to match the font used in the article.

February 15, 2017

Workout Wednesday: Dynamic Trellis Chart

For Workout Wednesday week 7, I challenge you to build this dynamic trellis chart. Special thanks to Chris Love for explaining to me why the trellis works the way it does. He has a great blog post series over on The Information Lab blog.

In this example, I'm using the same Superstore Sales I've used in previous Workout Wednesdays. Here are the guidelines:

  1. Match my colors.
  2. The user should be able to choose the level of detail they want for the date.
  3. The date axis format should not change, irrespective of the date level chosen.
  4. The user should be able to pick from the list of dimensions shown.
  5. The dimensions should be sorted from upper left to lower right based on the sales in the most recent time period.
  6. Match my tooltips. Note that they change based on the options the user selects. Pay attention to the date formats in particular.
  7. The title should update dynamically based on the date level and dimension selected.
  8. The end of each line should be labeled to the right of the last point. 
  9. There should be a little circle on the end of each line.
  10. Each section of the trellis should include a label for the value of the dimension for that section. E.g., California should be on the upper left when you selected quarters by state.
  11. The dimension labels should be centered in each section.
  12. There should be no gridlines, but the zero line should be included.
  13. Each row should should have a light divider between them.
  14. My final view is 900x700.

February 14, 2017

Tableau Tip Tuesday: How to Add Space for Labels on the End of Lines and How to Create a Year/Quarter/Month Selector

This week is a double tip. First, I take you through adding a buffer to the end of sparklines so that you can have your labels next to the end of the lines. Second, I show you how to create a dynamic date selector, which I then use to create a dynamic buffer for the labels.

February 13, 2017

Makeover Monday: How Much Do Americans Spend on Valentine’s Day?

No comments
Because I'm tend to forget Valentine's Day (I consider it a Hallmark Holiday), it almost passed me that this week included Valentine's Day and I had intended to use another data set for Makeover Monday. Fortunately we have people on Twitter to keep us straight and this tweet changed the theme for this week.

This meant spending my Sunday morning find a new viz and data set. A quick google search turned up this infographic from KarBel Multimedia:

What I like:
  • Color choices that match the theme
  • Simple title that tells me what I'm about to see
  • Proper sourcing
  • Nice description that include a question that explains what the viz is about
  • Donut chart works well here as it's only 2 slices
  • Clear labeling

What could be improved:
  • Why use bubbles to compare the sizes of the spending? A bar chart would be way easier to read.
  • There's very little context. Is this spending increasing or decreasing?
  • While the color choices work for the theme, this sure is A LOT of pink.

For my viz, I wanted to create a mobile version that looks at the historical spending trends in two groups: significant others and everyone else. I don't lover my effort this week (pardon the pun), but there's only so much time in a day. Lastly, special thanks to Eva for the color palette.

February 9, 2017

Workout Wednesday: How Will the UK population Change By 2039?

No comments
Another fun Workout Wednesday challenge from Emma this week. She asked us to build a butterfly chart, something I'd never done before.

Conceptually, I knew exactly what I needed to do:

  1. Create separate measures for males and females
  2. Create additional measures to make them percentages by dividing by the population estimate
  3. Create an LOD for the national average for males and females
  4. Create additional measure to make them percentages
  5. Make the male axis reversed
  6. Set all of the axes scales to be equivalent (Emma didn't do this, but I think the axes should be the same on each side of the center)
  7. Create the calcs for the rows and columns of the trellis
  8. Add a filter for local authority and add that to the title
  9. Throw is all together in a dashboard size 1000x800
I tend to jot these things down in the Notes app as I think of them. It helps me remember what I need to do for weeks like this when I don't get it done on Wednesday.

I'm posting an image here, but if you want to see the interactive version, tap on the image.

February 7, 2017

Tableau Tip Tuesday: How to Create an Aggregated Extract

No comments

Aggregated extracts are an undervalued and underused feature in Tableau. 

Week 6 of #MakeoverMonday allowed us to work with 105M Chicago taxi trips from an Exasol data source. This is fantastic, until you need to publish the viz to Tableau Public which has a 15M record limit. The way to work around the 15M limit is by creating an aggregated extract, for which Tableau has created a great quick start guide here. In the video below, I show you how I created an aggregated extract with the Chicago taxi data.

Keep in mind, though, that your data has to be aggregatable. For example, if you do cohort analysis, you likely won't be able to aggregate the data and maintain the cohorts.

Why aggregated extracts?

  1. Smaller extracts
  2. Better performance
  3. Only contains the necessary dimensions
  4. Makes extract refreshes faster
  5. Reduces resource burden on your Tableau Server

Steps for Creating an Aggregated Extract

  1. Hide unused dimensions
  2. Add extract filters (optional)
  3. Aggregate the data for the visible dimensions
  4. Create the extract

February 6, 2017

Makeover Monday: Are We Nearing the Death of Chicago Taxis?

Now THIS was a fun week! Eva and Exasol set us up with 105M records to play with. Often time, this can feel like an incredibly daunting exercise, but I took a very methodical approach. More about that in a minute. First, let's take a look at the viz we want to makeover:

What works well?

  • Using a line chart to portray a time series rarely is a bad choice
  • Nice simple labeling of the axes
  • Title tells me what I'm looking at
  • Red line on the grey background works well
  • The y-axis is truncated, but the chart maintains a nice 3x2 ratio so that the line trends aren't too distorted.

What could be improved?
  • Scale of the y-axis is a bit odd; I like nice rounded numbers that make the math easy to do in my head
  • There is SO much more data to work with; why limit to only trips?
  • Could use a more impactful title
  • Could use more context

This week I decide to apply some of the training we received from Rhiannon Fox and do a bit of mood boarding, color choosing and seeking overall inspiration. I knew I wanted to create a summary dashboard of sorts that included lots of context, so after a bunch of Google image searching and pinning, I ended with this mood board.

I started by connecting to all 105M records live because I wasn't sure which dimensions I would end up using. When I finally finished (this took well over an hour), I created an aggregated extract. The trick to this is to hide the dimensions you aren't using before you create the extract and to rollup to the lowest date level in the view (month for my viz). This took the viz from 105M to 299K records. Incredibly, the extract was ready in less than 10 seconds. Exasol is crazy fast!

Overall, another fun week. Tonight I get to introduce this to a bunch of new Tableau users at the #MakeoverMonday Live session at Tableau HQ in London. Can't wait to see what they come up with!

February 1, 2017

Workout Wednesday: The Distribution and Median of NFL Quarterbacks

No comments
Last week on my Data Viz Done Right site, I wrote about a distribution visualisation created by Harry Enten that shows the range of dates for snowfall at select U.S. cities. It's Super Bowl week, so I decided to recreated the style of Harry's viz in Tableau with the same NFL data that Emma used last week. Your challenge this week is to re-create my viz.

Below is the visualisation that I created. If you're reading this on a phone, tap on the image for the interactive version. Some requirements to keep in mind that are intentionally designed to make this tougher and to make you learn:

  1. All of the elements must be floating on a dashboard sized 650x650.
  2. You cannot use the Player dimension anywhere in the view.
  3. Match my colors including the background
  4. Create the legend (HINT: It's not an image)
  5. Match the tooltip (Note the stats that are displayed in the tooltip. This will be a bit tricky. Essentially you need to count the number of players that are contained within each band.)
  6. The viz should update based on the stat selected. The user should be able to choose between: Attempts, Completions, Interceptions, Touchdowns, and Yards
  7. The title should update dynamically based on the stat the user selects.
  8. Optional: Use Montserrat font (you can download it from Google fonts)

If you have any questions or get stuck, either leave a comment on this post or tweet me. Good luck!

January 30, 2017

Makeover Monday: Employment Growth in G-7 Countries

After the complexity of last week's data set, it's kind of nice to work on something simpler with the objective of telling a clearer story. Business Insider posted this visualisation about the growth of the economies in G-7 countries:

What works well?

  • Good to add a note about rounding
  • Citing the data source

What could be improved?

  • A clearer title that makes the message more evident
  • Countries should be sorted in descending order
  • Labels should include the %
  • Pie chart is a bad choice for this many categories
  • Too many colors
  • Chart titles are confusing (to me at least)

I took the time to read through the article and to read the section of the original report that is referenced. After reading them, the intent of the pie charts was clear. The articles helped me organize my visualisation in a manner that makes the message more clear. I decided to highlight the U.S. with the official red of the U.S. flag. For me, the bar charts are much more clear than the pie charts and they help me break the story into two parts.

January 26, 2017

Makeover Monday: Regional Tourism Spending in New Zealand (Take 3)

Inspired by the visualisation by Harry Enten that I highlighted today on my sister site DataVizDoneRight, I decide to look at the New Zealand tourism data again and see if I could build a similar view. After all, no data visualisation is ever “complete”. I really like how this turned out (and thank you to Eva Murray for feedback).

I incorporated a legend on the upper right to make the bars easier to interpret. Basically the grey bar shows the 25th to 75th percentile of all of the regions and the red dot indicates the median of all regions. I’ve removed the total region from the view.

January 25, 2017

Workout Wednesday: Cumulative Passing Yards for NFL QBs

Nice challenge from Emma this week! She’s a massive NFL fan and since the Super Bowl is upon us, she decide to challenge us to create a common baseline chart that shows the passing yards for QBs in the NFL over the course of their careers. Go to her blog for the full challenge details.

First requirement was to filter to QBs that had played at least 3 seasons and had at least 2000 total passing yards. I did this by adding a data source filter. The benefit of doing this is that my Player list will now only include those that meet the criteria and I won’t need filters elsewhere.

Next, I created a LOD calc to get the first year for each QB.

I built upon that calculation with this calculation that gives me the number of seasons played per QB. This goes onto the Columns shelf.

The cumulative passing yards is merely a running total table calc set at the Year level. This goes on the Rows shelf.

I put Player on the detail shelf to get a line per QB. I also put Year and Yds on the Detail shelf since I need those for the tooltip.

Next was a parameter to pick a QB and use that to highlight the QB chosen. I then created a simple calculation that check the Player again the parameter and put that on the color shelf.

Last was the dot on the end of each line. To do that, I created a calculation that checks if it’s the end of the line and the player selected and, if so, return the cumulative passing yards. Since this is a nested table calc, it’s important to set both table calcs to compute using Year.

Some tidying up, adding the footnotes and I was done. I decided to float all elements on the dashboard to ensure they would render exactly as I wanted them to. Another fun week of learning something new! Thanks Emma!!

January 24, 2017

Makeover Monday: The Regional Disparity of Tourism Spending in New Zealand

No comments

This week’s Makeover Monday data was too much fun to not continue to play with. I’ve kind of had a crush on barcode charts/strip plots since Makeover Monday week 44 last year when I created a barcode chart of Scotland deprivation.

Given how this week’s data needed to be viewed at the month, region, visitor type level in order to represent the index properly, I thought I’d give a barcode chart a try. Why do I think it works well in this case?

  1. By color-coding the overall index red, I can clearly see how it compares to all other regions.
  2. A barcode chart is great for showing distributions.
  3. The reference line at zero let’s me see how many regions are above and below the 2008 average.
  4. Having a row for each month allows me to see which months were more above the average than others, particularly within a year.
  5. This helps me see the big picture (spending has increased over time) while still giving me all of the details of all regions.

What do you think? Does it communicate well for you? Click on the image for the interactive version.

Tableau Tip Tuesday: Two Use Cases for the New DateParse Feature

No comments

Tableau 10.2 is bringing us a new date parse feature. I had two perfect uses cases for this to give it a test. The first is file of WhatsApp messages that requires both splitting and a dateparse. Previously this had to be done either in Alteryx or via a split, then a complicated calculation.

The second example is the data set used in Makeover Monday week 3. If I had the dateparse feature when I created the Tableau extract, I wouldn’t have messed up the data for everyone.

No workbook to accompany the video this week. This short video itself should give you a good idea for how dateparse works.


January 23, 2017

The Pitfalls of Averages of Averages

1 comment

As soon as I started exploring the data for Makeover Monday week 4, I had a suspicion that people wouldn’t pick up on a few things:

  1. There was a region named “Total (all TLAs)” that represented the total but was mixed in with all of the other regions.
  2. The data was an index, which is calculated as a weighted number for each region, month, and visitor type. 
  3. When using indexes in the dataset, using an average aggregation is appropriate as long as you only use it at the individual region, month, and visitor type level. You can’t use an average of the average to represent the total.

I saw a few people make the mistake of using an average of the average very early on (I won’t shame them publicly), so I thought it would be appropriate to explain averages of averages, why they don’t calculate “accurately", and how people should be using them.

Have Increases in Healthcare Spending Led to an Increase In Life Expectancy in America?

No comments

I was looking at an article on Visual Capitalist last night and saw another story that caught my eye. It was a connected scatterplot of life expectancy vs. health care expenditure for OECD countries. Everything on it is well documented including the data source. So off I went to get the data for Life Expectancy and Health Care Expenditure from the World Bank. I wanted to rebuild this chart in Tableau.

One interesting note is that the original visualisation had data back to 1970, yet that data is nowhere to be found on the World Bank website nor is it documented where it came from. That’s not very good. I limited my re-creation to the data I knew was accurate, 1995-2013.

This didn’t take too awfully long to create in Tableau, with most of the time being spent fiddling with the country labels. I know they overlap the lines in a lot of places, but I think it still works. When you click on a line, you’ll see drop lines that reveal the values for that point and also highlight the whole line for that country.

As always, rebuilding a chart I like has been a good learning experience. This was the first time I used drop lines and it was fun figuring out how to get only certain points labeled and colouring everything just right.

Click on the image for the interactive version.

Makeover Monday: Domestic & International Tourism Spend in New Zealand


Already four weeks into another fun year of Makeover Monday. As it’s an even numbered week, it’s Eva’s week to pick the topic. I’m really enjoying her helping out with picking topics because it allows me to participate like everyone else. I don’t see anything ahead of time and have to read the article, interpret the data and build a viz all at once. When I pick the topics myself, I tend to think weeks ahead about what I want to create.

This week, Eva chose two charts that are basically the same. There’s one for international tourism spend and one for domestic. Here’s the international spend chart:

What works well?

  • Title and subtitle make it easy to know what I’m viewing
  • Source is clearly documented
  • Colors are distinct enough that it’s clear we’re supposed to look at them individually
  • Legend is out of the way
  • Axes are clearly labeled
  • Minimal gridlines that aren’t distracting
  • Removing the vertical axis line
  • Nice crisp font (Founders Grotesk)

What could be improved?

  • Packed bars make it harder than necessary to follow a year across its months
  • 100 is the baseline (see the subtitle), so why isn’t the chart the difference from the baseline?
  • Overall too cluttered
  • Are there more years that could be used for comparison?
  • What was spending like before 2008? How do 2013-15 compare?
  • Are we trying to understand the change within each month or the change across the years?

I must admit that I got sucked in by all of the detail Eva provided in the data set. I started by creating a view of the change in spending by year compared to the 2008 average. After all, this is the baseline, so we should be comparing to that. Anything above 100 is an increase compared to 2008; anything below 100 is a decline versus 2008. I wonder how many people are going to catch that subtlety in the data. I understood it after reading the notes that accompanied the chart.

This is what I came up with first. It show the change versus 2008 for each region. Click on the image for the interactive version.

But was I really doing a makeover of the original? No, I wasn’t. The original was look at the country total, so back to the drawing board I went. I really liked the look of the lines I had created so it wasn’t like I was completely starting over. What I wanted to show was:

  1. How did the overall spending change compared to 2008 for each visitor type?
  2. What was the change by year within each season? This would help me understand any seasons where tourism spending has decreased.

Overall, a really fun data set to play with and a lesson learned to not get sucked into the data too quickly. I need to think more holistically, think about the story I want to tell, search for idea, do some sketching, and move away from Tableau when I start. Basically, practice what I preach.

January 20, 2017

London Crimes: Exploring the Trends, Locations and Crime Types

No comments

Sasha Pasulka has been a good friend of mine for many years and is moving to London from the US soon in her new role with Tableau. Like anyone else that’s moving to a new country, she finds the entire process is ridiculously overwhelming. Trying to find apartments from thousands of miles away, not knowing anything about neighbourhoods, can be an incredibly daunting task. To help her, I decide to build a viz in Tableau.

I started by using the London Crimes web data connector from Tableau Junkie, only to realise that it’s somehow not returning all of the data. It looks like it returns the data via a radius versus for an entire postcode.

No worries though. Instead I went to and downloaded all crimes from the Metropolitan Police Service. This returned a separate CSV for each month and it also wasn’t limited to just the London area. On the London Datastore I was able to find a list of all LSOAs in the London area. Great! All I needed to do was union all of the CSVs then join them to the London LSOAs. I love that I can do all of this straight inside Tableau now.

From there, it was a matter of building a simple visualisation that allows Sasha to pick boroughs and see the crimes in those areas. Note that I set the map to only display when there are 3 or fewer boroughs selected. did this because the map was simply too slow to draw the dots. Hopefully she likes it and makes it easier for her to settle in.

January 18, 2017

Workout Wednesday: The State of U.S. Jobs


Last week on my Data Viz Done Right site, I wrote about a great small multiples visualisation created by Matt Stiles that shows U.S. unemployment compared to the national average for every State. I recreated Matt’s work in Tableau, and your challenge for this week is to do the same. Personally, I find trying to rebuild visualisations a great way to learn.

Below you’ll see an image of what I created. Click on it for the interactive version. I’ve prepared the data for you here (it comes from the Bureau of Labor Statistics). Some things to keep in mind:

  1. My viz is 875x2150. Your’s doesn’t have to be this size, but I thought I’d provide it for guidance.
  2. I’m using the Source Sans Pro font, which you can download from Google Fonts. This is the font that Matt used in his version.
  3. The years should be displayed every 10 years.
  4. The axis line on for the year should be more distinct than the gridlines.
  5. The ends of each line should be colour-coded by whether it was an increase or decrease compared to the national average.
  6. You will need to calculate the national average. The national average needs to include the District of Columbia, but D.C. should not have its own chart.
  7. Values above the national average should start with a + and and below should start with a -
  8. The national average is the line you see at 0%.
  9. This is NOT a trellis chart, but if you think you can make it work as a trellis chart, go for it!
  10. The area above zero should be shaded. The hex code to use is #F7E6E2 and it should be shaded from 0- to +10%.
  11. Pay attention to the gaps between each State. I like how this gives it some breathing room.

This will be a very tedious exercise. To provide some context, this took me 2-3 hours to create. Don’t get discouraged and don’t feel like you have to do it all in one sitting. Basically, try to make yours look identical to mine.

Post an image on Twitter when you’re done and hashtag it with #WorkoutWednesday and please tag me so I see it. You can also comment below with a link to your workbook or with any questions you have. Good luck!