Data Viz Done Right

June 9, 2019

#WorkoutWednesday Week 20: Top and Bottom States for Total Orders

No comments
As I continue catching up on WW, I went to week 20, which was a live event at the SFTUG, so I knew it had to be solvable quickly and wouldn't require any crazy calcs. The goal was to create a view that:

  1. Groups together the top 10 States, bottom 5 States and all other States based on their total orders.
  2. Shows all other States as an average number of orders across the States
  3. Display the 2018 value plus the change vs. 2017 on the end of the bar with arrow indicators for the change
  4. Show different colored bars for 2018 depending on the groups created in step 1 above

I thought for sure this was solvable with table calcs and it mostly is, until you get to all other States. You would need State in detail and this draws multiple bars and the calcs for 2018 and 2017 orders would need quite a bit of thinking to get them just right. I decided on LODs instead.

If you haven't solved this one yet, I highlight suggest you consider sets for the Top, Bottom and Blank groupings.

Here's my solution:

#MakeoverMonday: Is it wrong for same-sex adults to have sexual relations?

No comments
It's #PrideMonth, which makes for perfect timing for this week's makeover. Eva has chosen a simple line chart that allows the user to answer a simple question: "Is it wrong for same-sex adults to have sexual relations?"

What works well?
  • The title, question response, and breakdown are all organized together. They also look quite crisp; that's because they render differently because they aren't part of the Tableau rendering.
  • The scale goes to 100%.
  • The horizontal gridlines help guide the eye.
  • Showing the year header every five years
  • Using a line chart to show trends over time
  • Allowing the user to highlight a line by clicking on the legend
  • Overall, it's an easy chart to understand.

What could be improved?
  • The share, print, export, and table buttons are in the way. Move them to the footer.
  • The colors are too similar.
  • Remove the dots on the lines for each year.
  • Why is the title green?
  • Legend text is cut off for some selections

What I did
I like the overall design of the original, so I decided to clean it up and change the main metric to answer the question differently.

  1. I removed all of the dots from the lines.
  2. I allow the user to highlight an age via a parameter action.
  3. I changed the metric on the change from % of population to the change vs. 1973. For me, this is a more meaningful way to show the change in opinions.
  4. I moved the title closer to the chart.
  5. I used Benton Sans as the font (Tableau fonts get boring sometimes).

This ended up being pretty much what I built last week, and I'm ok with that. If a chart works, stick with it.

June 8, 2019

#WorkoutWednesday Week 22: X% of Sales make up Y% of Orders

No comments
This was a tough one. For Workout Wednesday week 22, Lorna set out the challenge of reproducing a trellis chart with each pane being a separate Pareto chart or sales vs. number of orders.

Great! That part was pretty straightforward. I've done a Pareto tons of times. I've done a trellis tons of times.

Then there was the coloring of the line. That is, the line needed to be colored in each pane up to the point where two reference lines meet. I got that right I thought.

Lastly, there was the little detail (i.e., not so little) of creating dynamic reference lines that update based on both the parameter selected and the point which you hover over. The fact that these are two conditions should have been my first clue.

I spent quite a while on the reference lines trying to make one reference line act upon two calculations. Well...that's not how reference lines work. Then it hit me that I actually needed four reference lines. Two of them will always hide based on the option selected in the parameter.


The calcs were fiddly. I got lost in the logic a couple times because what you have to do is a bit counterintuitive, kind of like double negatives. I wrote down the four scenarios so that I could approach them one by one.

My calculation names are a mess, which drives me insane and actually made getting to the solution much harder. I'm normally very good about making sure I have clear names, but in this case I had several with almost the same name, so I kept mixing them up, hence why writing things down helped.

Once I got that figured out, I was done. A bit of tidying, then I downloaded Lorna's solution to compare mine too. We had taken a nearly identical approach (it must be that teacher she had at The Data School that set her on her way).

Thanks for the challenge Lorna! I enjoyed the struggle and always love the eureka moments. This was another viz that I could easily see used in a business context.

June 7, 2019

#WorkoutWednesday Week 23: Which sub-categories are ordered most?

No comments
I haven't done a Workout Wednesday in quite some time, in fact, this is the first one I have done this year. However, when I saw Curtis Harris' challenge this week (check it out here), I knew I wanted to give it a try. Why? Because it's a view that is very usable in a business context. I could certainly see this being used on a mobile phone in a real business.

  • Create a bar chart which displays the sub-category label on top of its bar
  • Label all bars to the right of the maximum value in view
  • Create a parameter that changes the display from a percent of total view to a raw order count view
  • Create a progress shadow for every bar
  • Show progress to 100% or progress to the maximum value depending on the parameter selection
  • Only use one sheet
  • Match formatting and colors
  • Match tooltips

The part that tripped me up the most was labeling the sub-category above the bar. I had done this before, but couldn't remember how, and I didn't to look back; I wanted to figure it out again. One other thing I wanted to do was make sure that the text didn't overlap the bar as it shows in the viz to rebuild (sorry Curtis, I know it's not 100% correct now).

To do this:
  1. Add Sub-Category on the rows twice.
  2. Hide the headers
  3. Turn on subtotals
  4. Move the column subtotals to the top
  5. Add a dummy measure as a secondary axis with the value 0
  6. Make sure the mark type is a circle on this shelf
  7. Add Sub-Category to the text shelf on the shelf with the circle and add an extra line below it in the text box with a space. This forces an extra line.

Tada! Fun one Curtis!

June 3, 2019

#MakeoverMonday: Are Americans sleeping more or less than they did in 2003?

No comments
A couple weeks ago, I finished Matthew Walker's amazing book Why We Sleep. I cannot recommend this book enough. It's both fascinating and terrifying. And if it doesn't change how you look at sleep, I'd be extremely surprised.

Given how this book impacted me, I wanted to find a topic about sleep. I found the American Time Use Survey, the best data I could find on the topic. As it turns out Matthew Walker is hosting an event on London Thursday night that I've bought tickets for. What amazing timing!

Here's the original viz from the ATUS:

What works well?
  • It's a simple side-by-side bar chart that makes comparing men to women easy within each age group.
  • Clear axis title
  • Labeling the bars can help with interpretation
  • Ordering the ages chronologically
  • The title is simple.
  • I like the line that divides the title from the chart.

What could be improved?
  • The axis is cut off. You should never ever truncate an axis for a bar chart as the length of the bar is what you are measuring, not a portion of the length.
  • The 3-D shading of the bars is unnecessary.
  • It would be good to see how sleep has changed over time. Are people sleeping more or less?
  • The labels are misleading. It's obvious some bars are longer than others, yet they are both showing the same values, e.g., 15-19 years.
  • What's the takeaway from the chart?
  • The data includes naps and spells of sleeplessness. What proportion does this make up?
  • People typically overestimate how much they sleep, so how accurate is the data in the end?

What I did
  • Incorporated all of the years
  • Compared genders vs. ages vs. overall
  • Used parameter actions to allow the user to highlight an age group
  • Look at the change since 2013 to see if groups are sleeping more or less

I did find that older Americans (65+) are sleeping less than in 2013 and men tend to sleep less than women. Overall, the data turned out to not be as interesting as I had hoped, but that's ok. It's still always fun to explore and understand data and to practice new features.

May 28, 2019

#TableauTipTuesday: Create an Interactive Quadrant Chart with Parameter Actions

No comments
Parameter actions are now out in the wild with Tableau 2019.2. These have me going through A LOT of charts I've made with parameters before, evaluating which ones would benefit from Parameter Actions.

The first chart I wanted to try was a quadrant chart. A quadrant chart colors each quadrant based on thresholds set for each axis in a scatter plot. Previously, I created two parameters and the user had to type in numbers to adjust the view. However, with parameter actions, I can now enable to use to update the quadrants by simply hovering over a dot.

And here's the video...enjoy!

May 27, 2019

#MakeoverMonday: What has happened since people started paying attention to climate change?

No comments
For week 22, Eva chose a topic that she and I are both very passionate about...climate change.

What works well?

  • Using a line chart over time helps show the trends
  • Including a slider filter for the user to zoom in on a specific period

What could be improved?
  • Using dotted lines indicates there are breaks in the timeline, but there aren't. Therefore, a solid line should be used.
  • The labels on the ends of the lines hide the data.
  • It could use an impactful title and subtitle. Though I suppose this is just a report, not analysis.

What I did

I created a map to ensure that the country names were correct. When I did this, I then saw that there were lots of aggregations of countries.  For some reason, the income level categories captured my attention so I filtered down to just those items.

The years 2015-2018 were include and didn't have any values. I filtered those out. There were years when no data was captured for some countries. I filtered those out.

I plotted the data as a line chart and created a calculation to show the change vs. the first year for each country. I noticed that there was a spike in CO₂ per capita in 1973 for high income countries. This reminded me of the oil crisis of 1973, but that wouldn't have anything to do with carbon emissions I wouldn't think.

That got me thinking about climate change in general. I entered "when did people start paying attention to climate change" into Google and the first search result was an article from National Geographic titled "Climate Change First Became News 30 Years Ago. Why Haven’t We Fixed It?"

This particular line was what I was looking for: "The Intergovernmental Panel on Climate Change was established in late 1988..."

So, back to the data I went and I filtered the data to 1988-2014 and compared every subsequent year to 1988 in order to see how much things have changed since climate change started garnering some attention. I expected high income countries to have ever increasing CO₂ per capita. I was wrong.

It turns out that the middle income countries have had the largest change in CO₂ per capita. So that became the focus of this analysis.

May 21, 2019

#TableauTipTuesday: Drill Down With Set Actions

No comments
In this week's tip, I take you through how to set up basic drill down view using set actions. In the workbook below, I've included two additional views: sparklines and a region to state map.


May 20, 2019

#MakeoverMonday: Bear Attacks in North America

No comments
Continuing the animal theme from Eva last week, this week I provided the Makeover Monday Community a data set about bear attacks in North America since 1900.

Vox has an interesting article and has the visualization that we'll makeover this week.

What works well?

  • The title summarizes the findings.
  • The sub-title provides context.
  • Including the source and author's name
  • Using a bar chart
  • Including the numbers in the bars
  • Including the gridlines to make the bars easier to compare
  • Good use of color

What could be improved?
  • The bear icons should be removed.

What I did
I really liked Hesham Eissa's viz this week, so I used that as inspiration for mine. 
  • His unit chart points downward, but I wanted mine to point upward.
  • I like his BANs, so I included some of my own, but different numbers.
  • I included the total for each month as he did.
  • Hesham's dot are colored by the location (US vs. other), while I colored mine by the type of bear.
  • I included a line chart to show the cumulative attacks by type of bear since 1900.

Thanks for the inspiration Hesham!!

May 16, 2019

The History of English Football Champions: 1888-2018

No comments
Last week I saw this really cool viz from Squawka Football on Twitter and wanted to see if I could rebuild it.

Given that this requires animation, I knew I needed a tool that supported this and I turned to Flourish. The data has to be structured in a very specific way, so I downloaded the data from Wikipedia, imported it into Alteryx for a bit of a massage, and spit it back out in the format Flourish required.

And voila! An animated viz of the history of English football champions from 1888-2018. Very little effort required + great animation = win!

May 13, 2019

#MakeoverMonday: Rhino poaching in South Africa - Is the decrease a real reversal?

No comments
Back from a successful 100 mile cycling event in Birmingham, I'm taking on Makeover Monday week 20. This week, Eva picked a data set about rhino poaching in South Africa. I had no idea this was a "thing" and I'm quite horrified that it happens. It reminds of the show Whale Wars on Animal Planet where a team of activists work to stop Japanese whaling.

This week's viz is a count of rhinos poached from 2006-2016.

  • The title and subtitle tell me what the viz is about.
  • Using a single color
  • The design clearly shows that there was a steady increase in rhinos poached over a 10 year period until the decline in 2016.
  • Including the labels at the top of the unit chart for context.

  • What value does each rhino image represent? Whatever it is, it's not accurate as the same number of units represent different values.
  • Is the rhino image necessary? I would try it without it.

  • I read the source article for context and to give me ideas for my analysis.
  • I changed the chart to a bar chart, which makes it easier to understand for me, and it makes the viz more accurate.
  • I named the title and added text based on what I read in the article.

With that, here's my Makeover Monday for week 20. Click on the image for the interactive version (though no interaction is necessary).

May 7, 2019

#MakeoverMonday: Top 10 Major League Baseball Home Run Hitters

No comments
Earlier today I saw this really cool viz created by Will Sutton that's an animation of the top home run hitters from 1985-2016.

As I posted last week, Sophie Sparkes introduced The Data School and me to Flourish. Flourish makes it super simple to create animated visualizations with tons of customization options. Given that Tableau doesn't support animations in the browser, this is a great alternative. Flourish provides an example, you import your data, do a bit of customization and voila! You have an animated viz.

The data needed to be structured with a column for each season, so I prepped the data in Alteryx and I included all seasons from 1912-2018. I then filtered down to players with 250+ career home runs (to make the list manageable).

And here's my animated viz of the top 10 home run hitters of all-time.

May 6, 2019

#MakeoverMonday: Major League Baseball's Most Cost Effective Players

No comments
Since the Makeover Monday Community seemed to enjoy sports data two weeks ago, I thought I would provide some data about Major League Baseball this week. First, here's the original visualization to makeover:

What works well?

  • The title and subtitle explain what the viz is about.
  • Dividing the viz into two sections by using different background colors on the scatter plots
  • Consistent scales for the salaries across the charts for each section
  • Using gridlines to help the audience understand the approximate values of each point
  • Only labeling the type of stat once by putting the label between the players and teams charts

What could be improved?

  • There's no data source listed.
  • I have no idea why these players or team are highlighted; an explanation is needed. At first, I thought it was highlighting the most effective player/team, but it's not (at least that's what I see).
  • The logos are meaningless for people that aren't familiar with the teams.
  • What does the big logo on the upper right represent? Is that the author?
  • The data should be filtered to players that meet certain criteria, like at bats in a season. This would then filter out many players near zero.


I liked the idea of using a scatter plot like the original, but I wanted to focus on a metric the better measures "effective". There are so many sophisticated metrics now in baseball. I didn't want to use any of those because they're hard for people to understand. I decided to use on base percentage which is the number of times a player reached base (H + BB + HBP) divided by at bats plus walks plus hit by pitch plus sacrifice flies (AB + BB + HBP + SF).

Why did I choose OBP? Ryan Kelley sums it up best in a post on Quora:

Outs are an extremely scarce resource in the economy of a baseball game, each team has 27 to use (in a 9-inning game) while trying to score as many runs as possible. Every time a batter makes an out therefore, the expected number of runs his team will produce will decrease (assume runs are also a limited resource for now).

A batter's job is to get on base--not make an out in other words. A batter fails to do his job when he makes an out, this failure percentage  is 1 - OBP. The success percentage is OBP. If every batter had a perfect 100% OBP, their team would score an infinite amount of runs before every making an out. 

Now, because you're talking about value specifically. OBP alone isn't effective in measuring value. You can make it a better indicator of value by giving it context. That context depends on what kind of value you're talking about. 

You could tie OBP  to a player's salary. This would give you an indicator of how value that batter was to his team in the context of a labor market. After all, baseball players are just employees of franchises in the end. Their jobs are to produce wins. A hitter's job is to produce wins via producing runs. Franchises make money by selling those wins to fans as entertainment.

Each team has a fixed amount of payroll to spend on wins, so the more payroll a batter's salary takes up, the less valuable he is to his team. A good way of illustrating a player's value would be OBP/$ of Salary.

Based on Ryan's explanation, I decided to use OBP as my proxy for batter effectiveness (y-axis). For the x-axis, I wanted to use salary for comparison. However, the data does not adjust salaries for inflation, so a salary in 1985 is not listed in 2016 value. Instead, I came up with a way to normalize the data across all of the seasons.

I created a calculation that compares a player's salary to that of the average salary of the entire league for each season. I made this a percent difference so that the data would then be normalized. Therefore, a player that was 10% above a 1985 salary would be comparable to players that was 10% above a 2016 salary.

Here are my calculations:

  1. Season average salary: { FIXED [Season] : AVG([Salary]) }
  2. △ to Season Average Salary: (AVG([Salary]) - SUM([Season avg Salary])) / SUM([Season avg Salary])


  1. First, I applied some filters to only include what I deemed "eligible" players. These are noted at the bottom of the viz.
  2. Now that I have the x-axis (salary variance from season average) and the y-axis (OBP), I created a scatter plot and added a point for each player for each season.
  3. I added reference lines for the average of each axis.
  4. The players on the upper left are the most cost efficient players. That led me to a quadrant chart, but I only wanted to highlight the most cost effective. I created a calculation to determine the points in that quadrant and place it on the color shelf.
  5. The problem now was that it was basically impossible to find a player in the viz. I thought about using a set action to drill in to a player, but that loses all of the context of the other players. Therefore, I create a parameter to allow the user to highlight a player and I show that players as a connected scatter plot.


  1. Players tend to be more cost effective earlier in their careers. That makes sense since they are on rookie contracts for the first few years of the career. 
  2. Once players sign their first big contract, they tend to either move to the upper right (high OBP, high relative salary) or the bottom right (low OBP, high relative salary). 
  3. Some players can sustain that for the rest of of their careers, but that's rare. Typically it's the superstars that follow this pattern (like Barry Bonds or Chipper Jones).
  4. For many of the other players, as they approach the end of their career, they tend to move either to the lower right (high relative salary, low OBP) or the lower left (low relative salary, low OBP). Neither of these are particularly good for the team.

And here's my final product. I had never thought of combining a scatter plot and a connected scatter plot before. I'm quite pleased with how this turned out.

May 1, 2019

The UK's Most Popular Baby Names

No comments
Today DS13 was supposed to have most of the day to work on their client project. However, after a training session where I showed them how I approach a new data set and then design a dashboard, we brought Sophie Sparkes in to throw a surprise dashboard week challenge at the team. After all, they only had three days of dashboard week anyway.

Sophie's Challenge

While Tableau is an amazing tool, when you use it all the time you can fall into data-viz-auto-pilot mode. You build the same kinds of charts; you construct similar kinds of dashboards; you fall back on the same formatting styles. While familiarity with tool, and a workflow, is a good thing, it also narrows your view of what’s possible.

For today’s Dashboard Week challenge, I want you to step outside your data viz comfort zones and try building a viz using Flourish. Flourish is a free tool that lets your build interactive, responsive, and embeddable vizzes and data stories, all within the browser using your own data. Flourish is focused at the communication side of data viz (more than the data exploration side), and I’d like DS13 to really think about communication in today’s challenge.

Why Flourish? I really like their wide (and ever expanding) range of templates and interactivity (transitions, stories and ‘Talkies’ to name a few); also they are based in London – so why not viz-local?

Using any part (years, geographic locations, genders) of the England and Wales baby names data sets, I want DS13 to find and communicate one specific story from this data set.

Here are the rules for today, and what I’d like to see as output:

  1. They must work independently.
  2. Everything must be finished by 5pm.
  3. They must use Tableau and Alteryx for the data prep and exploration.
  4. The final viz must be made in Flourish.

My Approach

First, I had to get some data. I decided to download the data from the ONS for 1996-2016 because it was in a relatively decent format.

Next, I opened the "Plotting Competitors" example because I loved the animation. The great thing about Flourish is you can immediate use the template. All you need to do is upload your own data, assign the columns, and you're done!

This meant I had to do some data prep in Alteryx to get it in the correct shape. I needed the years across the view and the number of births for each name and gender. Then I filtered both the boys and girls to the 25 most common names, giving me 50 names in total.

I absolutely LOVED playing with Flourish and will definitely use it in an upcoming Makeover Monday.

Check it out! The animations are so so good! There were straight line and curved line options. I went for the curves. Enjoy!

April 30, 2019

#TableauTipTuesday: How to Make the Font Bigger than the Max Tableau Allows

No comments
In this video, I show you how to methods for making the size of the font in a text field larger than the maximum 72pt font Tableau allows.

April 28, 2019

#MakeoverMonday: Space Station Spacewalks

No comments
For week 18, Eva chose this viz from NASA about spacewalks by Americans and Russians since December 1998.

Credit: NASA

What works well?

  1. The title provides a nice summary of the data.
  2. The stacked bar chart makes is easy to compare the US and Russia within a single year.
  3. The colors are easy to distinguish from each other.
  4. Since there is no axis, labeling the bars makes sense.

What could be improved?

  • The background image doesn't add any value and takes attention away from the chart.
  • It took me a minute to figure out which color went with which country. That should be more obvious.
  • Straighten the diagonal text for the years.

My Ideas

  • Consider other chart types: area chart, stacked area chart, barbell chart, line chart.
  • Consider other metrics like cumulative spacewalks or variance to some year.
  • Check the two country flags for their official colors.

In the end I went with a cumulative line chart with shading between the lines. I did this with a polygon. See this blog post from Rody Zakovich to learn how to create it.