VizWiz

Data Viz Done Right

May 20, 2019

#MakeoverMonday: Bear Attacks in North America

No comments
Continuing the animal theme from Eva last week, this week I provided the Makeover Monday Community a data set about bear attacks in North America since 1900.

Vox has an interesting article and has the visualization that we'll makeover this week.


What works well?

  • The title summarizes the findings.
  • The sub-title provides context.
  • Including the source and author's name
  • Using a bar chart
  • Including the numbers in the bars
  • Including the gridlines to make the bars easier to compare
  • Good use of color

What could be improved?
  • The bear icons should be removed.

What I did
I really liked Hesham Eissa's viz this week, so I used that as inspiration for mine. 
  • His unit chart points downward, but I wanted mine to point upward.
  • I like his BANs, so I included some of my own, but different numbers.
  • I included the total for each month as he did.
  • Hesham's dot are colored by the location (US vs. other), while I colored mine by the type of bear.
  • I included a line chart to show the cumulative attacks by type of bear since 1900.

Thanks for the inspiration Hesham!!

May 16, 2019

The History of English Football Champions: 1888-2018

No comments
Last week I saw this really cool viz from Squawka Football on Twitter and wanted to see if I could rebuild it.

Given that this requires animation, I knew I needed a tool that supported this and I turned to Flourish. The data has to be structured in a very specific way, so I downloaded the data from Wikipedia, imported it into Alteryx for a bit of a massage, and spit it back out in the format Flourish required.

And voila! An animated viz of the history of English football champions from 1888-2018. Very little effort required + great animation = win!

May 13, 2019

#MakeoverMonday: Rhino poaching in South Africa - Is the decrease a real reversal?

No comments
Back from a successful 100 mile cycling event in Birmingham, I'm taking on Makeover Monday week 20. This week, Eva picked a data set about rhino poaching in South Africa. I had no idea this was a "thing" and I'm quite horrified that it happens. It reminds of the show Whale Wars on Animal Planet where a team of activists work to stop Japanese whaling.

This week's viz is a count of rhinos poached from 2006-2016.


WHAT WORKS WELL?
  • The title and subtitle tell me what the viz is about.
  • Using a single color
  • The design clearly shows that there was a steady increase in rhinos poached over a 10 year period until the decline in 2016.
  • Including the labels at the top of the unit chart for context.


WHAT COULD BE IMPROVED?
  • What value does each rhino image represent? Whatever it is, it's not accurate as the same number of units represent different values.
  • Is the rhino image necessary? I would try it without it.


WHAT I DID
  • I read the source article for context and to give me ideas for my analysis.
  • I changed the chart to a bar chart, which makes it easier to understand for me, and it makes the viz more accurate.
  • I named the title and added text based on what I read in the article.

With that, here's my Makeover Monday for week 20. Click on the image for the interactive version (though no interaction is necessary).

May 7, 2019

#MakeoverMonday: Top 10 Major League Baseball Home Run Hitters

No comments
Earlier today I saw this really cool viz created by Will Sutton that's an animation of the top home run hitters from 1985-2016.


As I posted last week, Sophie Sparkes introduced The Data School and me to Flourish. Flourish makes it super simple to create animated visualizations with tons of customization options. Given that Tableau doesn't support animations in the browser, this is a great alternative. Flourish provides an example, you import your data, do a bit of customization and voila! You have an animated viz.

The data needed to be structured with a column for each season, so I prepped the data in Alteryx and I included all seasons from 1912-2018. I then filtered down to players with 250+ career home runs (to make the list manageable).

And here's my animated viz of the top 10 home run hitters of all-time.

May 6, 2019

#MakeoverMonday: Major League Baseball's Most Cost Effective Players

No comments
Since the Makeover Monday Community seemed to enjoy sports data two weeks ago, I thought I would provide some data about Major League Baseball this week. First, here's the original visualization to makeover:


What works well?

  • The title and subtitle explain what the viz is about.
  • Dividing the viz into two sections by using different background colors on the scatter plots
  • Consistent scales for the salaries across the charts for each section
  • Using gridlines to help the audience understand the approximate values of each point
  • Only labeling the type of stat once by putting the label between the players and teams charts

What could be improved?

  • There's no data source listed.
  • I have no idea why these players or team are highlighted; an explanation is needed. At first, I thought it was highlighting the most effective player/team, but it's not (at least that's what I see).
  • The logos are meaningless for people that aren't familiar with the teams.
  • What does the big logo on the upper right represent? Is that the author?
  • The data should be filtered to players that meet certain criteria, like at bats in a season. This would then filter out many players near zero.

MY APPROACH

I liked the idea of using a scatter plot like the original, but I wanted to focus on a metric the better measures "effective". There are so many sophisticated metrics now in baseball. I didn't want to use any of those because they're hard for people to understand. I decided to use on base percentage which is the number of times a player reached base (H + BB + HBP) divided by at bats plus walks plus hit by pitch plus sacrifice flies (AB + BB + HBP + SF).

Why did I choose OBP? Ryan Kelley sums it up best in a post on Quora:

Outs are an extremely scarce resource in the economy of a baseball game, each team has 27 to use (in a 9-inning game) while trying to score as many runs as possible. Every time a batter makes an out therefore, the expected number of runs his team will produce will decrease (assume runs are also a limited resource for now).

A batter's job is to get on base--not make an out in other words. A batter fails to do his job when he makes an out, this failure percentage  is 1 - OBP. The success percentage is OBP. If every batter had a perfect 100% OBP, their team would score an infinite amount of runs before every making an out. 

Now, because you're talking about value specifically. OBP alone isn't effective in measuring value. You can make it a better indicator of value by giving it context. That context depends on what kind of value you're talking about. 

You could tie OBP  to a player's salary. This would give you an indicator of how value that batter was to his team in the context of a labor market. After all, baseball players are just employees of franchises in the end. Their jobs are to produce wins. A hitter's job is to produce wins via producing runs. Franchises make money by selling those wins to fans as entertainment.

Each team has a fixed amount of payroll to spend on wins, so the more payroll a batter's salary takes up, the less valuable he is to his team. A good way of illustrating a player's value would be OBP/$ of Salary.

Based on Ryan's explanation, I decided to use OBP as my proxy for batter effectiveness (y-axis). For the x-axis, I wanted to use salary for comparison. However, the data does not adjust salaries for inflation, so a salary in 1985 is not listed in 2016 value. Instead, I came up with a way to normalize the data across all of the seasons.

I created a calculation that compares a player's salary to that of the average salary of the entire league for each season. I made this a percent difference so that the data would then be normalized. Therefore, a player that was 10% above a 1985 salary would be comparable to players that was 10% above a 2016 salary.

Here are my calculations:

  1. Season average salary: { FIXED [Season] : AVG([Salary]) }
  2. △ to Season Average Salary: (AVG([Salary]) - SUM([Season avg Salary])) / SUM([Season avg Salary])

BUILDING THE VIZ

  1. First, I applied some filters to only include what I deemed "eligible" players. These are noted at the bottom of the viz.
  2. Now that I have the x-axis (salary variance from season average) and the y-axis (OBP), I created a scatter plot and added a point for each player for each season.
  3. I added reference lines for the average of each axis.
  4. The players on the upper left are the most cost efficient players. That led me to a quadrant chart, but I only wanted to highlight the most cost effective. I created a calculation to determine the points in that quadrant and place it on the color shelf.
  5. The problem now was that it was basically impossible to find a player in the viz. I thought about using a set action to drill in to a player, but that loses all of the context of the other players. Therefore, I create a parameter to allow the user to highlight a player and I show that players as a connected scatter plot.

SOME THINGS I FOUND

  1. Players tend to be more cost effective earlier in their careers. That makes sense since they are on rookie contracts for the first few years of the career. 
  2. Once players sign their first big contract, they tend to either move to the upper right (high OBP, high relative salary) or the bottom right (low OBP, high relative salary). 
  3. Some players can sustain that for the rest of of their careers, but that's rare. Typically it's the superstars that follow this pattern (like Barry Bonds or Chipper Jones).
  4. For many of the other players, as they approach the end of their career, they tend to move either to the lower right (high relative salary, low OBP) or the lower left (low relative salary, low OBP). Neither of these are particularly good for the team.

And here's my final product. I had never thought of combining a scatter plot and a connected scatter plot before. I'm quite pleased with how this turned out.

May 1, 2019

The UK's Most Popular Baby Names

No comments
Today DS13 was supposed to have most of the day to work on their client project. However, after a training session where I showed them how I approach a new data set and then design a dashboard, we brought Sophie Sparkes in to throw a surprise dashboard week challenge at the team. After all, they only had three days of dashboard week anyway.

Sophie's Challenge

While Tableau is an amazing tool, when you use it all the time you can fall into data-viz-auto-pilot mode. You build the same kinds of charts; you construct similar kinds of dashboards; you fall back on the same formatting styles. While familiarity with tool, and a workflow, is a good thing, it also narrows your view of what’s possible.

For today’s Dashboard Week challenge, I want you to step outside your data viz comfort zones and try building a viz using Flourish. Flourish is a free tool that lets your build interactive, responsive, and embeddable vizzes and data stories, all within the browser using your own data. Flourish is focused at the communication side of data viz (more than the data exploration side), and I’d like DS13 to really think about communication in today’s challenge.

Why Flourish? I really like their wide (and ever expanding) range of templates and interactivity (transitions, stories and ‘Talkies’ to name a few); also they are based in London – so why not viz-local?

Using any part (years, geographic locations, genders) of the England and Wales baby names data sets, I want DS13 to find and communicate one specific story from this data set.

Here are the rules for today, and what I’d like to see as output:

  1. They must work independently.
  2. Everything must be finished by 5pm.
  3. They must use Tableau and Alteryx for the data prep and exploration.
  4. The final viz must be made in Flourish.


My Approach

First, I had to get some data. I decided to download the data from the ONS for 1996-2016 because it was in a relatively decent format.

Next, I opened the "Plotting Competitors" example because I loved the animation. The great thing about Flourish is you can immediate use the template. All you need to do is upload your own data, assign the columns, and you're done!

This meant I had to do some data prep in Alteryx to get it in the correct shape. I needed the years across the view and the number of births for each name and gender. Then I filtered both the boys and girls to the 25 most common names, giving me 50 names in total.

I absolutely LOVED playing with Flourish and will definitely use it in an upcoming Makeover Monday.

Check it out! The animations are so so good! There were straight line and curved line options. I went for the curves. Enjoy!

April 30, 2019

#TableauTipTuesday: How to Make the Font Bigger than the Max Tableau Allows

No comments
In this video, I show you how to methods for making the size of the font in a text field larger than the maximum 72pt font Tableau allows.

April 28, 2019

#MakeoverMonday: Space Station Spacewalks

No comments
For week 18, Eva chose this viz from NASA about spacewalks by Americans and Russians since December 1998.

Credit: NASA

What works well?

  1. The title provides a nice summary of the data.
  2. The stacked bar chart makes is easy to compare the US and Russia within a single year.
  3. The colors are easy to distinguish from each other.
  4. Since there is no axis, labeling the bars makes sense.

What could be improved?

  • The background image doesn't add any value and takes attention away from the chart.
  • It took me a minute to figure out which color went with which country. That should be more obvious.
  • Straighten the diagonal text for the years.

My Ideas

  • Consider other chart types: area chart, stacked area chart, barbell chart, line chart.
  • Consider other metrics like cumulative spacewalks or variance to some year.
  • Check the two country flags for their official colors.

In the end I went with a cumulative line chart with shading between the lines. I did this with a polygon. See this blog post from Rody Zakovich to learn how to create it. 

April 23, 2019

#TableauTipTuesday - How to create multiple lines in a single field with a line break

No comments
I first learned this week's tip from Jeffrey Shaffer during one of our tips battles at a Tableau Conference (I don't recall which one). I used this tip in my Makeover Monday week 17 viz and wanted to share this trick with you.

In this tip, I show you how to:

  1. Take two fields and combine them into a calculated field.
  2. Take the combined field and force the fields to split into two separate lines within the same field.
  3. Build a quick heatmap.

The workbook can be downloaded here.

April 22, 2019

#MakeoverMonday: Which NBA arena makes Stephen Curry's favorite popcorn?

No comments
Wow! What an incredible data set this week? A professional athlete tracking quantified-self data! And in this case, that athlete is NBA star Stephen Curry and the data is popcorn. Yes, that's right, Stephen Curry tracked data about popcorn from every NBA arena. According to this article from the New York Times, having some popcorn is part of his pre-game routine. Whatever works!

In the NYT article, they included a simple heatmap:


What works well?

  • A heatmap is a good chart choice. Because the highest rating is darkest, those pop out more.
  • Sorting the teams/arenas by the total score.
  • Including borders around each cell helps separate them.
  • Have both the team and arena together in a single cell but in multiple rows.
  • Including the rating for each field.
  • I love the data source!!

What could be improved?

  • Needs a better title
  • The diagonal rotation of the text makes the categories harder to read than necessary.

What I did

  • I really like the original, so I also created a heatmap.
  • I change the colors to use the blue from the Golden State Warriors brand colors.
  • I made the category headers horizontal.
  • I created a calculation to include the team and arena in the same field, but on multiple rows.
  • I used viz in tooltip to shows the rating across all categories for each team.
  • I included an option to allow the reader to sort by their most important factor.

April 15, 2019

#MakeoverMonday: Info We Trust - Word by Word Analysis

No comments

UPDATE (19-Apr)

Based on feedback from Eva and Jeffrey Shaffer during Viz Review, I've made the following changes:
  1. Removed the sorting from the table on the right. When you click on a word, it now stays in its position rather than moving to the top.
  2. Made the titles of the bar charts on the right more succinct.

Thank you Jeff & Eva for your feedback!!



Week 16 is here and in collaboration with RJ Andrews, author of Info We Trust, we are making over a word cloud he created based on the frequency of the 270 most popular words in the book.


What works well?

  • The most frequent words stand out because of their size.
  • The word cloud looks interesting, meaning it captures your attention.

What could be improved?

  • There are too many colors.
  • The words are rotated in different directions.
  • Sizing the words make it difficult to compare them and rank them.
  • It's a book about data and the biggest word is data...go figure!

What I did

  • We had been practicing set actions last week at The Data School, so I thought I'd replicate this dashboard by Lindsey Poulter.
  • I wanted to rank the words and also rank the words within each section of the book.
  • Create a mobile version based on the template Tableau builds for you automatically.

With that, here's my Makeover Monday week 16. 

April 9, 2019

#TableauTipTuesday: How to Create a Hub & Spoke Diagram with a Union

No comments
In this week's tip, I show you how to use the Union feature in the connection pane to union a data set to itself in order to create paths between origins and destinations. This example uses airline routes and it could also have many other use cases, e.g., where are bike picked up and dropped off in London.

April 8, 2019

#MakeoverMonday: Cash Solvency of US States

No comments
For Makeover Monday week 15, we are looking at data about the fiscal conditions of US States. According to Mercatus:
States face many fiscal problems, but these problems are not insurmountable. Studying how each state is performing with regard to a variety of fiscal indicators can help state policymakers address persistent issues and anticipate potential problems. 

Mercatus produces this simple map to visualize the results:


What works well?

  • By using a map, people instantly know this is about geographical information.
  • Using a clear legend with distinct colors to indicate good vs. bad
  • Including the top 5 and bottom 5 as a summary/key finding
  • Overall, a very nice layout with the map on the left and the additional context on the right.
  • Including the numbers on the States for context.

What could be improved?

  • Do the colors work for the color-blind? I'd recommend running it through a color blind checker.
  • The States need equal size weighting to ensure they can all be visible equally. This would also help with some of the labels needing to be lines that point to the respective States.
  • There's no definition for fiscal ranking.

What I did

  • I wanted to look at the data over time, but also look at all of the States at the same time. For this I used a tile map. I based it on a similar viz I created.
  • I wanted to give the user an option to compare years to a year they select so that they can see the change compared to a point in time.
  • Use color to indicate the positive or negative change vs. the year selected.
  • I created the calculations for each of the rankings and found cash solvency to be the most interesting, so I focused on that.

And here's my viz for Makeover Monday week 15.

April 1, 2019

#MakeoverMonday: How much plastic waste has been found on UK beaches?

No comments
I've been particularly aware of the amount of plastic used and wasted. Keep in mind that plastic cannot biodegrade, therefore any plastic EVER created is still on Earth. Think about that for a minute. The plastic is washing up on the most remote islands.

Don't believe me? Watch Drowning in Plastic on the BBC. If this documentary doesn't change you mind about the amount of plastic you waste and the impact its having, then you need to have a deeper look into your soul.

This week, Eva chose a data set about the waste found on UK beaches.

SOURCE: BBC

WHAT WORKS WELL?

  • Including the raw numbers, and how big they are, provides great impact.
  • They sort going down the page.
  • The title is clear, concise,  and tells you what you are about to see.

WHAT COULD BE IMPROVED?

  • The infographic makes it appear as though this is ALL of the waste found on the beaches. However, it's only the top 10. You can see that if you read the original article Eva linked to.
  • The icons are cute, but are the necessary?
  • A simpler visualization, like a bar chart, would make the impact of the plastic more apparent.

WHAT I DID

As I did last week, I wanted to try out another tool. This week, I played around with infogram
  1. Infogram is great for building simple infographics very quickly. 
  2. The customization options help you create a good looking visual.
  3. The interactions on the charts are super responsive.
  4. You can change the theme or chart type with one or two mouse clicks.
  5. There's no "publishing" required. It's already live to everyone once you create your graphic.
  6. The chart types are limited, but I suspect 90% or more of what you need is available. 
  7. If you want a chart to display the graphic a slightly different way, you may need to edit the data and either crosstab or transpose the data.

Overall, using infogram was a pretty fun experience. I haven't used it for a while and it seems to have come a long way since then. With that, here's my Makeover Monday for week 14.

March 30, 2019

Groundwater Contamination and Cow Poo: A Major Contributor to Global Warming

No comments
This is a project I've been working on for a while now, mostly because time has not permitted me to finish it and I've had other "issues" to deal with. I've been doing lots of research about global warming, water contamination and whether or not the two are related.

While watching a documentary, they mentioned how methane from cows (i.e., cow farts) are a major contributor to the greenhouse gasses and how cow manure is a major source of nitrate released into groundwater used for drinking. Fortunately, there is tons of data available, the primary source being the Environmental Protection Agency (EPA).

I wanted to understand the geographical distribution of three factors:

  1. The percentage of each State with high groundwater nitrate concentrations.
  2. The total area (square miles) of each State with high groundwater nitrate concentrations.
  3. Where the cow crap comes from that pollutes groundwater used for drinking.

I decided to create a map for each of these topics, as a scrolling story, with three actions you can take to help reduce the impact of cow manure pollution. We all want safe drinking water after all.

March 26, 2019

#TableauTipTuesday: Create a Region to State Drill Down Map with Set Actions

No comments
In this tip, I show you how to use set actions to create a map that allows the user to click on a region and show the states for the region, but keeping all other areas at the region level.

March 25, 2019

#MakeoverMonday: Consumer Spending by Generation

No comments
For week 13, we're making over this viz from Business Insider:


What works well?

  • The generations are sorted from youngest to oldest.
  • The title is clear.
  • The gridlines help guide the eye across the viz.
  • It's easy to compare the general/misc category and the restaurants across generations.
  • A stacked bar chart is easy to understand.

What could be improved?

  • The story in the data, from the article, is about how millennials are spending more on restaurants. It would be good to make that a more obvious focus of the viz. 
  • There are too many colors.
  • While the title is clear, if you don't read the article, you could miss the purpose for the chart.

What I did

I really enjoyed using Google Data Studio last week, so I thought I'd give it another try to continue my learning. Since this was a simple stacked bar chart, I wanted to create a "set" for restaurants vs. all others. I needed to create a calculated field using a case statement that checks the category field. That's it!

From there, it was formatting, which is pretty intuitive as well. I'd highly recommend you give Data Studio a try, especially if you know exactly what you want to build; it's not a data exploration tool.