Data Viz Done Right

November 18, 2013

VizCup Champ–Eric Rynerson: Seeing Red: Home Crowds Bay for Blood; Referees Bend

No comments
The first ever VizCup Champ, Eric Rynerson, describes his viz and his VizCup experience in his own words below.  Eric admits that this post is long, so he provides the viz early on in case you don’t want to read about his experiences.  Also note that this is a static image of the dashboard.  There’s a bug in Tableau Public that is causing it not to display the referee images.  To play with the workbook, which I would encourage you to do, download it here or click the image.

First I’d like to say that it was a great experience participating in the competition, the other entries were impressive and I was honored to have won.

I had a few reflections on the ride home; I was going to write these down to share with my colleagues back at InsideTrack (where we use Tableau and other tools to explore the impact of our coaching program on student engagement and persistence) so when Andy offered me a guest post here on VizWiz I thought it might be helpful to share them more widely. Below is my viz, and following it are “Five Things I Learned from the Facebook Viz Cup”.

The dashboard answers three questions:
  1. Do referees punish the visiting team more than the home team? (Definitely. Hover over the regression line to see the coefficient: Way higher than 1.0, indicating more punishment for the visiting team. As I note, the yellow card bias is weaker, suggesting it’s not just gameplay factors)
  2. Are some referees a lot more or less biased? (Yes, though only a few really look like outliers)
  3. Do they gain the confidence and maturity to ignore the home crowd’s pressure as they gain experience? (Here’s where I made it interactive- click around in the bubble chart and look at each ref’s personal history on the right; I think you’ll agree that generally they do not get any better)
  • Bonus: What does each one look like? Click on a ref in the bubble chart to see images of them from the web in the lower section of the dashboard

Five Things I Learned from the Facebook Viz Cup

1) Failure is okay- just do it fast, and don’t give up
Sometime after midnight on Tuesday, the first of just two evenings I would have to explore the data before the competition, it occurred to me that I was failing.
My plan was to investigate whether local rivalries in soccer really do produce unpredictable results (as the cliché goes, “in the derby the form book goes out the window”), and I really wanted to find the answer. However, whatever I did to get the final output I’d have to repeat on-site in under an hour and then explain it all in 90 seconds to three judges and a room full of people…including a required interactive component.

The problems were multiplying: To answer the question directly I needed to transform the data or compensate with a lot of calculated fields (which would eat up precious minutes Thursday evening), there just weren’t enough crosstown rivalries for consistent patterns to emerge, I kept thinking of variables I needed to take into account that were hard to get or complicated to explain, and so on.
I honestly started to wonder what gave me the idea I had any business driving down to Facebook Headquarters to compete in a data visualization contest. I started to imagine computer science PhDs blowing away the audience with 3D network graphs that respond to a clicked tweetID with a dazzling cascade of retweets the tune of spacey New Age music. Upon later reflection I suspect it takes longer than an hour to code up something like that, but it was after midnight and my confidence was wavering.

I decided that as long as I avoided embarrassing myself it would be worth the trip just to learn what everyone else did with the same datasets, how they did it, what tools they used and how they learned them. So I vowed that I would revisit the question briefly at lunchtime and then most likely start over with a clean slate on Wednesday evening, perhaps looking at red card distribution. Thank goodness I did! I’m not sure I really “failed fast”, or fast enough anyway, but at least I didn’t give up when my wheels were spinning.

2) Play to your strengths
I was quite excited to see the Premier League dataset among the seven options. I’ve been following Barcelona since 2006-7, Arsenal since roughly 2009-10, and the Seattle Sounders since they joined the MLS in 2009, so coming up with interesting questions to explore was easy. For example, if you only have 45 minutes to spare for a game should you watch the first or the second half? Does more offensive pressure (shots off target, shots on target, corners, etc.) lead to more goals? Do all the player trades teams make in January actually change their trajectories in the second half of the season?

I’d like to believe I would have found compelling stories in the other datasets provided, but I think having a hunch as to where to look for interesting findings helped a lot given the time constraint. I imagine that was also a factor for the participants who work routinely with social media or location data (and built from those datasets)- in fact some of them made comments to that end.

3) Focus on the story
There were some great entries that didn’t get much recognition despite solid functionality and depth, and I think it’s because their presenters emphasized what the user can do (click here to go there, select here to filter that) rather than why they should do it- or missed the opportunity to pick one storyline and follow it to its conclusion using the dashboard.
It’s tempting to leave it to the audience to draw their own conclusions, especially for analytical people (who tend to make claims cautiously, as they fear being wrong even more than they love being right). The problem is that people don’t engage with neutral information the way they engage with an argument- so make one! You can use interactivity to give them a chance to undermine your thesis, or figure out when it works and when it doesn’t, or whatever. But give them something to react to.

By the way, I think this extends beyond static analyses- the best live reports are the ones designed by people who begin by thinking about what questions the report is supposed to answer. You might be interested enough in a report promising “measure X by dimensions A and B”, but you’d be more interested in seeing “last week’s performance leaders and laggards” or “current pipeline bottlenecks”.

4) Keep it simple
Ninety seconds is a very short time. Having to present so quickly was actually a bigger constraint than having an hour to put the viz together: I kept having to remind myself this was not an analysis competition, and that I would only have time to convey two or three points to the audience anyway. So I needed to keep the question simple and the answer simple as well. This is hard for analytical people- we like to think of all the potential holes someone could poke in our argument and fill them with supporting analyses.

Fortunately I had taken some inspiration from watching Ryan Sleeper win the Iron Viz competition at this year’s Tableau conference. Instead of going for analytical depth (I should mention here he had only twenty minutes to build something), his viz answered a simple question and it was very clear how the user was supposed to interact with it. I pulled it up several times as I was iterating toward my final version and the example helped me decide to leave several things out of the main dashboard.
For example, I almost worked in another graph comparing yellow card bias- instead I just included a note and a verbal mention during the presentation. I also wanted to add a section with the list of games in which the selected ref had handed out red cards, as well as another section driven by that list which provided Google News results for the game you clicked on so you could investigate whether the cards were controversial. I was pretty certain this would be awesome, but in the end I worried that all the extra content would make the dashboard look like someone’s personal website from the 1990s (too many frames and scrollbars). I suspect the clutter would have cost me.

5) People love pictures
I usually focus on findings and next steps in my day-to-day communication and don’t often take formatting beyond “professional-looking”, so the other thing that struck me at this year’s Iron Viz competition was how much a couple relevant images from the web or some thematic formatting could add to the viewer experience.

That was part of my inspiration for spending about a third of my prep time Wednesday evening just on the portion of the dashboard that pulls each referee’s images from the web: Trying different sites, reading about the variables in Google’s URLs and experimenting with them, seeing which search terms were most reliable, making sure the user doesn’t have to scroll, etc. It was totally worth the time.

A gif here and there made a lot of the dashboards more fun- like Mike Evans’ alien with the peace sign- though my favorite was the cartoon steer being abducted via tractor beam (bunchball’s dashboard).

Lastly, the verbal picture can be just as impactful. Since most of the audience had likely never attended a game in a soccer-mad country (England, Argentina, the Pacific Northwest…) I related the referee’s plight for them when introducing the questions I was answering. It was something along the lines of: “Imagine having to make a hugely consequential decision in a split second… now imagine 70,000 people screaming at you to make one particular choice. And knowing that they might follow you to your car if they don’t like your decision. You can imagine that it might be hard to stay completely objective!”

So that’s what I learned. Drop me a line if it was helpful; I’d love to hear from you.

November 15, 2013

VizCup roundup–2nd place: Mike Evans, UFO Sightings on Your Birthday

No comments

This is a guest post from Mike Evans, the second place finisher at the Facebook VizCup.  I love the clean design and the way that it draws you in the play with it.  Many of us would not have been surprised if this was the winner.  And the fact that Mike flew up from LA just for this event…wow!  Truly inspirational!

Panic.  It was late. The night before the first-ever VizCup at Facebook Headquarters.  I needed a phenomenal data visualization that I could reproduce from scratch on site within a 60 minute time limit.  The data set on UFO sightings I’d been poring over revealed no compelling story.  I’d come up all the way from Orange County to compete and if inspiration didn’t strike soon, I was going to be embarrassed.  I switched to a soccer data set.  I know very little about soccer, but I’m generally a sports fan.  I quickly had a semi-interesting question.  “Can your team pull off a comeback in the Premier League when losing at halftime?”  The visual looked ok.  

Comeback Premier League

Something wasn’t right though.  Do soccer fans call it “halftime”?  When I’m presenting this should I actually call it soccer or football?  The other thing I didn’t like was that what I really wanted to answer was how likely a team could win when coming from behind at different points in the game.  That couldn’t be answered from the data set.  Plus I’d only have an hour during the competition to build a great viz and I already had to do a bunch of reshaping of the soccer data to answer this question.  Reshaping was time consuming and increased my chances of a making a mistake or encountering a technical issue. 

So I switched back to the UFO data.  Stupid UFO data.  Maybe there was a relationship between “The X-Files" episode airing dates in the 1990s and UFO sightings?  Did the higher rated episodes correlate to higher sightings that night?  That was a dead end.  Stupid, stupid UFO data.  A little while later I’d plotted out the average delay in reporting UFOs by region.  Turns out that some UFO sightings were reported many decades after they were seen, and different states had different averages.  Yawn.  I grew bored and took a break.  I started wondering if there might be a UFO sighting on my birthday on Jan 16, 1977.  Turns out there wasn’t.  Hmmm, well what about any sightings on Jan 16 of any year?  Cool, there were a bunch.  Is that a high number compared to other days?  I wonder how many were on my anniversary?  Wait!  This is great idea for a viz!!!

That was really the inspiration for my viz and I spent my remaining prep time coming up with the best design I could think of that would still be easily reproduced in less than an hour in an unfamiliar environment competing against some of the most talented people in the industry.  And so the “UFOs on Your Birthday?” Viz was born and earned the “First Runner Up” status at the Inaugural Facebook Viz Cup.

November 14, 2013

VizCup Roundup–3rd place: Micah Rice, an analysis of natural disasters

No comments

This is a guest post by Micah Rice, the third place finisher at the Facebook VizCup.  In his day job, Micah is a Strategy Consultant for Wells Fargo.

I was interested in the natural disasters data set for two reasons: 1) because I work in a location analysis group and I wanted to use maps and geo-spatial analysis, and 2) because I am interested in using data to solve big problems that impact people’s lives.  The first question I wanted to answer was where in the world disasters happen: most occurrences and most people affected (Dashboard 1). A quick map revealed that China and India have the highest number of events and people impacted. The U.S. has a high number of events, but low population affected. 

The second question I had was what type of disasters cause most of the damage and how deadly are they (Dashboard 2). As it turns out, drought affects the most people by far, but extreme temperature kills a much higher proportion of those affected. I plotted this change over time and while the number of events recorded has increased tenfold, the proportion killed has actually decreased over time.  That got me thinking about what might be causing the disparities between number of events, people affected, and proportion killed.

I thought a country’s economic ability to deal with a disaster was a good place to start, so I did a quick search and found a GDP by country data set online and created a little interactive toggle to bring in Per Capita GDP cohorts into the analysis. This new data revealed that not only are poorer countries more likely to experience natural disasters, but they see much higher rates of mortality in several disaster categories.

This made me question what could be done about this from here in the Bay Area (Dashboard 3). I plotted the number of disasters and the distance from San Francisco in a few different ways and found that if an aid organization wanted to be well-located in order to respond quickly to these disasters, they would not be located here, but rather somewhere near the Middle East or South Asia.

While this analysis does not solve any real problems, I was very happy to think that good visual analytics might serve to improve the logistics around disaster response, and possibly even inform policy around what factors contribute to the unevenly-concentrated human impact we saw in the data.

November 13, 2013

VizCup roundup–data viz hackers unite!


Last Thursday evening, Facebook hosted 80 people for the VizCup, a data visualization competition.  We provided eight data sets two days ahead of time for the participants could choose from.  The data ranged from UFO sightings, to natural disasters, to the Premier League, to Foursquare check-ins, to stats about blogs, pages and portals.  They also had the option to supplement the data with data of their own, like corn yields to see if there’s a relationship to UFO sightings.

We allowed the participants to self-organize into teams or they could work on their own. In the end, we had 25 individuals or teams present. 

Ken Rudin, who leads all of analytics at Facebook, kicked off the evening with a few words about what it means to be an analyst and the future of analytics. Ken was followed by our three awesome judges, who came out to loud cheers and a bit of Crazy Train for intro music.  Anya A’hearn of datablick, Drew Skau of and Cole Nussbaumer of storytelling with data served as our judges.  Check out Cole’s review of the VizCup.  When the hacking began, the judges roamed the room to get a feel for what people were building.

One interesting note was that every participant, except one, chose to use Tableau to build their viz, despite being given the freedom to use whatever tool they wanted.  Personally, I think this is for two primary reasons:

  1. Tableau’s ease of use and the ability for a user to build something meaningful in an hour
  2. The enthusiasm and passion of the Tableau community.  What I mean by that is Tableau’s users look for any excuse to use Tableau on their free time.  Mike Evans, our 2nd place finisher, even flew up from LA!!  Now that’s passion!

There will be summaries of the top 3 finishers coming soon, written in their own words.  But first, here are a few pictures to give you a feel for the atmosphere (thanks to Peter Bickford of Slalom Consulting for many of the pictures).  Farther down in this post, you can see some of the entries submitted.  We’re super excited to host the event again soon!

November 4, 2013

Data Viz, Facebook, and you. Join our awesome team!

On our Data Warehousing & Reporting team, we're looking to bring on full-time employees, contract-to-hire, and/or full-time contractors.

An overview of the role can be found on Facebook Careers.

It's been quite challenging to find strong candidates.

Why don’t people meet the bar?

  • Overstating Tableau skills
  • Overstating SQL skills
  • Too much dependency on tools to do the thinking
  • Lack of product sense
  • Too much dependency on data engineers

Having been through them, I can honestly tell you that interviews at Facebook are tough.  We will test your limits.  We don't expect everyone to know everything, but we do expect you to do your best to work through problems and ask questions when you don't know the answers.

It's clear that we want you to know SQL, yet we have candidates come in that haven't done anything to come up to speed on SQL.  All that does is show us that you didn't take the initiative to learn beforehand when you knew it was a weakness.

What makes a good candidate?
  • Strong ability to tell stories with data
  • Strong in data visualization
  • Competent in SQL
  • Be able to demonstrate excellent product sense (i.e., Can you pick up products goals, concepts and requirements quickly?)
  • Tool-agnostic critical thinkers
If you think you'd be a good fit or if you want to chat in more detail, email the Facebook Data Viz team.