Launch, grow, and unlock your career in data

May 29, 2015

Dear Data Two | Week 6: Physical Contact

No comments
What an amazing week for me for Dear Data Two! The topic for week 6 physical contact and I've been learning Alteryx. The first thing I thought of was physical activity, not physical contact, so I emailed Jeffrey and asked him if he was ok with me taking such liberty on the topic. Fortunately Jeffrey was ok with my idea, but then I decided I could stick to the original contact by extending my thinking a bit.

I'm a huge quantified self data collector, which you'll likely see throughout my Dear Data Two work. I wanted to see how I could use Alteryx to help me get the data into Tableau for analysis before creating my analogue version because I feel like the best way to learn a new tool is to find a practical application. This is the first workflow I built on my own in Alteryx. It might not be the most elegant or most efficient, but I sure did learn a lot along the way. You can download this workflow here.

One of the things I have started to like the most about Alteryx is that I can push all of the complicated row level calculations that I used to do in Tableau to Alteryx, which in the end makes Tableau much faster. For example, I used to multi-row tool to calculate the distance between two geographic points recorded by my watch.

From there, I created the dashboard below to explore the data. In particular I wanted to view the maps and see the summary stats.  One thing I learned is that I need to figure out a way to account for times that I paused my watch; that data doesn't appear in the GPX files.

Exploring the data Tableau helped me quantify my runs for the week, but that didn't account for all of my physical activity for the week. To capture ALL of my activity:

  1. I noted my total daily steps from Fitbit.
  2. I calculated the number of steps for my runs by taking my stride rate of 184 strides per minute from TomTom and multiplying by the minutes I ran in Strava.
  3. I subtracted my running steps from the total steps to get my walking steps.
  4. I used the time of day that I ran and roughly calculated the proportion of walking steps before and after each run each day.

That resulted in this draft, which is sort of like a dot matrix:

For the final version, I colored the dots: Blue dots represents 200 steps walking and red dots indicate 200 steps running. I rounded the numbers for drawing purposes. 

You can view the images in the Tableau dashboard above as well, but note that as you're exploring the dashboard, when you click on the tabs that contain images, they will take several seconds to load. I've reported this bug to Tableau.

I really learned a ton this week thanks to Dear Data Two because I found a great use case for Alteryx. Not only did I learned a bunch of Alteryx tools I hadn't learned in the training I took at Inspire15, but I also learned how to do row-level calculations in Alteryx and how those can help Tableau performance. 

May 25, 2015

My First Alteryx Inspire Conference

No comments
Do you remember that feeling you had when you first started using Tableau? It’s the eureka moment that gets you. I’m going through this exact same set of emotions right now…with Alteryx. And it started at Inspire15 in Boston. A conference that felt very, very similar to a Tableau conference, in terms of its content, energy and audience.

I first got a demo of Alteryx from George Mathew back in San Diego at TCC12. I was working for Facebook at the time, Mike Roberts from InterWorks set up the meeting, but I didn’t see a particularly good use case immediately for Facebook. Why? Facebook Data Engineers have always (and probably always will) code their own pipelines.

Fast forward to Boston. Inspire15. I’m now working at The Information Lab and Alteryx is key to our business. I knew I needed to learn what all the fuss was about. Chris Love, retired Grand Prix Champ, helped me sort out which classes to take.

The day before heading to Inspire, I was sitting with Robin Kennedy and told him that I wanted to get a headstart on my training. Low and behold, he showed me all of the fabulous training modules that are built right into Alteryx. I had no idea! I completed about 15 of these on the flight to Boston.

After watching Arsenal draw 1-1 in a drab affair Sunday morning, I headed to the first of three training sessions: Predictive Analytics for Beginners. In this class I learned how to apply different data investigation techniques to help me understand how predictive a data source may be. The instructor showed us how to use the Association Analysis, Violin Plot, and Field Summary tools.

The workflow that we created...

…resulted in this series of violin plots (apologies for the blurry image).

From here, I attended Predictive Analytics for Intermediate Users, which was basically a continuation of the first class. In this session, I learned how to use regression analysis to help predict potential response rates to targeted marketing campaigns. Tools used in the session included: Logistic Regression, Decision Tree, and Forrest Model.

The regression analysis workflow we created...

…resulted in a series of tables and this chart (which shows that charting is not in the Alteryx sweet spot).

The third and final class I attended on Sunday was Intermediate Macro Development. This was a pretty simple class in which we built a workflow + macro to strip heading from a messy Excel spreadsheet.

Monday afternoon our team went on an amazing Segway tour of Boston.

Monday evening, Alteryx hosted a really nice welcome reception, including the Grand Prix, which our very own Craig Bloodworth qualified for. 

Dean Stoecker kicked off Tuesday with a great keynote about Analytics Independence. Alteryx does this kind of strange thing during their keynotes where they bring up customers for on stage interviews. I’m not sure why they do it. Personally, I found that they disrupted the flow of the keynotes and didn’t really add any value, but that’s my opinion.

I wanted to be sure to attend a few customer stories to get a better feel for how people are using Alteryx with the hope that it would give me some ideas on how I can use it. The first customer session I went to was by Sprint, and it was a stinker! The content was mediocre at best and the speakers were not very good. The second speaker stood at the front, faced his presentation with his back to the audience, and simply read the slides. It was THAT bad. Here are a couple of screenshots of their slides if you don’t believe me.

In the afternoon, I attended a great session by Ramnath Vaidyanathan, a Data Scientist at Alteryx, on the interactive visualizations for predictive analytics that were introduced in Alteryx 9.5.

Tuesday night was capped by an incredible party at The House of Blues with a Aerosmith tribute band “Draw the Line”. The band was incredible. It was clear the lead singer really wanted to be Steven Tyler, all the way down to the cosmetic surgery.

The Information Lab team ALWAYS has fun!

The final day of Inspire15 was kicked off with sensational keynote by George Mathew, in which he talked about the future of Alteryx and brought a developer on stage to demo the features coming in Alteryx 10.

Nice photobomb by the TIL team!

The quantified self work of Tim Ngwena of TIL was a keynote highlight!
I took one more training class in the late morning, Predictive Analytics for Advanced Users. My Inspire15 concluded with one of the best talks I’ve ever seen at any conference. The Information Lab’s Chris Love, Tim Ngwena and Craig Bloodworth gave an amazing talk they titled “From Data Hobbyist to the Boardroom”. It was chock full of great work that is reusable and applicable to everyone. Well done lads! You can watch the video of their talk below.

In the end, Inspire15 was a fantastic experience for me, a new Alteryx user. I’ve already started applying what I’ve learned and am working on two blog posts. My only regret is that I didn’t start using Alteryx sooner.

Stay tuned!

May 23, 2015

Dear Data Two | Week 5: Things We Buy

No comments
Week 5 of Dear Data Two was pretty straight forward, but took a bit of data blending to make it work. I had two goals for this week:

  1. Track every purchase that I make
  2. Categorize each purchase by the type of goods
  3. Locations each place where I made a purchase

Precise times were taken from purchase receipts, along with the categorisations. I then recorded the locations of each place by Swarm check-in, which were uploaded to a Google Sheet via IFTTT. I downloaded both sets of data into excel and manually joined them (there were only 19 records so it wasn’t much effort to do manually).

I then explored the data in Tableau, to see what stories I could find, if any. This week took me longer than I was expecting, mostly because I was having trouble finding anything interesting in the data. The one point that stuck out the most is that I spent more on ice cream than Mother’s Day. Oops! Please don’t tell my mom.

May 19, 2015

May 18, 2015

Makeover Monday: How Much Water Is Used to Produce Your Food?

Quick makeover this week (we have a Segway tour of Boston at #Inspire15 in 30 minutes). I saw this graphic on the LA Times about the amount of water it takes to produce a single ounce of food.

It’s cute and it’s interactive, but it’s not very good for making comparisons or ranking. Bubble plots are notoriously difficult this way. For example, tell me quickly which food uses the 3rd most water? Tough to tell, right? I also don’t understand why they grouped fruits and vegetables together.

I manually recreated the data in Excel, which you can download here. Hopefully I recorded everything correctly; if not, please let me know. I then quickly built a chart in Tableau. I’ve addressed the issues that bubbles present, ranking and comparison, by using a bar chart instead.

Going back to the previous question, using my viz, which food uses the 3rd most water? Simple right? How about the 10th most vegetable? That’s simple too; all you need to do is click the color on the right.

May 13, 2015

Dear Data Two | Week 4 - Mirrors

No comments
The week theme for Dear Data Two was “Mirrors” (You can follow Dear Data Two here). I explored the data in Tableau and created this story about my week of mirrors and reflections. This was quite a tough week from a data collection perspective. I’m not totally satisfied with my analog version, but done is better than perfect.

Tableau Tip Tuesday: How to Create Waterfall Charts

I've missed the last few weeks of #TableauTipTuesday, and technically it's Wednesday in London, but pretend I'm in California and it's Tuesday. This week, I show you how to create waterfall charts in Tableau.

The first example is very basic; I did this intentionally so that the steps would be super easy to follow. The second example is only moderately more complex; it looks at Tableau's SEC financial filings from 2011-2014.

May 11, 2015

Makeover Monday: Why Are There so Many More Muslims in U.S. Prisons?

No comments recently published an article about the difference in the religious beliefs of prisoners in the U.S. vs. the general U.S. population. In this article, they provide this table:

They go on to do an analysis, but never really address the story the data is telling in this table. Clearly what this table is screaming out for is to show the difference between the two populations. I’ve been on a bit of a slope graph kick lately, so that’s what I’m using again this week. Why? Because I find slope graphs to be an excellent way to show variances between two data points.

The slope graph clearly makes the differences stand out. One can easily see that there are fewer Protestants and Catholics in prison, and at the same time see that there are way more Muslims in prison. I then like to supplement the slope graph with a bar chart that shows only the differences.

There’s no clear evidence available as to why this is, but representing the data this way leads to more questions and more discussion. Any time you design a viz and it continues the conversation, you’ve probably done something right.

May 4, 2015

Makeover Monday: Which NFL Teams Were the Biggest Overachievers and Underachievers in 2014?

The NFL Draft was this past weekend, which for many people is the biggest day of their year. It’s the day that all teams have renewed hope for the upcoming season. This got me thinking back to a viz I saw from Cork Gaines back in January that I had tagged for a makeover.

My biggest problem with this viz is that I have to turn my head sideways to read it. In addition:
  • The length of the bars isn’t accurate.  How can +4.5 be longer than -5.0?
  • The bars are in reverse order - the biggest overachievers (Dallas) should be first.
  • I have to do the math in my head to get to their predicted wins.

My first thought was to see what this viz looked like it I rotated it counter clockwise.

That definitely makes it more readable, but the story still doesn’t stand out. What the data is screaming for is to show the change and emphasize the winners and losers. To this end, along with accounting for the observations above, I created this interactive version in Tableau.

May 2, 2015

Dear Data Two | Week 3 - A Story of Thanks

No comments
This week’s theme for Dear Data Two was “Thanks” (You can follow Dear Data Two here). I explored the data in Tableau and created this story about my week of email. Click on the image below to explore the story.