October 24, 2014
Tableau Tip: Blending data sets that have different max dates
Fun little problem at work today. Imagine you have four data sources that all have a Date field in common, but they all land in your database at different times. You want a view of the data for the latest date that they all have in common.
Here's a sample of data the represents the situation:
Inside Tableau, these are four different data sources. I blend them all together onto a single worksheet like this:
What I desire is to only show the data for the latest date that all four data sources have in common, 10/17/2014 in this example. While there might be a more efficient way to perform these calculations, I want to share the approach that I took to solve the problem.
First, I created a calculated field in each data source to get the max date. Note that I've set the table calculation to calculate along Date.
I then added each of these calculations to the view to make sure I was getting the results I wanted.
Sweet! This is exactly what I was hoping for. Next I need to determine which one of these four dates is the minimum and then return the date from that data source. This is the calc I created and it's kinda messy, but it works. Note this I set the calc to compute along Date.
This calc returns 10/17/2014, which is exactly what I was expecting. One more step. I need to create a calculated field to use as a filter in my view so that I only see one row.
I add this to the Filters shelf, set the Compute using to Date and choose True. That's it!
If anyone knows a more efficient way to do this, I'm all ears! Download the workbook here.
Here's a sample of data the represents the situation:
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyINApJcsoRQhd6vG9tOIU8R3UTIxGu6NdmVzRapfW1Lc7Lo7Rxi1luSgd78IY4FY4bSqOI-J4SPDgvrk_AS4JkShQn6jfd32qNC1rZEfGovQf4jwcFXhSC6wToj2qQGQ2JiLYZWbDuug/s1600/1-Engagement.png)
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj82py7yzRMVG0aJXDhq5Z-HLxLxr25T_idlWqeCk9uqhO-Awmmw6b4uzC8Gvssk7JRdqV9JpiZkLwfLqzKC6CI54g2mVAWEuRTPFhN-qO1SE8XSaUQCHEZCxmVObgyGcdllJGJOongqvo/s1600/1-Likes.png)
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKPjmjadsuHM2c7SYhGDR9lxCp0Z8x03qL_mRUWUFBpKGUPk35oz83YiUwbb9vSQE_-OsqSSLykCalMRX2aGfKPySCI1VW3IJQcTytfxRkb9veUpZCnYNUJinTTx1XqyEieKmaZvPdzmw/s1600/1-Reach.png)
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgAh0uz1ktegWEirZC_wHNGiX1PjaTqiuI6Vx8oTGLoIxWuplOzThoiivy8cfF9xyvERJdEgQZQpizgi5sTBvTOpHxA0v7a7yCyhZM0Jx8jwvXR02tZgti3WydpRDGwbm7e8EsrFjaMMN8/s1600/1-Views.png)
Inside Tableau, these are four different data sources. I blend them all together onto a single worksheet like this:
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5hlxDBg375woysQosbwubzPmT8kGgom9u2zGtZeyU0CD_At0rC9pz7C09-EuJPUFpUL5Upniy-ipxEJVGqkHNrc34u3pp54z7Y74Q_3fTnA_d8LYNOb5TIKTaA98qs7zZpJUgI3TsgW8/s1600/1.png)
First, I created a calculated field in each data source to get the max date. Note that I've set the table calculation to calculate along Date.
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtcUXq7ZdRv1vhR6l11ATIQLYqIR39Phy4I4towrv4MMi775vGpvrKZDo3kjxFxG3nTJ87s8LBgUnQP1rNhi_20juapU_Yb0SMcy_djda9KtyJxSZPfVpP8aNp52IZynUsP7e_WUvAIpI/s1600/1a.png)
I then added each of these calculations to the view to make sure I was getting the results I wanted.
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhb0k1oCInqqlAzFysHfB5bq27JJr2185QDw0zQlqudzHkZAiWyULln7oFbRWoQ6WGo7QjqSX_M88RHF9izo1uun4rudL37wSaihIurtbRtNiY_vfz_IjdxAU4E_jaMGrNd4OmwyNGlL_w/s1600/1b.png)
Sweet! This is exactly what I was hoping for. Next I need to determine which one of these four dates is the minimum and then return the date from that data source. This is the calc I created and it's kinda messy, but it works. Note this I set the calc to compute along Date.
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjcs60vmeZGRFSNLAjLP4kIEyLLfg-oWrjRNGwLl4vdL4OcNKc_w1mkpZV7T_bElcE8QyYlCLTy4KTncpSvba78lVytxZiqS2pvYlq5pbqmpITkY0fX3YUozNV_5awinoimu7UPN92UsG0/s1600/1c.png)
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWfhoxH-ULJ7h76oUloo5WzHVRKtBoTJVgEJMMvJQTxb6XY4jYv7kGiiNbb2SRoKCnqM4nV_B37w-0NNOse_EKe6e-70THdNXJEs9WCFMIIjW2s2rxDLlkXIsdao9_JWplhi5VTRUjmBs/s1600/1d.png)
I add this to the Filters shelf, set the Compute using to Date and choose True. That's it!
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgU5hnJ0LUK8s6Ml0e6X0DOCo4Ao636IM1l1ASBIibYB_aCO7wJokZFXNic6U3lVP1vF3BZegO6G3ul_E08OvsI6jv-1DsPKjqBIm8DXwISPLiL_f4iy1FXaEJAwHdKNEern3d1jU4F9XU/s1600/1e.png)
If anyone knows a more efficient way to do this, I'm all ears! Download the workbook here.
Subscribe to:
Post Comments
(
Atom
)
Link to workbook: https://www.dropbox.com/s/de2ku3bkxvoqkp4/FourBlendsSample-MMMedit.twbx?dl=0
ReplyDeleteIf I'm understanding the issue correctly, the key is that "NULL if any value in calc is NULL":
ReplyDeleteMAX([Date]) =
WINDOW_MAX(DATEADD("second",
DATEPART("second", ATTR([Date]))+
DATEPART("second", ATTR([Engagement].[Date]))+
DATEPART("second", ATTR([Likes].[Date]))+
DATEPART("second", ATTR([Views].[Date]))
, MAX([Date])))
Thanks Matthew! I knew there was likely a more efficient way. Part of what I wanted to show was how I thought through the problem.
DeleteI tend to go back and clean up and simplify afterwards.