Automation – How to make some workbook cache shorter than others

You wish that live connection workbook’s cache can be much shorter than workbook with extracts. You also wish that cache level can be controlled at each data source or workbook. Unfortunately Tableau server does not have those features yet. One Tableau server has only one cache policy – all workbooks have the same length of cache…..

This blog shows you that you can have the following server add-on features:

  1. Workbooks with ‘no-cache’ tags are not cached (technically a few minute cache only)
  2. Workbooks with live connection are cached very short, like hourly
  3. Workbooks with extracts have much longer caches, like 20 hrs

How to achieve different cache setting for workbook specific,  live connections and extracts?

The answer is to find a way to dump live connection workbook cache much more often than overall server cache holds.

How? You may have noticed that whenever you published a newer version of workbooks, Tableau server will not hold the old version workbook’s cache anymore, which makes perfect sense!

What it also means is that Tableau server cache has to do with workbook’s timestamp! This is a very useful insight that was confirmed with Tableau Professional Service team. When workbook’s timestamp is changed, its cache is gone automatically on Tableau server. The idea here is simply update workbook’s timestamp by using additional scripts outside Tableau server so the cache will be forced out automatically per specific conditions that we can define!!!

There is no API to update the workbook timestamp but you can update the Postgre DB directly. It is only the ‘last published at’ column of workbooks table. The sudo code is as following:

UPDATE workbooks
SET ‘Last Published At’ = (select TO_TIMESTAMP (‘Last Published At’,’DD/MM/YYYY’) + NUMTODSINTERVAL(1/1000,’SECOND’)) from workbooks)
WHERE condition;

  • What it does is to add one millisecond to the workbook’s timestamp
  • The condition is what you define. For example, for workbooks that have specific tags. Or for workbooks with true live connections to data sources (note: it is not easy to identify those true live connection workbooks, I will talk about this in separate blog)

Before I implemented above cache mechanism, I struggled to set cache policy to meet both live connection use case and extract use case’s needs. I had to set cache at about 1 hr for whole server which is Ok for live connection but not effective enough for workbook with extracts (slow perf).  After we figured out new cache approach, now the server has 24 hour cache policy that improved view render time 30%:

  1. Workbooks with ‘no-cache’ tags are cached only 5 minutes
  2. Workbooks with live connection are cached 1 hour
  3. Workbooks with extracts have 24 hrs cache. Of course, turned on  ‘Pre-compute workbooks viewed recently’ flag for each site to leverage cache warm-up feature after extract refresh.

Read addition notes @ 

10 thoughts on “Automation – How to make some workbook cache shorter than others”

  1. Two full thumbs up for this magnificent article of yours. I’ve really enjoyed reading this article today and I think this might be one of the best article that I’ve read yet. Please, keep this work going on in the same quality. Spinning around the most interesting & ever green concept science & technology. Especially talking about tableau visualization tool is best suitable data representation tool for any business across the world.

    1. Why the last_published_at column and not the updated_date? I think that updated_date may be updated when there is new extract refresh for embedded data source workbook. While last_published_at is what the workbook itself (xml) was last updated

  2. Hi, This is an awesome read and nice solution. I was wondering how did u manage to write update query against the tableau postgres database. I am able to connect to database and read the data from tables and views but readonly user does not have update permission. Can you please advise on this.

    Thanks

  3. Even I have similar issue and how updating the last published timestamp will re-create the cache file again?

    I have hyper extract as data source and dashboards is not at all creating cache. I am facing challenges to create cache file

  4. Since Mark covered this in the Virtual Server Admin group yesterday I thought I would post my experience using this method. It has been running for about a year now with great success.

    Our environment was set up in 2012 for weekly and monthly financial data but as adoption increased we ended up with some more use cases. I had a user who was running 15 minute long extracts every hour so that he could have near real-time data. Working with them we were able to figure out why they weren’t just doing live queries and implemented this solution.

    The users who want live and real-time data get it and we can still cache for everything else.

    If someone needs a code example there is one here: http://missingrows.blogspot.com/2019/11/deeper-into-tableau-repository-part-6.html

    Also, I am incrementing the last_published_at column and it works fine for me.

    1. Tom, Really glad to know that it works for you and thank you so much for sharing your code.

Leave a Reply