Automation – How to make some workbook cache shorter than others

You wish that live connection workbook’s cache can be much shorter than workbook with extracts. You also wish that cache level can be controlled at each data source or workbook. Unfortunately Tableau server does not have those features yet. One Tableau server has only one cache policy – all workbooks have the same length of cache…..

This blog shows you that you can have the following server add-on features:

  1. Workbooks with ‘no-cache’ tags are not cached (technically a few minute cache only)
  2. Workbooks with live connection are cached very short, like hourly
  3. Workbooks with extracts have much longer caches, like 20 hrs

How to achieve different cache setting for workbook specific,  live connections and extracts?

The answer is to find a way to dump live connection workbook cache much more often than overall server cache holds.

How? You may have noticed that whenever you published a newer version of workbooks, Tableau server will not hold the old version workbook’s cache anymore, which makes perfect sense!

What it also means is that Tableau server cache has to do with workbook’s timestamp! This is a very useful insight that was confirmed with Tableau Professional Service team. When workbook’s timestamp is changed, its cache is gone automatically on Tableau server. The idea here is simply update workbook’s timestamp by using additional scripts outside Tableau server so the cache will be forced out automatically per specific conditions that we can define!!!

There is no API to update the workbook timestamp but you can update the Postgre DB directly. It is only the ‘last published at’ column of workbooks table. The sudo code is as following:

UPDATE workbooks
SET ‘Last Published At’ = (select TO_TIMESTAMP (‘Last Published At’,’DD/MM/YYYY’) + NUMTODSINTERVAL(1/1000,’SECOND’)) from workbooks)
WHERE condition;

  • What it does is to add one millisecond to the workbook’s timestamp
  • The condition is what you define. For example, for workbooks that have specific tags. Or for workbooks with true live connections to data sources (note: it is not easy to identify those true live connection workbooks, I will talk about this in separate blog)

Before I implemented above cache mechanism, I struggled to set cache policy to meet both live connection use case and extract use case’s needs. I had to set cache at about 1 hr for whole server which is Ok for live connection but not effective enough for workbook with extracts (slow perf).  After we figured out new cache approach, now the server has 24 hour cache policy that improved view render time 30%:

  1. Workbooks with ‘no-cache’ tags are cached only 5 minutes
  2. Workbooks with live connection are cached 1 hour
  3. Workbooks with extracts have 24 hrs cache. Of course, turned on  ‘Pre-compute workbooks viewed recently’ flag for each site to leverage cache warm-up feature after extract refresh.

Read addition notes @ 

Automation – Data Source Archiving

If you follow my previous blog Automation – Advanced Archiving  to archive workbooks,  overtime you may also need to archive data sources.

Why delete data sources?

If the workbook has embedded data sources, the embedded data will be deleted when the workbook is deleted. However if the workbook has separate published data sources, when the workbook is deleted, the published data source is not deleted.

First of all, when the workbook is deleted, you do not want to delete the published data source right away, why?

  • The published data source could be used by other workbooks
  • The published data source can still be used for new workbook later on

On the other side, it is possible that your server may have a lot orphan  published data sources not connected to any workbooks – those are the candidate for additional deletion, which is why this blog about.

How to delete data sources?

Good new is that there is Delete Data Source API : DELETE /api/api-version/sites/site-id/datasources/datasource-id

api-version The version of the API to use, such as 3.4. For more information, see REST API Versions.
site-id The ID of the site that contains the data source.
datasource-id The ID of the data source to delete.

How to decide what data sources to delete?

That is hard part.  The high level selection criteria should be as followings:

  1. Not connected to any workbooks  : Ref https://community.tableau.com/thread/230904
  2. Created a few weeks ago : Do not delete newly published data sources
  3. No usage for a period of time (like 3 months): It is possible the data source is for Ask Data only or for others to access via Desktop. Join historical_events and historical_event_types and look for Access Type = Access Data Source with specific hist data source idaccess_DB

 

 

 

 

 

 

 

 

Another way to identify those data source not used for long time is to use  the following criteria:

select datasource_id, ((now())::date – max(last_view_time)::date) as last_used
from _datasources_stats
where last_used > 90
group by datasource_id

Download Tableau Data Source Archiving Recommendation.twb

Conclusions:  It is a good idea not only to delete old workbooks but also old data sources. This is specially important if the workbook is deleted but the published data sources still have scheduled refresh.

The idea is to delete orphan data sources published for a period of time but has no more usage at all.

Automation – Remove Permissions for Slow Render Workbooks

My previous blog talks about sending automated alert (VizAlerts) for slow render workbooks, this blog will show you how to  enforce timeout or other government rules for slow render workbooks.

Problem statement: Slow workbook alert  is important and absolutely necessary. However sometimes the alerts and email renders are not good enough – what if some workbook owners do not take any actions?  What server governance team can do about?

Solution #1: Reduce vizqlserver.querylimit

  • vizqlserver.querylimit: Covers specifically the queries executed to return views or populate extracts. The default value is 1800 seconds (or 30 minutes). This can be reduced to 180 seconds (for example).
  • What it means? querytime limitAny queries more than this limit will receive the this  error
  • How to change this setting?
      • tsm configuration set -k vizqlserver.querylimit -v 300
      • tsm pending-changes apply

 

Solution #2: Remove workbook permission

The solution #1 (Reduce vizqlserver.querylimit) should be good enough for most of governance process to enforce render time.  It is  very simple and effective. Limitations of solution #1  are

  • It is for whole server and is not site specific
  • It has no exception at all, no matter which workbook

If you want to have more flexibility, you can create custom solution. How to do it?  After a lot of research and discussion with many others, the best custom solution is to create your own selection criteria for slow render workbooks, then remove those workbook permissions. What I did is as followings:

  1. Find out the render time pattern of 1% slowest workbooks (based on the data on my server, those 1% slowest workbook took about 20% server VizQL CPU ands memory
  2. Find out what the weekly avg render time for those slowest 1% workbooks (it was about 30 seconds for my server after avg due to caches, etc)
  3. Now you have the selection criteria of identifying slowest workbooks that you wanted to take actions on
  4. Use my previous blog  to send  automated alert (VizAlerts) for slow render workbooks
  5. I will recommend to send 3 warnings before the enforced actions – it means that the slowest workbook have to be slow and met selection criteria for 4 weeks in the row before enforcement actions
  6. At 4th week, remove the workbook’s permissions and send auto email to let workbook owner know that the workbook’s permissions are removed and the email should include all permissions being removed – this email can be achieved by another VizAldert and permission query and deletion can be done using REST API :
    • Query workbook permission API GET /api/api-version/sites/site-id/workbooks/workbook-id/permissions
    • Delete workbook permissions API

      DELETE /api/api-version/sites/site-id/workbooks/workbook-id/permissions/groups/group-id/capability-name/capability-mode  and DELETE /api/api-version/sites/site-id/workbooks/workbook-id/permissions/users/user-id/capability-name/capability-mode

Conclusions:

  1. You should reduce vizqlserver.querylimit from default 1800 seconds to somewhere 180-300 seconds
  2. Next step is to implement slow workbook VizAlert warning
  3. Depends on your organization culture, if VizAlert alone does not change publisher’s behavior, you can take enforcement actions to remove workbook permissions automatically  after a few warning.  Why remove permission is the best approach:
  • It stops end user access the workbook so no more slow render that impacts everyone else on the server
  • It still allows workbook owners to work on the perf tune which is what you wanted
  • It gives workbook owner flexibility to re-grant workbook access right away to meet business continuity for whatever reasons
  • Any updates of the workbook will re-set the 4 week clock – it means that it buys another 4 week time for owner to improve the workbook