Advanced Deployment : Make VIP extract priority stick

I have a very popular blog about extract priority already : SCALING TABLEAU (2/10) – SET EXTRACT PRIORITY BASED ON DURATION.  It  increased  70-80% efficiency on my server. Today’s blog is for different use case –  VIP extracts.

Do you have VIP extract priority ask? How to handle it if you have business requirement to keep a few mission critical extracts higher priority?

There are at least options to make it happen:

  • Option 1: Create high priority schedule – the problem with this approach is that you still can’t hide this schedule so other publishers can still choose to use for their regular extracts – not easy for admin to control. Some people name the schedule as ‘ask admin before use’…..
  • Option 2: As as admin, you can change the identified extract priority (no matter embedded in workbook or published data source) to a small number (like 5) as VIP high priority – the problem with this approach is that every time the workbook or published data source has a new revision, the priority changes back to 50 (default) automatically.

We have been using Option 2 for long time but requesters complain from time to time since it is a lot of work from both sides to communicate the new revision and manually change extract priority.

Recently one idea came to live and I am extremely happy about it. Here is what we are able to achieve now : Making the identified VIP extract priority stick – means that even the owner changed the workbook or data source, the priority will still keep the VIP high priority so admin does not have to manually change every time. WoW!

How to do it? The answer is again a Python program:

  • Create a simple CSV file with site_name, extract_type (value is ‘workbook’ or ‘data source’ – this will make next step much easier), extract_name, VIP_priority (for example, 5, or 10, etc)
  • Create a simple Python to read the CSV line by line, match the record with PostGre database to find the exact extract task.
  • Change the extract priority to the priority in CSV by using the following command:

UPDATE tasks
SET tasks.priority = xxx
WHERE condition;

  • Schedule the Python (maybe a few times a day)

What happens is that when new revision was created by owners, the  VIP extract priority is actually changed back to 50 – this is something Tableau controls automatically (we do not want to customize Tableau feature at all so upgrade made simple). However a few hours later, the priority will be reset back to what it is supposed to be based on the CSV file that is the master control file.

I agree that it is not a perfect solution but I am pretty happy with this solution – it is a lot of better than manual updating each time after publishers have a new revision of their VIP extracts – it saves tons of communication.

Love to hear your alternative approach.

 

 

Automation – Increase Extract Efficiency

Recently I was talking with a Tableau Center of Excellent leader of well-known hi-tech company. I was told that the reason why his company could not go bigger with Tableau is the extract challenge or too many extract delays that they do not know how to handle……

I shared with him how did I resolve it. I’d love to share with everyone. Here is a few things that we did to solve the famous extract challenge:

  1. Reduced subscription timeout from default 30 mins to 5 mins (or 3 mins depends on your situation) . Pls read  Automation – Timeout long subscriptions and auto send email to workbook owner.
  2. Turn on ‘sorting extract jobs based on last run time’ (updated on Oct 18, 2019: Thank you Yuri Fal who pointed out that this setting is automatically enabled since V2019.3 so there is no need to run this user config anymore per kb).
    • tsm configuration set -k backgrounder.sort_jobs_by_run_time_history_observable_hours  -d  168
    • What it does? Tableau Server can sort full extract refresh jobs so they are executed based on the duration of their “last run,” executing the fastest full extract refresh jobs first.
    • The “last run” duration of a particular job is determined from a random sample of a single instance of the full extract refresh job in last <n> hours. Full extract jobs are then prioritized to run in order from shortest to longest based on their “last” run duration.
    • Default is off.  Tableau documentation recommends 36 hrs but 168 hrs to cover the weekly jobs as well is a good approach
  3. Automatically re-schedule extract jobs based on usage
    • Tableau gives complete flexibility to publishers who can choose any schedules. It ends up with a lot of unnecessary extracts jobs that admin doesn’t know how to handle.
    • Tableau does not give server admin much control at all for this. My approach is to create an add-on feature by writing scripts to re-schedule jobs automatically based on duration, pls read my blog Automation  – Set Usage Based Extract Schedule
    • This is a game changer to scale extracts on Tableau server.  Hourly schedule can become daily if the workbook not used for 2 days. Daily becomes weekly, weekly becomes monthly.
    • It is not an officially supported approach – you are on your own risk to do this.
  4. Change extract job’s priority based on avg execution duration
    • This is an add-on feature that you will have to build. Try this only if the above step 1-3 does not give you the extract waiting time you are looking for.
    • The intent is to change extract priority to higher priority (like 30 ) for any  extracts with duration below median average.    You can start to  change the extract priorities manually to see how it goes. Just be aware that any re-publishing of extracts will change priority back to default 50.
    • Unfortunately  I have not seen API for it. You will need to update PostGre table directly. So it is NOT officially a supported approach.
    • All you have to do is create a program to update tasks.priority directly (where tasks is the table name and priority is column name)
    • Pls read Next PostScaling Tableau (2/10) – Set Extract Priority Based on Duration

Conclusion : Extract delay is a  common challenge for last Tableau deployment. Part of the reason is that publishers can choose any schedules w/o any restrictions- A lot of publishers choose hourly schedules while daily is good enough, or daily schedules while weekly is good enough. Also they schedule it and forgot it – when the workbook is not used anymore, the extract refresh for the workbook is still running…..  This blog summarize two less known but out of box Tableau server configurations that can be used and two additional script work that can get things done better with some efforts.