Recently I was talking with a Tableau Center of Excellent leader of well-known hi-tech company. I was told that the reason why his company could not go bigger with Tableau is the extract challenge or too many extract delays that they do not know how to handle……
I shared with him how did I resolve it. I’d love to share with everyone. Here is a few things that we did to solve the famous extract challenge:
- Reduced subscription timeout from default 30 mins to 5 mins (or 3 mins depends on your situation) . Pls read Automation – Timeout long subscriptions and auto send email to workbook owner.
Turn on ‘sorting extract jobs based on last run time’(updated on Oct 18, 2019: Thank you Yuri Fal who pointed out that this setting is automatically enabled since V2019.3 so there is no need to run this user config anymore per kb).- tsm configuration set -k backgrounder.sort_jobs_by_run_time_history_observable_hours -d 168
- What it does? Tableau Server can sort full extract refresh jobs so they are executed based on the duration of their “last run,” executing the fastest full extract refresh jobs first.
- The “last run” duration of a particular job is determined from a random sample of a single instance of the full extract refresh job in last <n> hours. Full extract jobs are then prioritized to run in order from shortest to longest based on their “last” run duration.
- Default is off. Tableau documentation recommends 36 hrs but 168 hrs to cover the weekly jobs as well is a good approach
- Automatically re-schedule extract jobs based on usage
- Tableau gives complete flexibility to publishers who can choose any schedules. It ends up with a lot of unnecessary extracts jobs that admin doesn’t know how to handle.
- Tableau does not give server admin much control at all for this. My approach is to create an add-on feature by writing scripts to re-schedule jobs automatically based on duration, pls read my blog Automation – Set Usage Based Extract Schedule
- This is a game changer to scale extracts on Tableau server. Hourly schedule can become daily if the workbook not used for 2 days. Daily becomes weekly, weekly becomes monthly.
- It is not an officially supported approach – you are on your own risk to do this.
- Change extract job’s priority based on avg execution duration
- This is an add-on feature that you will have to build. Try this only if the above step 1-3 does not give you the extract waiting time you are looking for.
- The intent is to change extract priority to higher priority (like 30 ) for any extracts with duration below median average. You can start to change the extract priorities manually to see how it goes. Just be aware that any re-publishing of extracts will change priority back to default 50.
- Unfortunately I have not seen API for it. You will need to update PostGre table directly. So it is NOT officially a supported approach.
- All you have to do is create a program to update tasks.priority directly (where tasks is the table name and priority is column name)
- Pls read Next PostScaling Tableau (2/10) – Set Extract Priority Based on Duration
Conclusion : Extract delay is a common challenge for last Tableau deployment. Part of the reason is that publishers can choose any schedules w/o any restrictions- A lot of publishers choose hourly schedules while daily is good enough, or daily schedules while weekly is good enough. Also they schedule it and forgot it – when the workbook is not used anymore, the extract refresh for the workbook is still running….. This blog summarize two less known but out of box Tableau server configurations that can be used and two additional script work that can get things done better with some efforts.
Hi Mark,
According to this Tableau KB info, the backgrounder.sort_jobs_by_run_time_history_observable_hours key is deprecated: https://kb.tableau.com/articles/issue/unable-to-use-configuration-key-backgrounder-sort-jobs-by-run-time-history-observable-hours-in-tableau-server
Yours,
Yuri