Advanced Deployment (9/10): OPTIMIZE BACKGROUNDER (extract AND SUBSCRIPTION) efficiency

Are you facing the situation that your Tableau server backgrounder jobs having long delay? You always have limited backgrounders. How to cut average extract/subscription delay without adding backgrounders? This webinar covers the following 4 things:

  1. Suspend extract for inactive content
  2. Reduce extract frequency per usage
  3. Dynamic swap vizQL and backgrounder
  4. Incremental or smaller extracts run first
  5. VIP extract priority
  • Download slides here

  • Watch recording here

  1. Suspend extract for inactive content

I used to suspend extract for inactive content by using Python. But thanks for Tableau v2020.3 that made this as  out-of-box feature. This feature should be the first thing every server admin does. The good thing is that this feature is ON as default with 30 days.

suspend extract for inactive content

2.  Reduce extract frequency per usage

Challenge:There are many unnecessary extract refreshes due to the fact that all schedules are available to every publisher who have complete freedom to choose whatever schedules they wanted. Although workbooks not used for weeks at all would be suspended as part of new v2020.3 feature. But what if the workbook with hourly or daily extract but only use once a month? … Maybe initially usage is high but overtime usage went down but publisher never bothers to reduce refresh frequency…. They have no incentive to do so at all.

Solution: Set Usage Based Extract Schedule – reschedule the extract frequency based on usage

For example:

  • Hourly refresh changes to daily if workbook not used for 3 days
  • Daily changes to weekly if workbook not used for 3 weeks
  • Weekly changes to monthly if workbook not used for 2 months

A few implementation notes:

too much workbook refresh

  • Here is how it works:
    • Find out the last_used (by days)
      select views_workbook_id, ((now())::date – max(last_view_time)::date) as last_used from _views_stats
      group by views_workbook_id
    • Find out refresh schedule by joining tasks table with schedules table
    • Do the calculation and comparison.  For example
      extract change

How to change schedule frequency?

  • Manual approach : Change the workbook or datasource refresh schedule based on the attached schedule change recommendation workbooks
  • Automation:  No API to change schedules.  The schedule change can be done to update tasks.schedule id (tasks is the table name and schedule id is column name):

UPDATE tasks
SET schedule id = xxx
WHERE condition;

A few additional notes:

  • How to figure out which schedule id to change to? Let’s say you have 10 daily schedules, when you change from hourly to daily, the best way is to randomly choose one of the 10 daily schedules to avoid the situation that overtime too many jobs are on one specific schedule.
  • What if publisher changes back from daily to hourly? They do have freedom to change their extract schedules at any time. This is the world of self-service. However they will not beat your automatic scripts over time. On the other side, this feature will help you to get buy-in from business.
  • How much improvement can you expect with this automation? Depends on your situation. I have seen 50%+ delay reductions.
  • Is the automation approach supposed by Tableau? NO NO. You are on your own risk but the risk is low for me, return is high.

3.  Dynamically swap backgrounders and VizQL

Tableau’s Backgrounder handles extract refresh or subscriptions and VizQL handles viz render. Often VizQL has more idle time during night while Backgrounder has more idle time during day. Is it possible to automatically configure more cores as Backgrounders during the night and more VizQL during the day? The dream becomes true with Tableau TSM from V2018.2.screenshot_3651

  • How to identify the right time to swap?
    • Get extract delays by hour from backgrounder_jobs table
    • Get VizQL usage by hour from http_requests table
    • The use pattern will show the best time to swap
  • Can I have the scripts to swap? Click Backgrounder swap with VizQL scripts.txt.zip to download working version of the scripts
  • What happens with in-flying tasks when the backgrounder is gone? The tasks will fail and get re-started automatically from V2019.1

4.  Incremental or smaller extracts run first

screenshot_3652

  • Make sure to educate your publishers since this feature is a great incentive for them
  • How to config incremental goes first? There is nothing to config for incremental goes first. It is out-of-box Tableau feature
  • How to config Smaller Full Extract goes first?
    • V2019.3 : No Config Required
    • V2019.1 & V2019.2: backgrounder.enable_task_run_time_and_job_rank  & backgrounder.enable_sort_jobs_by_job_rank
    • V2018.3or older backgrounder.sort_jobs_by_run_time_history_observable_hours -v 180 (recommend 180 hrs to cover the weekly jobs)

5  VIP Extract Priority

Challenge : If you will have to give higher priority to some extracts, the challenge is that new revision of the workbook or datasource will set extract priority back to default 50.

Solution : You can automate it by (no API available)

UPDATE tasks

SET tasks.priority = xx

WHERE condition;

Read more @ https://enterprisetableau.com/extract/

Re-cap: Although Tableau did not give server admins enough control on the extract refresh schedule selections for given workbook or datasource, there are still ways to govern your Tableau server backgrounder jobs :

  • Reduce extract frequency per usage will reduce all the unnecessary refresh. This can increase 50% your backgrounder efficiency.
  • Dynamic swap vizQL and backgrounder will give you more machine power. This can get 50% more backgrounders depends on your use pattern.
  • Incremental or smaller extracts run first is out-of-box feature. Make sure to let publishers know as this is a incentive for them to design effective extracts.
  • VIP extract priority may not help a lot for the backgrounder efficiency comparing with other 3 items but this is one of things that you may have to do per business need

11 thoughts on “Advanced Deployment (9/10): OPTIMIZE BACKGROUNDER (extract AND SUBSCRIPTION) efficiency”

  1. For the out-of-box feature of smaller extracts running first in 2019.3+ server, do you know if there’s a certain time frame it uses to figure that order out? Is it just based on the last run of the extract, or is it an average of a specific amount of time? (e.g. avg. duration from the last 30 days)

  2. Thanks, Mark for the valuable post. It looks like both .tbw files have been auto-deleted from the portal. Is it still available in the tableau community?

  3. How do you manage the “race to the bottom” I have seen in some larger Tableau deployments, where everyone starts to schedule their refreshes at priority 0 to try and get it to run at the scheduled time?

    1. Priority 0 can be triggered by clicking the ‘run now’ manually or someone using tabcmd. Now there is control at site level to turn off ‘run now’ if that is the problem. For tabcmd that needs local server user password, I am not giving any local server user password anymore so no new user can use tabcmd anymore, instead, I ask everyone to use REST API that can trigger extracts as well but all extracts triggered using API have priority 50.

      1. Understood. Along the same lines then, what is there to prevent everyone setting their refresh to priority 1? Site admins are able to do this. I’ve even engaged in it a little myself…

        I’m geniunely interested, this is something I have grappled with but, as a consultant, I have never stayed in an org long enough for this to become my problem.

        1. Andy, Only site and server admins can change extract priority. My server has many sites but I do not have the problem that a lot site admins change many of their extracts to priority 0 – it is mainly due to the fact that my extract peak delay is about 10+ minutes only so extract delay is not an issue anymore. If you have done all mentioned in this blog and extract delay is still very big so many site admins change their site’s extracts to priority 1, you can set policy (for example per site, 5% extract priority 1-20, 10% extract priority 21-49) and build automatic scripts to enforce this.

Leave a Reply