Automation – Set Usage Based Extract Schedule

Are you facing the situation that your Tableau server backgrounder jobs have longer delay?

You always have limited backgrounders. How to cut average extract delay without adding extract backgrounders ?

I got a lot positive feedback about my blog  SCALING TABLEAU (2/10) – SET EXTRACT PRIORITY BASED ON DURATION  that sets higher extract priority for smaller extract. Grant Eaton recommended to use 180 hrs (7.5 days) vs default 36 hrs to catch weekly jobs when turn on  ‘run faster extract refresh jobs first’ feature. The commands are as followings:

(2018.2 or newer)
tsm configuration set -k backgrounder.sort_jobs_by_run_time_history_observable_hours  -v 180

(10.0 – 2018.1)
tabadmin set backgrounder.sort_jobs_by_run_time_history_observable_hours 180

This blog talks about one more big technique to further improve extract efficiency :  Set Usage Based Extract Schedule

Challenge:  There are many unnecessary extract refresh due to the fact that all schedules are available to every publisher who have complete freedom to choose whatever schedules they wanted.

For example, a workbook is not used for one week but still refresh daily or hourly… Maybe initially usage is high but overtime usage went down but publisher never bothers to reduce refresh frequency…. They have no incentive to do so at all.

 Solution: Set Usage Based Extract Schedule – automatically reschedule the extract frequency based on usage (updated on 12/17/2019 with workbooks attached)

For example:

  • Hourly refresh changes to daily if workbook not used for  2 days
  • Daily changes to weekly if workbook not used for 2 weeks
  •  Weekly changes to monthly if workbook not used for 2 months

A few implementation notes:

  1. Make sure to get agreement with business leaders before implementation
  2. Send automatic email to impacted workbook/data source owner when schedule changed
  3. How to identify unnecessary extract?  Feel free to download the attached workbooks at the end of this blogtoo much workbook refresh. Here is how it works:
    A. Find out the last_used
    select views_workbook_id, ((now())::date – max(last_view_time)::date) as last_used from _views_stats
    group by views_workbook_id
    B. Find out refresh schedule by joining tasks table with schedules table
    C. Do the calculation and comparison.  For example
    extract change
  4. How to change schedule frequency?  There is API for Get Extract Refresh Tasks, and Add Workbook to ScheduleAdd Data Source to Schedule but I have not seen API to remove workbook or data source from schedule, no API to change schedules. Similar as SET EXTRACT PRIORITY BASED ON DURATION where had to use a program to update tasks.priority directly (tasks is the table name and priority is column name), this schedule change can be done to update tasks.schedule id (tasks is the table name and schedule id is column name):
    UPDATE tasks
    SET schedule id = xxx
    WHERE condition;
  5. How to figure out which schedule id to change to? Let’s say you have 10 daily schedules, when you change from hourly to daily, the best way is to randomly choose one of the 10 daily schedules to avoid the situation that overtime too many jobs are on one specific schedule.
  6. What if publisher changes back from daily to hourly? They do have freedom to change their extract schedules at any time. However they will not beat your automatic scripts over time.
  7. How much improvements can you expect with this automation? Depends on your situation. I have seen 50%+ delay reductions.
  8. Is this supposed by Tableau? NO NO. You are on your own risk but the risk is low for me, return is high.

Here is the workbook to show which workbook or datasource refresh should be re-scheduled:

5 thoughts on “Automation – Set Usage Based Extract Schedule”

  1. Hello, Could you please give more insight on this task ?

    I just want to list out workboks which refreshes hourly and no of hits. Please help me out.

    Regards,
    Param

Leave a Reply