Automation – Workbook/Disk size governance

My previous post (Automation – Advanced Archiving) talks about auto deletion of unused workbooks. Overtime, 2/3 of workbooks are removed – both business and IT are happy about it : Business users have only active and useful assets in their Tableau portal. IT’s  backup/restore is faster.

However I still found some active workbook’s size too big, for example can be 10G+ with extracts.  Tableau server does not have any control to stop large size data source/workbook being published.  Most of those large workbook’s perform badly anyway.

How to disencourge the bad behavior of publishing very large workbooks? How to govern disk size?

The answer is again deletion.

1. Delete large unused workbooks more aggressively

The best way to encourage smaller workbook size is to delete large workbooks more aggressively. For example, if your regular policy is to delete workbooks not used for 3 months. You can introduce size factor :

  • Delete workbooks not used for 2 months if workbook size between 2G-5G
  • Delete workbooks not used for 1 month if workbook size between 5G-10G

2. Delete very large active workbooks

Can you have policy to delete super large but actively used workbooks? It really depends on your corp policy and business-IT relationship. I have a policy to delete any workbooks with size larger than 10G daily – even it is actively used workbook.  How it works?

  •  Business-IT agrees on the policy  –  no workbook can be larger than 10G on server.  Unfortunately Tableau server does not have this feature so we have to have our own automation program runs hourly (can be daily) to delete any workbooks > 10G in size.
  • Of course, any deletion notification will be sent to workbook owner with policy stated in the message.

3. How to handle the situation that workbook size gradually increasing to the enforced deletion threshold? 

  • A separate size alert would be necessary to let data source / workbook owner know that his or her workbook is inches away fromvbeing deleted so action can be taken by workbook owners.

Feel free to add your comments ….

Automation – Advanced Archiving

My previous post (Automation – Set Usage Based Extract Schedule) provides a practical server governance approach that re-schedules self-service publisher’s extracts based on workbook usage automatically.

This blog talks about handling old workbooks that nobody uses anymore over a period of time.  The keyword is archiving. Many server admins are doing archiving. The tips and tricks in this blog will enlighten your thinking about this topic, which is why I call it advanced archiving.

  1. Do not archive but delete

The common IT way of doing things is to make copy of ‘old workbooks/data sources’ somewhere else, then business workbook/data source owners can download when needed. This is old way of doing things since it creates more support work for technical team (like workbook owners could not find the archiving URL or workbook, etc). The much better way is no archiving but just deletion, then send the deleted workbooks to owners.

2. Send old workbooks to owners automatically

For the workbook met deletion criteria, call server API (GET /api/api-version/sites/site-id/workbooks/workbook-id/content)  to download the workbook. If the workbook is twb, perfect; If the workbook is twbx, rename it as zip, unzip it, ignore the .hyper (or .tde) but get .twb only. Then send the .twb to workbook owners (and project leaders if needed) email with .twb attached. Key benefits are as followings:

  • Workbook owner can always search their email inbox to get the deleted workbooks if they need to re-publish again later on.
  • Do not email .hyper (or .tde) due to its size and data security concerns

3. Delete first, then send notifications

It is a common mistake that server admin sends a list of workbooks for owners to confirm before archiving, which creates unnecessary clicks on server. Please delete those old workbook first from server then send the notification with .twb attached and policy link in the email body.

4. Delete more aggressively for larger workbooks

How to define old? Some use 180 days but I use 15-90 days depends on size of workbooks:

  • Regular workbooks get deleted if no usage for 90 days
  • Workbooks with 2G+ size  get deleted if no usage for 30 day
  • Workbooks with 5G+ size  get deleted with no usage for 15 day

5. Delete published data sources as well

When you delete workbooks, some published data sources have no connected workbook anymore over-time:

  • Delete standalone data source if it is created more than 2 (or 4) weeks back – you do not want to delete the recently published data sources

6. Technical implementation details

Use historical_events table to drive usage cals. Make days of no-usage as part of email body vs policy so workbook owner does not have to guess why the workbook deleted. If you use size criteria as well, get the workbook size in the email body as well.

7. Get buy-in from business management for those policy

You want to get buy-in from business leaders for those policy,  document the policy, and then the email notification always includes a link of this policy.  It is a lot of easier than what most people think to get buy-in. Why?  Business loves the fact that server deletion makes interactor’s life much easier to find the active content. The higher level you do, the easier to get buy-in.

8. How to identify those workbooks not used for long time?

One way is to use the following criteria :

select views_workbook_id, ((now())::date max(last_access_time)::date)  as last_used
from _views_stats
where last_used > 90
group by views_workbook_id

 

Download the Tableau Workbook Archiving Recommendation.twb

Updated on June 8, 2019: Pls read Automation – Data Source Archiving