Scaling Tableau (1/10) – version control and revision history

Tableau released one of the most wanted server features – version control and revision history in V9.3. Then this feature is  much more enhanced in V10 with previewing old workbook,  one click restoring, and maximum revisions setting. I love all of those new V10 features:

  • The workbook previewing and restoring features are so convenience for publishers.
  • The maximum revision setting is so cool for server admin who can actually control the server space usage so you do not have to run out of storage while enabling revision history. It also shows Tableau’s thought process for built-in governance process while enabling a new feature, which is important to scale Tableau to enterprise.   I will explain those features in details here:
  1. Turn it on. By default, Revision History is not turned on. It can be turned on site by site. To turn it on, go to site Setting, General and select  “Save a history of revisions“.  If you are on V10, you have two choices of Unlimited and # of revisions. Unlimited means that there is no limit on the max version history, which you probably do not want to have. As a server admin, you always want to make sure that your server will not run out of space. You will find # of revision is a very handy feature so admins can have some peace of mind about server storage.Screen Shot 2016-11-27 at 3.27.57 PM

2. How to decide the max. number of revisions?

I asked this question but I did not find any guidances anywhere. I spent days of research and I wanted to share my findings here. First  of all,  my philosophy is to give the max flexibility to publishers by providing as many revisions as possible. On the other side, I also want to be able to project extra storage that the revision history will create for planning purpose.

How many revision you should set? It depends on how much space you can allocate to revision history w/o dramatically impacting your backup/restore timing and how many workbooks the server have. Let’s say that you are Ok to give about 50G to all revision history. Then figure out how many workbooks you have now, and what is the total space for all the xml portion of workbooks (revision history only keeps xml piece), then you can calculate max number of revisions. Here is how:

  • Open Desktop, connect to PostgreSQL, give your server name, port, workgroup as database, give readonly user and password. Select  Workbooks table, look for Size, Data Engine Extracts, and number of records.  The Data Engine Extracts of  Workbooks table tells you if the workbook is embedded workbook or not.
  • If you have total 500 workbooks with 200 of them have Data Engine Extracts as false and total size as 200M for all workbooks with Data Engine Extracts as false.  It means that the avg twb is about 1M per workbook – this is what revision history will keep once it is turned on. Then the total xml size of workbook is about 500M.
  • When you turn on revision history and if you set max revision as 50, overtime, the server storage for revision history would be about 50 x 500 x 1M = 50G overtime.  Two other factors to consider: One is new workbook creation rate, two is that not every workbook would max out revision.
  • Once you set the revision number, you can monitor the storage usage for all revision history by looking at  Workbook_versions table which keeps all the revision history.  You can find the overall size, number of versions, and more insights about use pattens. You can also do the following joins to find out workbook name and use name, etc.

Screen Shot 2016-11-27 at 10.10.39 PM

3.  Can interactors see the previous version as well? No. The end users of interactors can only see the current version.

4. Does publish have to do anything to keep revision history of his or her workbooks? No. Once ‘Save a history for revision’ is turned on for site, every time the  workbook is web edited or modified via Desktop, a new revision w be created automatically – there is no further action for publisher. When the max number of revision is reached out, the oldest version will be deleted automatically. There is no notification to publishers either. All you need to communicate to publisher is that max number of revisions that any publisher can have.  For example, if you keep  50 revisions and one workbook has 50 revision already. When this workbook is changed again, Tableau server will keep the most recent  50 revisions only by deleting the oldest revision automatically.

5. Can you change the max revisions? Yes. Let’s say you have max revision as 50 and you want to reduce it to 25. Tableau server will delete the old revisions (if there are any) and keep the most recent 25 revisions only. What happens if you change back from 25 to 50? All the older revisions are gone and will not show up anymore.

6. What is workflow for publisher to restore an old workbook? Publishers or admin can see revision history for their workbooks by click details, revision history. With one simple click to preview any old workbook or restore. Once it is restored, a new revision will be created automatically again.

7. How to restore data source revision? V10 came with review and restore features for workbooks only. You can view all revisions for data sources as well but you will have to download the data source and upload it gain if you want to restore older version of data source. I am sure Tableau’s scrum team has been working on one click restoring of data source as well.

Leave a Reply