Advanced Deployment – Reduced upgrade time from 50 to 5 hrs

Tableau server backup can take very long time. It takes 20+ hrs for my server even after implemented Filestore on initial node and very aggressive content archiving.  Of course, part of the reason is that Tableau does not have incremental backup.

There are two problems with long backup : One is that you Disaster Recover (DR) data is far from Prod data. Two is that upgrade process takes long time, likely whole weekend.

This blog talks about how to reduce upgrade time from about 50 hrs to 5 hrs. 

  • Large Tableau sever upgrade used to take 50 hrs pre-TSM
  • Same Tableau server upgrade takes about 30 hrs with TSM (V2018.2 or newer)
  • Same Tableau server upgrade can be done within 5 hrs with TSM
  1. It used to take 50 hrs for the upgrade

upgrade1

2. Thanks for TSM (v2018.2) that allows new server version installed while current version still running. The upgrade time with post-TSM cuts almost half (means upgrade from v2018.2 or above to newer version). Here is how it looks like:

upgrade6

3. When I looked at the above timeline, I asked myself : Is it possible to skip the cold backup process so upgrade can be done within 5 hrs? It is possible with two options:

  • Option 1: Assume that you will have a hot backup done, if cold backup is skipped and upgrade went south that you have to restore from the backup. What it mean is that you will miss about 20 hr server changes – anything after hot backup started are gone and you have no way to know what those are anymore: Are IT and business willing to take this risk? If yes, you are lucky and you can just skip the cold backup.
  • Option 2: Most likely many others can’t take the risk. For IT, upgrade can fail but IT has to be able to get back the data/workbook/permissions for business if upgrade failed. At minimum, IT has to know what the changes are. Here is what we did – I called it the BIG idea – that is to track all the changes for a period of about 24 hrs:Upgrade4

4. How to skip cold backup but track changes?

  • How it works is that you will have two hot backups. The hot backup 1 is restored to DR while hot backup 2 is saved but not restored (no time to restore before upgrade)
  • Skip cold backup and then complete the upgrade within 5 hrs.
  • If upgrade failed in such way that restore has to be done to get server back. You can restore from hot backup 2 that misses about 20 hrs data (from point 1 to point 2). Then you will need to let impacted publishers know those changes, so they can manually re-do the missing data provided by Server team:
        • Workbooks
        • Data Sources
        • Projects
        • Tasks
        • Subscriptions
        • Data-driven alerts
        • Permissions

5. The net result is that server upgrade is done within 5 hrs. Wow! That is huge!  If things go south, IT server team has all changes tracked – that is more like incremental backup. The difference is that most likely business publishers should re-do those changes.

Upgrade5

6. How to track the changes for the following objects?

  • Workbooks
  • Data Sources
  • Projects
  • Tasks
  • Subscriptions
  • Data-driven alerts
  • Permissions

Just query your Postgre database directly, you can easily get all of them from one point to another point since all those objects have timestamp except Permissions that is very tricky.

Like it or not, Tableau’s permission tables do not have timestamp! I personally gave feedback to Tableau Dev team already but it is what it is.

You can find Tableau permission workbook @ https://community.tableau.com/message/940284 and one option is that you run it twice and diff it.

Re-cap: Both business and IT are extremely happy with 5 hr upgrade process while it used to be 50 hrs or at least 30 hrs. 

 

 

 

 

 

 

 

 

 

 

 

Advanced Deployment – Run FileStore on Network Storage

What? You can run FileStore on network storage? Yes, it is doable with fair good performance. And the benefit is to have DR data only 2 hrs behind prod while it used to be 50 hrs behind for large & extract heavy deployment.

Before you continue, the intent of this blog is NOT to teach you how to configure your Tableau server to run on network storage. Because it is not supported by Tableau yet. Instead the intent of this blog is share with you the possibility of awesome new feature coming in the future…

The Problem Statement: When your server FileStore gets close to 1TB (it happens in large enterprise server with extract heavy deployment even you are doing aggressive archiving), the backup or restore can take 20 hrs each. It means that DR data is at least 50 hrs behind considering file transfer time.

  • The server upgrade can take the whole weekend
  • The server users will see 2 day old data whenever user traffic is routed to DR (like weekly maintenance)

The Solution: Config FileStore on network storage so that all extract files can be snapshot to DR in much fast speed leveraging network storage’s built-in snapshot technology.

Impact: The DR data can be about 2 hrs behind prod vs 50 hrs.

screenshot_564

 

 

 

 

 

 

 

How it works?

  • After it is configured, the server works the same way as file store on local disk. No user should notice any difference as long as you use Tier 0 (most expensive) SSD network storage (NetApp or EMC for example)
  • Server admin should see no difference as well when use TSM or Tableau server admin views
  • Does it work for Windows or Linux? I am running it on Linux after working with Tableau Dev for months. Tableau Dev may have a alpha config for Windows but  I don’t know
  • Can we run repository on network storage as well? That was what we had initially but it also means single repository for the whole cluster that posts additional risk.  I am running repository on local and have two repositories.
  • Does it means that you can’t have 2nd Filestore in the cluster? You are right – single filestore only on network storage. Is it risky? It has risk but it is common enterprise practice for many other large apps.

New process to backup and restore:

screenshot_567 

  • Regular tsm maintenance backup handles both repository and filestore extract nicely together. Now we do not want to backup filestore anymore, so use the tsm maintenance backup --file <backup_file> --pg-only
  • Unfortunately when you use pg-only backup,  the Restore will fail since repository and Filestore are not in the ‘stable’ status.
  • What happens is that both repository and filestore syncs internally within Tableau server constantly . For example, when new extract is added to filestore, the handle of the extract has to be added in repository. When extract is deleted from filestore (old extract, or user workbook deletion, etc), the handle has to be deleted from repository otherwise Postgre will fail to start during integrity checks after Restore.
  • One critical step is to stop the sync job between repository and filestore before the backup happen to ensure both repository and filestore are in the ‘stable’ status that can be separately sent to DR
  • Of course, after backup is done, restart the  repository and filestore sync jobs to catch up sync.

What it means with Tableau’s new 2019.3 Amazon RDS External Repository?

When Tableaus’ repository can run on Amazon RDS external database, it potentially means future deduction of repository’s availability on DR. Hopefully the 2hrs backup/send/restore repository can be reduced to minutes. I have not tried this config yet.

Re-cap:  You can run FileStore on network storage for much better DR potentially from 2 days to 2 hrs.

 

Advanced Deployment – Content Migration

One big Tableau’s success factor is its self-service – Business dashboard creators can publish their workbooks in self-serve manner without IT’s involvement. Although IT may be involved for some data readiness, data ETL and data preparations. By large, most  Tableau implementations empower business users to create and release their dashboards directly.

When you drive more and more Tableau adoptions, you soon will realize that you need some good governance as well to make sure single source of truth,  data access policy consistence (among multiple self-service publishers), workbook style consistence, and avoid duplications of content, etc.

How to control or govern the publishing? There are multiple ways to go, depends on the nature of dashboards (data sensitivity, audience, purpose) and how much control you wanted to have:

  1. Stage vs Official project : Only approved publishers can publish to Team Official project while a lot more people can publish to Team Stage project.
  2. Landing page area within Tableau: The landing page is actually another workbook that your end users go to.  The workbook is more like ‘the table of contents’, it uses URL actions to go to each separate workbook, and only dashboards listed in this landing page workbook are official ones.
  3. Portal embedding Tableau views outside Tableau: Most audiences will not go to Tableau server directly for dashboard but they go to portal that has all ‘approved’ dashboard embedded. The governance/control process happens at the portal since only approved content will be available to end users via embedded  portal access.
  4. Test vs Prod server: You don’t allow publishers to publish to Prod server directly. Instead you find a way for them to publish to a Test server, then with proper approvals, the dashboards will be ‘pushed’ to Prod server.

The control level and difficulty level are as followings : Test vs Prod server > Portal embedding > Landing page > Stage 

There are many blogs about Staging, Landing page and portal embedding, this blog focuses on the Test vs Prod.

How to automate Test vs Prod server migration? It is common to have test env but it is also common that a publisher has publish access directly to both Test and Prod so self-service publishing can be done. However for some special use cases (like external dashboards, etc) where you absolutely do not want anyone to publish directly to the Prod server w/o approval at each workbook level and you want to automate the process, here is how to:

  1. The best way is to use Tableau’s new Deployment Tool that is part of 2019.3 beta now but available for Windows now.  Deployment Tool enables the governance of Tableau server workbook and data source migrations. Admins and technology groups can finally automate and scale the distribution of workbooks/data sources between development, staging and production environments. It needs additional Server Management license which is new add-on feature.
  2. Custom scripts using API for workbook and data source migrations. The high level approach is to download the workbook and data source from source server (using API) and then publish them to target server (using API). Sounds easy but the most difficult part  is to handle the embedded password. It has a few scenarios :   embedded password for live connections, for embedded data sources and for separately published data sources.  Good news is that it is all doable. Bad news is that it is very very difficult and it is more like ‘password hack’. I will not recommend this approach unless you work with Tableau’s Professional Service team. My organization had this done by working with Tableau’s Professional Service team and it works great for us.

I am still testing Tableau’s new Deployment Tool to understand what it offers.  My quick sense is that Deployment Tool should work with most of the organizations for content migration purpose. However I am not sure about it scalability – for very large enterprise customers that you have a lot workbooks/data sources to migrate from one server to another continuously, customer scripts with multi-threads will  give you better scalability.