I’d like to complete my scaling Tableau 10 serial blogs with architecture and automation topic. If you follow the tips/approaches in this scaling Tableau 10 serials and governance self-service 10 serials, you should not have any problems to deploy Tableau at enterprise with thousands of Desktop publishers on a few hundred core server cluster that supports ten thousand extracts/day, ten thousands unique active users/day with a few million clicks/month.
Architecture
- Prod, DR and Test : It is advisable to have 3 different env for any large Tableau deployment: Prod, DR and Test:
- DR: During regular maintenance when Prod is down, server user traffics will be routed automatically to DR cluster. The best practice is to restore Prod to DR once a day so DR can have relative new contents. If you use extracts, it is trade-off if you want to refresh extracts on DR or not. If yes, it will be double load to your data sources, but your DR will have latest data. If not refresh DR, your DR will have one day old data during weekend prod maintenance period. If you create extracts outside Tableau and use Tableau SDK to push new extracts to server, you can easily push the extracts to both Prod and DR to keep DR data refresh.
- Test: It is not advisable to publish all workbooks on Test instance before Prod although it is a common transitional SDLC approach. If you do so, you are creating a lot of extra work for your publishers and server admin team. However it does not mean that you can ignore the controls and governance on prod version of workbooks. The best practice is to control and govern workbooks in different projects within the Prod instance. Then you may ask, what Test instance is for? For Tableau upgrade, OS upgrade, new drivers, new configuration file, performance test, load test, new TDC file, etc. Of course, Test can still be used to validate workbooks, permissions etc.
- Server location: For best performance, Tableau server should be installed in the same zone of the same data center as where your data sources are. However your data sources are likely in different data centers, current Tableau server cluster does not support WAN nodes, you will have to choose one location to install your Tableau server cluster. There are so many factors impacting workbook performance, if your West Coast server has to connect live to large data in East Coast data source, your workbook will not have a good performance. Option is to use extracts or split into two clusters – one in East Coast mainly for East Coast data sources, one is West Coast. It is always a trade-off.
- Bare Metal vs. VM: Tableau server performs better on bare metal Windows server although VM gives you other flexibilities. For your benchmarking purpose, you can assume VM has 10-20% less efficiency vs. bare metals but there are so many other factors affecting your decision between bare metal vs. VM.
- Server Configurations: There is no universal standard config for your backgrounders, VizQL server, Cache server, Data Engine, etc. The best approach is optimize your config based on your TabMon feedback. Here is a few common tips:
- Get more RAM to each node, specially Cache server node
- Make sure Primary and File Engine nodes have enough disk for backup/restore purpose. As benchmarking, your Tableau database size should be less than 25% of disk.
- It is Ok to keep CPU of backgrounder node at about 80% average to fully leverage your core licenses.
- It is Ok to keep CPU of VizQL node at about 50% average
- Install File Engine on Primary will reduce 75% backup/restore time although your Primary cores will be counted as licenses
- Number of cores on single node should be less than 24
- Continuously optimize config based on feedback from TabMon and other monitoring tools.
Automation
- Fundamental things to automate :
- Backup: Setup backup with file auto rotation so you do not have to worry about backup disk out of space. You should backup data, server config and logs daily. Pls find my working code @ here
- User provisioning: Automatically sync Tableau server group and group members from company directory.
- Extract failure alerts: Send email alerts whenever extract failed. See details here
- Advanced automation (Tableau has no API for those, more risk but great value. I have done all those below ):
- Duration based extract priority : If you face some extract delays, adjust extract priority can increase 40-70% extract efficiency without adding new backgounders. The best practice is to set priority 10 for business critical extracts, priority 20 for incremental, priority 30 for extract with duration below median average (this is 50% of all extract jobs). Priority 50 for all the rest. How to update priority? I have not seen API for it. However I just had a program to update tasks.priority directly (this is something that Tableau does not support officially but it works well). Read my blog about extracts.
- Re-schedule extracts based on usage: One of the common problems in self-service world is that people do not bother to schedule the existing extracts when usage is less often than before. Server admin can re-schedule extracts based on usage: For example, the daily extracts should be re-scheduled to weekly if the workbook has no usage in past 2 week, weekly extracts should be re-scheduled to monthly if workbook has no usage in past 2 weeks. All those can be automated by updating tasks.priority directly although it is not officially supported approach.
- Delete old workbooks: I have deleted 50% of workbooks on Tableau server in a few quarters. Any workbooks that have no usage in past 90 days are deleted automatically. This policy is well received because it helps users to clean up old contents and it also help IT for disks and avoid unnecessary attentions to junk contents. The best practice is to agree on this policy between business and IT via governance process, then do not provide a list of old workbooks to publishers before the deletion (to avoid unnecessary clicks). Only communicate to publishers after the workbooks are deleted. The best way is to communicate is to send their specific .twb that got deleted automatically by email while disregard the .tde. Publishers can always publish the workbooks again as self-service. Use HISTORICAL_EVENTS table to identify old workbooks. I do not recommend to archive the old workbooks since it is extra work that does not have a lot of value. Pls refer Matt Coles’ blog as start point.
- Workbook performance alerts: If workbook render is one of your challenges on server, you can create alerts being sent to workbook owners based on workbook render time. It is a good practice to create multi-level of warning like yellow and red warning with different threshold. Yellow alerts are warnings while red alerts are for actions. If owner did not take corrective actions during agreed period of time for red warning, a meeting should be arranged to discuss the situation. If the site admin refuses to take actions, the governance body has to make decision for agreed-upon penalty actions. The penalty can lead to site suspension. Please read more details for my performance management blog.
- Things should not be automated: Certain things you should not automate. For example, you may not want to automate site or project creation since sites/projects should be carefully evaluated and discussed before creation. You may not want to automate creation of Publisher site role since Publisher site role should be also under controlled. Proper training should be required before grant new Publisher.
As re-cap to scale Tableau to enterprise, there are mainly 5 areas of things to drive: Community, Learning, Data Security, Governance and Enterprise Approach. This serials focuses more on the Enterprise Approach. Hope is helps. I’d love to hear your tips and tricks too.