Tableau helps us to see and understand our data which is great. A lot of great things are happening every day when creative analysts have powerful Tableau Desktop with unlocked enterprise source data and Tableau server collaboration environment.
As Tableau adoption goes from teams to BU to enterprise, you quickly run into scalability challenges : Extract delays and enterprise data warehouse (EDW) struggles to meet ad-hoc workloads, etc.
My last blog talks about setting extract priority on server to improve 50% extract efficiency. This blog will focus on best practices for data source connection to scale EDW & server – use published data sources.
- What is Tableau published data source?
It is nothing but Tableau’s semantic layer. For those who have been in BI space for a while, you may be familiar with Oracle BI’s Repository or Business Objects’ Universe. The problem of Repository or Universe is that they are too complex and are designed for specially trained IT professions only. Tableau is a new tool designed for business analysts who do not have to know SQL. Tableau has much simplified semantic layer. Tableau community has never focused enough on published data sources till recent when people start to realize that leveraging published data source is not only a great best practice but almost must to have in scaling Tableau to enterprise.
2. Again, what makes up Tableau published data source?
- Information about how to access or refresh the data: server name & credentials, Excel path, etc.
- The data connection information: table joins, field friendly names, etc
- Customization and cleanup : calculations, sets, groups, bins, and parameters; define any custom field formatting; hide unused fields; and so on.
3. Why Tableau published data source?
- Reusable: Published data sources are reusable connections to data. When you prep your data, add calculations, and make other changes to your fields, these changes are all captured in your data source. Then when you publish the data source, other people can use it to conduct their own analysis.
- Single source of truth (SSoT): You can have data steward who defines the data model while workbook publishers who can consume the publish data source to create viz and analysis. Here is an example of how to set up permission to achieve SSoT.
- Less workload to EDW: When you use extracts , one refresh of the published data source will refresh all data to its connected workbooks, which reduces a lot workloads to your EDW. This can be a very big deal to your EDW.
4. How many data sources are embedded vs published data sources? You can find it out from Data_Connections table. Look for the DBCLASS column, when value = ‘sqlproxy’, it means that it is a published data source. Work with your server admin if you do not have access to workgroup table of Tableau Postgre database.
If you have <20% data sources are published data sources, it means that published data sources is not well leveraged yet in your org or BU.
5. How to encourage people to use published data sources?
- Control who can access to EDW: Let’s say you have a team of 10 Desktop users, you may want to give 2 of them the EDW access so you do not have to train all 10 people about table structure details while have the rest of 8 people to use published data sources created by the two data stewards.
- If extracts are used, you can create higher priority to all published data sources as incentive for people to use published data sources. See my previous blog for details.
- Make sure people know the version control feature works for data source as well
- As data stewards, add comments to columns – here is how comment looks like when mouse over in Desktop Data pan:
Here is how to add comments:
Conclusions: Published data sources are not new Tableau feature but are not widely used although they are reusable, SSoT, scalable, less workload to your DB server. Tableau has been improving its publishing workflow by making data source publishing much easier than before since 9.3. Tableau v10 even gives you a new option to publish your data sources separately or not during workbook publish workflow. Data source revision history is great feature to control data source version. Tableau has announced big roadmap about data governance in TC16. However self-service practitioners do not have to wait any new Tableau features in order to leverage the published data sources.