All posts by markwu2000@gmail.com

TABLEAU SERVER AND CLOUD SECURITY (10/10): Trade Secrets

My previous post TABLEAU SERVER AND CLOUD SECURITY (9/10): PII and DETECT AND DELETE PII DATA ON TABLEAU SERVER talk about PII detection and deletion scripts. Why and how.

There is another type of concerns related to compartmented secret information on Tableau server got out of hands, like future products, or trade secrets etc. How to protect those type of data?

Here is what I did for my Tableau server to protect trade secret:

Scripts to compare Tableau permissions with trade secrets reference database, send auto email alerts to content owner if discrepancy found.

For example: If trade secrets reference database says that only 3 people are disclosed for the dataset, while Tableau workbook gave permissions to 2 more additional users. Alert email will be sent out.

Let me explain a few key components involved:

1. Trade secrets reference database

This is the single source of the truth about who is allowed to access what secret data. This is outside Tableau. Our Tableau scripts just access this dataset.

2. Linkage between Tableau workbook (or datasource) to the above Trade secrets reference database

Unfortunately there is no such linkage as there is no way to scan the data to tell. Which is why we have to ask content owner to tag the workbook/datasource or project with a specific Trade Secrets code (like ABC-1234). So it is more like a self declaration process.

3. How to find out who has permission to which dashboard?

Using API and SQL to call Tableau’s linage tables, permission tables to figure who has effective permissions for those tagged Trade Secret content.

4. Compare permission difference and send out alert

This concludes this series of TABLEAU SERVER AND CLOUD SECURITY:

  1. TABLEAU SERVER AND CLOUD SECURITY (1/10): Overview
  2. TABLEAU SERVER AND CLOUD SECURITY (2/10): External Site
  3. TABLEAU SERVER AND CLOUD SECURITY (3/10): External Server
  4. TABLEAU SERVER AND CLOUD SECURITY (4/10): Extension
  5. TABLEAU SERVER AND CLOUD SECURITY (5/10): Explain Data & Data Story
  6. TABLEAU SERVER AND CLOUD SECURITY (6/10): Content Owner Left Company
  7. TABLEAU SERVER AND CLOUD SECURITY (7/10): All Users Group
  8. TABLEAU SERVER AND CLOUD SECURITY (8/10): Large Group
  9. TABLEAU SERVER AND CLOUD SECURITY (9/10): PII
  10. TABLEAU SERVER AND CLOUD SECURITY (10/10): Trade Secrets

TABLEAU SERVER AND CLOUD SECURITY (9/10): PII

Is Personal Identifiable Information (PII) data Ok on Tableau server? Yes and no, depends on your organization policy:

  • Does your org have PII policy and classifications?
  • Do you allow PII data on your server or Cloud site? 
  • How to identify Tableau workbooks using PII?
  • How to govern PII on Tableau server?

In my organization, we clarify PII into High Sensitive PII (like SSN, Payment card#) and Sensitive PII (Like drive license, email etc.). High Sensitive PII can’t be on Tableau server as this type of data either should not be stored at all or has to have field level encryption.

Tableau server has Encryption at Rest as well as Transmission (https://). However Tableau server does not support field level encryption, which is why High Sensitive PII data can’t be on Tableau server.

How to ensure High Sensitive PII data not on Tableau server?

My answer again is the ‘after the fact’ governance – scripts to detect any High Sensitive PII data and delete if found.

example of PII deletion notification to content owner

High level workflow is as following:

It has a few parts:

1. Detect PII

  • If data source – database table and columns have PII calcification, the best way is to access the datasource calcification, then use Tableau lineage data to find all Tableau content associated with PII.
  • If data source has no PII clarification at all, we use PII Taxonomy to identify PII (if column name matches with PII Taxonomy), it is likely a PII.

2. Owner confirmation

For PII Taxonomy match, since it is an educated guess, we will ask content owner confirmation before content deletion – if content owner confirms as High Sensitive PII, the scripts will delete marked content.

3. Content Deletion

The workbook or data source deletion will happen if content owner confirms that the detected content is High Sensitive PII

4. PII Tag, No download control etc

If it is not High Sensitive PII, content is Ok to be published on Tableau server.

  • Optionally, we have additional logic to remove any download permissions so no server users can download the data (except the owner since we have no way to remove content owner’s download permission – anyway there is no need to control content owner at all).
  • We also tag workbooks with PII. However Tableau server tags are loosely controlled as tags have no permission controls.
  • For a period of time, we also leveraged out-of-box sensitive data high alert to have pop-up for all PII content. However user feedback is too much extra clicks, and this feature was removed later on.

A few technical implementation details or tips:

  • We use Tableau Prep flows for most of the detect, deletion logic, see details below:
  • Run ‘tsm maintenance metadata-services enable’, catalog data available for Readonly Postgre user without Data Management Add-on!!!!

  • Be aware : Tableau lineage not getting 100% custom SQL lineage

Conclusion: PII detection and deletion is possible on Tableau server as advanced server security governance. So far we deleted more than one hundred workbooks from server.

TABLEAU SERVER AND CLOUD SECURITY (8/10): Large GROUP

My previous post TABLEAU SERVER AND CLOUD SECURITY (7/10): ALL USERS GROUP talks about detection and deletion watch dog scripts that will remove any permissions using All Users group. The scripts runs hourly and will remove the permissions and then send email alert to content owners. It is an enforced Tableau server policy although it does have a few exceptions (like some server admin owned dashboards are excepted).

The scripts helped a lot permission mistakes/cleanup. However from time to time, server admin team still got emails from user community :

User feedback : Why I see this dashboard, I have nothing to do with it….

It is not about All Users group anymore but due to workbook permissions granted to a very large group….

Now the issue is more about user education and department access management policy. However I wanted to do something about it as server admin. What we came up with is Large Group Permission Management alert – Another watch dog program that will send auto email alert to content owners if they used group with members more than 1,000 users (or whatever users make sense for your org):

The logic used is very similar as All Users group detection. The key difference is that All Users group permission will be removed once detected, while this large group is a reminder only and will not remove large group from the permission at all.

Re-cap:

TABLEAU SERVER AND CLOUD SECURITY (7/10): All Users group

Tableau Cloud or server has one built-in user group : All Users. It is group available in any site.

All Users can be very connivence for content that can be shared to every Cloud or server user.

However in large organization, it likely ends up a lot excess permissions. I used to receive emails asking why he or she has access to a dashboard that the user has nothing to do with it. The user can’t tell if dashboard is shared to All Users as only content owner or project leaders can tell. This is where I realized that All Users group caused more problem than benefits for large org.

Something that I found when a publisher was not sure which group to use, he or she just used All Users group causing excess permission.

I last a few years, I tried many different ways to get rid of this All Users group. Unfortunately I have to give it up as I found that Tableau uses this built-in All Users for its own permission process.

Another thing that I do not like this group is that it always shows on the top of group list during permission process. Then I tried to rename All Users group to something like ZZ All Users (do not use) with intent to let this group show to the bottom of the group but failed as well.

Then I had to come with what I call : ‘After the fact’ governance approach:

Detect and delete All Users group used in any permission.

The Python scripts will delete any permission uses All Users group, and then send email alert to the content owner. The scripts is scheduled to run hourly to minimize potential excess permission issue. Below is how the alert looks like

What content we check? The scripts checks all type of content: workbook, data source, flow, project, virtual connection, Metrics, Data Role, Lense, etc

Here is how it works:

How to have exception for the above process?

You can add exception to some projects or content owner if you do have use cases where content needs to be shared to everyone on the server site.

Conclusion : This All Users group permissions detection and deletion is a good safe net to govern Tableau server security. This scripts helped me out in many security review process

TABLEAU SERVER AND CLOUD SECURITY (6/10): Conten Owner left company

What should be done when Tableau server or Cloud users left company?

The right process is to un-license AND then delete the users from Tableau server or Tableau when users left company.

  • Admins can un-license content owners but Admins can’t delete users who own content even the users left company.
  • You will have to change content owner first before deleting the user

Who can change content owner?

  1. Owner. The best practice is for owner to change the content owner to someone else in the team.
  2. Project Admin (aka project leader) who has publisher site role. For whatever reasons, if content owner left company, project admin can also change the owner to someone else (two step process: change to yourself and then change to someone else)
  3. Project Owner (two step process: change to yourself and then change to someone else)
  4. Admins

Content owner change tips

  • Explorer site role user can be given content ownership but can’t grant permissions to others, can’t change the ownership to anyone else anymore.
  • Once content owner is changed, the embedded password will be invalid. The new content owner has to re-embedded the password
  • Project admin (aka Project Leader) or Project owner can’t other’s content to someone else although can change someone else’s content to herself or himself. It will be two step process for project admin to change owner content to someone else:
    1. Change owner to project leader himself or herself
    2. Then as content owner, you can change content ownership to someone else

What happed when content owner left company?

  • If content ownership is not changed before owner left company, workbook can still be accessed, extract can still run (if embedded credentials not tie to personnel database acct)
  • But, Tableau server or Cloud’s email or Slack notifications (failure/suspension, flow failure/suspension) goes to nowhere….
  • No action can be taken by owner anymore 

So content owner should be changed. However ….

The Problem : Often Project Admins may not be aware of all content owned by person who left company 

The scalable solution is to build Invalid Content Owner alert

What is Invalid Content Owner alert?
It is a Python scripts to use REST API to find content owned by user who is not an active employee anymore. The content list alert will be sent to project admin (aka project leader) and project owner. It can also be sent to previous manager if possible (see below alert):

The alert can also have a link to the content to make project admin’s life easier. Action is for project admin (aka project leader) to click the link in the alert and change content to himself, then change to someone else if necessary.

Here is the logic used for the alert

Conclusion: We found that it was extremely useful to send invalid owner alert in the daily basis for large Tableau deployment since project leaders don’t have visibility for this data. You do need Tableau server database workgroup readonly user/password, REST API, Python and whatever scheduling tool to implement it.

TABLEAU SERVER AND CLOUD SECURITY (5/10): Explain Data & DatA story

My last post talked about Dashboard Extension security, in short, Sandboxed extensions are safe to use while network extensions are not safe to use. This blog focuses Explain Data and Data Story security concerns.

Is Explain Data safe to turn on for the server or site?

Yes

Explain Data Concerns: It may expose data in data sources used but not available in the dashboards. However Explain Data will not send data outside your Tableau server. Which is why Explain Data is safe to turn on for admins.

On the other side, Explain Data can be controlled at each workbook level (on or off with default ON). Even Explain Data is turned on at server and site level, for specific workbook, the workbook owner has option to turn Explain Data off.

One beatific thing about Explain Data is that it is available for all workbooks on the server/site, include those workbooks created before 2021.2 although Explain Data is released only as part of 2021.2 release.

Next, let’s look at Data Story.

Is Data Story safe to turn on for the server or site?

Yes

What is the Data Story’s risk from Data and Security perspective? The only risk for Data Story is that Hidden worksheet data can be used on Data Story.

Good thing about Data Story is that it will not send data outside Tableau server as Data Stories doesn’t use generative AI, large language models (LLMs), or machine learning to write insights and stories.

Summary: Both Explain Data and Data Story are safe to use. Default is ON for both features and the default is good.

TABLEAU SERVER AND CLOUD SECURITY (4/10): Extension

At high level, there are two types of Tableau extensions:

  1. Dashboard Extension: Do things you wish Tableau did easily but does not and those features are developed by 3rd parties:
    • Sandboxed: Tableau hosted, run in a protected environment without access to any other resources or services on the web (Safe to use)
    • Network: Anyone can host, dashboard data has to be sent to the hosted server (not safe to use)
    • Data Story (the implementation is the same as an extension)
  2. Analytics Extension
    • TabPy
    • RServer
    • Einstein Discovery (not very useful yet)
    • Analytics Extension API

From data security perspective, make sure you are fully aware of the followings:

I do recommend to make all Sandboxed extensions available on your server and site as those are safe to use.

Why Sandboxed extension is safe? By design, Sandboxed extension never sends dats out of your Cloud site or server.

Is Sandboxed extension free? Pretty much all free

This is my recommendation for Extension Config for your site.

Since Extension is also available on Desktop, the default Tableau Desktop setting is very good: It does not turn off Network Extension completely but does give pop-up warning before any Network Extension can be used.

TABLEAU SERVER AND CLOUD SECURITY (3/10): EXTERNAL Server

My previous post shared one recommended setup to segment all external users to one Limited Visibility site that is a great balance between security and on-going maintenance.

For some organizations that do not allow mixed internal and external users in one Tableau server at all, there is option to setup dedicated External Server:

Two separated Tableau servers: Internal and External

You can have both External Site and External Server solutions if your org has different type of external users.

My setup has the External Server sitting outside company firewall in DMZ zone. For additional security considerations, we even did not open any network connectivity from External Server to any internal database. The External Server is more like an island. Extracts and workbooks can only be pushed to External Server via API on behalf of publishers.

Here is how it works:

  1. Workbook is published to specific project on your internal Tableau server
  2. Extract refresh happens on your internal Tableau server only
  3. Updated workbook and/or extracts are published to External Server via API only
  4. There is no Creator or Explorer (can publish) site role on External Server site
  5. No extract refresh schedule either on External Server

Notes:

  • This setup has the maximum security and it comes with on-going extract works for content owners.
  • Since API is not good enough for users/groups and permissions, there are some admin work to set permission correctly on External Server

TABLEAU SERVER AND CLOUD SECURITY (2/10): external site

It is not uncommon to share your Tableau dashboards to your vendor users or partner users, like vendor performance KPI data. To avoid surprises, it is better for vendors to know exactly how your company evaluates their specific business process metrics, Tableau can be a perfect tool for it. All it needs is to grant external user your Tableau server workbook permissions.

There are many more data security questions when your Tableau platform has external users. Do you need a peer review process when new data is shared to external users? How to avoid vendor A to see vendor B data? How to avoid silly mistakes to share internal data to external users? etc. Some of those are business process controls. And the big question we are trying to answer here is HOW TO SEGMENT INTERNAL VS EXTERNAL FROM PLATFORM LEVEL?

This setup is what I have in production:

  • One External site for all vendors.
  • All external users can only be provisioned to this external site.
  • Site special config User Visibility as Limited.

Key benefits are :

  1. Avoid the mistake to share internal data to external since external users are NOT provisioned to any other places other than External site and only limited publishers.
  2. User Visibility = Limited prevents vendor A user to see vendor B user names. This is a great Tableau feature and it disables all the following automatically for Explorers and Viewers:
    • Sharing
    • Who has seen this view?
    • Ask Data usage analytics
    • Data-Driven Alerts
    • Comments
    • Public Custom Views
    • Request Access
  3. Avoid a lot potential on-going maintenances comparing with one site per vendor approach.
  4. This setup works for both Tableau server and Tableau Cloud

Check out next blog for alternative solution if your org can’t have mixed internal and external users on one server at all.

Tableau Server and Cloud Security (1/10): Overview

This serial is about the Tableau server and Tableau Cloud security. Tableau has a Platform Security white paper that covers Authentication, Authorization, Data Security and Network Security. It is a good documentation, however I find that it is hard to explain to non-tech audiences about those security components. Instead, I created the following security model and found that regular audiences got those very easily:

Let me double click for each of the areas, it will be something like those below:

  1. Infrastructure: covers network, SSO, InfoSec, server OS, etc.
  2. Tableau App Configuration: This is application level, in other words, Tableau server or site or Cloud level. Some of those things are configurable. The rest of blogs will talk about each of those areas with intent to maximize the security.
    • External user site/server
    • Site Segmentation
    • User Visibilities
    • User provisioning
    • Encryption
    • Extension
    • Explain Data
    • Sensitive Lineage Data
    • ConnectedApp
    • Mobile Security
    • Token
    • Guest Account
  3. Tableau Governance layer: A possible thin layer of governance processes or/and scripts to further enhance the Tableau server security. Those are more advanced work and need Tableau server Postgre readonly user. I am not sure how to apply those to Cloud yet.
    • User deletion
    • Project setup
    • ‘All User’ permissions
    • Delete inactive content
    • Re-subscription 
    • PII data deletion or flag
    • Sensitive data protection
  4. Publish and Permission: Those are content owner’s responsibilities. No matter how good Tableau server or Cloud is configured, content owns can still mess up the data & content security. Business self-service content owners have to follow departmental data access guidances and grant access permissions accordingly. Those are covered in my other blogs and I do not plan to explain more here :
    • Workbook Permission
    • Project Permission Locking
    • Row Level Security
    • Sensitive Data Tagging
    • PII

Check out next blog TABLEAU SERVER AND CLOUD SECURITY (2/10): EXTERNAL SITE

What is view acceleration?

Updated Nov 2023: After one year of initial View Acceleration release, Tableau released new View Acceleration Recommendation feature, which makes the View Acceleration much more user friends.

What happened is there are too many scenarios that View Acceleration is not supported (like live connection, view has user filter or role level security, or view render is a few seconds or less, etc). In the past, I often selected 2-3 out of 10 views selected can be actually accelerated, that disencourge me to try it more…..

This new View Acceleration feature tells me which view can be accelerated, that is very user friendly now.

Tableau v2022.1 released a feature called View Acceleration. What is it? It is actually pre-compute query cache after extract refresh (for the data queries that the view uses).

Pre-compute query cache, aks cache warm up, is a feature since 10.4. What are the differences between this new View Acceleration and pre-compute query cache? Here is the summary:

  • Watch my presentation about server cache configuration and recommendations @ https://www.youtube.com/watch?v=u-Ms_YwRJm0
  • Download slide used for the presentation below

govern classified data on tableau server

My previous blog talked about how to detect and delete PII data on Tableau server. In additional to Personally Identifiable Information (PII) data (like SSN, DoB, Payment Card, etc), some organizations may have other types of classified data like R&D data, Attorney-client Privileged Information, Controlled Unclassified Information, Export Controlled Information, Student Loan Application Information, etc.

Likely organizations have existing policy to govern those classified data. The fundamental of the governance process is to control who can access what data. The situation is that the collaboration software like Dropbox, Box, Quip, Slack, Tableau created a lot new challenges to existing data protect / data governance process. Those collaboration tools make one user to share data with another user just too easy.

For example, John’s Tableau workbook, with classified information, is shared to a server group. John reviewed the group members and confirmed all members had disclosure to access the classified information – it is all good when John grants permission. The problem happens later on : Many Tableau server deployment syncs Tableau groups from Active Directory (AD). The AD group may be added more members without John’s knowledge at all. The new members may not have disclosure so that the John’s classified information is out of control now….

How to resolve this Tableau permission ‘cascading’ issue?

The above process will send notification to content owner (can add Project Leader as well) automatically when following conditions met:

  • an user has access to a Tableau classified object
  • and this user is NOT in the disclosure

Four step to implement this process:

  1. Create organization’s classified data disclosure repository : who is disclosed to what classified data
  2. Content owner to tag datasource or workbooks on Tableau server : the tag is the classification code
  3. Enable Tableau Lineage Tables (if not done yet) : run ‘tsm maintenance metadata-services enable’. The Tableau lineage tables will be populated with lineage data without Data Management Add-on model. The tables can be accessed by Postgre ‘Readonly’ users although not available to any Tableau server users w/o Data Management Add-on model.
  4. Create workbook to compare Tableau permission with using classified data disclosure repository : Identify discrepancies and take actions (alert or deletion)

RE-CAP: Protect classified data on Tableau server needs content owner and Tableau server platform team’s involvement. It does provide a peace of mind solution for those who commit to data security and data protection.

It is still strongly encouraged to use following design patterns when deal with classified or sensitive dataset:

  • Row level security design – still need to ensure right group used and group member controlled if ISMEMBEROF() used
  • Live connection only (no extract) and use Prompt User when publish workbook – control data access in the data source outside Tableau server

Detect and delete PII data on tableau server

Tableau’s agile and self-service nature does come with some data management concerns. How to make sure data is not getting out of hands? How to ensure Tableau process does not break existing Personal Identifiable Information (PII) control process?

During old days, dashboards are only developed by small group of developers, education/control process is much easier. With Tableau self-service, source data is accessed by hundreds of business analytics who can develop dashboards, publish to server and decide access policy (permission) by himself or herself, business analysts love the flexibility but it can be data privacy or data security’s nightmare if Tableau data security is not managed closely.

Detect and delete Personal Identifiable Information (PII) on Tableau server

Tableau’s Encryption at Rest feature encrypt extracts sitting in FileStore but does not encrypt data in memory or during network transition. Most organizations have policy to enforce certain type of PII data (like SSN, Payment Cards, DoB) to be encrypted at rest and in transit. Unless you use special AWS config, your on-premise Tableau server will not meet this requirements. In other words, regular Tableau servers can’t have those types of PII data. The question is what if some publishers bring such PII data to Tableau server? Is there a way for Tableau platform to detect such data and even delete it?

The answer is Yes. Here is one example of PII detection/deletion notification. It can be fully automated.

The How To invoices 4 steps.


  1. PII Reference Repository or PII Taxonomy as minimum: Most likely this comes from organization’s privacy team. It is PII definition : list of Database name, schema name, table name and column name that have which kind of PII. This is the start point.
  2. Enable Tableau Lineage Tables (if not done yet) – run ‘tsm maintenance metadata-services enable’. The Tableau lineage tables will be populated with lineage data without Data Management Add-on model. The tables can be accessed by Postgre ‘Readonly’ users although not available to any Tableau server users w/o Data Management Add-on model.
  3. Create workbook to identify all connected workbooks using any columns in the PII Reference Repository or PII Taxonomy list.
  4. Actions on identified workbooks. Depends on your org policy, you can either send alert to content owners or remove data directly or both.
    • If your org only has PII Taxonomy, you really don’t know for sure that it is PII data, alerts to content owners and ask owners to confirm is a recommended process
    • If your org has clearly defined PII Reference Repository with database name, schema name, table name, column name and PII classification, you can delete certain workbooks directly and then send email to data owner as an enforced governance approach

RE-CAP: It is important to implement PII detection and delation process while ramp up Tableau self-service to hundreds or thousands self-service citizen publishers to ensure PII data not getting out of hands.

The process mainly involves PII Reference Repository or PII Taxonomy (by working together with your data privacy team) and Tableau’s lineage tables that can be enabled.

How backgrounder.timeout.single_subscription_notify works?

Tableau released new backgrounder.timeout.single_subscription_notify  config key as part of v2021.2. It works 10 times better than existing subscriptions.timeout. However Tableau’s documentation is not clear and not accurate. I had to talk with Tableau Dev to get things clarified. Here is how it works.

What is the difference?

Other Configurations Impacting Subscription Timeout

Two more config also impact subscription timeout. Here is how they all work together:

Min(backgrounder.querylimit, subscriptions.timeout) + backgrounder.extra_timeout_in_seconds is the total timeout per view of the subscription job. Never works for pdf subscriptions

or

Min(backgrounder.querylimit, backgrounder.timeout.single_subscription_notify) + backgrounder.extra_timeout_in_seconds is the total timeout per subscription job – no matter how many views (per user). Works for pdf subscriptions as well


Recommended settings

Re-cap:

  1. backgrounder.timeout.single_subscription_notify is per subscription (all views of the workbook) timeout while the old config subscriptions.timeout  is per view time out of the workbook subscriptions.
  2. backgrounder.timeout.single_subscription_notify works for pdf subscription as well while subscriptions.timeout never works for pdf subscriptions
  3. Other two config impacts subscription timeout as well. Please see the above chart for the recommended settings – pls note that backgrounder.extra_timeout_in_seconds is strongly recommended to set as 0.

Upgrade Tableau Server

Often we heard enterprise Tableau customers had all kinds of challenges to upgrade their Tableau servers. This webinar invited the super experienced Tableau server admin talking about our own experiences of upgrading very large Tableau servers.

  • Recording available here
  • Download slide used here

For those who use Tableau Online, you do not have to worry about server upgrade. But for those who have on-premier servers, the server has to be upgraded to leverage Tableau’s new features.

1. How often to upgrade Tableau server?

We did a quick pull during webinar with 50+ responses. The data shows 50% Tableau server upgrade once/year, 30% twice/year and 20% 3+/year.

We have two super large Tableau servers. One server got upgraded 2 times/year and another one 3-4 times/year.

We strongly recommend most Tableau server admins should upgrade your servers 2+ times/year as the server admins are between Tableau’s new features and your user adoption of those features.

2. What to do pre-upgrade

  • Strategic Upgrade Approach
    • Planning – minimum disruption to users
    • Upgrade test:
      • Test in Prod Like env and Data
      • Run all extracts once to compare failure rate and avg execution time
      • Make publishers accounted for their own workbook test/validation
      • No need to over test
  • Platform Health Check
    • Server State
    • Memory related issues
    • Are services crashing
    • Do we have adequate storage
    • Can natives/firewall cause an issue

Tips: Very often Tableau sever upgrade have issues due to things outside Tableau app – means platform level issues. Upgrade more often actually help to resolve those issues.

3. What to do During Upgrade

4. What if Upgrade Failed

Re-Cap:

  • Upgrade Tableau server twice a year
  • Choose the latest version when upgrade
  • Pre-upgrade tasks include system health, OS health and test
  • Test new version in Prod like server env with prod like dataset
  • Know what to monitor and how to monitor during upgrade
  • Ensure a good backup before upgrade and be prepared to reinstall server from scratch and restore with backup data as plan B

Tableau-Server-Upgrade-1

PROJECT (4/10) : Sub-project or nested project

Tableau’s nested project feature has been there since v10.5, however I only start to see more people to use it after v2020.1’s new lock permission feature – allowing permissions being locked independently at any nested project. This blog focuses on nested project features and next blog will talk about how lock works with nested project.

This blog will extend previous blog Project (3/10) : Project Leader/owner and their site role to nested projects.

Who can create nested project?

  • Only admins can create top level project (I am going to write a new blog to show you how to break this rule to allow all publishers be able to self-serve creating new top level project). Of course, admins can also create nested projects
  • In additional to the admins, the main reason to have nested project is to provide more self-service so each project owner or project leader can project nested projects. Of course, a project leader/owner can only create nested project within the project he/she is owner/leader of
  • Whoever created the project becomes the owner of the project – this role is still true for creating nested project.

Example of how nested project works?

Let’s say a top level project Finance has one owner (John) and two leaders (Sherry and Mark). All of them can create as many nested projects within Finance. Sherry created nested project Finance – GL, Mark created nested project Finance – AP. Who is the owner/leader for Finance – GL and Finance – AP?

  • Finance – GL: Owner is Sherry. Inherited project leaders are John and Mark
  • Finance – AP: Owner is Mark. Inherited project leaders are John and Sherry

How the nested project permission works if it is moved to another top level project?

It is a little bit hard to understand how it works but I think that it is well designed to avoid more confusions. Here is how it works. Let me still use the above example

For Finance – GL: Owner is Sherry. Inherited project leaders are John and Mark. Let’s see Ivy was also added as project leader since Ivy is responsible for GL area. Now this Finance – GL nested project is moved from Finance to General Ledge top level project.

Finance – GL under General Ledge:

  • Owner for Finance – GL remains unchanged still Sherry.
  • Project leader Ivy is still the project leader (no change)
  • However the initial inherited project leaders (John and Mark) are not project leaders for this Finance – GL anymore
  • The owner and project leaders of General Ledge now become the new inherited project projects for Finance – GL
  • All existing content permissions within Finance – GL remains unchanged no matter Finance – GL is customizable or locked unless General Ledge is locked and has the flag to apply locking to all its sub-project permissions.

What are the setup to publish the nested project permissions?

Re-Cap:

  1. Use nested project to organize your content
  2. Project owner and leader can create nested project
  3. There is no limit of how many nested project within one project
  4. There is no limit of how many nested level although more than 3 levels can be hard to manage
  5. Be aware of additional setup to publish to a nested project – the publisher has to have ‘View’ permission to all its parent level projects

Tableau Row Level Security

Data security has been the top concern for Tableau deployment. This is summary of Row Level Security webinar by Zen Master, Mark Wu: What is Tableau Row Level Security (RLS)? What are the implementation options? How to decide which option to use? How to test RLS? How to improve RLS workbook performance?

  • Download presentation slides here
  • Watch Webinar recording here

Tableau handles data security by permission and row level security.

  • Permission controls what workbooks/views that users can access and can do.
  • Row level security controls what data sets the users can see. For example APAC users see APAC sales, EMEA users see EMEA sales only while both APAC and EMEA users have the same permission to the same workbook.

There are mainly 3 options to implement row level security. Sometimes all of those will be used in the same workbook/data source:

  1. ISMEMBEROF: The most popular option – use server groups to control row level security
  2. USERNAME(): Use separate entitlement table to control. Use Multiple Table extract will make performance a lot better for most cases
  3. CONTRAINS(): Concatenate all allowed usernames in one comma-delimited field in your data.

ISMEMBEROF implementation steps:

  • Create or Sync server groups
  • Create calculated field : ISMEMBEROF(‘Group-AMER’) AND [Order_Region] = ‘AMER’ ….
  • Add the calculated field to data source filter (strongly recommended) or workbook filter and select ‘true’
  • Publish data source but do not give end user ‘Connect’ permissions at all to the published data source. Only give ‘View’ and ‘Connect’ to the content creators in your team
  • Publish workbook and embed password in Authentication (embed password is absolutely necessary)
  • Set workbook permission to all the groups used
    • Make sure Web Editing as No, download as No if use workbook filter used

USERNAME() implementation steps:

  • Create calculated field: USERNAME() = [Username]  (note: [Username] is the server logon user’s name not Display name. Ask your server admin to confirm what is the server’s username on your server. Some implementations use email address, others may use employee_id, etc.)
  • Add the calculated field to data source filter  (recommended) or workbook filter and select ‘true’
  • Can make extracts after joining
    • Join Option 1: Cross-DB traditional left join if you can
    • Join Option 2: Relationship Join
    • Join Option 3: Blend (be aware of limitations)
  • Publish data source but do not give end user ‘Connect’ permission at all
  • Publish workbook and embed password for Authentication
    • Set workbook permission to all the groups above
    • Make sure Web Editing as No, download as No if use workbook filter used
  • Pls use multiple table extract for better performance!!!!

Multiple Table extracts will have better performance for most use cases.

CONTAINS([User strings], USERNAME())

  • Concatenate all allowed usernames in one comma-delimited field in your data, and then use  CONTAINS([user strings], USERNAME())
  • Although it is string comparison but Tableau somehow handles string comparison much faster than most of databases.

Which option to go? The driving factors are performance, data preparation efforts and ongoing maintenance of the data/entitlement table. Often combine to use in the same datasource or workbook

Recap:

  1. Get https://github.com/tableau/community-tableau-server-insights and make it available to all your publishers using Row Level Security
  2. Tech your publishers to use Row Level Security design
  3. Multi Table extract for better performance

How to create a governed self-service model

I manage a large Tableau platform with a few hundred of cores. In last a few years when we scale the server, we also put together some good governance processes to ensure scalability. When you ramp up a server to 20K, 50K or 100k users, there are a lot of things to think about. There is where some companies’ self-service BI went wrong. People think that it is self-service, and start to throw everything on the server. Soon realized that it did not work, then they blame Tableau did not work and started to look for other technology. it is nothing wrong to look for new technology. But some practitioners may not know that tech along does not provide more self-service, it is combination of tech, data and gov. In short, both business and IT have to win together. IT has to find a way to throttle things or control the server. The answer is governance. Objective is to have governed self-service.

To show an example of why governance necessary, let me start with one common Tableau server challenge …. Extract or subscription delays at peak hours.

Although there are many reasons causing the delay but the most common reason is wasteful extracts on server. For example, hourly extract while the workbook not used for a couple of days at all, or daily extract while workbook not used for 2-3 weeks. The reason why this is so common is that Tableau gives too much freedom to all publishers who can choose any schedules available.

We used to have huge extract delay problem a few years ago but I did one thing that completely changed the picture – Governance, specially Usage Based Extract Schedule for this case.

You can find details @ Automation(1/10)  – Set Usage Based Extract Schedule for how it was done and even download workbook used. The impact is that our server extract has a few minute delay at peak hours even with 10X extract volume as part of the usage growth. This blog focuses on how to get business buy-in for this policy.

How to get buy-in from business teams?

The answer is Governance.

How to govern the enterprise self-service analytics? Who makes the decisions for the process and policies? Who enforces the decisions?

In the traditional model, governance is done centrally by IT since IT handles the entire data access, ETL and dashboard development activities. In the new self-service model, a lot of business super users are involved for the data access, data preparation and development activities. The traditional top down governance model will not work anymore. However no-governance will create chaos situation. How to govern? My answer is to create new bottom up governance body that consists of x-fun Tableau experts who sit together in weekly basis to define process and best practices. They we go out to Desktop community to enforce the rules. 

Some people may call this CoE. It is a forum or bridge that IT and business key stakeholder got together to make quick decisions about Tableau and around Tableau. This is my secret sauce for success. 

For self-service analytics to scale, governance needs to be collaborative. Both IT and business stakeholders on the project team are responsible for defining data and content governance.  I implemented the Tablea Governance Council for two large enterprise. Governance Council members most time has two IT people plus about ten business people.

Who should the the governance members?

Governance Council members does not have to have big titles but they have to be Tableau Jedi and ‘Go To’ person why business director has questions about Tableau in their org.

The Governance Council has to be authorized to make all Tableau related technical decisions to speed things up.

How often the Governance Council meeting should be? Depends on your situations but I have been doing it weekly.

Goals of self-service analytics platform has to be fast, cost effective, self-service, easy to use, meet all company security policy and yet scalable. Gov framework ensure all of those….

How to start governance process?

Other examples of governance policy?

Here is a few examples that I am able to implement with Governance Council’s buy-in:

  • Kill slow render workbooks to ensure that one badly designed workbook will not impact others. The default value for vizqlserver.querylimit is 1,800 seconds that is too long. M server is 180 seconds. vizqlserver.querylimit -v 180 
  • Kill Hyper queries taking too much memory. This saves my server. •hyper.session_memory_limit -v 10g -frc
  • Old workbook deletion: Our server deletes any workbooks not used for 90 days and the deletion gets more aggressive based on the size of workbooks. Read more Automation – Advanced Archiving and Automation(8/10) – Workbook/Disk size governance:

Re-cap:

  • Governance isn’t restricting access or locking down data but is guidelines and structure to enable self-service
  • Governance makes self-service analytics possible
  • Tableau bottom up governance model works well – governance body consists of Tableau Jedi from business teams and IT to make tactical decision around Tableau platform
  • Tableau governance members get inputs from his business areas and also communicate decisions back to Tableau Creators.
  • It is much more effective to enforce governance processes/decisions automatically – either using server config when available, or use API scripts.

Additional resources

 Please read my 10 series blogs about governed self-service analytics:

  1. Governed Self-Service Analytics (1/10)
  2. Governed Self-Service Analytics : Governance (2/10)
  3. Governed Self-Service Analytics: Roles & Responsibilities (3/10)
  4. Governed Self-Service Analytics: Community (4/10)
  5. Governed Self-Service Analytics: Multi-tendance (5/10)
  6. Governed Self-Service Analytics: Publishing (6/10)
  7. Governed Self-Service Analytics: Performance Management (7/10)
  8. Governed Self-Service Analytics: Data Governance (8/10)
  9. Governed Self-Service Analytics: Content Management (9/10)
  10. Governed Self-Service: Maturity Model (10/10)

Project (3/10) : Project Leader/owner and their site role

My previous blog PROJECT (2/10) : Project Leader’s permission details out all activities can be done by project leader or owner assuming that project leader/owner have the creator or explorer (can publish) site roles. Site role is another very confusing but important Tableau concern.

Project leader/owner vs site role

  • Site role defines the maximum level of access a user can have on the site.
  • Project leader/owner is not site role but permission.
  • Site role, along with content permissions, determines who can publish, interact with, or only view published content, or who can manage the site’s users and administer the site itself.
  • Read more site role https://help.tableau.com/current/server/en-us/users_site_roles.htm

What is minimum site role project owner must have?

  • Short answer is creator or explorer (can publish).
  • You can’t assign a user as project owner if the user site role is Explorer or View. An user has to have creator or explorer (can publish) or admin site role in order to become a project owner
  • What happens a project owner’s site role was changed from creator to view?
    • First of all, when user’s site role is downgraded, Tableau server never checks what permissions of this user has before.
    • Although you can’t assign site role viewer user as a project owner, however if this user site role is changed from creator to viewer, this user still remains as project owner – that is not changed
    • The implication is that this non-publish site role project owner can’t handle any of the project owner activities any more except content access permissions:
      • Can’t manage permissions for others anymore
      • Can’t create new subproject
      • Can’t move content
      • Can’t change owner
      • Can’t add or change project leader
      • Can’t delete the project
      • Can’t rename the project
      • ….

What is minimum site role project leader must have?

  • Short answer is viewer. Yes, you can assign viewer or explorer users as project leader. However viewer or explorer users can’t do any of the followings even they are project leaders
    • Can’t manage permissions for others anymore
    • Can’t create new subproject
    • Can’t move content
    • Can’t add or change project leader
    • ….

Read next blog PROJECT (4/10) : Sub-project or nested project

PROJECT (2/10) : Project Leader’s permission

My previous blog Project (1/10) : Differences between Project Leader and owner talked about differences between project leader and project owner. This blog details out the permission details that project leader has.

Before dive into details what project leaders are allowed to do, remember this golden rule – project leader and owner got all the permissions.

Project leaders and project owner have the followings permissions for all content (workbook, data source, flow, metrics) published within the project) :

  1. Add or change extract refresh frequency
    • Project leader/owner can’t add or change overall server schedule (for example, can’t add hourly refresh schedule if it does not exist) but change refresh frequency (from example, from daily refresh 6:00am to 7:00am or to weekly frequency)
  2. Change workbook (same for data source, flow, metrics) owner
    • When workbook owner changes team or leaves company, project leader/owner are supposed to change the workbook owner to someone else in the team
    • If workbook has embedded credential, the embedded credential will be deleted after owner change that will cause extract failure, etc. You will need to re-embed your database credential.
    • Sometimes the workbook owner change can be two step process: Change to yourself as project leader, then change the other user.
  3.   Change embedded data source user/password
  4.   Modify any workbooks (web edit or using Desktop)
    • This can be super useful when workbook owner is on vacation or not available but you have urgent need to update the workbooks
    • It is two-step process when you overwrite someone else’s workbook even you have the permission to do so:
      • The ‘Save’ button will grey out so you will have to click ‘Save As’
      • Click ‘Save As’, then type exact the same workbook same as what it was
      • Pop up warning if you want to overwrite, click Yes
      • Then ‘Save’ button will show, click ‘Save’ to overwrite
  5.   Delete workbooks
    • Yes. Project leader/owner can delete any workbooks in the project. This is very handy when there is a need to clean up things.
    • Self-service comes with accountability. Mistake can be made here to delete actively used workbooks
    • Tableau server will not warn you during deletion even the workbook is actively used.
  6.  Change workbook or data source or flow permissions
  7.  Move workbook
    • When move workbook from project A to B, you need to be project leader for both project A and B
  8.  Restore old revision 
  9.  Lock or unlock project permission
    • There is no Undo for this action
    • When lock project permission from ‘Customizable’, all existing permissions will be lost and all workbook permissions will be changed to project level permission. Strongly recommend to take some screenshots before this action as there is no undo.
    • Leverage the option to allow lock the nested project or not, that is the biggest small feature, I will have separate blog to talk about that feature
  10.  Certify or uncertified data sources
    • Only project leader/owner can certify or un-certify data source
    • Tableau has no feature associated with certified data source, other filtering during searching
  11.  Create or delete sub project folder
    • Project owner can delete project that the user is owner of while project leader can’t delete the project
    • However project leader can delete sub project (aka nested project)
  12. Add or remove project leaders
    • It is possible that you add John as project leader, then John just removed you out of project leader role. A lot of trust among project leaders. As best practices, I will recommend to keep max 3 project leaders for any project
  13. Change project owner – only project owner has this permission

Pls continue reading next blog Project (3/10) : Project Leader/owner and their site role

Project (1/10) : Differences between Project Leader and owner

Tableau’s project leaders and owner have super powerful built-in privileges. The intent of this project serial is to help project leaders/owner/admins understand and leverage Tableau features better. Everything discussed here applies to both Tableau serve and online unless specified.

  • Project Leaders vs owner
  • What project leader can do
  • Project leader/owner permissions in nested projects
  • Locked or customizable project
  • Common use cases of locked project
  • Publishing to nested project
  • How to create ‘Tableau Public’ project within your firewall
  • How to create ‘Private’ project for every publisher
  • How to automate top level project creation
  • How to plan your project structure

Differences between Project Leaders vs Owner

Project owner is an individual users who owns anything and everything about the project.

The Project Leader provides a way to allow multiple users administrative access to a project, its child projects, and all workbooks and data sources in those projects.

High Level Difference : project owner vs leader vs admins

  • Only admins can create top level projects.
  • Whoever created the project becomes owner by default after project created
  • Strongly recommended admin to change the owner to project requestor after project created
  • Project owner can add project leaders

Key differences between leader and owner:

  1. Owner is individual user while leader can be user or group
  2. Owner can delete top level project while leader can’t
  3. Owner can change the owner while leader can’t
  4. Owner will receive access request notification for locked project while leader will not

Read next blog: PROJECT (2/10) : Project Leader’s permission

GOVERNED SELF-SERVICE ANALYTICS: PUBLISHING (6/10)

The publishing process & policy covers the followings areas:  Engagement Process; Publisher Roles; Publishing Process and Dashboard Permissions.

PublishingFirst step is to get a  space on the shared enterprise self-service server for your group’s data and self-service dashboard, which is called Engagement Process. The main questions are:

  • From requester perspective, how to request a space on shared enterprise self-service server for my group
  • From governance perspective, who decides and how to decide the self-service right fit?

Once a business group has a space on the shared enterprise self-service server, the business group has to ask the following questions:

  • Who can publish dashboard from your group?
  • Who oversees or manages all publishers in my group?

After you have given a publishing permission to some super users from your business group, those publishers need to know the rules, guidance, constraints on server, and best practices for effective dashboard publishing. Later on you may also want to make sure that your publishers are not creating the islands of information or creating multiple versions of KPIs.

  • What are publishing rules?
  • How to avoid duplications?

The purpose of publishing is to share your reports, dashboards, stories and insights to others who can make data-driven decisions. The audiences are normally defined already before you publishing the dashboards although dashboard permissions are assigned after publishing from workflow perspective. The questions are:

  • Who can access the published dashboards?
  • What is the approval process?

Engagement Process

Self-service analytics does not replace traditional BI tools but co-exists with traditional BI tools. It is very rare that you will find self-service analytics platform is the only reporting platform in your corporation. Very likely that you will have at least one IT-controlled enterprise-reporting platform designed for standard reporting to answer known questions using data populated from enterprise data warehouse. In additional to this traditional BI reporting platform, your organization has decided to implement a new self-service analytics platform to answer unknown questions and ad-hoc analysis using all the available data sources.  Co-exist

This realization of traditional BI and self-service BI co-existing is important to understand this engagement process because guidance has to be defined which platform does what kinds of reporting. After this guidance is defined and agreed, continuous communication and education has to be done to make sure all self-service super users are in the same page for this strategic guidance.

Whenever there is a new request for a new self-service analytics application, fitness assessment has to be done before proceed. The following checklist serves this purpose:

  • Does your bigger team have an existing site already on self-service analytics server? If yes, you can use the existing site.
  • Who is the primary business / application contact?
  • What business process / group does this application represent? (like sales, finance, etc)?
  • Briefly describe the purpose and value of the application?
  • Do you have an IT contact for your group for this application? Who is the contact?
  • What are the data sources?
  • Are there any sensitive data to be reporting on (like federal data, customer or client data)? If yes, describe in details about the source data.
  • Are there any private data as part of source data? (like HR data, sensitive finance data)
  • Who are the audiences of the reports? How many audiences do you anticipate? Are there any partners who will access the data?
  • Does the source data have more than one enterprise data? If yes, what is the plan for data level security?
  • What are the primary data elements / measures to be reporting on (e.g. booking, revenue, customer cases, expenses, etc)
  • What will be the dimensions by which the measure will be shown (e.g. product, period, region, etc)
  • How often the source data needs to be refreshed?
  • What is anticipated volume of source data? How many quarters of data? Roughly how many rows of the data? Roughly how many columns of the data?
  • Is the data available in enterprise data warehouse?
  • How many self-service report developers for this application?
  • Do you agree with organization’s Self-Service Analytics Server Governance policy (URL ….)?
  • Do you agree with organization’s Self-Service Analytics Data Governance policy (URL ….)?

The above questionnaires also include your organization’s high-level policies on data governance, data privacy, service level agreement, etc since most of the existing self-service tools have some constraints in those areas. On one side, we want to encourage business teams to leverage the enterprise investment of the self-service analytics platform. On the other side, we want to make sure that every new application is setup for success and do not create chaos that can be very expensive to fix later on.

Publisher Roles

I heard a lot of exciting stories about how easy people can get new insights with visualization tools (like Tableau). Myself experienced a few of those insightful moments as well. However I also heard a story about new Tableau Desktop user who just came out of fundamental training, he quickly published something and shared to the team but caused a lot of confusions about the KPIs being published. What is wrong? It is not about the tool. It is not about the training but publishing roles and related process.

The questions are as followings:

  • Who can publish dashboard from my group?
  • Who oversees or manages all publishers in my group?

Sometimes you may have easy answers to those questions but you may not have easy answers for many other cases. One common approach is to use projects or folders to separate boundary for various publishers. Each project has project leader role who overalls all publishers within the project.

You can also define a common innovation zone where a lot of publishers can share their new insights to others. However just be aware that the dashboards in innovation zone are early discovery phase only and not officially agreed KPIs. Most of the dashboards will go through multiple iterations of feedback and improvement before become useful insights. We do encourage people to share their new innovations as soon as possible for feedback and improvement purpose. It will be better to distingue official KPIs with innovation by using different color templates to avoid the potential confusions to end audiences.

Publishing Process

To protect the shared self-service environment, you need to have clear defined publishing process:

  • Does IT have to be involved before publish a dashboard to the server?
  • Do you have to go from a non-production instance or non-production folder to a production folder?
  • What is the performance guidance?
  • Should you use live connection or extracts?
  • How often you should schedule your extracts? Can you use full refresh?
  • What are the data security requirements?
  • Do you have some new business glossary in your dashboards? If yes, did you spell out the definition of the new business glossary?
  • Does the new glossary definition need to get approval from data stewardship? Did you get the approval?
  • Who supports the new dashboards?
  • Does this new dashboard create potential duplication with existing ones?

Each organization or each business group will have different answers to those above questions. The answers to above questions form the basic publishing process that is essential for scalability and avoid chaos.

Here is summary of what most companies do – so call the common best practices:

  1. IT normally is not involved for the releasing or publishing process for those dashboards designed by business group – this is the concept of self-service
  2. IT and business agreed on the performance and extract guidance in advanced. IT will enforce some of the guidance on  server policy settings (like extract timeout thresholds, etc). For many other parameters that can’t be systematically enforced, business and IT agreed on alert process to detect the exceptions. For example a performance alert that will be sent to dashboard owner and project owner (or site admin) if dashboard renders time exceeds 10 seconds.
  3. Business terms or glossary definition are important part of the dashboards.
  4. Business support process is defined so end information consumers know how to get help when they have questions about the dashboard or data.
  5. Dashboards are clarified as certified and non-certified. Non-certified dashboards are for feedback purpose while certified ones are officially approved and supported.

Dashboard Permissions

When you design a dashboard, most likely you have audiences defined already. The audiences have some business questions; your dashboards are to answer those questions. The audiences should be classified into groups and your dashboards can be assigned to one or multiple groups.

If your dashboards have row level security requirements, the complexity of dashboards will be increased many times. It is advised that business works with IT for the row level security design. Many self-service tools have limitations for row level security although they all claim row level security capability.

The best practice is to let database handle row level security to ensures data access consistence when you have multiple reporting tools against the same database. There are two challenges to figure out:

  • Self-service visualization tool has to be able to pass session user variable dynamically to database. Tableau starts to support this feature for some database (like query banding feature for Teradata or initial SQL for Oracle)
  • Database has user/group role tables implemented.

As summary, publishing involves a set of controls, process,  policy and best practices. While support self-service and self-publishing, rules and processes have to be defined to avoid potential expensive mistakes later on.

Please read next blog for performance management

Version COMPATIBILITY – Tableau Prep Builder and Tableau Server

Recently I found one interesting Tableau server behavior about compatibility between Prep Conductor (Data Management Add-on model) and Tableau Prep Builder:

  • V2020.4 Server is not full compatible with V2020.4 Prep Builder.
  • V2020.3 Server is not full compatible with V2020.3 Prep Builder.

You did not read it wrong. This is actually in Tableau’s official documentation that is hard to understand. Here is my simple translation:

The key takeaway is V2020.3 Prep Builder is NOT fully compatible with V2020.3 server
  1. If you use Prep Builder v2020.3 or lower to publish flow to Tableau v2020.4 server, the flow always works.
  2. If you use Prep Builder v2020.4 to publish flow to Tableau server v2020.4, the flow may not run due to possible incompatible feature
    • You should get a warning for incompatible feature before publish
    • You have option to exclude the incompatible feature before publish
    • If warning ignored and continue, publish can be done successfully but flow can’t run – after server upgraded, the flow is supposed to run
  3. If you use Prep Builder v2020.4 to publish extracts to Tableau server v2020.4, it works as there is no compatibility issue. What is not working is server Prep Conductor.

Here is a few screenshots to help you understand the compatibility better:

If you connected to Tableau server when building the flow, you will get the following warning when use incompatible feature

You still have option to publish to server if you really want….The good thing is that it shows exactly what feature is not compatible

After publish the incompatible feature to server, the flow unfortunately can’t run…

Then you will need to find incompatible feature, remove it before publish…

Conclusions:

It is a surprise that Prep Builder v2020.4.x is not fully compatible with Prep Conductor v2020.4.x but is fully compatible with Prep Conductor v2021.1

Advanced Deployment (9/10): OPTIMIZE BACKGROUNDER (extract AND SUBSCRIPTION) efficiency

Are you facing the situation that your Tableau server backgrounder jobs having long delay? You always have limited backgrounders. How to cut average extract/subscription delay without adding backgrounders? This webinar covers the following 4 things:

  1. Suspend extract for inactive content
  2. Reduce extract frequency per usage
  3. Dynamic swap vizQL and backgrounder
  4. Incremental or smaller extracts run first
  5. VIP extract priority
  • Download slides here

  • Watch recording here

  1. Suspend extract for inactive content

I used to suspend extract for inactive content by using Python. But thanks for Tableau v2020.3 that made this as  out-of-box feature. This feature should be the first thing every server admin does. The good thing is that this feature is ON as default with 30 days.

suspend extract for inactive content

2.  Reduce extract frequency per usage

Challenge:There are many unnecessary extract refreshes due to the fact that all schedules are available to every publisher who have complete freedom to choose whatever schedules they wanted. Although workbooks not used for weeks at all would be suspended as part of new v2020.3 feature. But what if the workbook with hourly or daily extract but only use once a month? … Maybe initially usage is high but overtime usage went down but publisher never bothers to reduce refresh frequency…. They have no incentive to do so at all.

Solution: Set Usage Based Extract Schedule – reschedule the extract frequency based on usage

For example:

  • Hourly refresh changes to daily if workbook not used for 3 days
  • Daily changes to weekly if workbook not used for 3 weeks
  • Weekly changes to monthly if workbook not used for 2 months

A few implementation notes:

too much workbook refresh

  • Here is how it works:
    • Find out the last_used (by days)
      select views_workbook_id, ((now())::date – max(last_view_time)::date) as last_used from _views_stats
      group by views_workbook_id
    • Find out refresh schedule by joining tasks table with schedules table
    • Do the calculation and comparison.  For example
      extract change

How to change schedule frequency?

  • Manual approach : Change the workbook or datasource refresh schedule based on the attached schedule change recommendation workbooks
  • Automation:  No API to change schedules.  The schedule change can be done to update tasks.schedule id (tasks is the table name and schedule id is column name):

UPDATE tasks
SET schedule id = xxx
WHERE condition;

A few additional notes:

  • How to figure out which schedule id to change to? Let’s say you have 10 daily schedules, when you change from hourly to daily, the best way is to randomly choose one of the 10 daily schedules to avoid the situation that overtime too many jobs are on one specific schedule.
  • What if publisher changes back from daily to hourly? They do have freedom to change their extract schedules at any time. This is the world of self-service. However they will not beat your automatic scripts over time. On the other side, this feature will help you to get buy-in from business.
  • How much improvement can you expect with this automation? Depends on your situation. I have seen 50%+ delay reductions.
  • Is the automation approach supposed by Tableau? NO NO. You are on your own risk but the risk is low for me, return is high.

3.  Dynamically swap backgrounders and VizQL

Tableau’s Backgrounder handles extract refresh or subscriptions and VizQL handles viz render. Often VizQL has more idle time during night while Backgrounder has more idle time during day. Is it possible to automatically configure more cores as Backgrounders during the night and more VizQL during the day? The dream becomes true with Tableau TSM from V2018.2.screenshot_3651

  • How to identify the right time to swap?
    • Get extract delays by hour from backgrounder_jobs table
    • Get VizQL usage by hour from http_requests table
    • The use pattern will show the best time to swap
  • Can I have the scripts to swap? Click Backgrounder swap with VizQL scripts.txt.zip to download working version of the scripts
  • What happens with in-flying tasks when the backgrounder is gone? The tasks will fail and get re-started automatically from V2019.1

4.  Incremental or smaller extracts run first

screenshot_3652

  • Make sure to educate your publishers since this feature is a great incentive for them
  • How to config incremental goes first? There is nothing to config for incremental goes first. It is out-of-box Tableau feature
  • How to config Smaller Full Extract goes first?
    • V2019.3 : No Config Required
    • V2019.1 & V2019.2: backgrounder.enable_task_run_time_and_job_rank  & backgrounder.enable_sort_jobs_by_job_rank
    • V2018.3or older backgrounder.sort_jobs_by_run_time_history_observable_hours -v 180 (recommend 180 hrs to cover the weekly jobs)

5  VIP Extract Priority

Challenge : If you will have to give higher priority to some extracts, the challenge is that new revision of the workbook or datasource will set extract priority back to default 50.

Solution : You can automate it by (no API available)

UPDATE tasks

SET tasks.priority = xx

WHERE condition;

Read more @ https://enterprisetableau.com/extract/

Re-cap: Although Tableau did not give server admins enough control on the extract refresh schedule selections for given workbook or datasource, there are still ways to govern your Tableau server backgrounder jobs :

  • Reduce extract frequency per usage will reduce all the unnecessary refresh. This can increase 50% your backgrounder efficiency.
  • Dynamic swap vizQL and backgrounder will give you more machine power. This can get 50% more backgrounders depends on your use pattern.
  • Incremental or smaller extracts run first is out-of-box feature. Make sure to let publishers know as this is a incentive for them to design effective extracts.
  • VIP extract priority may not help a lot for the backgrounder efficiency comparing with other 3 items but this is one of things that you may have to do per business need

Governed Self-Service Analytics: Data Governance (8/10)

I was in the panel discussion at Tableau Conference 2015 about self-service analytics to a group of executives. Guess what is the no.1 most frequent asked question – data governance. How to make sure that data not get out of hands? How to make sure that the self-service analytics does not break the existing organization’s process, policy around data protections, data governance?

Data governance is a big topic. Tableau’s Data Management Add-on model (Data Catalog, Tableau Prep Conduct) are making great progress toward data management. This blog focuses following 3 things:

  • Data governance for self-service analytics
  • How to enforce data governance in self-service environment
  • How to audit self-service environment
  1. Data governance for self-service analytics

First of all, what is data governance?

Data governance is a business discipline that brings together data quality, data management, data policies, business process management, and risk management surrounding the handling of data.

The intent is to put people in charge of fixing and preventing issues with data so that the enterprise can become more efficient.

The value of enterprise data governance is as followings:

  • Visibility & effective decisions: Consistent and accurate data visibility enables more accurate and timely business decisions
  • Compliance, security and privacy: Enable business to efficiently and accurately meet growing global compliance requirements

What data should be governed?

Data is any information in any of our systems. Data is a valuable corporate asset that indirectly contributes to organization’s performance.   Data in self-service analytics platform (like Tableau) definitely is part of data governance scope. All the following data should be governed:

  • Master Data: Data that is shared commonly across the company in multiple systems, applications and/or processes. Master Data should be controlled, cleansed and standardized at one single source. Examples: Customer master, product item master. Master data enable information optimization across systems, enable data enrichment, data cleaning and increase accuracy in reporting.
  • Reference Data: Structured data used in an application, system, or process. Often are common lists set once a fiscal year or with periodic updates. Examples like current codes, country codes, chart of accounts, sales regions, etc.
  • Transactional Data: The information recorded from transactions. Examples like user clicks, user registrations, sales transactions, shipments, etc. The majority of the enterprise data should be the transactional data. Can be financial, logistical or work-related, involving everything from a purchase order to shipping status to employee hours worked to insurance costs and claims. As a part of transactional records, transactional data is grouped with associated master data and reference data. Transactional data records a time and relevant reference data needed for a particular transaction record.

What are data governance activities?

  • Data ownership and definition: The data owner decides and approves the use of data, like data sharing/usage requests by other functions. Typically data owners are the executives of the business areas. One data owner is supported by many data stewards who are the operational point of accountability for data, data relationship and process definitions. The steward represents the executive owners and stakeholders. Data definition is what data steward’s responsibility although many people can contribute to the data definitions. In the self-service environment where data is made available to many analyst’s hands, it is business advantage to be able to leverage those data analyst’s knowledge and know-how about the data by allowing each self-service analyst to comment, tag the data, and then find a way to aggregate those comments/tags. This is again the community concept.
  • Monitor and corrective actions: This is an ongoing process to define process flow, data flow, quality requirement, business rules, etc. In the self-service environment where more and more self-service developers have capability to change metadata and create calculated fields to transform the data, it can be an advantage and can also become chaos if data sources and process are not defined within one business group.
  • Data process and policy: This is about exception handlings.
  • Data accuracy and consistency: Commonly known as data quality. This is where most of time and efforts are spent.
  • Data privacy and protection: There are too many examples that data leakage damages brand and causes millions for organizations. Some fundamental rules have to be defined and enforced for self-service enterprise to have a piece of mind.

2. How to enforce privacy and protection in self-service environment?

The concept here is to have thought leadership about top sensitive data before make data available for self-service consumption. To avoid potential chaos and costly mistakes, here is high level approach I use:

  • Define what are the top sensitive dataset for your organization (for example Personally Identifiable Information (PII) Tier classifications.
  • Then use Tableau’s Data Lineage Feature to find out any workbooks using those PII data.
  • Send alerts to workbook owners for the list of workbooks using PII.

3. What are the additional considerations for data privacy governance?  

  • No privacy and private data is allowed to self-service server. Like SSN, federal customer data, credit cards, etc. Most of those self-service platform (like Tableau) is defined for easy of use, and does not have the sophisticate data encrypt technologies.
  • Remove the sensitive data fields (like address, contacts) in database level before making the data available for self-service consumption. The reason is that it is really hard to control those data attributes once you open them to some business analytics super users.
  • Use site as partition to separate data, users, and contents for better data security. For example, finance is a separate site that has finance users only. Sales people have no visibility on finance site.
  • Create separate server instance for external users if possible. Put the external server instance in DMZ zone. Different level of network security will be applied as additional layer of security.
  • Create site for each partner / vendor to avoid potential problems. When you have multiple partners or vendors accessing your Tableau server, never put two vendors into same site. Try to create one site for each vendor to avoid potential surprises.

4. How to audit self-service environment?

You can’t enforce everything. You do not want to enforce everything either. Enforcement comes with disadvantages too, like inflexibility. You want to choose the most critical things to enforce, and then you leave the remaining as best practices for people to follow. Knowing the self-service analytics community always tries to find the boundary, you should have audit in your toolbox. And most importantly let community know that you have the auditing process.

  • What to audit:
    • All the enforced contents should be part of audit scope to make sure your enforcement works in the intended way
    • For all the policy that your BU or organization agreed upon.
    • For any other ad-hoc as needed
  • Who should review the audit results:
    • Self-service governance body should review the results
    • BU data executive owners are the main audiences of auditing reports. It is possible that executives gave special approvals in advanced for self-service analysts to work on some datasets that she or he does not have access normally. When they are too many exceptions, it is an indication of potential problem.
  • Roles and responsibilities of audit: Normally IT provides audit results while business evaluate risks and make decisions about process changes.
  • How to audit: Unfortunately Tableau does not have a lot of server audit features. There is where a lot of creativities come into play. VizAlert can be used. Often creating workbooks from Tableau database directly is the only way to audit.

Please read next blog about content management.

Governed Self-Service Analytics: Performance Management (7/10)

Performance management has been everyone’s concerns when it comes to a shared self-service environment since nobody wants to be impacted by others. This is especially true when each business unit decides their own publishing criteria where central IT team does not gate the publishing process.

How to protect the shared self-service environment? How to prevent one badly designed query from bringing all servers to their knees?

  • First, set server parameters to enforce policy.
  • Second, create daily alerts for any slow dashboards.
  • Third, made performance metrics public to your internal community so everyone in the community has visibility of the worse performed dashboards to create some peer pressures with good intent.
  • Fourth, hold site admin or business leads to be accounted for the self-service dashboard performance.

You will be in good shape if you do those four things above. Let me explain each of those in details.

performance

  1. Server policy enforcement

The server policy setting is for enforced policies. For anything that can be enforced, it is better to enforce those so everyone can have a piece of mind. The enforced parameters should be agreed upon business and IT, ideally in the governance council. The parameters can always be reviewed and revised when situation changes.

Some super useful enforced parameters are (pls ref my presentation Zen Master Guide to Optimize Server Performance  for details):

  • Set VizQL  Session Timeout as 3 minutes vs default 30 minutes
  • Set Hyper Session Memory Timeout as 10G vs default no limit
  • Set Process Memory limit as 60% of system memory vs about 95% when is too late.
  1. Exception alerts

There are only a few limited parameters that you are able to control as enforcement. All the rest will have to be governed by process. The alerts are most common approach to server as exception management:

  • Performance alert: Create alerts when dashboard render time exceeds agreed threshold.
  • Extract size alerts: Create alerts when extract size exceed define thresholds (Extract timeout can be enforced on server but not size).
  • Extract failure alerts: Create alerts for failed extracts. Very often stakeholders will not know the extract failed. It is essential to let owners know his or her extracts failed so actions can be taken timely.
  • You can create a lot of more alerts, like CPU usage, overall storage, memory, etc.

How to do the alerts? There are multiple choices. My favorite one is VizAlert for Tableau https://community.tableau.com/groups/tableau-server-email-alert-testing-feedbac

Who should receive the alerts? It depends. A lot of alerts are for server admin team only, like CPU usage, memory, storage, etc. However most of the extracts and performance alerts are for the content owners. One best practice for content alert is always to include site admins or/and project owners as part of alerts. Why? Workbook owners may change jobs so the original owner may not be responsible for the workbooks anymore. I was talking with a well known Silicon Valley company recently, they are telling me that a lot of workbook owner changed in last 2 years, they had hard time to figure out whom they should go after for issues related to workbooks. Site admin should be able to help to identify the new owners. If site admin is not close enough to workbook level in your implementation, you can choose project leaders instead of site admin.

What should be the threshold? There is no universal answer. But nobody wants to wait for more than 10 seconds. The rule of thumb is that anything less than 5 seconds good. However anything more than 10 seconds is no good. I got a question when I present this in one local Tableau event. The question was what if one specific query used to take 30 minutes, and team made great progress to reduce it to 3 minutes. Do we allow this query to be published and run on server? The answer is depends. If the view is so critical for business, it will be of course worth of waiting 3 minutes for results to render. Everything has exception. However if the 3-minute query chokes everything else on the server and users may click the view to trigger the query often, you may want to re-think the architecture. Maybe the right answer will be to spin-off another server for this mission critical 3-minute application only so the rest of users will not impact.

Yellow and red warning: It is a good practice to create multi-level of warning like yellow and red warning with different threshold. Yellow alerts are warnings while red alerts are for actions.

You may say, hi Mark, this all sounds great but what if people do not take the actions.

This is exactly where some self-service deployments go wrong. There is where governance comes to play. In short, you need to have strong and agreed-upon process enforcement:

  • Some organizations use charging back process to motivate good behaviors. The charge back will influence people’s behaviors but will not be able to enforce anything.
  • The key process enforcement is a penalty system when red alert actions are not taken timely.

If owner did not take corrective actions during agreed period of time for red warning, a meeting should be arranged to discuss the situation. If the site admin refuses to take actions, the governance body has to make decision for agreed-upon penalty actions. The penalty can lead to site suspension. Once a site is suspended, nobody can excess any of the contents anymore except server admins. The site owners have to work on the improvement actions and show the compliances before site can be re-activated. The good news is that all the contents are still there when a site is suspended and it takes less than 10 seconds for server admin to suspend or re-active a site.

I had this policy that was agreed with governance body. I communicate to as many self-service developers about this policy as I can. I never got push back about this policy. It is clear to me that self-service community likes to have a strong and clearly defined governance process to ensure everyone’s success. I suspended a site for some other reasons but never had to suspend a site due to performance alerts. What happens is that is my third tricky about worse performed dashboard visibility.

  1. Make performance metric public

It takes some efforts to make your server dashboard performance metric public to all your internal community. But it turns out that it is one of the best things that a server team can do. It has a few benefits:

  • It serves as a benchmarking for community to understand what is good and good enough since the metric shows your site overall performance comparing with others on the server
  • It shows all the long render dashboards to provide peer pressures.
  • It shows patterns that help people to focus the problematic areas
  • It creates great opportunity for community to help each other. This is one most important success factor. What turns out is that the problematic areas are often the new team on-boarded to the server. It community always have so many ideas to make dashboard perform a lot of better. This is why we never had to suspend any sites since when it comes with a lot of red alerts that community is aware of, it is the whole community that makes things happen, which is awesome.
  1. Hold site admin accounted for

I used to manage Hewlett Packard’s product assembly line during my early career. Hewlett Packard has some well-known quality control processes. One thing that I learned was that each assembler is response for his or her own quality. Although there is QA at the end of line but each workstation has a checklist before pass to next station. This simple philosophy applies today’s software development and self-service analytics environment. The site admin is responsible for the performance of workbooks in the sites. The site admin can further hold workbook owners accounted for the shared workbooks. Flexibility comes with accountability too.

I believe theory Y (people have good intent and want to perform better) and I have been practicing theory Y for years. The whole intent of server dashboard performance management is to provide performance visibility to community and content owners so owners know where the issues are so they can take actions.

What I see often is that a well-performed dashboard may become bad over-time due to data changes and many other factors. The alerts will catch all of those exceptions no matter your dashboards are released yesterday, last week, last month or last year – this approach is a lot of better than gating releases process which is a common IT practice.

During a recent Run-IT as business meet-up, audiences were skeptical when I said that IT did not gate any workbook publishing process and it is completely a self-service. Then audiences started to realize that it did make sense when I started to talk about performance alerts that will catch it all together. What business likes most about this approach is the freedom to push some urgent workbooks to server even workbooks are not performing great – they can always come back later on to tune them and make them perform better for both better use experiences and being good citizen.

Please continue to read next blog about data governance.

Governed Self-Service Analytics: Multi-tendance (5/10)

Tableau has a multi-tendance strategy which is called site.  I heard many people asking if they should use site, when should use site. For some large Tableau deployment,  people also ask if you have created separate Tableau instances. All those are Tableau architecture questions or multi-tendance strategy.

 

How do you approach this? I will use the following Goal – Strategy – Tactics to guide the decision making process.screenshot_42

It starts with goals. The  self-service analytics system has to meet the following expectations which are ultimate goals:  Fast, Easy, Cost Effectiveness, Data Security, Self-Service, Structured and unstructured data.

Now keep those goals in our mind while scale out Tableau from individual teams to department, and then from department to enterprise.

 

How do we maintain self-service, fast and easy with solid data security and cost effectiveness while you deal with thousands of users? This is where you need to have well-defined strategies to avoid chaos.

First of all, each organization has its own culture, operating principles, and different business environment. Some of the strategies that work very well in one company may not work for others. You just have to figure out the best approach that matches your business requirement. Here is some of food for thoughts:

  1. Do you have to maintain only one Tableau instance in your organization? The answer is no. For SMB, the answer may be yes but I have seen many large organizations have multiple Tableau instances for better data security and better agility. I am not saying that Tableau server can’t scale out or scale up. I have read the Tableau architecture white paper for how many cores one server can scale. However they are many other considerations that you just do not want to put every application in one instance.
  2. What are the common use cases when you may want to create a separate instance? Here is some examples:
    • You have both internal employees and external partners accessing your Tableau server. Tableau allows both internal and external people accessing the same instance. However if you would have to create a lot of data security constraints in order to allow external partners to access your Tableau server, the same constraints will be applied to all Tableau internal users which may cause extra complexity. Depends on the constraints you will have, if fast and easy goals are compromised, you may want to create a separate instance to completely separate internal users vs. external users – this way you have completely piece of mind.
    • Network seperation. It is getting common that some corporations have separate engineering network from the rest of corp network for better IP protections. When this is the case, create a separate Tableau instance within engineering network is an easy and simple strategy.
    • Network latency. If your data source is in APAC while your Tableau server is in US, likely you will have some challenges with your dashboard performance. You should either sync your database to US or you will need to have a separate Tableau server instance that sits in APAC to achieve your fast goals.
    • Enterprise mission critical applications. Although Tableau started as ad-hoc and exploration for many users, some Tableau dashboard start to become mission critical business applications. If you have any of those, congratulations! You have a good problem to deal with. Once some apps become mission critical, you will have no choice but tight up the change control and related processes which unfortunately are killers to self-service and explorations. The best way to resolve this conflict is to spin-off a separate instance with more rigors on mission critical app while leave the rest of Tableau as fast, easy self-service.

What about Tableau server licenses? Tableau server have seat-based license model and core-based license model. If you have seat-based model, which goes by users. The separate of instance should not have much impacts on total numbers of licenses.

Now Let’s say that you have 8 core based licenses for existing internal users. You plan to add some external users. If you will have to add 8 more cores due to external users,  your separate instance will not have any impacts on licenses.  What if you only want to have a handful external users? Then you will have to make trade-off decision. Alternately you can keep your 8 core for internal users while get handful seat-based license for external users only.

How about platform cost and additional maintenance cost when we add separate instance? VM or hardware are relatively cheap today. I will agree that there are some additional work initially to setup a separate instance but server admin work is not doubled because you have another server instance.  On the other side, when your server is too big, it is a lot of more coordinations with all business functions for maintenance, upgrade and everything. I have seen some large corp are happy with multiple instance vs. one huge instance.

How about sites?  I have blog about how to use site. As summary, site is useful for better data security, easy governance, employing self-service and distributing administrative work. Here is some cases when sites should not be used:

  • Do not create a new site if the requested site will use the same data sets as one of the existing sites, you may want to create a project within the existing site to avoid potential duplicate extracts (or live connections) running against the same source database. Since 2020.1, project has a new feature to lock or unlock any sub-projects, many content segmentation features can be achieved by using projects/sub-projects vs sites.
  • Do not create a new site if the requested site overlaps end users a lot with one existing site, you may want to create a project within the existing site to avoid duplicating user maintenance works

As summary, while you plan to scale Tableau from department to enterprise. you do not have to put all of your enterprise users on one huge Tableau instance. Keep goals in your mind while deciding the best strategy for your business. The goals are easy, fast, simple, self-service, data security, cost effectiveness. The strategies are separate instance and sites.

 

Please read next blogs about release process.

Tableau Metrics Deep Dive

Tableau 2020.2 released a long waiting Metrics feature.  Metrics make it easy to monitor key performance indicators from web or  Mobile. It is super easy to create. When you use Tableau Mobile to check Metrics, you can update trend line data range and even compare measures with time frames.

Marc Reid(Zen Master)’s dataviz.blog has a nice summary about how to use Metrics. This blog talks about how Metrics works on Tableau server to answer the following questions:

  • Who can create Metrics
  • What is entry point to create Metrics
  • What is the relationship between Metrics and its connected view
  • How Metrics permission works
  • How metrics is refreshed
  • What are the differences between Metrics and subscription/Data-driven alert
  • How to find out Metrics usage
  • Can I turn off Metrics for site, project or workbook
  • Is there Metrics revision

metrics_gif

Who can create Metrics

Only Publishers can create Metrics.

Metrics authoring is actually a publishing process, similar as save Ask Data results to server or workbook publish. The Metrics author has to have ALL the following 5 things to create a Metrics:

  1. Site role as Creator or Explorer (can publish)
  2. Publisher permission to a project on server/site
  3. v2020.2-2021.2 (Download Full Data permission) to the view. v2021.3 onwards, Create/refresh Metrics permission
  4. The  workbook has embedded password
  5. The workbook has no row level security or user filterscreenshot_2232

What is entry point to create Metrics?

screenshot_2233

View or Custom View are the only entry point to create Metrics unlike Ask Data using Data Source as entry point.

What is the relationship between Metrics and its connected view ?

Metrics is created from a view. However soon as Metrics is created, it becomes more like ‘independent’ object that has its owner and its own permission. Metrics will still stay on server when connected view is deleted although Metrics can’t be refreshed anymore. This is very similar as relationship between Published Data Source and connected workbook.

How Metrics permission works?

For those who know me, I spent hours or hours to test/validate Tableau permissions. Here is how Metrics permission works:

  • Metrics has its own independent permission
  • Metrics permission is controlled by Metrics owner and the project leader where Metrics is published
  • Metrics owner decides who can access/overwrite the Metrics.
  • Metrics owner can grant permission to any Explorer who does not have the original connected view permission at all. This is an important behavior to be aware of, similar as published data source and its connected workbook permission process (like it or not)
  • For example, John’s dashboard granted Allan permission. If Allan is a publisher on the server, Allan can create a Metrics and grant 1,000 other user’s access to the Metrics without John’s approval, or even without John’s knowledge.
  • An important behavior: John, as workbook owner, has a new Connected Metrics tab to have visibility on Metrics. However if Allan does not grant John access to the Metrics, John would have no idea at all such Metrics connected with his view (yes, that is how the permission works, similar as connected workbook with published data sources)

screenshot_2235

screenshot_2234Can I turn off Metrics for site, project or workbook?

If there is data security concern that you want to turn off Metrics. Here is how:screenshot_2236

  • Metrics can be turned OFF at site level although default is ON – admin goes to site setting for the flag
  • There is no feature to turn off Metrics at project level. But my tip is to set project permission such way to uncheck Download Full Data for workbooks and LOCK the project permission . This way none of the workbooks can be used to create Metrics
  • There is no feature to turn off Metrics at view level or workbook level. However as long as you do not give Download Full Data permission, Metrics can’t be created

How Metrics is refreshed?

screenshot_2237

  • Live connection: Refresh hourly
  • Extract: After each extract run. This is handled by Tableau server backgrounder new ‘Update all metrics in the view‘ process.
  • Server admin can change Live connection refresh interval (metricsservices.checkIntervalInMinutes) –  default value is 60 min.

How Metrics refreshes handles warnings and errors?

  • screenshot_2239If Metrics refresh failed 10 times in the row, Tableau server will      send notification email to Metrics owner. This 10 time can be configured
  • If Metrics refresh failed 175 times in the row, Tableau server will stop refresh. Metrics owner has to manually resume the refresh after problem solved.

How to find out Metrics usage?screenshot_2240

  • Built-in feature, similar as find view usage
  • Admins can also find usage details from admin views.

Is there Metrics revision?

No unlike workbook or data sources. If you need new version of a published Metrics, you can replace existing one if you are owner, project leader or user who has overwritten permission. Metrics permission will  remain unchanged after Metrics is replaced with a new revision. If you save new Metrics as different name, it will be a complete a new Metrics. You can create many Metics from the same view. You can’t combine two view’s measures into the same Metrics. As matter of fact, Metrics handle only single measure.

Re-cap:

  • Metrics is great new Tableau innovation to track KPIs instantly and on Mobile.
  • Metrics is created from an existing view on server and refresh hourly for live connection or whenever extract refresh for view using extract.
  • You can’t create Metrics from Desktop.
  • Metrics can create new potential data access or data governance challenge due to the fact that one can create a Metrics and grant anyone else access permission without approval or knowledge from original workbook/view owner. This is very different from Data-Driven Alert or Subscription that follow the exact view permission.
  • Today it is hard enough already to audit “who has what access” for a team/project with many workbooks. This Metrics feature will make this problem worse although Metrics is a great feature.
  • Workbook has a new Connected Metrics tab but if Metrics owner did not give workbook owner the Metrics access permission, workbook owner will not know the Metrics exists although the Metrics may be shared to many other server users….
  • Do not get me wrong. My intent is not to disencourge you to use Metics. Actually I do strongly encourage everyone to use it. However my point is to make sure admins/project leader/publishers full aware of the permission and data security behavior so you can put necessary controls to avoid potential data security chaos.
  • What are the potential controls? Some ideas from Mark:
    • For very sensitive data, put the workbooks into separate project, lock project permission and do not give Download Full Data permission to any of the workbooks
    • For one or a few sensitive workbooks, if you do not use lock project permission approach, you can control workbook level permission as well – not giving Download Full Data
  • Again, repeating…. For very sensitive workbooks, if you use v2020.2-2021.2, I strongly recommend to lock the Project permission and do not give Download Full Data permission so Metrics can’t be created.
  • If your server is v2021.3 or newer, you do not have potential metrics permission cascading issue anymore. Read https://enterprisetableau.com/metrics2/
  • Data Catalog – data lineage does not include Metrics data yet as v2020.2 release. But as I know that Tableau Dev is working on it.
  • Enjoy cool Metrics!!!!! screenshot_2241

Tableau Server New Stale Content Feature

Tableau v2020.3 released a super useful  Tag ‘Stale Content’ feature.  There are two big use cases:  content archiving and stop extract refresh schedule for stale content.

How it works? How to automate the workflow?

  • ‘Stale Content’ for server/site admins
  • Tag ‘Stale Content’ 
  • Automate archiving workflow
  • Move or delete ‘Stale Content’
  • Automate ‘Stale Content’ notification 

1. ‘Stale Content’ for server/site admins

‘Stale Content’ feature actually is available since v2020.2. However you can only tag the selected ‘Stale Content’ since v2020.3. The tag is the game changer.

screenshot_3260

 

 

 

 

 

 

 

 

2. Tag ‘Stale Content’ 

Although you can tag ‘Stale Content’ since v2020.3 feature, the fact is that server does not do anything for the tagged content. Hmnn. What we can do about the tagged ‘Stale Content’?

screenshot_3262

 

 

 

 

 

 

3. How to use ‘Stale Content’ tag to automate your workflow?

After ‘Stale Content’ is tagged, a less known feature is that you can actually select all the content by the ‘Stale Content’ tag, then you can move all of them to an archiving project, later on  you can also delete them.  This is an easy archiving process.  Here is how to select all tagged content. screenshot_3263

 

 

 

 

4. How move or delete ‘Stale Content’?

See the flow below for the process. Recommend to lock the permission ‘Archiving’ project:

screenshot_3264

  • After content is moved to ‘Archiving’ project, the owner remains the same so content owner can still access his/her workbook/datasource
  • Content owner can even download the stale content after moved to ‘Archiving’ project
  • Content owner can also re-publish the stale content to the original project if necessary – that is a good feature for a self-service platform
  • End consumers will not have permission to access the content in  ‘Archiving’ project that should be fine as it has not been used for many days anyway.

5. How to notify ‘Stale Content’ owners?

When you tag/move/delete ‘Stale Content’, Tableau server will not send any notifications automatically. How to automate the notification process?

  • Create VizAlert for moving to archiving project notification
  • Use Webhooks API for deletion notification from archiving (‘WorkbookDeleted’ event and ‘DatasourceDeleted’ event)

Re-cap:  This blog actually provided an awesome and simple content archiving approaching by leveraging the new ‘Stale Content’ feature released since v2020.3.  This ‘Stale Content’ process is also the most effective approach to deal with server backgrounder delays – it can reduce about 40% backgrounder delays if this is the first time that you will stop extract jobs for stale content.

How ‘Automatically Suspend Extract Refresh Tasks’ Works?

Tableau v2020.3 released a super useful feature for Tableau server admins – Automatically Suspend Extract Refresh Tasks for inactive workbooks.  This blog explains how it works.

suspend extracts

Why suspend extract refresh for inactive content? 

You may be surprised that how many of extracts are running daily or hourly while nobody actually uses the dashboards for the post month (I heard multiple server admins mentioned  about 40% before their archiving activities).  It has been a huge waste for your valuable Tableau backgrounders. Your active workbook extract delays can be significantly reduced if those inactive content’s extract tasks are stopped. I am really glad to see this feature is available since v2020.3

How it works?

Since v2020.3, for each site you can enable Automatically Suspend Extract Refresh Tasks feature and specify the number of days, from 7 through 100, that a workbook should be inactive before extract refresh tasks are suspended. The default is 30 days.

How inactive counter works?

Any of the followings are considered as active usage:

      • Setting up a new subscription 
      • Setting up a new data-driven alert
      • Viewing the workbook sheets
      • Download
      • Owner change
      • Move to new location

IMPORTANT NOTES: Active ongoing subscription or data-driven alert will reset the counter.

For example, a workbook has not viewed for 30 days but there is daily active subscription. The extract will not be suspended.”

Which kind of extracts in the scope?

Are there any auto notifications?

  • An email notification is sent three days before the extract refresh schedule is suspended.
  • Another email notification is sent when the extract refresh schedule is suspended.

Is there auto refresh resume with active usage again?

No. Active use will not resume the refresh automatically. Content owner can always add new extract refresh schedules at anytime after notification received if the owner knows active usage coming.

Re-cap: Now I can retire my Python scripts to stop extract refresh for inactive content due to this awesome Tableau server feature available now!!!

Governed Self-Service Analytics: Community (4/10)

Self-service analytics community is a group of people who share the common interest about self-service analytics and common value about data-driven decision-making culture.

Why people are motivated for the internal self-service community?

The self-service community motivations are as followings:

  • Empowerment: Self-service stems from – and affects – a wider macro trend of DIY on one hand, and collaboration on the other hand: content builders are taking the lead for services they require, and often collaborate with others. The key is to offer the members empowerment and control over the process, so they can choose the level of services they would like to engage in, thus affecting the overall experience.
  • Convenience: The benefit of community self-service is obvious – they can get fast access to the information they need without having to email or call IT or a contact center. According to Forrester, 78% of people prefer to get answers via a company’s website versus telephone or email.
  • Engagement: It is their shared ideas, interests, professions that bring people together to form a community. The members join in because they wish to share, contribute and learn from one another. Some members contribute, while others benefit from the collective knowledge shared within the community. This engagement is amplified when members induce discussion and debate about tools, features, processes and services provided and any new products that are being introduced. The discussions within the community inform and familiarize people with the new and better ways of getting things done – the best practices.

How to start creating an internal user community?

When you start creating an internal user community, you need to keep in mind that a lot of community activities are completely dependent on intranet. So you need to ensure that the community is one that can be easily accessed by the maximum number of people. Below is the checklist:

  • Determine a purpose or goal for it. One example: The place you find anything and everything about self-service analytics. Community is the place of sharing, learning, collaborating….
  • Decide who your target audience will be. Most likely audience should be those content developers and future content developers. Mostly likely the audiences are not the server end users.
  • Design the site keeping in mind the tools for interaction and the structure of your community.
  • Decide upon the manner in which you will host the community.
  • Create the community using tools available within your organization.
  • Create interesting content for the community.
  • Invite or attract members to join your community. Try to find out who has the developer licenses and send invitation to all of them.
  • Administer it properly so that the community flourishes and expands. It is a good practice to have at least two volunteers as moderators who make sure to answer user’s questions timely and close out all open questions if possible.

Who are the community members?screenshot_20

The audiences are all the content builders or content developers from business and IT across organization. Of course, the governing body or council members are the cores of the community. It is a good practice that council members lead most if not all the community activities. The community audiences also include future potential content builders. Council should put some focuses to reach out to those potential content builders. The end information consumers, those who get dashboards or reports, are normally not parts of the community, as end information consumers really do not care too much tools, technology or processes associated with the self-service. All end information consumers care is the data, insights and actions.

What are the community activities?

The quick summary is in the below picture. More detailed will be discussed later on.

  • Intranet: Your community home. It is the place for everything and everything about your self-service analytics. The tool, process, policies, best practices, system configuration, usage, data governance polices, server policies, publishing process, license purchasing process, tip, FAQ, etc.
  • Training: The knowledge base at community intranet is good but is not good enough. Although most of the new self-service tools are designed for easy of use, they do have a few learning curves. Training has to be organized to better leverage the investment.
  • User Meetings: User summit or regular best practice sharing is one must have community activity.
  • License Model: When a lot of business super users have dashboard development tools, what is most cost effective license model for dashboard development tools? Do you want to charge back for the server usage?
  • Support Process: Who support the dashboards developed by business super users? What is IT’s vs. business’ role in support end users?
  • External Community: Most self-service software vendors have ver active local or virtual or industrial community. How to leverage external community? How to learn the best practices?

Key takeaway: Build a strong community is the critical piece for success self-service analytics deployment in enterprise.

Please next blogs for Multi-tendance strategy

Governed Self-Service Analytics: Roles & Responsibilities (3/10)

When business super users are empowered to create discovery, data exploration, analysis, dashboard building and sharing dashboards to business teams for feedback, business is taking a lot of more responsibilities than what they used to do in traditional BI & analytics environment. One of the critical self-service analytics governance components is to create a clear roles and responsibilities framework between business and IT. This is one reason why the governing body must have stakeholders from both business and IT departments. The governing body should think holistically about analytics capabilities throughout their organization. For example they could use business analysts to explore the value and quality of a new data source and define data transformations before establishing broader governance rules.

A practical framework for the roles and responsibilities of self-server analytics is in following picture.screenshot_18

Business owns

  • Departmental data sources and any new data sources which are not available in IT managed enterprise data warehouse
  • Simple data preparation: Data joining, data blending, simple data transformation without heavy lifting ETL, data cleansing, etc.
  • Content building: exploration, analysis, report and dashboard building by using departmental data or blending multiple data sources together
  • Release or publishing: sharing the analysis, report or dashboard to information end consumers for feedback, business review, metrics, etc.
  • User training and business process changes associated with the new reports & dashboard releases.

IT owns

  • Server and platform management, licensing, vendor management, etc
  • Enterprise data management and deliver certified, trustworthy data to business, build and manage data warehouse, etc
  • Create and maintain data dictionary that will help business super users to navigate the data warehouse.
  • Support business unit report developers by collaborating to build robust departmental dashboards and scorecards, converting ad hoc reports into production reports if necessary.
  • Training business to user self-service analytics tools

It is a power shift from IT to business. Both IT and business leaders have to recognize this shift and be ready to support the new roles and responsibilities. What are the leader’s roles to support this shift?

  • Create BI/Analytics Center of Excellence: Identify the players, create shared vision, facilitate hand-offs between IT and business
  • Evangelize the value of self-service analytics: create a branding of self-service analytics and market it to drive the culture of analytics and data-driven decision-making culture; run internal data/analytics summit or conference to promote analytics
  • Create a federated BI organization: manage steering committee or BI council, leverage BI& Data gurus in each organization, and encourage IT people to go from order takers to consultants.

Please read my next blogs for Community.

Governed Self-Service Analytics : Governance (2/10)

How to govern the enterprise self-service analytics? Who makes the decisions for the process and policies? Who enforces the decisions?

In the traditional model, governance is done centrally by IT since IT handles the entire data access, ETL and dashboard development activities. In the new self-service model, a lot of business super users are involved for the data access, data preparation and development activities. The traditional top down governance model will not work anymore. However no-governance will create chaos situation. What will be needed for self-service environment is the new bottom up governance approach.

In the new self-service analytics model, since super business users do most of dashboard development, the more effective governance structure is to include representatives of those super business users.screenshot_17

In the picture, the blue box in the middle is the self-service analytics governing body for enterprise. It consists of both business and IT team members. The self-service analytics governing body members are self-service analytics experts & stakeholders selected by each business unit. You can think of the governing body members are the representatives of their business units or representatives of the entire self-service analytics content builder community. The charter of this governing body is as followings:

  • Define roles and responsibilities between business & IT
  • Develop and share self-service best practices
  • Define content release or publishing process
  • Define analytics support process
  • Define data access, data connections and data governance process
  • Define self-moderating model
  • Define dashboard performance best practices
  • Helps on hiring and training new self-service analytics skills
  • Communicate self-service process to entire self-service content builder community and management teams
  • Enforce self-service analytics policies to protect the shared enterprise self-service environment
  • Make sure that self-service process and policy alignment with enterprise process and policy around data governance, architecture, business objectives, etc

Should business or IT lead the governing body? While there are times when a business-led governing body can be more effective, do not discount an IT-led governing body. There are many good reasons to consider the IT-led governing body.

  • IT understands how to safely and accurately exposes an organization’s data and can standardize how data is exposed to self-service super users.
  • IT has a centralized view of all analytics needs from all functions of the organization, which can help the enterprise develop streamlined, reusable processes and leading practices to help business groups be more efficient using the tool.
  • IT can also centralize functions such as infrastructure, licensing, administration, and deeper level development, all which further cut down costs and mitigates risks.

What are the key skills and expectations of the head of governing body or leader of the center of excellence team? Different organizations use very different titles for this person. But the person at the helm of your of governing body or leader of the center of excellence team should have the following skills:

  • The passion about self-service analytics and related technologies
  • The ability to lead, set strategy, and prioritize objectives based on needs/impact
  • An in-depth understanding of self-service tool, the business analytics space, and the analytics needs of the business
  • The ability to align self-service analytics objectives with corporate strategy and direction
  • Comfort in partnering and negotiating with both business and IT stakeholders
  • A talent for navigating the organization to get things done

Please read my next blogs for roles and responsibilities

Governed Self-Service Analytics (1/10)

Organizations committed to improve data-driven decision-making processes are increasingly formulating an enterprise analytics strategy to guide the efforts in finding new patterns and relationships in data, understanding why certain results occurred, and forecasting future results. Self-service analytics has become the new norm due to availability and simplicity of newer data visualization tool (like Tableau) and data preparation technologies (like Alteryx)

However many organizations struggle to scale self-service analytics into enterprise level or even business unit level beyond the proof of concept. Then they blame tools and start to try different tools or technologies. It is nothing wrong to try something else, however what many analytics practitioners did not realize that technologies along were never enough to improve data-driven decision-making processes. Self-service tools alone do not resolve organizational challenges, data governance issues, and process inefficiencies. Organizations that are most successful with self-service analytics deployment tend to have a strong business and IT partnership around self-service; a strategy around data governance; and defined self-service processes and best practices. The business understands its current and future analytics needs, as well as the pain points around existing processes. And IT knows how to support an organization’s technology needs and plays a critical role in how data is made available to the enterprise. Formalizing this partnership between business and IT in the form of a Center of Excellence (COE) is one of the best ways to maximize the value of a self-service analytics investment.

What are the key questions that Center of Excellence will answer?

  1. Who is your governing body?
  2. How to draw a line between business and IT?
  3. What are the checks and balances for self-service releases?
  4. How to manage server performance?
  5. How to avoid multiple versions of KPIs?
  6. How to handle data security?
  7. How to provide trustworthy data & contents to end consumers?

The ultimate goal of the center of excellence is to have governed self-service in enterprise. The governance can be classified as six areas with total 30 processes:

screenshot_16

Governing body

  • Governing structure
  • Multi tenant strategy
  • Roles & responsibilities
  • Direction alignment
  • Vendor management

Community

  • Intranet Space
  • Training strategy
  • Tableau User CoE meeting
  • Tableau licensing model
  • Support process

Publishing

  • Engagement process
  • Publishing permissions
  • Publishing process
  • Dashboard permission

Performance

  • Workbook management
  • Data extracts
  • Performance alerts
  • Server checkups for tuning & performance

Data Governance

  • Data protection
  • Data privacy
  • Data access consistence
  • Role level security
  • Data sources and structure

Content Certificatio

  • Content governance cycle
  • Report catalog
  • Report category
  • Data certification
  • Report certification

Please read my next blogs for each of those areas..

 

Advanced Deployment (10/10) : Desktop & Prep Deployment in enterprise

I wanted to close this advanced deployment series with Desktop and Prep Builder enterprise deployment approach : how to let users to get Desktop & Prep installed and activated automatically with a single package.  My installer has the following features:

  • Install Desktop
  • Install Prep
  • Activate and register Desktop and Prep license
  • Get Desktop reporting setup
  • Customize the Desktop settings, for example Custom Discover Pane link, turn off Extension, add server URL
  • Hide license key

Why we do this?

  1. Benefits for users:
    • Simple
    • Don’t have to enter email, address, etc
    • Don’t have to figure out which version to download or to use
    • Don’t need to enter license key
  2. Benefits for Tableau team:
    • License asset protection
    • Easy to track who installed
    • Control the Desktop settings

How to do it?  There things:

desktop installer

  1. Build installer package with both Desktop and Prep

I work in Mac env so the installer I built is for Mac only but the process works for Windows as well.

I use a free software Packages V1.2.8 (588). To make it easy for you, here is actual working package used:

TableauV2020.1_share.pkgproj

  • Pls download the above file
  • Remove the .txt from filename (I have to add .txt since the web site don’t allow me to upload .pkgproj file)
  • Open it with Packages V1.2.8 or newer version, you will see exactly how it works.
  • As you can see that the Payload has both Tableau Desktop.app and Tableau Prep.app. So the installer will get it installed one by one automatically.  You download Tableau’s .dmg, install once to get the.app.
  • If you build the package for V2020.1 and you have problem to notarize your package, it is due to Tableau’s original “/Applications/Tableau Prep Builder 2020.1.app/Contents/NOTICES.txt” has its own code signature that is being removed once you copy.  You have to find a way to keep the hidden attributes to make it work.  After I gave this feedback to Tableau Dev, I heard that the issue would be fixed in Prep 2020.2 where NOTICES.txt is in Resources folder.
  • Important notes about FlexNet
    • Make sure to install it in the correct path
    • Get the postinstall.sh from Tableau Desktop or Prep’s original package and add it to postinstall.sh
    • Make sure FlexNet’s postinstall.sh runs first BEFORE license activation command to avoid activation error.
  • You should also have additional postinstall scripts that will be discussed next steps. You can also have preinstall.sh as your choices.

Desktop_Payload

2. Automate license activation and registration

The most tricky part is the license key distribution and user data collection process, that varies depends on your env. Some tips:

  • If you have one enterprise master key for all of your Desktop/Prep license, make sure to greg out  Help > Manage Product Keys so Desktop users will not see the key. This is actually done by Tableau license team.  They have an option to hide the key when the key is cut. Please work with your account manager for this.
  • The master key will be kept in your company hidden website that the installer can get but regular users can’t see it
  • The postinstall script will download the key and then use “Tableau Desktop.app /Contents/MacOS/Tableau -activate $license_key” to activate it.
  • You can still achieve the automated activation even you have individual keys kept somewhere for installer to grep.
  • 2020.1’s server user to activate Desktop feature is another option to handle license activation.
  • Before you can use “Tableau Desktop.app /Contents/MacOS/Tableau -register ” , you have to find way to get your user data from the Mac or Windows PC:
    • `defaults write $path Data.first_name $registration_info_given_name`;
    • `defaults write $path Data.last_name $registration_info_last_name`;
    • `defaults write $path_root Data.email $registration_info_email`;

3. Post Install configurations

This is where you can enforce Desktop configurations automatically:

  • Set Reporting server URL so all your Desktop usage can be sent
  • Turn off Extension if you want.
  • If you have your own Extension gallery and you want Desktop users to get your own Extension gallery vs Tableau’s galley whenever use drag the Extension to a dashboard, you can add the following line to the /etc/hosts
    • “your_own_extension_galley_ip_address\textensiongallery.tableau.com\n”;
  • I also rename Tableau Desktop xxxx.x.app to Tableau Desktop.app after installation so users do not have to think which app to use
  • I rename Tableau Prep Builder xxxx.x.app to Tableau PrepBuilder.app after installation as well
  • I also launch one website for new Desktop users automatically after installation.
  • Of course, you can change Desktop Discover links to your own internal Tableau related resources. I used to hack this with help from Tamas Foldi. Finally Tableau released this feature in 2020.1.
  • I also have to include a few other plist updates here
    • `defaults write $path AutoUpdate.Server Your_server_URL_without_https://`;
    • `defaults write $path AutoUpdate.AutoUpdateAllowed 1`;
    • `defaults write $path AutoUpdate.AutoUpdateAllowed False`;
    • `defaults write $path Settings.WorkgroupServer https://`; #this will modify default URL from http:// to https:// when Desktop is used the very first time#
    • `defaults write $path WorkgroupEffectiveServers https://`;
    • `defaults write $path Settings.WorkgroupServerKerberosCapable 0`;
    • `defaults write $path DiscoverPane.DiscoverPaneURL https://xxx/DesktopLink.html`;
    • `defaults write $path Settings.Extensions.DisableNetworkExtensions 1`;
    • `defaults write $path Settings.Extensions.DisableNetworkExtensions 1`;
    • `defaults write $path Telemetry.TelemetryEnabled 1`;
  • You can use your creativity and do a lot of things here, like sending them an email, etc

Re-Cap: Auto install Desktop & Prep is awesome way for enterprise deployment. It saves me a lot of time and users love it!

 

 

 

Advanced Deployment : Super Useful Sever Memory Management Features

This blog dives into a few powerful and super useful server memory management features. Most are undocumented. Please do check with your Tableau Technical Account Manager before you use them. In my opinion, they are must-have  controls for last enterprise Tableau server.

Problem Statement:  If your Tableau has memory governance or control issue or sometimes most user render slow (spinning) due to memory, read on…

TSM Commands:

  1. tsm configuration set -k native_api.memory_limit_enabled -v true -frc

What it does is to enable the special memory limit feature.  You need -force-keys option (-frc) .

Now you can set specific memory limit for the Tableau server processes.

2. tsm configuration set -k native_api.memory_limit_per_process_gb -v 50 -frc

What it does is to set each process memory limit as 50G. Let’s say that you have 4 VizQL server on one node, what this command does is to make sure each of the 4 VizQL process can’t use more than 50G memory – What a great safe net enterprise Tableau server must have.

 What happens if the memory over 50G?

  • When memory reaches 80% of the limit (50G x 80% = 40G), the Server Resource Manager will have reclamation process.
  • When memory reaches 100% of the limit, the process will get re-started by itself so it will not impact the whole server.

memory limit

 

 

 

 

3. Hyper on  a Separate Server (HoSS) special setting needed

HossStarting V2018.3, Tableau officially supports new architecture configuration – a node has nothing but File Store (Hyper) and Data Engine (Data Engine has to be on every node). The
intent is to give as much server resources as possible to Hyper. You can have multiple HoSS nodes in one cluster.

How to set memory list of the Hyper nodes?

The above  tsm configuration set -k native_api.memory_limit_per_process_gb -v 50  will also limit ‘hyperd’ process (‘hyperd’ is Hyper process name) memory  to be 50G as well, which is good and bad. Unlike VizQL node where multiple VizQL processes can be configured on the same node), only ONE ‘hyperd’ process can be configured on one HoSS node. Can you Have higher memory limit for HoSS node while still keep lower memory for VizQL nodes? The answer is YES.

4. tsm configuration set -k hyper.srm_memory_limit_per_process_gb -v 180 -frc 

What it does is to limit hyperd memory total for the node to 180G (this tsm may not be available on V2018.x) and still keep max 50G memory for each VizQL process. This setting is a life-saving feature for me!!!!

The 180G is one example only, you may want to set it to 300G, 400G or more demands on your hardware and workload. My rule of thumb is to set it to 70% of your hardware total memory. For example, if server has 256G, set it as 180G.

 How HoSS memory works?

  • When memory reaches 80% of the limit (180G x 80% = 144G), the Server Resource Manager will have reclamation process.
  • When memory reaches 100% of the limit, the process will get re-started by itself – that is why you need 2 or more HoSS nodes. When one HoSS hyperd restarted, the other HoSS can still server user queries.

Now you know how to control hyperd total memory.  You should be asking how to control each query’s memory limit. Is there such setting? Good news is Yes from V2019.3 (although I did not see this feature documented till V2020.1 beta)

5. tsm configuration set -k hyper.session_memory_limit -v 5g -frc

What it does is to limit each hyper query’s memory 5G  – that is to make sure one large query will not bring the whole hyperd down (it happens and it does happen). This is must-have feature for large Tableau deployment.

It controls the maximum memory consumption that an individual query can have. Specify the number of bytes. Append the letter ‘k’ to the value to indicate kilobytes, ‘m’ to indicate megabytes, ‘g’ to indicate gigabytes, or ‘t’ to indicate terabytes. For example, hyper.session_memory_limit='900m'. Alternatively, specify the session memory limit as a percentage of the overall available system memory.

This feature has been available since V2019.3 but Tableau did not document it. Likely Tableau’s official documentation will start to list this feature since V2020.1.

The following error msg will be what shows on user’s browser when the individual query reached the limit.

session_memory_error1

This feature will work when Desktop user connects to published data source.session_memory_error2 The  following will be the error msg for Desktop user connects to published data source on server when individual query limit reaches:

 

Conclusions: 

Enterprise server admins always need to have more controls on server resources in order to scale Tableau.   Those commands are must-have for large server deployments:

  • tsm configuration set -k native_api.memory_limit_enabled -v true -frc
  • tsm configuration set -k native_api.memory_limit_per_process_gb -v 50 -frc
  • tsm configuration set -k hyper.srm_memory_limit_per_process_gb -v 180 -frc
  • tsm configuration set -k session_memory_limit -v 5g -frc

Since those are not documented, please do talk with your Technical Account Manager and test it out before use. I have been using them.

Automation – Set Usage Based Extract Schedule

Are you facing the situation that your Tableau server backgrounder jobs have longer delay?

You always have limited backgrounders. How to cut average extract delay without adding extract backgrounders ?

I got a lot positive feedback about my blog  SCALING TABLEAU (2/10) – SET EXTRACT PRIORITY BASED ON DURATION  that sets higher extract priority for smaller extract. Grant Eaton recommended to use 180 hrs (7.5 days) vs default 36 hrs to catch weekly jobs when turn on  ‘run faster extract refresh jobs first’ feature. The commands are as followings:

(2018.2 or newer)
tsm configuration set -k backgrounder.sort_jobs_by_run_time_history_observable_hours  -v 180

(10.0 – 2018.1)
tabadmin set backgrounder.sort_jobs_by_run_time_history_observable_hours 180

This blog talks about one more big technique to further improve extract efficiency :  Set Usage Based Extract Schedule

Challenge:  There are many unnecessary extract refresh due to the fact that all schedules are available to every publisher who have complete freedom to choose whatever schedules they wanted.

For example, a workbook is not used for one week but still refresh daily or hourly… Maybe initially usage is high but overtime usage went down but publisher never bothers to reduce refresh frequency…. They have no incentive to do so at all.

 Solution: Set Usage Based Extract Schedule – automatically reschedule the extract frequency based on usage (updated on 12/17/2019 with workbooks attached)

For example:

  • Hourly refresh changes to daily if workbook not used for  2 days
  • Daily changes to weekly if workbook not used for 2 weeks
  •  Weekly changes to monthly if workbook not used for 2 months

A few implementation notes:

  1. Make sure to get agreement with business leaders before implementation
  2. Send automatic email to impacted workbook/data source owner when schedule changed
  3. How to identify unnecessary extract?  Feel free to download the attached workbooks at the end of this blogtoo much workbook refresh. Here is how it works:
    A. Find out the last_used
    select views_workbook_id, ((now())::date – max(last_view_time)::date) as last_used from _views_stats
    group by views_workbook_id
    B. Find out refresh schedule by joining tasks table with schedules table
    C. Do the calculation and comparison.  For example
    extract change
  4. How to change schedule frequency?  There is API for Get Extract Refresh Tasks, and Add Workbook to ScheduleAdd Data Source to Schedule but I have not seen API to remove workbook or data source from schedule, no API to change schedules. Similar as SET EXTRACT PRIORITY BASED ON DURATION where had to use a program to update tasks.priority directly (tasks is the table name and priority is column name), this schedule change can be done to update tasks.schedule id (tasks is the table name and schedule id is column name):
    UPDATE tasks
    SET schedule id = xxx
    WHERE condition;
  5. How to figure out which schedule id to change to? Let’s say you have 10 daily schedules, when you change from hourly to daily, the best way is to randomly choose one of the 10 daily schedules to avoid the situation that overtime too many jobs are on one specific schedule.
  6. What if publisher changes back from daily to hourly? They do have freedom to change their extract schedules at any time. However they will not beat your automatic scripts over time.
  7. How much improvements can you expect with this automation? Depends on your situation. I have seen 50%+ delay reductions.
  8. Is this supposed by Tableau? NO NO. You are on your own risk but the risk is low for me, return is high.

Here is the workbook to show which workbook or datasource refresh should be re-scheduled:

Advanced Deployment : Make VIP extract priority stick

I have a very popular blog about extract priority already : SCALING TABLEAU (2/10) – SET EXTRACT PRIORITY BASED ON DURATION.  It  increased  70-80% efficiency on my server. Today’s blog is for different use case –  VIP extracts.

Do you have VIP extract priority ask? How to handle it if you have business requirement to keep a few mission critical extracts higher priority?

There are at least options to make it happen:

  • Option 1: Create high priority schedule – the problem with this approach is that you still can’t hide this schedule so other publishers can still choose to use for their regular extracts – not easy for admin to control. Some people name the schedule as ‘ask admin before use’…..
  • Option 2: As as admin, you can change the identified extract priority (no matter embedded in workbook or published data source) to a small number (like 5) as VIP high priority – the problem with this approach is that every time the workbook or published data source has a new revision, the priority changes back to 50 (default) automatically.

We have been using Option 2 for long time but requesters complain from time to time since it is a lot of work from both sides to communicate the new revision and manually change extract priority.

Recently one idea came to live and I am extremely happy about it. Here is what we are able to achieve now : Making the identified VIP extract priority stick – means that even the owner changed the workbook or data source, the priority will still keep the VIP high priority so admin does not have to manually change every time. WoW!

How to do it? The answer is again a Python program:

  • Create a simple CSV file with site_name, extract_type (value is ‘workbook’ or ‘data source’ – this will make next step much easier), extract_name, VIP_priority (for example, 5, or 10, etc)
  • Create a simple Python to read the CSV line by line, match the record with PostGre database to find the exact extract task.
  • Change the extract priority to the priority in CSV by using the following command:

UPDATE tasks
SET tasks.priority = xxx
WHERE condition;

  • Schedule the Python (maybe a few times a day)

What happens is that when new revision was created by owners, the  VIP extract priority is actually changed back to 50 – this is something Tableau controls automatically (we do not want to customize Tableau feature at all so upgrade made simple). However a few hours later, the priority will be reset back to what it is supposed to be based on the CSV file that is the master control file.

I agree that it is not a perfect solution but I am pretty happy with this solution – it is a lot of better than manual updating each time after publishers have a new revision of their VIP extracts – it saves tons of communication.

Love to hear your alternative approach.

 

 

Automation – Increase Extract Efficiency

Recently I was talking with a Tableau Center of Excellent leader of well-known hi-tech company. I was told that the reason why his company could not go bigger with Tableau is the extract challenge or too many extract delays that they do not know how to handle……

I shared with him how did I resolve it. I’d love to share with everyone. Here is a few things that we did to solve the famous extract challenge:

  1. Reduced subscription timeout from default 30 mins to 5 mins (or 3 mins depends on your situation) . Pls read  Automation – Timeout long subscriptions and auto send email to workbook owner.
  2. Turn on ‘sorting extract jobs based on last run time’ (updated on Oct 18, 2019: Thank you Yuri Fal who pointed out that this setting is automatically enabled since V2019.3 so there is no need to run this user config anymore per kb).
    • tsm configuration set -k backgrounder.sort_jobs_by_run_time_history_observable_hours  -d  168
    • What it does? Tableau Server can sort full extract refresh jobs so they are executed based on the duration of their “last run,” executing the fastest full extract refresh jobs first.
    • The “last run” duration of a particular job is determined from a random sample of a single instance of the full extract refresh job in last <n> hours. Full extract jobs are then prioritized to run in order from shortest to longest based on their “last” run duration.
    • Default is off.  Tableau documentation recommends 36 hrs but 168 hrs to cover the weekly jobs as well is a good approach
  3. Automatically re-schedule extract jobs based on usage
    • Tableau gives complete flexibility to publishers who can choose any schedules. It ends up with a lot of unnecessary extracts jobs that admin doesn’t know how to handle.
    • Tableau does not give server admin much control at all for this. My approach is to create an add-on feature by writing scripts to re-schedule jobs automatically based on duration, pls read my blog Automation  – Set Usage Based Extract Schedule
    • This is a game changer to scale extracts on Tableau server.  Hourly schedule can become daily if the workbook not used for 2 days. Daily becomes weekly, weekly becomes monthly.
    • It is not an officially supported approach – you are on your own risk to do this.
  4. Change extract job’s priority based on avg execution duration
    • This is an add-on feature that you will have to build. Try this only if the above step 1-3 does not give you the extract waiting time you are looking for.
    • The intent is to change extract priority to higher priority (like 30 ) for any  extracts with duration below median average.    You can start to  change the extract priorities manually to see how it goes. Just be aware that any re-publishing of extracts will change priority back to default 50.
    • Unfortunately  I have not seen API for it. You will need to update PostGre table directly. So it is NOT officially a supported approach.
    • All you have to do is create a program to update tasks.priority directly (where tasks is the table name and priority is column name)
    • Pls read Next PostScaling Tableau (2/10) – Set Extract Priority Based on Duration

Conclusion : Extract delay is a  common challenge for last Tableau deployment. Part of the reason is that publishers can choose any schedules w/o any restrictions- A lot of publishers choose hourly schedules while daily is good enough, or daily schedules while weekly is good enough. Also they schedule it and forgot it – when the workbook is not used anymore, the extract refresh for the workbook is still running…..  This blog summarize two less known but out of box Tableau server configurations that can be used and two additional script work that can get things done better with some efforts.

Advanced Deployment – Turn off Ask Data for large data source

Ask Data’s entry point is the published data sources. By default, Ask Data is ON  and Ask Data defaultData source analysis (Ask Data indexing) is  triggered by user request for ALL published data sources.

What it means is that even the user did not intend to use Ask Data when come to the published data source (like to see refresh schedules, or check when it was refreshed last time, etc), it will still trigger Tableau Ask Data Index right away that is unnecessary server actions.

The above behavior may not be a big deal for small published data sources but can make difference for large published data sources, specially if your server has hundreds or thousands of published data sources.

How to avoid unnecessary Ask Data indexing for large published data sources?

The best option is to turn Ask Data off by default for those very large published data sources.  How?

UPDATE datasources
SET nlp_setting = ‘disabled’
WHERE size > ‘400000’

What is does is to turn off Ask Data for those published data sources   with size more than 400Mgb (or whatever size you decide).

Why this approach?

  1. Data source owner can always turn Ask Data on if they do want to use Ask Data. This is done by self-service
  2. You are not turning off Ask Data for whole server which is doable but need tsm restart to change it back.
  3. For those who may not know, the following command will turn whole server’s Ask Data off
    tsm configuration set -k features.NLBox -v false
    tsm pending-changes apply
  4. The following commend will turn whole server’s Ask Data on
    tsm configuration set -k features.NLBox -v true
    tsm pending-changes apply
  5. I am not aware commend to turn on/off for a site

Re-Cap: If you are concerned Ask Data always consumes server resources when mouse over to any published data sources even the user doesn’t intend to use Ask Data. Server Admin can turn Ask Data off for large published data source by default with one simple. Pls note that it is not supported by Tableau.

 

Advanced Deployment – Notification to all server users

As a server admin, do you have to broadcast notifications about server upgrade and other unexpected server events that would impact every server user?

It may not a big deal when you have only hundreds of server users. However it can become a big burden when your server has 20K, 100K or more users.  Even you have a good distribution list, overtime you are creating negative impressions to your user community since normally you only send negative news to them.

How to avoid broadcasting emails to large numbers of users while still communicate the message?

The answer is Tableau server portal customization with a banner like this one below:

notificaqtion

 

 

 

 

 

 

Warning: It is not a supported solution and you are on your own risk.

  • You can schedule the banner show up window.
  • You can also control the type of banner (like Informational or Alert, etc).
  • Users can click DISMISS button if they do not want to see the banner again (after clear the browser cache, the banner will still show up again during the window even DISMISS button clicked before).
  • Users can {Click Here} to see more details as you do not want the banner to have more than one line
  • You can design different color banner as well for different types of notification

How to use banner?

  • We turn banner on a few days before  server upgrade to let  users know that they will access Disaster Recover instance
  • Banner is also in during server upgrade to warn users that they are access Disaster Recover instance  that likely has different data than Production.

Re-cap: Tableau portal banner is a great way to communicate with users for server planned and unplanned event notifications. Pls vote https://community.tableau.com/ideas/3738

Advanced Deployment – Reduced upgrade time from 50 to 5 hrs

Tableau server backup can take very long time. It takes 20+ hrs for my server even after implemented Filestore on initial node and very aggressive content archiving.  Of course, part of the reason is that Tableau does not have incremental backup.

There are two problems with long backup : One is that you Disaster Recover (DR) data is far from Prod data. Two is that upgrade process takes long time, likely whole weekend.

This blog talks about how to reduce upgrade time from about 50 hrs to 5 hrs. 

  • Large Tableau sever upgrade used to take 50 hrs pre-TSM
  • Same Tableau server upgrade takes about 30 hrs with TSM (V2018.2 or newer)
  • Same Tableau server upgrade can be done within 5 hrs with TSM
  1. It used to take 50 hrs for the upgrade

upgrade1

2. Thanks for TSM (v2018.2) that allows new server version installed while current version still running. The upgrade time with post-TSM cuts almost half (means upgrade from v2018.2 or above to newer version). Here is how it looks like:

upgrade6

3. When I looked at the above timeline, I asked myself : Is it possible to skip the cold backup process so upgrade can be done within 5 hrs? It is possible with two options:

  • Option 1: Assume that you will have a hot backup done, if cold backup is skipped and upgrade went south that you have to restore from the backup. What it mean is that you will miss about 20 hr server changes – anything after hot backup started are gone and you have no way to know what those are anymore: Are IT and business willing to take this risk? If yes, you are lucky and you can just skip the cold backup.
  • Option 2: Most likely many others can’t take the risk. For IT, upgrade can fail but IT has to be able to get back the data/workbook/permissions for business if upgrade failed. At minimum, IT has to know what the changes are. Here is what we did – I called it the BIG idea – that is to track all the changes for a period of about 24 hrs:Upgrade4

4. How to skip cold backup but track changes?

  • How it works is that you will have two hot backups. The hot backup 1 is restored to DR while hot backup 2 is saved but not restored (no time to restore before upgrade)
  • Skip cold backup and then complete the upgrade within 5 hrs.
  • If upgrade failed in such way that restore has to be done to get server back. You can restore from hot backup 2 that misses about 20 hrs data (from point 1 to point 2). Then you will need to let impacted publishers know those changes, so they can manually re-do the missing data provided by Server team:
        • Workbooks
        • Data Sources
        • Projects
        • Tasks
        • Subscriptions
        • Data-driven alerts
        • Permissions

5. The net result is that server upgrade is done within 5 hrs. Wow! That is huge!  If things go south, IT server team has all changes tracked – that is more like incremental backup. The difference is that most likely business publishers should re-do those changes.

Upgrade5

6. How to track the changes for the following objects?

  • Workbooks
  • Data Sources
  • Projects
  • Tasks
  • Subscriptions
  • Data-driven alerts
  • Permissions

Just query your Postgre database directly, you can easily get all of them from one point to another point since all those objects have timestamp except Permissions that is very tricky.

Like it or not, Tableau’s permission tables do not have timestamp! I personally gave feedback to Tableau Dev team already but it is what it is.

You can find Tableau permission workbook @ https://community.tableau.com/message/940284 and one option is that you run it twice and diff it.

Re-cap: Both business and IT are extremely happy with 5 hr upgrade process while it used to be 50 hrs or at least 30 hrs. 

 

 

 

 

 

 

 

 

 

 

 

Advanced Deployment – Run FileStore on Network Storage

What? You can run FileStore on network storage? Yes, it is doable with fair good performance. And the benefit is to have DR data only 2 hrs behind prod while it used to be 50 hrs behind for large & extract heavy deployment.

Before you continue, the intent of this blog is NOT to teach you how to configure your Tableau server to run on network storage. Because it is not supported by Tableau yet. Instead the intent of this blog is share with you the possibility of awesome new feature coming in the future…

The Problem Statement: When your server FileStore gets close to 1TB (it happens in large enterprise server with extract heavy deployment even you are doing aggressive archiving), the backup or restore can take 20 hrs each. It means that DR data is at least 50 hrs behind considering file transfer time.

  • The server upgrade can take the whole weekend
  • The server users will see 2 day old data whenever user traffic is routed to DR (like weekly maintenance)

The Solution: Config FileStore on network storage so that all extract files can be snapshot to DR in much fast speed leveraging network storage’s built-in snapshot technology.

Impact: The DR data can be about 2 hrs behind prod vs 50 hrs.

screenshot_564

 

 

 

 

 

 

 

How it works?

  • After it is configured, the server works the same way as file store on local disk. No user should notice any difference as long as you use Tier 0 (most expensive) SSD network storage (NetApp or EMC for example)
  • Server admin should see no difference as well when use TSM or Tableau server admin views
  • Does it work for Windows or Linux? I am running it on Linux after working with Tableau Dev for months. Tableau Dev may have a alpha config for Windows but  I don’t know
  • Can we run repository on network storage as well? That was what we had initially but it also means single repository for the whole cluster that posts additional risk.  I am running repository on local and have two repositories.
  • Does it means that you can’t have 2nd Filestore in the cluster? You are right – single filestore only on network storage. Is it risky? It has risk but it is common enterprise practice for many other large apps.

New process to backup and restore:

screenshot_567 

  • Regular tsm maintenance backup handles both repository and filestore extract nicely together. Now we do not want to backup filestore anymore, so use the tsm maintenance backup --file <backup_file> --pg-only
  • Unfortunately when you use pg-only backup,  the Restore will fail since repository and Filestore are not in the ‘stable’ status.
  • What happens is that both repository and filestore syncs internally within Tableau server constantly . For example, when new extract is added to filestore, the handle of the extract has to be added in repository. When extract is deleted from filestore (old extract, or user workbook deletion, etc), the handle has to be deleted from repository otherwise Postgre will fail to start during integrity checks after Restore.
  • One critical step is to stop the sync job between repository and filestore before the backup happen to ensure both repository and filestore are in the ‘stable’ status that can be separately sent to DR
  • Of course, after backup is done, restart the  repository and filestore sync jobs to catch up sync.

What it means with Tableau’s new 2019.3 Amazon RDS External Repository?

When Tableaus’ repository can run on Amazon RDS external database, it potentially means future deduction of repository’s availability on DR. Hopefully the 2hrs backup/send/restore repository can be reduced to minutes. I have not tried this config yet.

Re-cap:  You can run FileStore on network storage for much better DR potentially from 2 days to 2 hrs.

 

Advanced Deployment – Content Migration

One big Tableau’s success factor is its self-service – Business dashboard creators can publish their workbooks in self-serve manner without IT’s involvement. Although IT may be involved for some data readiness, data ETL and data preparations. By large, most  Tableau implementations empower business users to create and release their dashboards directly.

When you drive more and more Tableau adoptions, you soon will realize that you need some good governance as well to make sure single source of truth,  data access policy consistence (among multiple self-service publishers), workbook style consistence, and avoid duplications of content, etc.

How to control or govern the publishing? There are multiple ways to go, depends on the nature of dashboards (data sensitivity, audience, purpose) and how much control you wanted to have:

  1. Stage vs Official project : Only approved publishers can publish to Team Official project while a lot more people can publish to Team Stage project.
  2. Landing page area within Tableau: The landing page is actually another workbook that your end users go to.  The workbook is more like ‘the table of contents’, it uses URL actions to go to each separate workbook, and only dashboards listed in this landing page workbook are official ones.
  3. Portal embedding Tableau views outside Tableau: Most audiences will not go to Tableau server directly for dashboard but they go to portal that has all ‘approved’ dashboard embedded. The governance/control process happens at the portal since only approved content will be available to end users via embedded  portal access.
  4. Test vs Prod server: You don’t allow publishers to publish to Prod server directly. Instead you find a way for them to publish to a Test server, then with proper approvals, the dashboards will be ‘pushed’ to Prod server.

The control level and difficulty level are as followings : Test vs Prod server > Portal embedding > Landing page > Stage 

There are many blogs about Staging, Landing page and portal embedding, this blog focuses on the Test vs Prod.

How to automate Test vs Prod server migration? It is common to have test env but it is also common that a publisher has publish access directly to both Test and Prod so self-service publishing can be done. However for some special use cases (like external dashboards, etc) where you absolutely do not want anyone to publish directly to the Prod server w/o approval at each workbook level and you want to automate the process, here is how to:

  1. The best way is to use Tableau’s new Deployment Tool that is part of 2019.3 beta now but available for Windows now.  Deployment Tool enables the governance of Tableau server workbook and data source migrations. Admins and technology groups can finally automate and scale the distribution of workbooks/data sources between development, staging and production environments. It needs additional Server Management license which is new add-on feature.
  2. Custom scripts using API for workbook and data source migrations. The high level approach is to download the workbook and data source from source server (using API) and then publish them to target server (using API). Sounds easy but the most difficult part  is to handle the embedded password. It has a few scenarios :   embedded password for live connections, for embedded data sources and for separately published data sources.  Good news is that it is all doable. Bad news is that it is very very difficult and it is more like ‘password hack’. I will not recommend this approach unless you work with Tableau’s Professional Service team. My organization had this done by working with Tableau’s Professional Service team and it works great for us.

I am still testing Tableau’s new Deployment Tool to understand what it offers.  My quick sense is that Deployment Tool should work with most of the organizations for content migration purpose. However I am not sure about it scalability – for very large enterprise customers that you have a lot workbooks/data sources to migrate from one server to another continuously, customer scripts with multi-threads will  give you better scalability.

Advanced Deployment – Create Internal Tableau Extension Gallery

This blog series are advanced deployment techniques that will give you some ideas what actually can be done with Tableau server and Desktop. I’d like to share how we did the followings in our Tableau implementation:

  • Internal Tableau Extension Gallery
  • Controlled publishing
  • Run FileStore on Network Storage
  • ‘Incremental’ backup
  • Customize Tableau Portal with alert feature
  • Custom Tableau server banner color
  • Customize Desktop Discover area links
  • Desktop single master key and key protection
  • Auto deploy Desktop and Prep Builder with license  activation
  • Ensure executive subscription always get latest data
  • VizQL server timeout and subscription timeout
  • Customize cache config
  • Measure server KPI

Internal Tableau Extension Gallery

Tableau Extension, part of 2018.2 feature, is a great innovation. It give you the ability to interact with data from third-party applications directly in Tableau. Capabilities like write-back to a database, custom actions, and deep integration with other apps are all at your fingertips.

Problem Statement: Dashboard extensions have data vulnerability when any extension used from https://extensiongallery.tableau.com/. No matter from Desktop or Server since it has to send your data (summary or details) outside your firewall.

screenshot_459

  • Extension can access workbook’s summary data by default and full data with additional confirmations.
  • Plus, they access the user’s IP address, Tableau Desktop or Safari versions, screen resolution, and device type.

While Tableau is still working on the ‘safe’ extension design, what can you do if you do not want to wait?

Solution: Build your own Tableau extension gallery and customize Desktop’s Extension Gallery link to automatically point to your own extension gallery.

screenshot_460

 

 

 

 

Why you have to customize Desktop Extension link? If you don’t, most of the Desktop users will still go to https://extensiongallery.tableau.com/ as default.  Once we figure out auto-redirect, the communication and training work becomes much easy now.

  1. How to build your internal Tableau Extension Gallery: Extension Gallery is pretty much a web server. In additional to regular Web server components, you do need to figure out user and object relationship to make sure John can only updates his extension and related libraries. You may want to design additional feature like private extension  (only available to yourself) vs public extension that can be shared to anyone in your organization.

screenshot_461

 

 

 

 

 

2. How to auto re-direct Tableau Extension Gallery to your internal Extension Gallery:  Unfortunately it is not easy to hack the URL from Desktop package. I found that the easiest way is DNS re-direct.

  • Work with your company’s network team to see if they are willing to re-direct all traffic from https://extensiongallery.tableau.com/ to your own internal extension gallery.
  • If the above option not doable, you will have to set each Tableau Desktop user’s PC or Mac’s hosts to auto re-direct the traffic. This approach works if you control the Desktop deployment. For Mac, add one line IP Address extensiongallery.tableau.com to  /etc/hosts file. Windows is similar hosts file change. The IP Address is your internal extension gallery’s IP address.

3. How to setup Tableau server for Extensions:  The followings are needed to make data security while Extension enabled:

  • Check ‘Enable Users to run extension on the site’ for each site
  • Uncheck ‘Enable unknown extension to run….’ for each site
  • Only add internal gallery extension to the server safe list – this is easy to control since only server admin can change the list and site admin can’t change it.screenshot_462

 

 

 

 

 

Re-cap: Tableau Extension is a great innovation. However if your org is concerned with data being sent out of your firewall while extension used, you can build your own internal extension gallery and direct DNS from  https://extensiongallery.tableau.com/ to your extension gallery  for all Desktop users. Then of course, you need to make sure Extension config on each site correctly not to use any external extensions.

 

Automation – How to make some workbook cache shorter than others

You wish that live connection workbook’s cache can be much shorter than workbook with extracts. You also wish that cache level can be controlled at each data source or workbook. Unfortunately Tableau server does not have those features yet. One Tableau server has only one cache policy – all workbooks have the same length of cache…..

This blog shows you that you can have the following server add-on features:

  1. Workbooks with ‘no-cache’ tags are not cached (technically a few minute cache only)
  2. Workbooks with live connection are cached very short, like hourly
  3. Workbooks with extracts have much longer caches, like 20 hrs

How to achieve different cache setting for workbook specific,  live connections and extracts?

The answer is to find a way to dump live connection workbook cache much more often than overall server cache holds.

How? You may have noticed that whenever you published a newer version of workbooks, Tableau server will not hold the old version workbook’s cache anymore, which makes perfect sense!

What it also means is that Tableau server cache has to do with workbook’s timestamp! This is a very useful insight that was confirmed with Tableau Professional Service team. When workbook’s timestamp is changed, its cache is gone automatically on Tableau server. The idea here is simply update workbook’s timestamp by using additional scripts outside Tableau server so the cache will be forced out automatically per specific conditions that we can define!!!

There is no API to update the workbook timestamp but you can update the Postgre DB directly. It is only the ‘last published at’ column of workbooks table. The sudo code is as following:

UPDATE workbooks
SET ‘Last Published At’ = (select TO_TIMESTAMP (‘Last Published At’,’DD/MM/YYYY’) + NUMTODSINTERVAL(1/1000,’SECOND’)) from workbooks)
WHERE condition;

  • What it does is to add one millisecond to the workbook’s timestamp
  • The condition is what you define. For example, for workbooks that have specific tags. Or for workbooks with true live connections to data sources (note: it is not easy to identify those true live connection workbooks, I will talk about this in separate blog)

Before I implemented above cache mechanism, I struggled to set cache policy to meet both live connection use case and extract use case’s needs. I had to set cache at about 1 hr for whole server which is Ok for live connection but not effective enough for workbook with extracts (slow perf).  After we figured out new cache approach, now the server has 24 hour cache policy that improved view render time 30%:

  1. Workbooks with ‘no-cache’ tags are cached only 5 minutes
  2. Workbooks with live connection are cached 1 hour
  3. Workbooks with extracts have 24 hrs cache. Of course, turned on  ‘Pre-compute workbooks viewed recently’ flag for each site to leverage cache warm-up feature after extract refresh.

Read addition notes @ 

Automation – Data Source Archiving

If you follow my previous blog Automation – Advanced Archiving  to archive workbooks,  overtime you may also need to archive data sources.

Why delete data sources?

If the workbook has embedded data sources, the embedded data will be deleted when the workbook is deleted. However if the workbook has separate published data sources, when the workbook is deleted, the published data source is not deleted.

First of all, when the workbook is deleted, you do not want to delete the published data source right away, why?

  • The published data source could be used by other workbooks
  • The published data source can still be used for new workbook later on

On the other side, it is possible that your server may have a lot orphan  published data sources not connected to any workbooks – those are the candidate for additional deletion, which is why this blog about.

How to delete data sources?

Good new is that there is Delete Data Source API : DELETE /api/api-version/sites/site-id/datasources/datasource-id

api-version The version of the API to use, such as 3.4. For more information, see REST API Versions.
site-id The ID of the site that contains the data source.
datasource-id The ID of the data source to delete.

How to decide what data sources to delete?

That is hard part.  The high level selection criteria should be as followings:

  1. Not connected to any workbooks  : Ref https://community.tableau.com/thread/230904
  2. Created a few weeks ago : Do not delete newly published data sources
  3. No usage for a period of time (like 3 months): It is possible the data source is for Ask Data only or for others to access via Desktop. Join historical_events and historical_event_types and look for Access Type = Access Data Source with specific hist data source idaccess_DB

 

 

 

 

 

 

 

 

Another way to identify those data source not used for long time is to use  the following criteria:

select datasource_id, ((now())::date – max(last_view_time)::date) as last_used
from _datasources_stats
where last_used > 90
group by datasource_id

Download Tableau Data Source Archiving Recommendation.twb

Conclusions:  It is a good idea not only to delete old workbooks but also old data sources. This is specially important if the workbook is deleted but the published data sources still have scheduled refresh.

The idea is to delete orphan data sources published for a period of time but has no more usage at all.