Category Archives: Tableau

TABLEAU SERVER AND CLOUD SECURITY (8/10): Large GROUP

My previous post TABLEAU SERVER AND CLOUD SECURITY (7/10): ALL USERS GROUP talks about detection and deletion watch dog scripts that will remove any permissions using All Users group. The scripts runs hourly and will remove the permissions and then send email alert to content owners. It is an enforced Tableau server policy although it does have a few exceptions (like some server admin owned dashboards are excepted).

The scripts helped a lot permission mistakes/cleanup. However from time to time, server admin team still got emails from user community :

User feedback : Why I see this dashboard, I have nothing to do with it….

It is not about All Users group anymore but due to workbook permissions granted to a very large group….

Now the issue is more about user education and department access management policy. However I wanted to do something about it as server admin. What we came up with is Large Group Permission Management alert – Another watch dog program that will send auto email alert to content owners if they used group with members more than 1,000 users (or whatever users make sense for your org):

The logic used is very similar as All Users group detection. The key difference is that All Users group permission will be removed once detected, while this large group is a reminder only and will not remove large group from the permission at all.

Re-cap:

Tableau Server and Cloud Security (1/10): Overview

This serial is about the Tableau server and Tableau Cloud security. Tableau has a Platform Security white paper that covers Authentication, Authorization, Data Security and Network Security. It is a good documentation, however I find that it is hard to explain to non-tech audiences about those security components. Instead, I created the following security model and found that regular audiences got those very easily:

Let me double click for each of the areas, it will be something like those below:

  1. Infrastructure: covers network, SSO, InfoSec, server OS, etc.
  2. Tableau App Configuration: This is application level, in other words, Tableau server or site or Cloud level. Some of those things are configurable. The rest of blogs will talk about each of those areas with intent to maximize the security.
    • External user site/server
    • Site Segmentation
    • User Visibilities
    • User provisioning
    • Encryption
    • Extension
    • Explain Data
    • Sensitive Lineage Data
    • ConnectedApp
    • Mobile Security
    • Token
    • Guest Account
  3. Tableau Governance layer: A possible thin layer of governance processes or/and scripts to further enhance the Tableau server security. Those are more advanced work and need Tableau server Postgre readonly user. I am not sure how to apply those to Cloud yet.
    • User deletion
    • Project setup
    • ‘All User’ permissions
    • Delete inactive content
    • Re-subscription 
    • PII data deletion or flag
    • Sensitive data protection
  4. Publish and Permission: Those are content owner’s responsibilities. No matter how good Tableau server or Cloud is configured, content owns can still mess up the data & content security. Business self-service content owners have to follow departmental data access guidances and grant access permissions accordingly. Those are covered in my other blogs and I do not plan to explain more here :
    • Workbook Permission
    • Project Permission Locking
    • Row Level Security
    • Sensitive Data Tagging
    • PII

Check out next blog TABLEAU SERVER AND CLOUD SECURITY (2/10): EXTERNAL SITE

SCALING TABLEAU (5/10) – LICENSE MANAGEMENT

Tableau license management has been a big pain point to scale Tableau. This blog covers the followings:

  • Tableau license types
  • What is your End User License Agreement
  • How to get most out of your Tableau licenses
  • Desktop and Server license management – The Enterprise Approach
  1. Tableau license types

Tableau has following licenses:

  • Tableau Creator: It includes Tableau Prep (for data profiling, shaping, and filtering before visualization), Tableau Desktop (for creating beautiful viz). New customers will get subscription model (pay as you go) only while existing customers before subscription model was available can stay the old model as long as paying license renewals. It also covers on publisher user base on Tableau server if server is used.
  • Tableau Explorer: This is server side of license. It allows user to web edit existing workbooks, create/publish new workbook from existing published data source on server. Of course, it allow users to have full interactive with published content on server.
  • Tableau Viewer: This is server side of license as well. Viewer can’t web edit existing workbooks, can’t create/publish new workbook from existing published data source on server. It is for interactive with published content on server. Can’t create custom view, can’t download full data but can download summary data only.
  • Tableau Server user based : Small to medium scale sharing and collaboration purpose. One publisher or one interactor takes one seat. If you purchased 100 user based licenses, you can assign a total 100 named users on server – you can change them as long as total does not exceed 100 users at any given time. Tableau offers subscription for user based – all you to use and update server for a specific period of time.
  • Tableau server core based: Medium to large scale sharing and collaboration purpose. If you have 16 cores, you can have unlimited number of interactors or publishers as long as your server is installed on < 16 core machines. Tableau also offers subscription for user based – all you to use and update server for a specific period of time.
  • Tableau online: Similar to Tableau Server user-based but it is on Tableau’s cloud platform.
  • Enterprise License Agreement (ELA): You pay a fixed amount to Tableau for 3 or more years then you will enjoy ‘tableau buffet’ – get unlimited and all types of licenses.

2. What is your End User License Agreement

Nobody wants to read the End User License Agreement. Here is summary of what you should know:

  • Each Desktop license can be installed in two computers of the same user.  You may get a warning when you try to activate 3rd computer.
  • If a Desktop license key is used by Joe who left company or does not use it anymore, this key can be transferred to someone else. The correct process is to deactivate the key from Joe’s machine and reactive it on someone else machine.
  • If you have .edu email, you are lucky as you can get free Desktop as students or teachers.
  • If  you are part of small non-profit org, you can almost get free Desktop licenses.
  • Each server key can be installed in 3 instances: one prod and two non-prod.
  • What if you have to have 4 instances: prod, DR, test, dev? Let’s say you have two core-based keys: key A 8 cores and key B 8 cores. You can activate both keys in prod and DR w 16 each, then you can have key A 8 cores only for test and key B 8 core only for dev. You are good as long as one server key is used in 3 or less instances.
  • What if you do not want to pay maintenance fee anymore? Since it is perpetual licenses, you are still entitled to use the licenses even you do not want to pay maintenance fee. What you are not entitled anymore is upgrade and support.

3. How to get most out of your Tableau licenses

  • If the registration info (name, email, last installed, product version) in Tableau Customer Portal – Keys report is null, it means that this key is never used so you can re-assign it to someone else. You may be surprised how many keys are never used……
  • If the registration info (name, email, last installed, product version) in Tableau Customer Portal – Keys report is associated with someone who left company and this key has single registration, you can re-assign it to someone else.
  • If the registered product version is very old, likely the key owner is not active Desktop user.
  • Enable Desktop license reporting work when you upgrade to v10 to see who does not use Desktop for last a few months. Then potentially you can get license transferred (see below for more).

4. Desktop and Server license management – enterprise approach

When you have hundreds of Desktop licensees, you will need following approaches to scale:

  • Co-term all of your licenses for easy renewals.  Co-term means to have the same renewal date for all of your Desktop & Server: both what you have  and new purchases. This may take a few quarters to complete. Start to pick one renewal date, then agree with your Tableau sales rep, renewal rep,  purchasing department and users for the one renewal date.
  • The Tableau champion to have visibility on every team’s Tableau licenses in Customer Portal. Tableau’s land and expand sales approach creates multiple accounts in Customer Portal. Each team can only see their own keys & renewals. If you drive enterprise Tableau, ask for access for all accounts in Customer Portal.
  • Automate Desktop Installation, Activation and Registration process. No matter you are in Windows or Mac environment, you can automate Desktop installation, activation and registration via  Command lines. Read details. This feature became available for Prep as well since 2018.1.2 although Prep Mac installation is designed different from Desktop. Prep Mac silent installation will need to copy “/var/root/Library/Preferences/com.tableau.Registration.plist” to  “$homedir/Library/Preferences/com.tableau.Registration.plist” for registration to be success since Prep plist is installed at root user directory – this is true for both 2018.2 and 2018.1  although Tableau may change this behavior later on.
  • Transit to Single Master Key. Tableau Desktop supports single master key. Instead of having 500 individual Desktop keys, you can consolidate all into one single master key which can be activated by 500 users. The pre-request is co-term all individual keys. A few important notes:
    • When single master key is created, make sure to ask Tableau to turn on hidden key feature so Desktop users will not see the key anymore. You do not want the single master key to be leaked out. See screenshot on Desktop where ‘Manage Product Keys’ menu does not show up anymore:screenshot_1028
    •  What it also means is that you will have to use quiet installer so key can be activated w/o user’s interaction.
    • This hidden manage product key feature also became available for Tableau Prep from 2018.1.2 although Prep has separate key with Desktop.
    • If you have some users who have two computers at work and both have Tableau Desktop installs. Tableau may consider one user as two installs which will mess up your total license counts. Tableau license team can help you out.
  • Enable Desktop License Reporting in V10. This is an awesome feature to track Desktop usage even Desktop users do not publish. The challenge is how to change each user’s laptop. Here is what you need to know:
    • It work only if both Desktop and Server are on v10. It will be better on v10.0.2 or above as earlier v10 versions are buggy.
    • This feature is turned off on server by default, you can turn it on  using tabadmin
      tabadmin set features.DesktopReporting true
      tabadmin config
      tabadmin restart
    • The most difficult part is to update Windows Desktop’s registry or Mac Desktop’s plist to point to the Tableau server where you want license usage to be sent to. Best way is  to have Desktop v10 installer (ref the Automate Desktop Installation, Activation and Registration process).
    • You should have all company’s Desktop pointing to one Tableau server even Desktop users publish to different servers. This way you will have one place to see all enterprise Desktop usage.
    • By default, Tableau Desktop v10+ will ping Tableau server v10+ for usage reporting every 8 hrs. You can configure intervals on  Desktop. screenshot_1029 Windows example
      Mac plist example:
    • <?xml version="1.0" encoding="UTF-8"?>
      <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
        <plist version="1.0">
          <dict>
            <key>Server</key>
            <string>https://mytableau02:8010,http://mytableau</string> 
            <key>scheduleReportInterval</key>
            <string>3600</string>
          </dict>
      </plist>
    • The Desktop usage (every 8 hrs) is not sent to Tableau company but to your own Tableau server only. What is sent to Tableau company from Desktop is only the registration info. Of course, the registration info is also sent to your defined Tableau server(s).
    • What table has Desktop usage? The Postgres table name is desktop_reporting
    • What dates desktop_reporting has? It  has 4 columns for dates:
      • Maintenance expiration date
      • Expiration date (3 month after maintenance expiration date)
      • Registration date (when registered)
      • Last report date (when last time Desktop used).  Notice it captures only last time when Desktop is used. If you want to know how often Desktop is used in past 3 months, you can’t tell …..
    • How can tell historical Desktop usage? What you can do is to build incremental refresh for the desktop_reporting by last report date, then you will build out your own history for better Desktop license reporting. I am sure that Tableau is working on getting this small table to historical as well…..

As summary, Tableau Creator and server license management is not a simple task. Hopefully those tips and tricks of The Enterprise Approach will easy your pains. It is a good practice to build those out step by step when you are not too big or not too messy.

 

SCALING TABLEAU (8/10) – LEVERAGE V10 FEATURES FOR ENTERPRISE

I love Tableau’s path of innovations. Tableau v10 has some most wanted new capabilities to enterprise customers. I have mentioned some of those features in my previous blogs. This blog summarizes V10 enterprise features:

  1. Set Extract Priority Based on Extract Duration.  

This is a very powerful v10 feature for server admin although it is not mentioned enough in Tableau community yet.   What this feature does is for the full extracts in the same priority  to run in order from shortest to longest based on their “last” run duration.

The benefit is to that smaller extracts do not have to wait for long time for big ones to finish. Tableau server will execute the smaller ones first so overall waiting time will be reduced during peak hours.

What server admin have to do to leverage this feature?

  • By default, this feature is off. Server admin has to turn it on. It is not site specific. Once it is on, it applies for all sites. Simplify run the following tabadmin to turn it on:
  •  tabadmin set backgrounder.sort_jobs_by_run_time_history_observable_hours  36
  • Please ready my blog  and Tableau doc for details.

2. Revision History and Version Control

Tableau released one of the most wanted server features – version control and revision history in V9.3. Then this feature is  much more enhanced in V10 with previewing old workbook,  one click restoring, and maximum revisions setting:

  • The workbook previewing and restoring features are so convenience for publishers.
  • The maximum revision setting is so cool for server admin who can actually control the server space usage so you do not have to run out of storage while enabling revision history.

How to deploy those features on server?

  • Turn it on: By default, Revision History is off. It can be turned on site by site. To turn it on, go to site Setting, General and select  “Save a history of revisions“.  If you are on V10, you have two choices of Unlimited and # of revisions. Unlimited means that there is no limit on the max version history, which you probably do not want to have. As a server admin, you always want to make sure that your server will not run out of space. You will find # of revision is a very handy feature so admins can have some peace of mind about server storage.Screen Shot 2016-11-27 at 3.27.57 PM
  • Decide the max revision you want to have which is site specific – it means that you can set diff max revisions for diff sites.
  • How to decide the max revisions to keep? How to find out extra server space for revisions?  Pls read my blog 

3. Cross database Joins and Cross Database Filter

X-DB joins and X-data source filters are two  most requested features by user community. Those are two different but related things.

X-DB joins allows two or more separate data sources to join together in row level. There are still some constraints on which kinds of data sources can be joined in V10 while Tableau plans to extend more in coming releases: V10 only allows extract to be primary data source while joins w other database and does not allow two extracts to join together yet.

What X-DB joins means for server admin?

  • Knowing that server admin has no control for x-db joins. It is totally controlled by publishers. This feature is enabled out of box and server admin  can’t turn it off – hopefully you never need to.
  • Watch server performance. A lot of x-db join activities happen on Tableau server. I was little skeptical about this feature that server admin does not have any control or visibility.  On the other side,  I have not uncounted any issues either after my v10 server upgrade since Nov 2016.
  •  From publisher perspective, the x-db joins can be slow if joins two large datasets.

What is cross database filter?

Use case example: Let’s say you’re connected to multiple data sources, each with common dimensions like Date or Product. And as part of your analysis, you want to have a single filter apply across all the sources.  That’s where this new feature comes in. Any time you have data sets that share a common dimension, you can filter across the data sets.  A few things to know about cross database filter

  • It is not x-db join but more like blending  where you can manage relationship to edit the blending from connected sources
  • You can only filter data across multiple primary data sources.You cannot filter data across secondary data sources.

4. Desktop License Reporting

Enable Desktop License Reporting is included in V10. This is an awesome feature to track Desktop usage even Desktop users do not publish. Pls see details about this @http://enterprisetableau.com/licensing/

The challenge to leverage this feature is how to change each user’s laptop to make the initially configuration. Here is what you need to know:

  • It work only if both Desktop and Server are on v10.
  • This feature is turned off on server by default, you can turn it on  using tabadmin
    tabadmin set features.DesktopReporting true
    tabadmin config
    tabadmin restart
  • The most difficult part is to update Windows Desktop’s registry or Mac Desktop’s plist to point to the Tableau server where you want license usage to be sent to. Best way is  to have Desktop v10 installer. Pls ref my previous blog for details.
  • You should have all company’s Desktop pointing to one Tableau server even Desktop users publish to different servers. This way you will have one place to see all enterprise Desktop usage.
  • By default, Tableau Desktop v10+ will ping Tableau server v10+ for usage reporting every 8 hrs. You can configure intervals on  Desktop.  It is controlled by plist of the Mac or registry of Windows. It is not tabadmin option. See here.

5. Subscribe Others

Finally Tableau delivered this long asking feature in V10. A few things to know:

  • This feature has to be enabled at site level
  • You can create custom email from address for each site. This is handy since users who received the subscription emails may not want to connect server admin rather site admin for questions.
  • Only workbook owners can subscribe others
  • The user has to have an email address in the Account Settings, otherwise subscribe others will not be highlighted.  If a lot of users do not have email address on Tableau server, you may have to mass update all users with valid email address before this feature can really be enabled.
  • You can’t subscribe to groups but users only. If you really want to subscribe group, one workaround is to create dummy user, then give group email to this dummy user.
  • You can’t subscribe to users who are not valid users of the site
  • You can’t subscribe to users who do not have permission to view the workbooks or views
  • The users who are subscribed can click ‘Manager my subscriptions’ link at the bottom of the subscribed emails to de-subscribe anytime.
  • Users can always subscribe themselves if they have view permission to the workbooks or views.

6. Device Specific Dashboard Layout 

After you’ve built a dashboard you can create layouts for it that are specific to particular phone or tablet devices. It will be the same URL but Tableau will render different layout depends on devices used to access the server.

Most of users (specially executive users) use phones to view information. This is great feature to drive Tableau enterprise adoption. A few notes:

  • It is enabled out of the box. There is no server or site level setting to enable or disable this feature.
  • When publish the dashboards, make sure to clear the option ‘ Show Sheets as Tabs’. Otherwise this feature does not work
  • This feature works for Tableau Apps and it also works for mobile devices that do not have Tableau Apps installed.
  • The best practice is to remove some views from default layout so mobile device layout will have fewer views than default layout

What are the design tips:

  • Ask yourself: What key information does my end user need from my dashboard?
  • Click “device preview” to confirm how your dashboard looks across different devices.
  • (For small screens) Remove unnecessary views, filters, titles, and legends.
  • (For small screens) Determine if you need a scrollable dashboard (fit width). If so, stack dashboard objects and use a “peek.”
  • (On touch devices) On scrollable dashboards, pin your maps, and disable pan and zoom.

With device designer, you’ll rest assured knowing your data stands out with optimized dashboards on any device!

6. Dataa Source Analytics

Data source management has been brought into line with Workbooks, so that we now have revision history, usage information and users can have favourite data sources.

You can also change the view for data sources so that you can see them grouped by where they connect to, instead of the data source name.

Tableau has yet to come up with data source lineage features announced in TC16 Austin  – from data source column to tell which workbooks use so you can do impact analysis when data source changes, or from workbooks to tell which data source table or/and columns for us to tell potential duplicated data sources. I am expecting those big new features in 2017.

7. Site Specific SAML

If using SAML authentication, you can make this site specific, instead of for the whole server.  This means that some sites on your Tableau Server can use SAML for single sign on, whilst others will just use normal authentication.

I know that it takes months for enterprise customers to leverage some of those new features. Hope this blog helps. Pls feel free to post your tips and tricks of implementing those features.

SCALING TABLEAU (4/10) – USE SITES

Tableau server has a multi-tenancy feature called “sites” which can be leveraged by enterprise customers for better scalability, better security and advanced self-service.

This blog covers following areas about Tableau sites:

  • Basic concepts
  • Common use cases
  • Governance processes and settings
  • When should not create a new site

1. Basic concepts about Tableau sites

Let’s start with some basic concepts. Understanding those basic concepts will provide better clarity, avoid confusions, and reduce hesitations to leverage sites.

Sites are partitions or compartmented containers. There is absolutely no ‘communication’ between sites. Nothing can be shared across sites.

Site admin has unrestricted access to the contents on the specific site that he or she owns. Site admin can manage projects, workbooks, and data connections. Site admin can add users, groups, assign site roles and site membership. Site admins can monitor pretty much everything within the site: traffic to views, traffic to data sources, background tasks, space, etc. Site admin can manage extract refresh scheduling, etc.

One user can be assigned roles into multiple sites. The user can be site admin for site A and can also have any roles in site B independently. For example, Joe, as a site admin for site A, can be added as a user to site B as admin role (or Interactor role). However Joe can’t transfer workbooks, views, users, data connections, users groups, or anything between site A and site B sites. When Joe login Tableau, Joe has choice of site A or B: When Joe selects site A, Joe can see everything in site A but Joe can’t see anything in site B – It is not possible for Joe to assign site A’s workbook/view to any users or user groups in site B.

All sites are equal from security perspective. There is no concept of super site or site hierarchy. You can think of a site is an individual virtual server.  Site is opposite of ‘sharing’.

Is it possible to share anything across sites? The answer is no for site admins or any other users. However if you are a creative server admin, you can write scripts run on server level to break this rule. For example, server admin can use tabcmd to copy extracts from site A to site B although this goes to the areas where Tableau does not support anymore officially.

2. Common use case of Tableau sites. 

  • If your Tableau server is an enterprise server for multiple business units (fin, sales, marketing, etc), fin does not wants sales to see fin contents, create sites for each business unit so one business unit site admin will not be able to see other business unit’s data or contents.
  • If your Tableau server is an enterprise platform and you want to provide a governed self-service to business. Site approach (business as site admin and IT as server admin) will provide the maximum flexibility to the business while IT can still hold business site admins accounted for everything within his or her sites.
  • If your server deals with some external partners, you do not want one partner to see other partner’s contents at all. You can create one site for each partner. This will also avoid potential mistakes of assigning partner A user to partner B site.
  • If you have some very sensitive data or contents (like internal auditing data), a separate site will make much better data security control – from development phase to production.
  • Using sites as Separation of Duties (SoD) strategy to prevent fraud or some potential conflicting of interests for some powerful business site admins.
  • You just have too many publishers on your server that you want to distribute some admin work to those who are closer to the publishers for agility reasons.

Arguably, you can achieve all of those above by using Projects w/o using sites. Why sites again?  First, Sites just make things easier for large Tableau server deployment. Many out of box server admin views go by site. So it will be easier to know each BU’s usage if you have site by BU. Second, if you have a few super knowledgable business users, you can empower them better when you grant them site admin access.  

3. Governance processes around Tableau sites.

Thoughtful site management approaches, clearly defined roles and responsibilities, documented request and approval process and naming conversions have to be planned ahead before you go with site strategy to avoid potential chaos later on. Here is the checklist:

    • Site structure: How do you want to segment a server to multiple sites? Should site follow organization or business structure? There is no right or wrong answer here. However you do want to think and plan ahead.
    • How many sites you should have? It completely depends on your use cases, data sources, user base, levels of controls you want to have. As a rule of thumb, I will argue anyone who plans to create more than 50 sites on a server would be too many sites although I know a very large corporation has about 300 sites that work well for them. I will prefer to have  less than 20 sites.
    • Who should be the site admin? Either IT or business users (or both) can be site admins. One site can have more than one admin. One person can admin multiple sites as well. When a new site is created, server admin normally just adds one user as site admin who can add others as site admins.
    • What controls are at site level? All the following controls can be checked or unchecked at site level:
      • Storage limitation
      • Revision history on or off and max numbers of revisions
      • Allow the site to have web authoring. When web authoring is on, it does not mean that all views within the site are web editable. The workbook/view level has to be set web editing allowed by specific users or user groups before the end user can have web editing.
      • Allow subscriptions. Each site can have one ‘email from address’ to send out subscriptions from that site.
      • Record workbook performance key events metrics
      • Create offline snapshots of favorites for iOS users.
      • Site-specific SAML with local authentication
      • Language and locale
    • What privileges server admin should give to site admins? Server admin can give all the above controls to site admin when the site is created. Server admin can change those site level settings as well. Server admin can even take back those privileges at anytime from site admin.
    • What is new site creation process? I have new site request questionnaires that requester has to answer (see below). The answers help server and governance team to understand the use cases, data sources, user base, and data governance requirements to decide if their use cases fit Tableau server or not, if they should share an existing site or a new site should be created. The key criteria are if same data sources exist in other site, if the user base overlaps with other site. It is balance between duplication of work vs. flexibility.
    • What is site request questionnaires?
      • Does your bigger team have an existing Tableau site already on Tableau server? If yes, you can use the existing site. Please contact the site admin who may need to create a project within the existing site for your team. List of existing sites and admins can be found @……. 
      • Who is the primary business / application contact?
      • What business process / group does this application represent? (like sales, finance, etc)?
      • Briefly describe the purpose and value of the application
      • Do you have an IT contact for your group for this application? Who is it?
      • What are the data sources?
      • Are there any sensitive data to be reporting on? If yes, pls describe the data source
      • Are there any private data as part of source data? (like HR data, sensitive finance data)
      • Who are the audiences of the reports? How many audiences do you anticipate? Are there any partners who will access the data
      • Does the source data have more than one Geo data? If yes, what is the plan for data level security?
      • What are the primary data elements / measures to be reporting on (e.g. booking, revenue, customer cases, expenses, etc)
      • What will be the dimensions by which the measure will be shown (e.g. Geo, product, calendar, etc)
      • How often the source data needs to be refreshed?
      • What is anticipated volume of source data? How many quarters of data? Roughly how many rows of the data? Roughly how many columns of the data?
      • Is the data available in enterprise data warehouse?
      • Are the similar reports available in other existing reporting platform already?
      • How many publishers for this application?

4. When should not create a new site?

  • If the requested site will use the same data sources as one of the existing sites, you may want to create a project within the existing site to avoid potential duplicate extracts (or live connections) running against the same source database.
  • If the requested site overlaps end users a lot with one existing site, you may want to create a project within the existing site to avoid duplicating user maintenance works.
  • The requester does not know that his or her bigger team has a site site

As a summary, Tableau site is a great feature for large Tableau server implementations. Sites can be very useful to segment data and contents, distribute admin work, empower business for self-service, etc. However site misuse can create a lot extract work or even chaos later on. Thoughtful site strategy and governance process have to be developed before you start to implement sites although the process evolves toward its maturity as you go.

Scaling Tableau (1/10) – version control and revision history

Tableau released one of the most wanted server features – version control and revision history in V9.3. Then this feature is  much more enhanced in V10 with previewing old workbook,  one click restoring, and maximum revisions setting. I love all of those new V10 features:

  • The workbook previewing and restoring features are so convenience for publishers.
  • The maximum revision setting is so cool for server admin who can actually control the server space usage so you do not have to run out of storage while enabling revision history. It also shows Tableau’s thought process for built-in governance process while enabling a new feature, which is important to scale Tableau to enterprise.   I will explain those features in details here:
  1. Turn it on. By default, Revision History is not turned on. It can be turned on site by site. To turn it on, go to site Setting, General and select  “Save a history of revisions“.  If you are on V10, you have two choices of Unlimited and # of revisions. Unlimited means that there is no limit on the max version history, which you probably do not want to have. As a server admin, you always want to make sure that your server will not run out of space. You will find # of revision is a very handy feature so admins can have some peace of mind about server storage.Screen Shot 2016-11-27 at 3.27.57 PM

2. How to decide the max. number of revisions?

I asked this question but I did not find any guidances anywhere. I spent days of research and I wanted to share my findings here. First  of all,  my philosophy is to give the max flexibility to publishers by providing as many revisions as possible. On the other side, I also want to be able to project extra storage that the revision history will create for planning purpose.

How many revision you should set? It depends on how much space you can allocate to revision history w/o dramatically impacting your backup/restore timing and how many workbooks the server have. Let’s say that you are Ok to give about 50G to all revision history. Then figure out how many workbooks you have now, and what is the total space for all the xml portion of workbooks (revision history only keeps xml piece), then you can calculate max number of revisions. Here is how:

  • Open Desktop, connect to PostgreSQL, give your server name, port, workgroup as database, give readonly user and password. Select  Workbooks table, look for Size, Data Engine Extracts, and number of records.  The Data Engine Extracts of  Workbooks table tells you if the workbook is embedded workbook or not.
  • If you have total 500 workbooks with 200 of them have Data Engine Extracts as false and total size as 200M for all workbooks with Data Engine Extracts as false.  It means that the avg twb is about 1M per workbook – this is what revision history will keep once it is turned on. Then the total xml size of workbook is about 500M.
  • When you turn on revision history and if you set max revision as 50, overtime, the server storage for revision history would be about 50 x 500 x 1M = 50G overtime.  Two other factors to consider: One is new workbook creation rate, two is that not every workbook would max out revision.
  • Once you set the revision number, you can monitor the storage usage for all revision history by looking at  Workbook_versions table which keeps all the revision history.  You can find the overall size, number of versions, and more insights about use pattens. You can also do the following joins to find out workbook name and use name, etc.

Screen Shot 2016-11-27 at 10.10.39 PM

3.  Can interactors see the previous version as well? No. The end users of interactors can only see the current version.

4. Does publish have to do anything to keep revision history of his or her workbooks? No. Once ‘Save a history for revision’ is turned on for site, every time the  workbook is web edited or modified via Desktop, a new revision w be created automatically – there is no further action for publisher. When the max number of revision is reached out, the oldest version will be deleted automatically. There is no notification to publishers either. All you need to communicate to publisher is that max number of revisions that any publisher can have.  For example, if you keep  50 revisions and one workbook has 50 revision already. When this workbook is changed again, Tableau server will keep the most recent  50 revisions only by deleting the oldest revision automatically.

5. Can you change the max revisions? Yes. Let’s say you have max revision as 50 and you want to reduce it to 25. Tableau server will delete the old revisions (if there are any) and keep the most recent 25 revisions only. What happens if you change back from 25 to 50? All the older revisions are gone and will not show up anymore.

6. What is workflow for publisher to restore an old workbook? Publishers or admin can see revision history for their workbooks by click details, revision history. With one simple click to preview any old workbook or restore. Once it is restored, a new revision will be created automatically again.

7. How to restore data source revision? V10 came with review and restore features for workbooks only. You can view all revisions for data sources as well but you will have to download the data source and upload it gain if you want to restore older version of data source. I am sure Tableau’s scrum team has been working on one click restoring of data source as well.

NetApp’s Tableau enterprise deployment added 2,500 users in less than 10 months

NetApp’s presentation about Tableau enterprise deployment is well received at Tableau conference 2015 Las Vegas – Survey shows 4.5 out of 5 on contents and 4.3 out of 5 for speaker presentation.

The key success factors for large scale Tableau server deployment are:

1. Create enterprise Tableau Council with members from both business and IT. NetApp’s Tableau Council has 10 members who are all Tableau experts from each BU & IT. Most of the Council members are from business. Council meets weekly to assess and define governance rules. This council is representatives of larger Tableau community.

2. Enable and support Tableau community within company. NetApp has a very active 300+ member Tableau community which are mainly Tableau Desktop license owners.  NetApp’s Tableau Intranet is the place for everything about Tableau.  Anyone can post any questions in community intranet and a few committed members ensure all questions are answered timely . NetApp also has monthly Tableau user CoE meeting, Hackathon, quarterly Tableau Day, and internal Tableau training program.

3. Define clear roles and responsibilities in new self-service analytics model. NetApp uses site strategy – each BU has its own site.

  • BU site admins are empowered to manage everything within his or her site: Local or departmental data sources, workbooks, user groups and permissions, QA/release/publishing process, user support, etc.
  • IT owns server management, server licenses, enterprise data extracts, technical consulting, performance auditing & data security auditing, etc
  • Business and IT partnership for learning, training, support and governance.

4. Define Tableau publishing or release process.  The question here  is how much IT should be involved for publishing or release? This is a simple question but very difficult to answer. Trust and integrity is at heart of NetApp culture. NetApp’s approach is that IT is not involved for any workbook publishing.  BU site admins are empowered to make  decisions for their own QA/test/release/publishing process.

There are two simple principles: One is test fist before production. Second is performance rule of thumb which is 5 second-10 second-20 second rule. Less than 5 second workbook render time is good.  Workbook render time more than 10 seconds is bad. No one should publish any workbook if render time is more than 20 seconds.

What if people do not follow? NetApp wants to give BU maximum flexibility and agility for release or publishing. However if rules are not followed, IT will have to step in and take control the release process. When it happens,  it will becomes weekly release process. Is this something which IT wants to do? No. Is this something that IT may have to if things go south.. Yes but hopefully not….

5. Performance management – trust but verify approach. Performance has been everyone’s concern when it comes to a shared platform, specially when each BU decides their own publishing criteria and IT does not gate the publishing.

How to protect the value of shared Tableau self-service environment? How to prevent one badly-designed query from bringing all servers to their knees? NetApp has done a couple of things:

  • First, set server policy to ensure Tableau platform healthy: like maximum workbook size, extract timeout limits, etc
  • Second, send out daily workbook performance alerts to site admin about their long running workbooks.
  • Third, make workbook performance matrix public so everyone in the community has visibility on the worst performed workbooks/views to create some peer pressures with good intent.

It is site admin’s responsibility to tune the workbook performance. If action is not taken, site admin will get a warning, which can lead to a site closure.

6. Must have data governance for self-service analytics platform. Objective is to ensure Tableau self-service compliance with existing data governance process, polices and controls that company has.

Data governance  is not ‘nice to have’ but ‘must have’ even for Tableau environment. NetApp has a pretty mature enterprise data governance (EDM) process. BI team works very closely with EDM team to identify & enforce critical controls. For example, IT has masked all sensitive human resource & federal data in enterprise tier 2 data warehouse from database layer so we have piece of mind when Tableau desktop users to explore the tier 2 data.

NetApp is also working on auditing process to identify potential data governance issues and working with data management team to address those, this is the verifying piece of ‘trust but verify model’.

The goal is to create a governed self-service analytics platform.  It has been a journey toward maturity of the enterprise self-service analytics model.

The attached is the presentation deck

NetApp-Tableau-Presentation-Final1