All posts by markwu2000@gmail.com

Automation – Remove Permissions for Slow Render Workbooks

My previous blog talks about sending automated alert (VizAlerts) for slow render workbooks, this blog will show you how to  enforce timeout or other government rules for slow render workbooks.

Problem statement: Slow workbook alert  is important and absolutely necessary. However sometimes the alerts and email renders are not good enough – what if some workbook owners do not take any actions?  What server governance team can do about?

Solution #1: Reduce vizqlserver.querylimit

  • vizqlserver.querylimit: Covers specifically the queries executed to return views or populate extracts. The default value is 1800 seconds (or 30 minutes). This can be reduced to 180 seconds (for example).
  • What it means? querytime limitAny queries more than this limit will receive the this  error
  • How to change this setting?
      • tsm configuration set -k vizqlserver.querylimit -v 300
      • tsm pending-changes apply

 

Solution #2: Remove workbook permission

The solution #1 (Reduce vizqlserver.querylimit) should be good enough for most of governance process to enforce render time.  It is  very simple and effective. Limitations of solution #1  are

  • It is for whole server and is not site specific
  • It has no exception at all, no matter which workbook

If you want to have more flexibility, you can create custom solution. How to do it?  After a lot of research and discussion with many others, the best custom solution is to create your own selection criteria for slow render workbooks, then remove those workbook permissions. What I did is as followings:

  1. Find out the render time pattern of 1% slowest workbooks (based on the data on my server, those 1% slowest workbook took about 20% server VizQL CPU ands memory
  2. Find out what the weekly avg render time for those slowest 1% workbooks (it was about 30 seconds for my server after avg due to caches, etc)
  3. Now you have the selection criteria of identifying slowest workbooks that you wanted to take actions on
  4. Use my previous blog  to send  automated alert (VizAlerts) for slow render workbooks
  5. I will recommend to send 3 warnings before the enforced actions – it means that the slowest workbook have to be slow and met selection criteria for 4 weeks in the row before enforcement actions
  6. At 4th week, remove the workbook’s permissions and send auto email to let workbook owner know that the workbook’s permissions are removed and the email should include all permissions being removed – this email can be achieved by another VizAldert and permission query and deletion can be done using REST API :
    • Query workbook permission API GET /api/api-version/sites/site-id/workbooks/workbook-id/permissions
    • Delete workbook permissions API

      DELETE /api/api-version/sites/site-id/workbooks/workbook-id/permissions/groups/group-id/capability-name/capability-mode  and DELETE /api/api-version/sites/site-id/workbooks/workbook-id/permissions/users/user-id/capability-name/capability-mode

Conclusions:

  1. You should reduce vizqlserver.querylimit from default 1800 seconds to somewhere 180-300 seconds
  2. Next step is to implement slow workbook VizAlert warning
  3. Depends on your organization culture, if VizAlert alone does not change publisher’s behavior, you can take enforcement actions to remove workbook permissions automatically  after a few warning.  Why remove permission is the best approach:
  • It stops end user access the workbook so no more slow render that impacts everyone else on the server
  • It still allows workbook owners to work on the perf tune which is what you wanted
  • It gives workbook owner flexibility to re-grant workbook access right away to meet business continuity for whatever reasons
  • Any updates of the workbook will re-set the 4 week clock – it means that it buys another 4 week time for owner to improve the workbook

 

 

 

Automation – VizAlert for Slow Render Workbooks

Tableau server performance management has been everyone’s concerns when it comes to a shared self-service environment since nobody wants to be impacted by others. This is especially true when each business unit decides their own publishing criteria where central IT team does not gate the publishing process.

How to protect the shared self-service environment? How to prevent one badly designed query from bringing all servers to their knees?

My previous blog Governed Self-Service Analytics: Performance Management (7/10) talked about a few actions. This blog talks about one technical implementation : how to send auto alert to workbook owner when the average workbook render time exceeded defined threshold. This helps as most of workbook owners are willing to take actions. My next blog will talk about some enforcement when no action is taken overtime.

What is the problem statement? 

  • Let workbook owners know their workbooks are a lot slower than many other workbooks on the same server

Is this necessary at all? Don’t they know already? 

  • Some workbook owners may know but other may not. It is possible that a well performed workbook becomes much slower overtime due to data change
  • Some workbook owners may know the slowness but they did not know how slow comparing with many other workbook on the server
  • Other workbook owners know slow but they do not bother to improve – if their users do not complain and server admin team does not complain, they have no incentive to improve it

Why server admin team cares about my slow views if my users are happy about it?

  • This is a valid argument. One time, I did research on my server : 1% views with render time > 30 seconds consume about 20-30% server CPU & memory resources!

 It used to take one day to get the answer outside Tableau, but now it takes only 5 mins on Tableau server to get the answer, I am happy about it!

  • This is a valid argument too. My counter answer is that you are working on a shared server env – your slow view impacts others. If you have your own dedicated server, admin will not care if your view takes 5 mins or much longer.

Now tell me auto alert solution:  Create a VizAlert to be sent to workbook owner abut the slow workbook views automatically in weekly basis. More details and tips for the VizAlert:

  1. Use weekly render time avg to avoid a lot one-off outliers
  2. Define your alert threshold, I use 30 sec as weekly avg
  3. Include # of weeks the slow view/workbook has been on the alert
  4. If it is on the alert the week 1, then it is not on week 1 anymore, that is great!
  5. However if the workbook is on week 1, 2 and 3, a conversation may be needed or you may want to trigger enforcement in my next blog
  6. Be sure to include best practice for designing effective Tableau workbooks and other internal resource list.

slow workbook

Automation – Timeout long subscriptions and auto send email to workbook owner

Why Subscriptions are not preferred for Tableau server?

  1. Subscriptions send out email at defined intervals, which does have a lot of conveniences for some users. However Tableau’s strength is interactive. The subscription is counter-interactive. 
  2. It is nothing wrong for users to get Tableau views in their inbox. The problem is that server admins or workbook owners have no way to tell if the users open the emails at all. Overtime, admins can’t tell if users are actually using the Tableau server or not.

What is the worse scenario of subscriptions?

I found that some ‘smart’ publishers mis-use subscriptions in such way that they know that their workbook render is too slow (like 30 minutes render) for user to render so they use subscriptions to send emails to users. Why it is bad?

  • It can take a lot of backgrounder process
  • It is just a bad behavior for those publishers who should spend more time to tune their dashboard for better perf
  • It impacts other extract jobs. As matter of fact, backgrounder priority is subscription, incremental extract, full extract. Image that 30 min subscription job to 20 people will potentially block a lot of small (2-3 mins) important extracts…..

How to implement server governance process to let offensive publishers feel the pain?

  • Reduce the subscriptions.timeout from default 30 mins to a shorter time, like 2-5 mins.
  • This value applies separately to each view in the workbook, so the total length of time to render all the views in a workbook (the full subscription task) may exceed this timeout value.
  • For example, if the subscriptions.timeout is 3 mins and workbook has 4 views, the workbook will not timeout till 3×4 = 12 mins
  • The commend is  tsm configuration set -k subscriptions.timeout -v 180  (where 180 seconds is 3 minutes)

What is the problem of this subscriptions.timeout?

The problem is that Tableau server sends the following  failure notification to subscribers who had no idea what happened. A lot of subscribers assume that something is wrong on server while the workbook owner had no idea about the timeout.

The view snapshot in this email could not be properly rendered.

To see the view online, go to link to the workbook view

If it failure like this, after  5 consecutive times it will skip from subscription.

What is the solution to streamline the subscriptions.timeout message?

  1. Create VizAlert to look for subscriptions.timeout error
  2. Find out workbook owner based on workbook_id
  3. Create VizAlert message about subscription time out policy
  4. Send VizAlert to both subscribers and workbook owner

Conclusion:

  1. Necessary governance processes are required to scale Tableau to enterprise
  2. The core concept of governance is to encourage good behaviors and disencourge bad behaviors.
  3. It is a bad behavior of subscribing users to a super slow workbook
  4. Reduce subscriptions.timeout to a few minutes (for example:  tsm configuration set -k subscriptions.timeout -v 180) and use VizAlert to send message to workbook owner about subscriptions.timeout is one solution to counter this bad behavior.

Automation – Swap Backgrounders and VizQL dynamically

Tableau V2018.2 introduced Tableau Server Manager (TSM) that is actually a new Tableau server architecture. Tableau does not change its architecture often at all. I do not remember when Tableau had it last time. TSM comes with many awesome features like support for dynamic topology changes,  no dedicated backup primary, installing new server version while current version is still running. This blog talks about the dynamic topology changes.

The problem statement:  Tableau server licenses are expensive. How to get more out of your existing Core server licenses? Or how to get more efficiency out of your HW infrastructure for subscription model?

Contents: Tableau’s Backgrounder handles extract refresh and VizQL handles viz render. Often VizQL has more idle time during night while Backgrounder has more idle time during day. Is it possible to automatically config more  cores as Backgrounders during the night and more VizQL during the day? The dream becomes true with Tableau TSM.

How? Here is the scripts :

set PATH_BIN=”D:\Tableau\Tableau Server\packages\bin.2019.1.xxx”
set PATH_LOGFILE=”xxxx\topology.log”  ##location of the log###
set DATE_TODAY=%DATE:/=-%
set USERNAME=””
set PASSWORD=””
set BG=”4″
set VIZ=”0″
set NODE=”node2″
cd /d %PATH_BIN%

echo %date% %time%: ##### Topology change started ##### >>%PATH_LOGFILE%

echo %date% %time%: Check current topology >> %PATH_LOGFILE%
call tsm topology list-nodes -v -u %USERNAME% -p %PASSWORD% >> %PATH_LOGFILE%

echo %date% %time%: Changing topology to %BG% backgrounders >> %PATH_LOGFILE%
call tsm topology set-process –count %BG% –node %NODE% –process backgrounder -u %USERNAME% -p %PASSWORD% >> %PATH_LOGFILE%

echo %date% %time%: Changing topology to %VIZ% vizql >> %PATH_LOGFILE%
call tsm topology set-process –count %VIZ% –node %NODE% –process vizqlserver -u %USERNAME% -p %PASSWORD% >> %PATH_LOGFILE%

echo %date% %time%: Listing Pending Changes >> %PATH_LOGFILE%
call tsm pending-changes list >> %PATH_LOGFILE%

echo %date% %time%: Apply Pending Changes >> %PATH_LOGFILE%
call tsm pending-changes apply >> %PATH_LOGFILE%

echo %date% %time%: check changes have been applied >> %PATH_LOGFILE%

call tsm topology list-nodes -v >> %PATH_LOGFILE%

echo %date% %time%: ##### script completed ##### >> %PATH_LOGFILE%
:: Finish

Important Notes :

  • It actually works better from V2019.1 onwards.  We found issue with in-fly extract tasks when backgrounder was reduced but issue is fixed in V2019.1.
  • How to decide timing to swap the Backgrounder vs VizQL? All depends on your server usage patten that you can find from historic_events table for user clicks and backgrounder_jobs for number of extracts by hour

Pls comment if you have a better scripts or any other questions

 

Automation – Remove any permissions for All Users group

Tableau server has a built-in All Users group in each site.  It consists of all users as it is named. It can be useful when you need to share content to all users. However we found that All Users group is a really bad feature for large enterprise server  – it is too easy for content owners to mistakenly grant permissions to All Users group unintentionly since it always shows at the top of user & group list when grant permissions.  The impact can be huge when a sensitive dashboard is shared mistakenly to All Users.

How to fix this problem?  I tried many things and what works is automatically removing any permissions granted to All Users group. Before I share more how to do this, let’s explain what other options I explored and why did not work:

  • Can you delete All Users group? Server admin can’t delete it from server UI but I found a way to delete it from Postgre database directly (you will need R/W user/password). The problem is that every user lost his or her permissions after All Users group is deleted. It appears to me that Tableau uses All Users for internal permission process.
  • Can you rename All Users group? Server admin can’t  All Users group either  from server UI but again you can rename it from Postgre database directly (you will need R/W user/password) to something like ZZ_All Users group. Unfortunately it still shows on tall usershe top even after it is renamed to ZZ_All Users.

How to make sure nobody uses All Users group? Unfortunately I can’t find any other option but delete any possible ‘wrong’ permissions after the fact. It actually works well.

How t query All User group permissions?  Pls see the key joins of Postgre.  Then set filter out = All Userspermission

 

 

 

You need to join Next_gen_permissions with  identities Custom SQL (select ‘User’ as Type, users.id as id, system_users.name as name
from users
join system_users on users.system_user_id=system_users.id
UNION
select ‘Group’ as Type, id, name from groups)

The objects is another custom SQL (select ‘Workbook’ as Type,id,name,site_id from workbooks
UNION
select ‘View’ as Type,id,name,site_id from views
UNION
select ‘Project’ as Type,id,name,site_id from projects)

You can find the workbook @ Unused groups share.twb

You can you Python or Java or whatever your prefered script tool. Run it daily would be good enough.

BTW, this is NOT a Tableau Supported approach, you are on your own risk.

 

 

 

Automation – Workbook/Disk size governance

My previous post (Automation – Advanced Archiving) talks about auto deletion of unused workbooks. Overtime, 2/3 of workbooks are removed – both business and IT are happy about it : Business users have only active and useful assets in their Tableau portal. IT’s  backup/restore is faster.

However I still found some active workbook’s size too big, for example can be 10G+ with extracts.  Tableau server does not have any control to stop large size data source/workbook being published.  Most of those large workbook’s perform badly anyway.

How to disencourge the bad behavior of publishing very large workbooks? How to govern disk size?

The answer is again deletion.

1. Delete large unused workbooks more aggressively

The best way to encourage smaller workbook size is to delete large workbooks more aggressively. For example, if your regular policy is to delete workbooks not used for 3 months. You can introduce size factor :

  • Delete workbooks not used for 2 months if workbook size between 2G-5G
  • Delete workbooks not used for 1 month if workbook size between 5G-10G

2. Delete very large active workbooks

Can you have policy to delete super large but actively used workbooks? It really depends on your corp policy and business-IT relationship. I have a policy to delete any workbooks with size larger than 10G daily – even it is actively used workbook.  How it works?

  •  Business-IT agrees on the policy  –  no workbook can be larger than 10G on server.  Unfortunately Tableau server does not have this feature so we have to have our own automation program runs hourly (can be daily) to delete any workbooks > 10G in size.
  • Of course, any deletion notification will be sent to workbook owner with policy stated in the message.

3. How to handle the situation that workbook size gradually increasing to the enforced deletion threshold? 

  • A separate size alert would be necessary to let data source / workbook owner know that his or her workbook is inches away fromvbeing deleted so action can be taken by workbook owners.

Feel free to add your comments ….

Automation – Advanced Archiving

My previous post (Automation – Set Usage Based Extract Schedule) provides a practical server governance approach that re-schedules self-service publisher’s extracts based on workbook usage automatically.

This blog talks about handling old workbooks that nobody uses anymore over a period of time.  The keyword is archiving. Many server admins are doing archiving. The tips and tricks in this blog will enlighten your thinking about this topic, which is why I call it advanced archiving.

  1. Do not archive but delete

The common IT way of doing things is to make copy of ‘old workbooks/data sources’ somewhere else, then business workbook/data source owners can download when needed. This is old way of doing things since it creates more support work for technical team (like workbook owners could not find the archiving URL or workbook, etc). The much better way is no archiving but just deletion, then send the deleted workbooks to owners.

2. Send old workbooks to owners automatically

For the workbook met deletion criteria, call server API (GET /api/api-version/sites/site-id/workbooks/workbook-id/content)  to download the workbook. If the workbook is twb, perfect; If the workbook is twbx, rename it as zip, unzip it, ignore the .hyper (or .tde) but get .twb only. Then send the .twb to workbook owners (and project leaders if needed) email with .twb attached. Key benefits are as followings:

  • Workbook owner can always search their email inbox to get the deleted workbooks if they need to re-publish again later on.
  • Do not email .hyper (or .tde) due to its size and data security concerns

3. Delete first, then send notifications

It is a common mistake that server admin sends a list of workbooks for owners to confirm before archiving, which creates unnecessary clicks on server. Please delete those old workbook first from server then send the notification with .twb attached and policy link in the email body.

4. Delete more aggressively for larger workbooks

How to define old? Some use 180 days but I use 15-90 days depends on size of workbooks:

  • Regular workbooks get deleted if no usage for 90 days
  • Workbooks with 2G+ size  get deleted if no usage for 30 day
  • Workbooks with 5G+ size  get deleted with no usage for 15 day

5. Delete published data sources as well

When you delete workbooks, some published data sources have no connected workbook anymore over-time:

  • Delete standalone data source if it is created more than 2 (or 4) weeks back – you do not want to delete the recently published data sources

6. Technical implementation details

Use historical_events table to drive usage cals. Make days of no-usage as part of email body vs policy so workbook owner does not have to guess why the workbook deleted. If you use size criteria as well, get the workbook size in the email body as well.

7. Get buy-in from business management for those policy

You want to get buy-in from business leaders for those policy,  document the policy, and then the email notification always includes a link of this policy.  It is a lot of easier than what most people think to get buy-in. Why?  Business loves the fact that server deletion makes interactor’s life much easier to find the active content. The higher level you do, the easier to get buy-in.

8. How to identify those workbooks not used for long time?

One way is to use the following criteria :

select views_workbook_id, ((now())::date max(last_access_time)::date)  as last_used
from _views_stats
where last_used > 90
group by views_workbook_id

 

Download the Tableau Workbook Archiving Recommendation.twb

Updated on June 8, 2019: Pls read Automation – Data Source Archiving

 

FEATURE ADOPTION – HOW TO SET UP SERVER CACHE

Updated Jan 2024:

Cache is one of the most confusing things on Tableau server.  The blog is trying to answer the following questions:

  • How Tableau server cache works?
  • What are the server settings to control cache expiration?
  • Would cache be refreshed after extract refresh?
  • Does cache work for both live connections and extracts the same way?
  • Is it possible that I can set server to have different cache policy for  live connections and extracts?
  • Is it possible to have customized cache policy for specific workbooks or data sources?
  1. How Tableau server cache works? 

Cache stores earlier computation data so that future requests for that data can be served faster.  From server scalability and view performance perspective, the more cache server holds, the better. The longer cache server holds, the better.  However on the other side, when the query hits cache, it also means that users are not getting the latest data from data source (extract or live connection).

You may say that an interactor can always click refresh button to by-pass the cache. However we can’t expect interactors to click twice for everyone view….. So it is important to set the cache correctly to balance the performance and freshness of the data.

2. What are the server settings to control cache?

There are 2 settings to control the behavior of server cache

1) The most important setting is overall server data cache setting that applies to whole server.

tsm data-access caching set -r <value>

Where <value> is one of these options:

  • low. This is the default and configures Tableau Server to cache and reuse data for as long as possible.
  • <n>. “<n>” specifies the maximum number of minutes data should be cached.
  • always or 0 (zero). These values indicates that the cache should be refreshed each time a page is reloaded. This is the default value if no value is specified

Apply changes with the tsm pending-changes apply command. This will restart Tableau Server.

2) The 2ndsetting is site specific“Workbook Performance after a Scheduled Refresh” – Pre-compute workbooks viewed recently.

When you turn on this feature, you can also increase or decrease the number of workbooks that are cached after a scheduled refresh with the following tsm configuration set option:

   backgrounder.externalquerycachewarmup.view_threshold

By default, the threshold is set to 2.0. The threshold is equal to the number of views that a workbook has received in the past seven days divided by the number of refreshes scheduled in the next seven days. (If a workbook has not been viewed in the past seven days, it is unlikely that it will be viewed soon, so Tableau Server does not spend resources recomputing queries for the workbook.)

3.  How to configure server cache?

  • Step 1: Turn on ‘Pre-compute workbooks viewed recently’ , then monitor overall server view render time and also monitor server memory over a period of time (like a week or a few weeks)
  • Step 2: If render time has no improvement and memory is not too high, you can increase more workbook for cache warning by changing Backgrounder.Externalquerycachewarmup.View_Threshold to 1 and monitor again
  • Step 3: Increase number of minutes data should be cached and then continue to monitor server view render time and server memory. On my server, I did not see render time impact till change number of minutes data should be cached to 360+ minutes. If your server does not have live connections and most extracts run daily, cache holds 24 hrs or longer could be a good setting. However some live connection workbooks may have stale data problem with 24 hrs caches.

4. How to set server to have different cache policy for live connections and extracts?

I had to hack it in the past but Tableau has out of the box feature to allow content owner to  set Live Connection workbook cache policy at each workbook level.

To edit a workbook data freshness policy, you must be the workbook owner, and the workbook must have a live connection to the data source.

  1. Sign in to a site on Tableau Cloud or Tableau Server.
  2. From the Home or Explore page, navigate to the workbook you want to set a policy for.
  3. Click the details icon Icon with an "i" inside a circle.
  4. From the Workbook Details dialog, click Edit Data Freshness Policy.
  5. Choose one of the following options:
    Site default (12 hours)
    Always live (Tableau will always get the latest data)
    Ensure data is fresh every
    Ensure data is fresh at
  6. Click OK.

Leveraging server cache is critical to scale Tableau in large server platform. It is tricky to get cache rights but you will be greatly rewarded when you see overall view render time improved 30+% when get your server cache setting right.

Read additional notes @ 

FEATURE ADOPTION – DASHBOARD EXTENSIONS

The good: Dashboard extensions give you the ability to interact with data from third-party applications directly in Tableau. Capabilities like write-back to a database, custom actions, and deep integration with other apps are all at your fingertips.

The bad: Dashboard extensions also means potential data vulnerability when third-party extension used even on Desktop alone :

  • Extension can access workbook’s summary data by default and full data with additional confirmations.
  • Extension can access the user’s IP address, Tableau Desktop or browser versions, screen resolution, and device type.

How to adopt Dashboard Extensions at large enterprise?

  1. Extension for Desktop:
    • Extension should be turned off by default on Desktop if your company controls user  Desktop installation
    • Some super technical Desktop users can turn extension on by themselves. Read here for details.
  2. Extension for Server :  Tableau server should have the following policy enforced:
    • Unknown extensions can’t run on Tableau server – this is the most important setting. Similar as guest account should be turned off by default, this enable unknown extension should be off by default.
    • Unfortunately you will have to do this for every single site – even your default site turned this off, newly created site will still have this default checked. Please vote IDEA
    • Every extension has to be
      added to the safe list by server adminsextension_setting
    • Hopefully server admins have policy to add only https://*.company.com/xxx URL can be in safe list. It means that third-party extension has to be hosted on-premise before it can be used.
  3. Extension Gallery :
      • Some people may not agree with me here. For me, any third-party extensions is unsafe since they can change extension definition without your knowledge, includes those from Extension Gallery from official Tableau website
      • The secure approach requires all extensions hosted in your company’s web server.
      • From high level, extension is not safe if it is hosted outside your company. Extension is considered ‘safe enough’ if it is hosted within your company’s firewall.
      • Large enterprise should consider to create your own extension gallery for your publishers to share their extensions within your firewall.

Watch the webinar for the recommend settings and Tableau’s plan to make Extensions inherently secure – short term, mid-term and long term.

SCALING TABLEAU (5/10) – LICENSE MANAGEMENT

Tableau license management has been a big pain point to scale Tableau. This blog covers the followings:

  • Tableau license types
  • What is your End User License Agreement
  • How to get most out of your Tableau licenses
  • Desktop and Server license management – The Enterprise Approach
  1. Tableau license types

Tableau has following licenses:

  • Tableau Creator: It includes Tableau Prep (for data profiling, shaping, and filtering before visualization), Tableau Desktop (for creating beautiful viz). New customers will get subscription model (pay as you go) only while existing customers before subscription model was available can stay the old model as long as paying license renewals. It also covers on publisher user base on Tableau server if server is used.
  • Tableau Explorer: This is server side of license. It allows user to web edit existing workbooks, create/publish new workbook from existing published data source on server. Of course, it allow users to have full interactive with published content on server.
  • Tableau Viewer: This is server side of license as well. Viewer can’t web edit existing workbooks, can’t create/publish new workbook from existing published data source on server. It is for interactive with published content on server. Can’t create custom view, can’t download full data but can download summary data only.
  • Tableau Server user based : Small to medium scale sharing and collaboration purpose. One publisher or one interactor takes one seat. If you purchased 100 user based licenses, you can assign a total 100 named users on server – you can change them as long as total does not exceed 100 users at any given time. Tableau offers subscription for user based – all you to use and update server for a specific period of time.
  • Tableau server core based: Medium to large scale sharing and collaboration purpose. If you have 16 cores, you can have unlimited number of interactors or publishers as long as your server is installed on < 16 core machines. Tableau also offers subscription for user based – all you to use and update server for a specific period of time.
  • Tableau online: Similar to Tableau Server user-based but it is on Tableau’s cloud platform.
  • Enterprise License Agreement (ELA): You pay a fixed amount to Tableau for 3 or more years then you will enjoy ‘tableau buffet’ – get unlimited and all types of licenses.

2. What is your End User License Agreement

Nobody wants to read the End User License Agreement. Here is summary of what you should know:

  • Each Desktop license can be installed in two computers of the same user.  You may get a warning when you try to activate 3rd computer.
  • If a Desktop license key is used by Joe who left company or does not use it anymore, this key can be transferred to someone else. The correct process is to deactivate the key from Joe’s machine and reactive it on someone else machine.
  • If you have .edu email, you are lucky as you can get free Desktop as students or teachers.
  • If  you are part of small non-profit org, you can almost get free Desktop licenses.
  • Each server key can be installed in 3 instances: one prod and two non-prod.
  • What if you have to have 4 instances: prod, DR, test, dev? Let’s say you have two core-based keys: key A 8 cores and key B 8 cores. You can activate both keys in prod and DR w 16 each, then you can have key A 8 cores only for test and key B 8 core only for dev. You are good as long as one server key is used in 3 or less instances.
  • What if you do not want to pay maintenance fee anymore? Since it is perpetual licenses, you are still entitled to use the licenses even you do not want to pay maintenance fee. What you are not entitled anymore is upgrade and support.

3. How to get most out of your Tableau licenses

  • If the registration info (name, email, last installed, product version) in Tableau Customer Portal – Keys report is null, it means that this key is never used so you can re-assign it to someone else. You may be surprised how many keys are never used……
  • If the registration info (name, email, last installed, product version) in Tableau Customer Portal – Keys report is associated with someone who left company and this key has single registration, you can re-assign it to someone else.
  • If the registered product version is very old, likely the key owner is not active Desktop user.
  • Enable Desktop license reporting work when you upgrade to v10 to see who does not use Desktop for last a few months. Then potentially you can get license transferred (see below for more).

4. Desktop and Server license management – enterprise approach

When you have hundreds of Desktop licensees, you will need following approaches to scale:

  • Co-term all of your licenses for easy renewals.  Co-term means to have the same renewal date for all of your Desktop & Server: both what you have  and new purchases. This may take a few quarters to complete. Start to pick one renewal date, then agree with your Tableau sales rep, renewal rep,  purchasing department and users for the one renewal date.
  • The Tableau champion to have visibility on every team’s Tableau licenses in Customer Portal. Tableau’s land and expand sales approach creates multiple accounts in Customer Portal. Each team can only see their own keys & renewals. If you drive enterprise Tableau, ask for access for all accounts in Customer Portal.
  • Automate Desktop Installation, Activation and Registration process. No matter you are in Windows or Mac environment, you can automate Desktop installation, activation and registration via  Command lines. Read details. This feature became available for Prep as well since 2018.1.2 although Prep Mac installation is designed different from Desktop. Prep Mac silent installation will need to copy “/var/root/Library/Preferences/com.tableau.Registration.plist” to  “$homedir/Library/Preferences/com.tableau.Registration.plist” for registration to be success since Prep plist is installed at root user directory – this is true for both 2018.2 and 2018.1  although Tableau may change this behavior later on.
  • Transit to Single Master Key. Tableau Desktop supports single master key. Instead of having 500 individual Desktop keys, you can consolidate all into one single master key which can be activated by 500 users. The pre-request is co-term all individual keys. A few important notes:
    • When single master key is created, make sure to ask Tableau to turn on hidden key feature so Desktop users will not see the key anymore. You do not want the single master key to be leaked out. See screenshot on Desktop where ‘Manage Product Keys’ menu does not show up anymore:screenshot_1028
    •  What it also means is that you will have to use quiet installer so key can be activated w/o user’s interaction.
    • This hidden manage product key feature also became available for Tableau Prep from 2018.1.2 although Prep has separate key with Desktop.
    • If you have some users who have two computers at work and both have Tableau Desktop installs. Tableau may consider one user as two installs which will mess up your total license counts. Tableau license team can help you out.
  • Enable Desktop License Reporting in V10. This is an awesome feature to track Desktop usage even Desktop users do not publish. The challenge is how to change each user’s laptop. Here is what you need to know:
    • It work only if both Desktop and Server are on v10. It will be better on v10.0.2 or above as earlier v10 versions are buggy.
    • This feature is turned off on server by default, you can turn it on  using tabadmin
      tabadmin set features.DesktopReporting true
      tabadmin config
      tabadmin restart
    • The most difficult part is to update Windows Desktop’s registry or Mac Desktop’s plist to point to the Tableau server where you want license usage to be sent to. Best way is  to have Desktop v10 installer (ref the Automate Desktop Installation, Activation and Registration process).
    • You should have all company’s Desktop pointing to one Tableau server even Desktop users publish to different servers. This way you will have one place to see all enterprise Desktop usage.
    • By default, Tableau Desktop v10+ will ping Tableau server v10+ for usage reporting every 8 hrs. You can configure intervals on  Desktop. screenshot_1029 Windows example
      Mac plist example:
    • <?xml version="1.0" encoding="UTF-8"?>
      <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
        <plist version="1.0">
          <dict>
            <key>Server</key>
            <string>https://mytableau02:8010,http://mytableau</string> 
            <key>scheduleReportInterval</key>
            <string>3600</string>
          </dict>
      </plist>
    • The Desktop usage (every 8 hrs) is not sent to Tableau company but to your own Tableau server only. What is sent to Tableau company from Desktop is only the registration info. Of course, the registration info is also sent to your defined Tableau server(s).
    • What table has Desktop usage? The Postgres table name is desktop_reporting
    • What dates desktop_reporting has? It  has 4 columns for dates:
      • Maintenance expiration date
      • Expiration date (3 month after maintenance expiration date)
      • Registration date (when registered)
      • Last report date (when last time Desktop used).  Notice it captures only last time when Desktop is used. If you want to know how often Desktop is used in past 3 months, you can’t tell …..
    • How can tell historical Desktop usage? What you can do is to build incremental refresh for the desktop_reporting by last report date, then you will build out your own history for better Desktop license reporting. I am sure that Tableau is working on getting this small table to historical as well…..

As summary, Tableau Creator and server license management is not a simple task. Hopefully those tips and tricks of The Enterprise Approach will easy your pains. It is a good practice to build those out step by step when you are not too big or not too messy.

 

FEATURE ADOPTION – TABLEAU PREP

Tableau Prep is a new product from Tableau designed to help Desktop users quickly and confidently combine, shape, and clean your data for analysis. The direct & visual experience gives you a deeper understanding of your data and makes data prep easier and more accessible.

  1. Tableau Prep cool features:
  • Desktop integration: Tableau Desktop can be launched to preview the results almost at any Prep steps
  • Data profiling pan: visualize data value and distributions
  • See your data at each of the data cleaning, shaping, filtering process
  • Repeatable process: Click ‘Run’ to get new output when source data changed
  • Push output to server

2. Tableau Prep limitations:

  • If you are doing data preparation mainly using Excel or join/union between CSV and other datasets, Tableau Prep is for you!
  • Prep does not replace any existing enterprise ETL tool
  • Prep has about 20 data source connectors while Desktop has 50+ connectors. Tableau is working on new Prep connectors
  • Prep output can’t be scheduled for auto refresh on Tableau server. Tableau is working on Tableau Prep server feature
  • A powerful PC or Mac would be needed for Prep to process complicated logic with large volume of data

3. Tableau Prep output:

  • Final Prep output can be .csv, or .tde or .hyper.  Both .tde and .hyper are Tableau data engine formats that can only be opened by Tableau Desktop. .Hyper is a new format that can only be opened by Tableau Desktop 10.5.* and above.
  • You can open your previousely saved .tfl Prep workflow for further editing
  • You can also open your previousely saved .tfl Prep workflow just to refresh output with data source changes
  • Prep output can be pushed to Tableau server as a published data source. However Tableau server can’t refresh Tableau Prep workflow.  Tableau Prep community outside is coming up some workaround for this.

4. Publish Prep output to Tableau server – detailed steps:

  • Publisher permission is required for this feature and again this is not for auto refresh
  • Add output from Prep flow
  • Select ‘Publish as a data source’
  • Select a server, sign-in, then type in full URL of your Tableau server….
  • Enter machine admin user name and password when asked.
  • After sign-in, select Project where you want the data source to be published
  • Give name of the data source to be published on serv
  • Description of the data source (option)
  • Click ‘Run Flow’

FEATURE ADOPTION – PRE-COMPUTE WORKBOOKS VIEWED RECENTLY

Tableau 10.3 has a big feature for server admins:  Automatic query caching refresh after extracts which is also called: Pre-compute workbooks viewed recently. 

It is a new flag that can be enabled for each site without downtime.  My Tableau server uses a lot of extracts, after we enabled this feature, the overall server render time improved 20% for all views which is a huge!

It sounds simple but it actually has a few other important factors to consider in order to make it work. To make it simple, here is what I will recommend other server admins to do step by steps:

    1. Get the baseline for your server overall render time perf. You can  use ‘Performance of Views’ from Tableau admin view
    2. Enable this flag for the sites that have a lot of extracts. If you do no use site, it will be the ‘Pre-compute workbooks viewed recently’ in default site. Change this flag does not need any system downtime.
    3. Observe overall server ‘Performance of Views’ improvements after step 2.
    4. If there is no improvement at all, check what is your  ‘server connection caching’ setting which is part of server configurations. If the setting is balanced, what is the minutes you have?  This minutes have to be longer enough, otherwise after extract refresh, server pre-compute the cache but this ‘server connection caching’ just dump the cache.  I had to change this ‘server connection caching’ to balanced 1440 mins to see the performance improvements. Please note, every change of ‘server connection caching’ setting needs server re-starting.
    5. You are not done yet. Unfortunately this ‘server connection caching’ impacts ALL server connections, includes live connections as well. When you change it to 12 or 24 hours, the million dollar question is if this is Ok for server live connections? 24 hours means that live connections will hold cache for 24 hrs as well, which maybe Ok if the source data does not change often, and may not be Ok at all for some other live connection use cases.
    6. Can you let ‘server connection caching’ setting by-pass all live connections but only apply to extracts? Unfortunately there is no such server setting options. Here are two workarounds:
      • Workaround1 : Embed all live connection views to portal, or another Tableau workbook view with refresh=yes, which means by-pass all cache. It is a little bit work for each workbook but it works. If you have a lot of live connection workbooks that data changes often, you may have to go with workaround2.
      • Workaround2 : Find a way to dump live connection workbook caches independent to server’s out of box ‘server connection caching’ setting. There is no simple way, likely you will need to work with Tableau Professional Service – they have a solution to change all live connection workbook’s last update timestamp, which will dump the cache. How it works? Whenever you publish a new version of workbook, server has to dump its cache. Behind scene, server uses workbook updated at timestamp to decide if cache should be dumped or not – an additional logic in additional to ‘server connection caching’ setting. If you can use Python to change the workbook timestamp hourly, then the workbook cache gets dumped hourly. This is not Tableau supported option since it needs to update Postgre DB directly but it works with relative low risk.
    7. Observe server CPU and memory impacts due to above changes. If high CPU or memory, increase the following threshold. If no CPU or memory impact, decrease the threshold to warm-up more workbooks:
      • Config workbook caching threshold: tabadmin set backgrounder.externalquerycachewarmup.view_threshold 2 (# of views of the workbook last 7 days / # of refresh in next 7 days)
    8. Of course, if server CPU or memory is a lot of higher above enabled this feature, you may have to turn this feature off.

 

Read more @ Tableau 10.3 and 10.4 new feature demo for admins  and download the attached workbook to decide if you should overwrite Tableau server’s default setting.

Feature Adoption – Lock Project Permission

Lock project permission is a great Tableau server feature. This blog cover the followings:

  • What is lock permission
  • Why lock permission
  • Who can lock/unlock
  • How it works for nested project
  1. What is Lock Permission?

Locked project permission means that all workbooks and data sources in this project always use the default permission from project level. Permissions for individual workbook or data source can’t be modified anymore by anyone (include publishers).

When a new workbook is published to the locked project, the workbook permission gets the default permission. When a workbook is moved to the locked project, the workbook permission gets the default permission no matter what permissions it had before.

If the project permission is managed by the owner (unlocked), and when it is locked, no matter what permissions each content had before, all content permission will be changed to the same as default from project level.

Project lock/unlock doesn’t change content’s ownership.

When the project changed from locked to unlocked, all workbooks and data source permissions remain the same, then permissions can be changed by owner/project lead/admin.

2. Why Lock Permission?

I am sure that everyone can add a few more use cases. The more common use cases are:

  • For more sensitive data – you want to make sure that all content permissions are setup correctly
  • To simplify permission process
  • Other cases. For example, if you have one project, the workbook permissions are so messed up and you want to re-do it. One way is to lock the permission, so all workbook/data source permissions will be cleaned up with one simple click. Then you can unlock it to make some additional changes from there.
  • You can’t undo the permissions when you change from ‘managed by the owner’ to locked. Pls take screenshots before change.

Lock project permission vs managed by the owner:  The key difference is if you want each publisher to change their  workbook permission or not in your project.  When it is locked, those who have publisher site role and publisher permission to your project can still publish, but they can’t change any workbook permission. All the workbook permissions will default from project level permissions you set for workbooks. So all the workbooks within the project will have exactly the same permission. If you change workbook permissions at project level, it will be applied automatically to all the workbooks/data sources in the project.

3. Who Can Lock/Unlock

Only project owner or project leader or admin can lock and unlock the project permission.

4. How It Works for Nested Project

Tableau’s first release for nested project is 10.5. I anticipate more nested project features are coming in future releases. Here is how it works for nested project as of V10.5:

  • Only top parent project can be locked
  • If parent is unlocked, child can’t be locked
  • For locked project:
    • All child projects are locked
    • All child projects have same permissions as parent project has.
    • All permission changes apply to all child projects automatically
    • Child projects can still have different owner
    • Child projects can’t have different project leaders

One big feature gap is that child project can’t be locked when parent project is unlocked. Please vote for this feature IDEA @ https://community.tableau.com/ideas/8455

Learn more? Download Understanding Tableau Server Permission Slides and watch  webinar recordings(from 11:30 mins).

Feature Adoption – Project Leader

If you feel that you did not understand what Tableau project leaders can do, you are not alone.  Hope this blog will give you some clarity on what project’s privileges are and how to leverage them to enable more self-service.

Project leader’s  privileges  for the workbooks and data sourced published to  the project and all its sub-projects:

  1. Change extract refresh schedule
  2. Modify any workbooks (web edit or re-publish)
  3. Change workbook owner
  4. Change data source owner
  5. Change data source user/password
  6. Delete workbooks
  7. Change workbook or data source permissions
  8. Move workbook from one project to  another if the user is project leader for both source and target projects.
  9. Lock or unlock project permission
  10. Certify or uncertified data sources (10.4 and above)
  11. Create or delete sub project folder(10.5 and above)

Although one project can have more than one project leader, you may not want to have too many project leaders for a project since project leader  can do almost anything for any contents within the project.

How to use Project Leader for more self-service?  A few examples:

  • Someone in your team left your the company. Unfortunately workbook owner was not transferred to others before he left. Do you have to go to site or server admins to get content ownership changed? If the project has project leaders who are likely business  people in your team, the project leaders can get the workbook ownership transferred to others in the team self-service.
  • Someone in your team is on vacation and his extract failed. You do not have to go to server or site admin either. Project leader can change the data source owner and do all necessary fixes for the data sources.
  • Your management asks an urgent change for a workbook but workbook owner is not available (vacation, etc). Project leader can change workbook owner and update the workbook, or project leader can change the workbook permission with SAVE allowed to himself, then project lead can update the workbooks (web edit & save, or re-publish)

What is difference between project owner and project leader:

  • A project can have one owner but can have multiple project leaders
  • Project owner can only be an user while project leader can be either users or groups
  • Project owner can only be site or server admin while project leader can be any site user

Does project leader have to be site role as publisher or above?

  • No. Project leader does not have to have site role as publisher although most project leaders are publishers.
  • What it means of having interactor site role as a project leader? One use case is for a large project to have something more like project admin role in your org. This project admin does not have to know Desktop or publishing but the project admin is to manage and maintain the project contents/permissions per defined process. This project admin can have project leader permission.
  • Another use case is for Separation of  Duty purpose by having interactor (who can’t publish contents) to be project leader

Read more about topic at my Webinar Summary, slides and recordings (from 40 mins).

Feature Adoption – Data-Driven Alert

Data-driven alert is a great long waiting feature of Tableau V10.3. It allows interactors to set threshold on the numeric axis of the view and receive emails only when the data exceeds (or below) the defined threshold. It also allows interactors to define how often to receive the emails when conditions are met. It works for custom view as well.  Interactors can define multiple alerts with different thresholds for the same view.  This blog explains how it works, why it is better than subscriptions and w
hat are the limitation.screenshot_2408
How Data-Driven Alerts work?

    • Numeric axis only
    • Interactors decides alerts
    • Interactors can add any site users who have email address to the alerts
    • Permissions are checked before  emails are sending NOT when users are added (opposite to subscriptions).
    • Email will not go to those recipients who  have no permission to the view although those users can still be added w/o any warning or error.
    • Pls vote IDEA https://community.tableau.com/ideas/8048
    • Can’t add group as recipients
    • Alerts can be created on saved custom views
    • Live connection –  server checks hourly(can be configured)
    • Extract – whenever refresh happens
    • When condition is true, how often emails to be sent is decided by alert owner.
    • Recipients can decide to be removed from alerts but can’t decide how often the emails will go out when conditions are met

What are the controls server admins have for data-driven alerts?

  1. Turn data-driven alerts on/off at site level (site, settings, uncheck or check data-driven alerts
  2. For live connections, decide how often server checks  alert conditions

Why Data-Driven Alerts are better?

  1. screenshot_2406It is a backgrounder process (admins can check Backgrounder Tasks for Non-Extracts : Check if Data Alert Condition is True)
  2. Less emails go out since Alert owners can compress emails when conditional are met
  3. It is a ‘push’ – every single extract completion will trigger one Check if Data Alert Condition is True backgrounder task

Why Subscriptions are not preferred?

  1. Subscriptions send out email at defined intervals, which does have a lot of convienances for some users. Tableau’s strength is interactive. The subscription is counter-interactive. 
  2. Each subscription is a simulation of user click on Tableau server unlike data-driven alerts which is a backgrounder process. User subscriptions are part of usage from http_requests table.
  3. It is nothing wrong for users to get Tableau views in their inbox. The problem is that server admins have no way to tell if the users open the emails at all. Overtime, admins can’t tell if users are actually using the Tableau server or not.

image_png

Tips and tricks to manage data-driven alerts in enterprise

  1. Limit number of subscription schedule to ‘force’ users from subscription to data-driven alert
  2. If your Tableau server has a content archiving program to archive unused workbooks, you can exclude subscription usage (historical_event_types, ‘Send Email’)
  3. Monitor your server to understand percentage of subscriptions vs. total click. It is hard to see what is the right balance but if your server subscriptions are >10% of total usage, it suggests that you have too many subscriptions.subs

SCALING TABLEAU (10/10) – ARCHITECTURE & AUTOMATION

I’d like to complete my scaling Tableau 10 serial blogs with architecture and automation topic. If you follow the tips/approaches in this scaling Tableau 10 serials and governance self-service 10 serials, you should not have any problems to deploy Tableau at enterprise with thousands of Desktop publishers on a few hundred core server cluster that supports ten thousand extracts/day, ten thousands unique active users/day with a few million clicks/month.

Architecture 

    • Prod, DR and Test : It is advisable to have 3 different env for any large Tableau deployment: Prod, DR and Test:
      • DR: During regular maintenance when Prod is down, server user traffics will be routed automatically to DR cluster. The best practice is to restore Prod to DR once a day so DR can have relative new contents. If you use extracts, it is trade-off if you want to refresh extracts on DR or not. If yes, it will be double load to your data sources, but your DR will have latest data. If not refresh DR, your DR will have one day old data during weekend prod maintenance period. If you create extracts outside Tableau and use Tableau SDK to push new extracts to server, you can easily push the extracts to both Prod and DR to keep DR data refresh.
      • Test: It is not advisable to publish all workbooks on Test instance before  Prod although it is a common transitional SDLC approach. If you do so, you are creating a lot of extra work for your publishers and server admin team. However it does not mean that you can ignore the controls and governance on prod version of workbooks. The best practice is to control and govern workbooks in different projects within the Prod instance. Then you may ask, what Test instance is for? For Tableau upgrade, OS upgrade, new drivers, new configuration file, performance test, load test, new TDC file, etc. Of course, Test can still be used to validate workbooks, permissions etc.
    • Server location: For best performance, Tableau server should be installed in the same zone of the same data center as where your data sources are. However your data sources are likely in different data centers, current Tableau server cluster does not support WAN nodes, you will have to choose one location to install your Tableau server cluster.  There are so many factors impacting workbook performance, if your West Coast server has to connect live to large data in East Coast data source, your workbook will not have a good performance. Option is to use extracts or split into two clusters – one in East Coast mainly for East Coast data sources, one is West Coast. It is always a trade-off.
    • Bare Metal vs. VM: Tableau server performs better on bare metal Windows server although VM gives you other flexibilities. For your benchmarking purpose, you can assume VM has 10-20% less efficiency vs. bare metals but there are so many other factors affecting your decision between bare metal vs. VM.
    • Server Configurations: There is no universal standard config for your backgrounders, VizQL server, Cache server, Data Engine, etc. The best approach is optimize your config based on your TabMon feedback. Here is a few common tips:
      • Get more RAM to each node, specially Cache server node
      • Make sure Primary and File Engine nodes have  enough disk for backup/restore purpose. As benchmarking, your Tableau database size should be less than  25% of disk.
      • It is Ok to keep CPU of backgrounder node at about 80% average to fully leverage your core licenses.
      • It is Ok to keep CPU of VizQL node at about 50% average
      • Install File Engine on Primary will reduce 75% backup/restore time although your Primary cores will be counted as licenses
      • Number of cores on single node should be less than 24
      • Continuously optimize config based on feedback from TabMon and other monitoring tools. 

Automation

  • Fundamental things to automate :
    • Backup: Setup backup with file auto rotation so you do not have to worry about backup disk out of space.  You should backup data, server config and logs daily. Pls find my working code @ here
    • User provisioning: Automatically sync Tableau server group and group members from company directory.
    • Extract failure alerts: Send email alerts whenever extract failed. See details here
  •  Advanced automation (Tableau has no API for those, more risk but great value. I have done all those below ):
    • Duration based extract priority : If you face some extract delays, adjust extract priority can increase 40-70% extract efficiency without adding new backgounders. The best practice is to set priority 10 for business critical extracts, priority 20 for incremental,  priority 30 for extract with duration below median average (this is 50% of all extract jobs). Priority 50 for all the rest. How to update priority? I have not seen API for it. However I just had a program to update tasks.priority directly (this is something that Tableau does not support officially but it works well). Read my blog about extracts.
    • Re-schedule extracts based on usage: One of the common problems in self-service world is that people do not bother to schedule the existing extracts when usage is less often than before. Server admin can re-schedule extracts based on usage: For example, the daily extracts should be re-scheduled to weekly if the workbook has no usage in past 2 week, weekly extracts should be re-scheduled to monthly if workbook has no usage in past 2 weeks. All those can be automated by updating tasks.priority directly although it is not officially supported approach.
    • Delete old workbooks: I have deleted 50% of workbooks on Tableau server in a few quarters. Any workbooks that have no usage  in past 90 days are deleted automatically. This policy is well received because it helps users to clean up old contents and it also help IT for disks and avoid unnecessary attentions to junk contents.  The best practice is to agree on  this policy between business and IT via governance process,  then do not provide a list of old workbooks to publishers before the deletion (to avoid unnecessary clicks). Only communicate to publishers after the workbooks are deleted. The best way is to communicate is to  send their specific .twb that got deleted  automatically by email while disregard the .tde. Publishers can always publish the workbooks again as self-service.   Use HISTORICAL_EVENTS table to identify old workbooks. I do not recommend to archive the old workbooks since it is extra work that does not have a lot of value.  Pls refer Matt Coles’ blog as start point.
    • Workbook performance alerts:  If workbook render is one of your challenges on server, you can create alerts being sent to workbook owners based on workbook render time. It is a good practice to create multi-level of warning like yellow and red warning with different threshold. Yellow alerts are warnings while red alerts are for actions. If owner did not take corrective actions during agreed period of time for red warning, a meeting should be arranged to discuss the situation. If the site admin refuses to take actions, the governance body has to make decision for agreed-upon penalty actions. The penalty can lead to site suspension. Please read more details for my performance management blog.
  •  Things should not be automated: Certain things you should not automate. For example, you may not want to automate site or project creation since sites/projects should be carefully evaluated and discussed before creation. You may not want to automate creation of Publisher site role since Publisher site role should be also under controlled. Proper training should be required before grant new Publisher.

As re-cap to scale Tableau to enterprise, there are mainly 5 areas of things to drive:  Community, Learning, Data Security, Governance and Enterprise Approach. This serials focuses more on the Enterprise Approach.  Hope is helps. I’d love to hear your tips and tricks too.

SCALING TABLEAU (9/10) – CONTROL DESKTOP UPGRADE

After your Tableau server is upgraded (let’s say from 10.0 to 10.2), you  want user’s Desktop to popup for 10.2 Desktop upgrade automatically. Read this blog.

Tableau Desktop can check for product updates and install them automatically. However, most large Tableau customers have to turn it off  (by modifying the setting for the AutoUpdateAllowed property value) since if Desktop is automatically updated to a newer version than Server, you can’t publish. For example, you can’t publish from 10.3 Desktop to 10.2 server.

What we really need is a controlled Desktop updates that Tableau team  can control when the Desktop users should be prompted for the upgrade after server upgrade.

In an attempt to achieve this, Tableau came up with control product updates for maintenance version only of Tableau Desktop. The problem is that out of box Tableau approaches works for maintenance updates only.

verssion

Tableau’s version is like this below: major.minor.maintenance.

Tableau control product updates only works for maintenance version updates, does not work for minor version updates.

I figured out  how to control your Desktop update for both minor & maintenance upgrade (i.e. from 10.0 to 10.2). It should work for major upgrade (from 9.* to 10.2) as well but I have not tested enough for it yet. This blog is a small deviation from Tableau’s out of box solution but is a big breakthrough on usability.

The  use case is that you already have control on your users’  Desktop configurations (for example,  you have built Mac installer package to update Mac plist or you have built Windows .bat to update registration of Windows), you plan to upgrade your Tableau server from 10.0 to 10.2, you want user’s Desktop to popup for 10.2 upgrade after your server is upgraded to 10.2.  It can be done by following the steps below:

  1. Create your own Tableau download server: Find an internal web server and create one Tableau download folder (let’s call it your own download server)  to host one TableauAutoUpdate.xml and the new installations packages. download folder

 

 

  • Make sure HTTPS is enabled
  • Validate download server by open browser @ https://xxx.corp.xyz.com/tableau/ to make sure that you are able to see the list of files. You will get error when click xml from browser, which is Ok.

2. Create your TableauAutoUpdate.xml from this example below:

<?xml version=”1.0″ ?>
<versions xmlns=”https://xxx.com/Tableau”>
<version hashAlg=”sha512″ latestVersion=”10200.17.0505.1445″ latestVersionPath=”” name=”10.0″ public_supported=”false” reader_supported=”false” releaseNotesVersion=”10.2.2″ showEula=”false”>
<installer hash=”86efa75ecbc40d6cf2ef4ffff18c3100f85381091e59e283f36b2d0a7a0d32e5243d62944e3ee7c8771ff39cc795099820661a49105d60e6270f682ded023669″ name=”TableauDesktop-10-2-2.pkg” size=”316511726″ type=”desktopMac”/>
<installer hash=”bb5f5ec1b52b3c3d799b42ec4f9aad39cc77b08916aba743b2bac90121215597300785152bafec5d754478e1de163eedfb33919457ad8c7ea93085f6deabff1e” name=”TableauDesktop-64bit-10-2-2.exe” size=”304921808″ type=”desktop64″/>
<version hashAlg=”sha512″ latestVersion=”10200.17.0505.1445″ latestVersionPath=”” name=”10.1″ public_supported=”false” reader_supported=”false” releaseNotesVersion=”10.2.2″ showEula=”false”>
<installer hash=”bb5f5ec1b52b3c3d799b42ec4f9aad39cc77b08916aba743b2bac90121215597300785152bafec5d754478e1de163eedfb33919457ad8c7ea93085f6deabff1e” name=”TableauDesktop-64bit-10-2-2.exe” size=”304921808″ type=”desktop64″/>
<installer hash=”86efa75ecbc40d6cf2ef4ffff18c3100f85381091e59e283f36b2d0a7a0d32e5243d62944e3ee7c8771ff39cc795099820661a49105d60e6270f682ded023669″ name=”TableauDesktop-10-2-2.pkg” size=”316511726″ type=”desktopMac”/>
<version hashAlg=”sha512″ latestVersion=”10200.17.0505.1445″ latestVersionPath=”” name=”10.2″ public_supported=”false” reader_supported=”false” releaseNotesVersion=”10.2.2″ showEula=”false”>
<installer hash=”bb5f5ec1b52b3c3d799b42ec4f9aad39cc77b08916aba743b2bac90121215597300785152bafec5d754478e1de163eedfb33919457ad8c7ea93085f6deabff1e” name=”TableauDesktop-64bit-10-2-2.exe” size=”304921808″ type=”desktop64″/>
<installer hash=”86efa75ecbc40d6cf2ef4ffff18c3100f85381091e59e283f36b2d0a7a0d32e5243d62944e3ee7c8771ff39cc795099820661a49105d60e6270f682ded023669″ name=”TableauDesktop-10-2-2.pkg” size=”316511726″ type=”desktopMac”/>
<installer hash=”bb5f5ec1b52b3c3d799b42ec4f9aad39cc77b08916aba743b2bac90121215597300785152bafec5d754478e1de163eedfb33919457ad8c7ea93085f6deabff1e” name=”TableauDesktop-64bit-10-2-2.exe” size=”304921808″ type=”desktop64″/>
</version>
</versions>

  • Notice latestVersionPath=””  This is the tricky setting so you do not have to create multiple directory within the above download folder to host download file.
  • How to create hash512? If you use Mac, open Terminal and run shasum -a 512 TableauDesktop-10-2-2.pkg  (you need to replace it with your package name)
  • Get the installer file size correctly. If you use Mac, open Terminal and run ls -l to get the file size in bytes
  • What is the LatestVersion? You need to install the target Desktop once, then you will find the  LatestVersion info from About Tableauversion
  • Name = “10.0” is the current Desktop version to be upgraded
  • public_supported=”false” or “true” – if support Tableau Public
  • reader_supported=”false” or “true” – if support Tableau Reader
  • showEula=”false” or “true” – if you want user to see & acknowledge Tableau’s standard End User License Agreement or not during installation
  • type=”desktop64″ means the installer is for Windows 64-bit
  • type=”desktopMac” means the installer is for Mac.

3. Create the installer packages (for Mac or Windows or both) and put them in the same folder as where TableauAutoUpdate.xml is.

  • Please do not put the package in any sub-directories.
  • Please make sure that the names of the installer packages are exactly the same used in TableauAutoUpdate.xml
  • If you change the name of the installer package, you will have to re-create the hash512

4. Configure user computers to point to your own Tableau download server

  • Windows: Make an entry for each product and operating system type (32-bit and 64-bit) in your environment. The following entry is for 64-bit Tableau Desktop:
    HKEY_LOCAL_MACHINE\SOFTWARE\Tableau\Tableau <version>\AutoUpdate
    Server = "xxx.corp.xyz.com/tableau/"

    For example:

    HKEY_LOCAL_MACHINE\SOFTWARE\Tableau\Tableau 10.3\AutoUpdate
    Server = "xxx.corp.xyz.com/tableau/"
    • Mac: Change the settings file for each user to list the download server. Use the defaultscommand.defaults write com.tableau.Tableau-<version> AutoUpdate.Server "xxx.corp.xyz.com/tableau/"For example:
      defaults write com.tableau.Tableau-10.2 AutoUpdate.Server "xxx.corp.xyz.com/tableau/"
  • Note:  AutoUpdate.Server “xxx.corp.xyz.com/tableau/” it does not have https in front since Tableau automatically adds https. Pls do not forget the ‘/’ at the end

5. How it works after you have done all the setups correctly?  When Desktop users launch an old version of Desktop, they will be getting the following popup automatically: user reminder

 

 

 

  • If ‘Download and install when I quit’ is selected, users can continue use Desktop, nothing happens till user close the Desktop.
  • Soon as Desktop is closed, download of the right new version Desktop will happen
  • The best piece of this is that soon as download is completed, the installation starts immediately and automatically
  • What happens if user canceled in the middle of download? No problem. Next time when Desktop is launched, the above popup will show up again
  • What if user cancel the AutoUpdates installation in the middle of installation? No problem. Next time when Desktop is launched, the above popup will show up again. Since new package is downloaded already, when user click ‘Download and install when I quit’,  it will not download again but kicks off installation right away.
  • The new package is downloaded to /Download/TableauAutoUpdate/
  • Do you need to do anything about the Release notes link? Right now, the link is https://www.tableau.com/support/releases. I’d love to config it so it can point to your own internal upgrade project link – I have not figured it out yet.

6. How to trouble shoot? Check /My Tableau Repository/Logs/log.txt.  Search for ‘AUTOUPDATE’ or/and xxx.corp.xyz.com/tableau/ to get hints why popup did not happen.

7. With AutoUpdate.Server configuration, you still need to turn off AutoUpdate.AutoUpdateAllowed.

SCALING TABLEAU (8/10) – LEVERAGE V10 FEATURES FOR ENTERPRISE

I love Tableau’s path of innovations. Tableau v10 has some most wanted new capabilities to enterprise customers. I have mentioned some of those features in my previous blogs. This blog summarizes V10 enterprise features:

  1. Set Extract Priority Based on Extract Duration.  

This is a very powerful v10 feature for server admin although it is not mentioned enough in Tableau community yet.   What this feature does is for the full extracts in the same priority  to run in order from shortest to longest based on their “last” run duration.

The benefit is to that smaller extracts do not have to wait for long time for big ones to finish. Tableau server will execute the smaller ones first so overall waiting time will be reduced during peak hours.

What server admin have to do to leverage this feature?

  • By default, this feature is off. Server admin has to turn it on. It is not site specific. Once it is on, it applies for all sites. Simplify run the following tabadmin to turn it on:
  •  tabadmin set backgrounder.sort_jobs_by_run_time_history_observable_hours  36
  • Please ready my blog  and Tableau doc for details.

2. Revision History and Version Control

Tableau released one of the most wanted server features – version control and revision history in V9.3. Then this feature is  much more enhanced in V10 with previewing old workbook,  one click restoring, and maximum revisions setting:

  • The workbook previewing and restoring features are so convenience for publishers.
  • The maximum revision setting is so cool for server admin who can actually control the server space usage so you do not have to run out of storage while enabling revision history.

How to deploy those features on server?

  • Turn it on: By default, Revision History is off. It can be turned on site by site. To turn it on, go to site Setting, General and select  “Save a history of revisions“.  If you are on V10, you have two choices of Unlimited and # of revisions. Unlimited means that there is no limit on the max version history, which you probably do not want to have. As a server admin, you always want to make sure that your server will not run out of space. You will find # of revision is a very handy feature so admins can have some peace of mind about server storage.Screen Shot 2016-11-27 at 3.27.57 PM
  • Decide the max revision you want to have which is site specific – it means that you can set diff max revisions for diff sites.
  • How to decide the max revisions to keep? How to find out extra server space for revisions?  Pls read my blog 

3. Cross database Joins and Cross Database Filter

X-DB joins and X-data source filters are two  most requested features by user community. Those are two different but related things.

X-DB joins allows two or more separate data sources to join together in row level. There are still some constraints on which kinds of data sources can be joined in V10 while Tableau plans to extend more in coming releases: V10 only allows extract to be primary data source while joins w other database and does not allow two extracts to join together yet.

What X-DB joins means for server admin?

  • Knowing that server admin has no control for x-db joins. It is totally controlled by publishers. This feature is enabled out of box and server admin  can’t turn it off – hopefully you never need to.
  • Watch server performance. A lot of x-db join activities happen on Tableau server. I was little skeptical about this feature that server admin does not have any control or visibility.  On the other side,  I have not uncounted any issues either after my v10 server upgrade since Nov 2016.
  •  From publisher perspective, the x-db joins can be slow if joins two large datasets.

What is cross database filter?

Use case example: Let’s say you’re connected to multiple data sources, each with common dimensions like Date or Product. And as part of your analysis, you want to have a single filter apply across all the sources.  That’s where this new feature comes in. Any time you have data sets that share a common dimension, you can filter across the data sets.  A few things to know about cross database filter

  • It is not x-db join but more like blending  where you can manage relationship to edit the blending from connected sources
  • You can only filter data across multiple primary data sources.You cannot filter data across secondary data sources.

4. Desktop License Reporting

Enable Desktop License Reporting is included in V10. This is an awesome feature to track Desktop usage even Desktop users do not publish. Pls see details about this @http://enterprisetableau.com/licensing/

The challenge to leverage this feature is how to change each user’s laptop to make the initially configuration. Here is what you need to know:

  • It work only if both Desktop and Server are on v10.
  • This feature is turned off on server by default, you can turn it on  using tabadmin
    tabadmin set features.DesktopReporting true
    tabadmin config
    tabadmin restart
  • The most difficult part is to update Windows Desktop’s registry or Mac Desktop’s plist to point to the Tableau server where you want license usage to be sent to. Best way is  to have Desktop v10 installer. Pls ref my previous blog for details.
  • You should have all company’s Desktop pointing to one Tableau server even Desktop users publish to different servers. This way you will have one place to see all enterprise Desktop usage.
  • By default, Tableau Desktop v10+ will ping Tableau server v10+ for usage reporting every 8 hrs. You can configure intervals on  Desktop.  It is controlled by plist of the Mac or registry of Windows. It is not tabadmin option. See here.

5. Subscribe Others

Finally Tableau delivered this long asking feature in V10. A few things to know:

  • This feature has to be enabled at site level
  • You can create custom email from address for each site. This is handy since users who received the subscription emails may not want to connect server admin rather site admin for questions.
  • Only workbook owners can subscribe others
  • The user has to have an email address in the Account Settings, otherwise subscribe others will not be highlighted.  If a lot of users do not have email address on Tableau server, you may have to mass update all users with valid email address before this feature can really be enabled.
  • You can’t subscribe to groups but users only. If you really want to subscribe group, one workaround is to create dummy user, then give group email to this dummy user.
  • You can’t subscribe to users who are not valid users of the site
  • You can’t subscribe to users who do not have permission to view the workbooks or views
  • The users who are subscribed can click ‘Manager my subscriptions’ link at the bottom of the subscribed emails to de-subscribe anytime.
  • Users can always subscribe themselves if they have view permission to the workbooks or views.

6. Device Specific Dashboard Layout 

After you’ve built a dashboard you can create layouts for it that are specific to particular phone or tablet devices. It will be the same URL but Tableau will render different layout depends on devices used to access the server.

Most of users (specially executive users) use phones to view information. This is great feature to drive Tableau enterprise adoption. A few notes:

  • It is enabled out of the box. There is no server or site level setting to enable or disable this feature.
  • When publish the dashboards, make sure to clear the option ‘ Show Sheets as Tabs’. Otherwise this feature does not work
  • This feature works for Tableau Apps and it also works for mobile devices that do not have Tableau Apps installed.
  • The best practice is to remove some views from default layout so mobile device layout will have fewer views than default layout

What are the design tips:

  • Ask yourself: What key information does my end user need from my dashboard?
  • Click “device preview” to confirm how your dashboard looks across different devices.
  • (For small screens) Remove unnecessary views, filters, titles, and legends.
  • (For small screens) Determine if you need a scrollable dashboard (fit width). If so, stack dashboard objects and use a “peek.”
  • (On touch devices) On scrollable dashboards, pin your maps, and disable pan and zoom.

With device designer, you’ll rest assured knowing your data stands out with optimized dashboards on any device!

6. Dataa Source Analytics

Data source management has been brought into line with Workbooks, so that we now have revision history, usage information and users can have favourite data sources.

You can also change the view for data sources so that you can see them grouped by where they connect to, instead of the data source name.

Tableau has yet to come up with data source lineage features announced in TC16 Austin  – from data source column to tell which workbooks use so you can do impact analysis when data source changes, or from workbooks to tell which data source table or/and columns for us to tell potential duplicated data sources. I am expecting those big new features in 2017.

7. Site Specific SAML

If using SAML authentication, you can make this site specific, instead of for the whole server.  This means that some sites on your Tableau Server can use SAML for single sign on, whilst others will just use normal authentication.

I know that it takes months for enterprise customers to leverage some of those new features. Hope this blog helps. Pls feel free to post your tips and tricks of implementing those features.

SCALING TABLEAU (7/10) – UNDERSTAND SERVER PERMISSIONS

When I think about Tableau permissions, I have two words:

  • Robust –  Tableau’s permission features are very comprehensive and robust. Definitely enterprise grade.
  • Confusion – On the other side, Tableau’s permission is kind of confusing  since it has too many different variables to set permissions.

To understand permissions, let’s start by looking into structures within Tableau server. A server consists of multiple sites (ref Tableau site blog for details). From permission perspective,  one important thing to know is that there is absolutely no ‘communication’ between sites. Nothing can be shared across sites.

Within each site, there are projects. Within project, there are workbooks and data sources, each workbook can have multiple views. Within each site, there are users and group. Sites are partitions or compartmented containers.

site structure

 

 

 

 

 

 

If you think projcet/workbooks are containers, permission is to assign users & groups into containers. Permissions are at all levels: site, project, workbooks, data sources, views. Let’s look into each of those.

1. Site Role

Tableau has many site roles but most common used ones are publisher, interactor in additional to admin.

site role

 

 

 

What is site role and how it works?

  • Site role is site specific. One user can have publisher site role in default but can have interactor site role in other site.
  • Site role can be granted by server admins and maybe site admins if the site site admin is allowed to manage users (site level setting).
  • Site role is ceiling as maximum permissions the user can have  for the site.
  • Interactor site role can never publish even with publisher permission at project level. Now you may start to see confusion part of Tableau permission.
  • Interactor site role can’t save or save as web editing even with “save” allowed at workbook level.
  • Site role does not define what a user can and can’t do at project,  workbook, or data source level. You can think site role as people’s legal right to work in US, while publish permission at project level is employer’s job offer.  Although you have to have legal rights to work in US, it does not mean that you can work for a company unless
    have a job offer from that company.  On the other side, even company gives an offer, Iy will not allowed to work if I do not have legal right to work in US.
  • You can check your own site role at ‘My Account Settings’  but you can’t check other’s site role.

2. Project Level Permissions

Project level permission  deals with who can view, publish, and manager the project. When you click project name, then permission , you will see the project permissions. You can set project roles (Publisher, Viewer and Project Leader) permission, you can also set workbook and data source permission here which will be as default permissions when workbooks or data sources are published to the project.project_permission

  • Publisher:  This is different from site role ‘Publisher’. Project publisher role defines if the  group or user can publish to this project. It is independent from site role ‘publisher’. Site role ‘Interactor’ can still have publisher permission at project level although it does not matter since site role ‘Interactor’ can’t publish to anywhere.
  • Project Leader: 
    • Can set permission to all items in the project
    • Can change refresh schedule: this can be a very handy feature if someone is on vacation and his workbook refresh schedule has to be changed.
    • Can change workbook or data source owner: This is great feature that project leader should do when someone leaving the team or company.
    • Can lock the project permission
  • Lock project permission vs managed by the owner:  The key difference is if you want each publisher to change their  workbook permission or not in your project.  When it is locked, those who have publisher site role and publisher permission to your project can still publish, but they can’t change any workbook permission. All the workbook permissions will default from project level permissions you set for workbooks. So all the workbooks within the project will have exactly the same permission. If you change workbook permissions at project level, it will be applied automatically to all the workbooks/data sources in the project. 
  • When to lock project permission? 
    • For more sensitive contents that you want to make sure permissions can’t be deviated
    • For simplifying permission purpose.
    • Other cases. For example, if you have one project, the workbook permissions are so messed and you want to re-do it. One way is to lock the permission, so all workbook/data source permissions will be cleared up with one simple click. Then you can unlock it to make some additional changes from there.
    • You can’t undo the permissions when you change from ‘managed by the owner’ to locked. Pls take screenshots before change

3. Workbook Level Permissions 

Workbook level has 14 different capacities that you can set independently. To simplify the process, Tableau out of box comes with a few templates (viewer, interactor, editor, none or Denied). When you make any modification to any of those templates, it will be called custom.workbook_permission

  • Download: Tableau workbook has 4 different download controls: Download image/pdf, download summary data, data load full data or download workbook (download workbook/save as is the combined capability).
  • Shared customized: There is shared customized and web edit. Customized view feature comes with filter. If user has filter permission, user can change view filter and can save the preferred filter as customized views, and can even make one of the customized views as default view to this user which is very handy specially for slower views. The shared customized controls if you want user to share his or her shared customized views to all other users who have access to the same views.
  • Web edit: Customized view is different from web edit. Customized view only allows filter type of change while web edit allows change the whole design of the view (like chart type, new calculations, new fields, etc).
  •  Download Workbook/Save As: Download will be highlighted with this permissions. However save as is considered publishing activities. If the user has site role as ‘Interactor’, the user can’t web edit save as or publish from Desktop even this Download Workbook/Save As is allowed at workbook.
  • Save: Save means to trust others to overwrite your workbooks.“Save” feature must know:
    • It works for both Desktop and Web Edit
    • The new user who ‘save’ will become new owner of workbook since workbook only has one owner at any given time
    • What about previous owner’s permission? The new owner can give previous owner any permission or no permission at all for the ‘managed by owner’ project
    • Revision history will create a new workbook revision if it is tuned on
    • “Save” button doesn’t appear except for owners. If you are not the content owner, the “Save As” button will appear. Type same name to overwrite a report, you will be asked to confirm overwriting, then “Save” button will appear.

4. How web edit, save as and save permissions work together

First, does the user have Web Edit permissions on the workbook. If no, then no “Edit” button appears.

Next, does the user have permissions to Publish on the Site. If not,  that user won’t get Save / Save As buttons even if you’ve granted correct Download / Web Save As permissions on the Workbook.

Also, does the user have workbook-level permissions to Download/Web Save As. If not, then No Save / Save As buttons for that workbook.

Finally, which Project can a user save? If you haven’t granted a user permissions to save into a particular project, then it doesn’t matter if all the other permissions are set correctly because the user doesn’t have any place to store their changes. If user has permission to publish to multiple projects, user will have choices of which project to save as.

web edit

 

 

 

 

 

5. Set data source permissions

When you publish a workbook, you can option to publish the data source separately from workbook. Then the published data sources  become reusable for more workbooks, one refresh schedule to update all its connected workbook at the same time so it becomes SSOT and of course less loads to data sources.

When you publish a workbook that connects to a Tableau Server data source, rather than setting the credentials to access the underlying data, you set whether the workbook can access the published data source it connects to.

If you select to prompt users, a user who opens the workbook must have View and Connect permissions on the data source to see the data. If you select embed password, users can see the information in the workbook even if they don’t have View or Connect permissions.

To simplify permission settings: When publish workbooks, select ‘Embedded password’ for published data sources it connects to:

  • When publish workbook, select ‘Embedded password’
  • Only give publisher group ’Connect’ permission at data source level
  • Do nothing for end consumer (‘interactor’) group at data source level

If you select ‘Prompt user’ data auth during workbook publishing while using published data source, a user who opens the workbook must have View and Connect permissions to the data source to see the data. Here is tricky part: You want to make sure that you do not give ‘interactor’ data source level ‘connect’ permission but give ‘interactor’ project level data source ‘connect’ permission. The correct setup is as followings:

  • For interactor group only:
    • Connect permission at project level
    • ‘unspecified’ at data source level
  •  For publisher group only:
    • Connect permission at data source level

The reason is that if you give  interactor group data source level ‘connect’ permission, they will be able to connect to the published data source if they have Desktop, which can potentially by-pass the filters or row level security setup in workbook or published data source. When user has project level data source ‘connect’ permission, the user is not able to connect via Desktop but is able to connect via workbook only. I could not find clear Tableau documentation for this but my test results in v9 and v10 confirmed this setting.

6. Set view permissions

When the workbook is saved without tabs, the default permissions are applied to the workbook and views, but view permissions can then be edited. Permissions for views in workbooks are inherited from the workbook permissions. If a user selects “Show sheets as tabs” when publishing a workbook from Tableau Desktop or saving it on Tableau Server, the workbook permissions override the permissions on individual views anyway.

Best practice is not to get view level permission at all.

6. Summary of best practices:

  • Permission groups, not users
  • Lock project permissions if possible
  • For owner managed projects, permission workbooks, not views
  • Assign project leaders
  • Plan your permissions
  • Use published data sources and ‘Embedded password’ when publish workbook
  • Apply additional row level security
  • Test permissions out
  • Continual reviews

See slides for more details….

SCALING TABLEAU (6/10) – ROW LEVEL SECURITY

Data security has been one of the top concerns for Tableau enterprise adoption. Tableau handles data security by permission and row level security. Permission controls what workbooks/views an user can see. Row level security controls what data sets this user can see. For example APAC users see APAC sales, EMEA users see EMEA sales only while both APAC and EMEA users have the same permission to the same workbook.

Does Tableau row level security works with extracts? Yes. This blog provides everything you need to know to create row level security controls for extracts and live connections, includes a new approach leveraging V10 x-db join features.

Use case : To create one workbook that server users can see subset of the data based on their Region (Central, East, South and West) and segments (Consumer, Corporate and Home Office) they are assigned to.

Solution A – Workbook filter for Row Level Security by Group

  1. Create following 12 Tableau server groups (Central-Consumer, Central-Corporate, Central-HomeOffice, East-Consumer, East-Corporate, East-HomeOffice,….). Central-Consumer group has all the Central region users who are assigned to Consumer segment….
  2.  Create calculated field
    ISMEMBEROF(‘Central-Consumer’) AND [Region] = ‘Central’ AND [Segment] = ‘Consumer’ OR
    ISMEMBEROF(‘Central-Coporate’) AND [Region] = ‘Central’ AND [Segment] = ‘Coporate’ OR
    ISMEMBEROF(‘Central-HomeOffice’) AND [Region] = ‘Central’ AND [Segment] = ‘HomeOffice’ OR
    ISMEMBEROF(‘West-Consumer’) AND [Region] = ‘West’ AND [Segment] = ‘Consumer’ OR
    ISMEMBEROF(‘West-Coporate’) AND [Region] = ‘West’ AND [Segment] = ‘Coporate’ OR
    ISMEMBEROF(‘West-HomeOffice’) AND [Region] = ‘West’ AND [Segment] = ‘HomeOffice’ OR
    ISMEMBEROF(‘East-Consumer’) AND [Region] = ‘East’ AND [Segment] = ‘Consumer’ OR
    ISMEMBEROF(‘East-Coporate’) AND [Region] = ‘East’ AND [Segment] = ‘Coporate’ OR
    ISMEMBEROF(‘East-HomeOffice’) AND [Region] = ‘East’ AND [Segment] = ‘HomeOffice’ OR
    ISMEMBEROF(‘South-Consumer’) AND [Region] = ‘South’ AND [Segment] = ‘Consumer’ OR
    ISMEMBEROF(‘South-Coporate’) AND [Region] = ‘South’ AND [Segment] = ‘Coporate’ OR
    ISMEMBEROF(‘South-HomeOffice’) AND [Region] = ‘South’ AND [Segment] = ‘HomeOffice’
  3. Add the calculated field to filter and select ‘true’
  4. After publish the workbook, set interactor permission to all the above 12 groups.
  5. Make sure Web Editing as No, Download as No.

That is all. ISMEMBEROF returns true if server current user is member of given group. ISMEMBEROF is the key function to use here. It  works for both extracts and live connection.

Notice that the control is a workbook filter. If workbook is downloaded, filter can be changed so the row level security will not work anymore, which is why workbook permission has to set download permission as No.

The better solution is to use data source filter for ISMEMBEROF calculation instead of workbook filter

Solution B – Data Source Filter for Row Level Security by Group

  1. You have the groups and calculated field from Solution A step 1 and step 2
  2. Edit data source filters to include the calculated field and select ‘true’pds
  3. Publish the data sources and set connect only permission (no edit)
  4. After publish the workbook, set permission to all the above 12 groups. There is no need to put the above calculated field to workbook filter anymore since filter is at data source level now.

Published data sources are reusable, single source of truth, less loads to data sources and now you have governed row level security built-in.

The Solution B works extracts. The only thing is that it is little tricky during workbook development process where you will need to make local extract local copy to simulate the user behavior from Desktop, and replace data sources from local to server published data source before publish the workbook, you will need to copy & paste all calculations. Pls reference manual fast way  or a hacky way.

The above approaches control user’s visibility of data sets by Tableau server groups.  It assumes that you will manage the group members outside Tableau. When have too many data security groups to manage manaually,  you can automate the group member creation by using Server REST API or your corp directory automation tool.

When group approach in Solution A & B can’t scale, the following USERNAME() approach will be another good option.

Solution C – Entitlement table x-db join for Row Level Security

Same use case but you want to add category as dimension for row level security in additional to Region and Segment. Now you will need 100+ groups just for row level security purpose which can be a lot to manage.  We are going to use Tableau’s USERNAME() function which returns current server user name. It does not use group anymore but assume that you will have separate user entitlement table below.

UserName Region Segment Category
U123 East Comsumer Furniture
U456 East Comsumer Office Supplier

This ser entitlement table can be Excel or separate database table. We can use V10’s cross database join feature for row level security:

  1. Create cross-db join between main datasource (like extract, MySQL) and use entitlement Excel
  2. Create calculated field
    USERNAME() = [UserName]
  3. If you use workbook filter, just add this calculated field into filter and set ‘true’ – the same as Solution A
  4. Or you use published data source, just edit data source filters to include the calculated field and select ‘true’ – the same as Solution B.
  5. You are done

The USERNAME() will return the server current user name. While [UserName] is the user name column of your use entitlement excel which can be a database table.

Please note: The current version of Tableau v10 does not support x-db joins between two extracts although it does support  x-db joins between an extract and excel (or some selective database). So if your primary data source is an extract, your use entitlement table can’t be  extract anymore.

In additional to ISMEMBEROF, the  USERNAME() is another great Tableau server function for row level security.  V10 x-db join feature extends USERNAME()’s use case a lot of more now since you can create your own use entitlement table outside your main database for agility and self-service.

When use entitlement table is in the same database as main FACT table, you may want to use database’ native join feature for row level security :

Solution D – Query Banding or Initial SQL for Row Level Security

For database (like TeraData) support query band, enter query banding:

 

  • ProxyUser = B_<ProxyUser>
  • TableauMode=<TableauMode>
  • TableauApp=<TableauApp>
  • Tableau Version=<TableauVersion>
  • WorkbookName=Name of DataSource

For database( Vertica, Oracle, SQL Server, Sybase ASE, Redshift, and Greenplum, etc)  support Initial SQL:

    • [TableauServerUser] returns the current Tableau Server user’s username only.
    • [TableauServerUserFull]
    • [TableauApp]
    • [WorkbookName}

As summary, ISMEMBEROF and  USERNAME() are two Tableau functions for row level security:

  • ISMEMBEROF returns true if server current user is member of given group. It needs server groups to be setup.
  • USERNAME() returns server current user name. It needs entitlement table. V10 x-db joins allows the entitlement table to be outside main data source.
  • Both can be implemented as Data Source Filter or workbook filter.
  • Both work for extracts and live connections.

Although USERNAME() returns server current user name, it does not pass the current user name to live connected datasource outside Tableau server.  In order to pass the server current user name to data source, you will have to use query banding or initial SQL depends on database you use. Query banding or initial SQL works only for live connections and does not work for extracts.

Do you still want to know more?  Click here.

SCALING TABLEAU (4/10) – USE SITES

Tableau server has a multi-tenancy feature called “sites” which can be leveraged by enterprise customers for better scalability, better security and advanced self-service.

This blog covers following areas about Tableau sites:

  • Basic concepts
  • Common use cases
  • Governance processes and settings
  • When should not create a new site

1. Basic concepts about Tableau sites

Let’s start with some basic concepts. Understanding those basic concepts will provide better clarity, avoid confusions, and reduce hesitations to leverage sites.

Sites are partitions or compartmented containers. There is absolutely no ‘communication’ between sites. Nothing can be shared across sites.

Site admin has unrestricted access to the contents on the specific site that he or she owns. Site admin can manage projects, workbooks, and data connections. Site admin can add users, groups, assign site roles and site membership. Site admins can monitor pretty much everything within the site: traffic to views, traffic to data sources, background tasks, space, etc. Site admin can manage extract refresh scheduling, etc.

One user can be assigned roles into multiple sites. The user can be site admin for site A and can also have any roles in site B independently. For example, Joe, as a site admin for site A, can be added as a user to site B as admin role (or Interactor role). However Joe can’t transfer workbooks, views, users, data connections, users groups, or anything between site A and site B sites. When Joe login Tableau, Joe has choice of site A or B: When Joe selects site A, Joe can see everything in site A but Joe can’t see anything in site B – It is not possible for Joe to assign site A’s workbook/view to any users or user groups in site B.

All sites are equal from security perspective. There is no concept of super site or site hierarchy. You can think of a site is an individual virtual server.  Site is opposite of ‘sharing’.

Is it possible to share anything across sites? The answer is no for site admins or any other users. However if you are a creative server admin, you can write scripts run on server level to break this rule. For example, server admin can use tabcmd to copy extracts from site A to site B although this goes to the areas where Tableau does not support anymore officially.

2. Common use case of Tableau sites. 

  • If your Tableau server is an enterprise server for multiple business units (fin, sales, marketing, etc), fin does not wants sales to see fin contents, create sites for each business unit so one business unit site admin will not be able to see other business unit’s data or contents.
  • If your Tableau server is an enterprise platform and you want to provide a governed self-service to business. Site approach (business as site admin and IT as server admin) will provide the maximum flexibility to the business while IT can still hold business site admins accounted for everything within his or her sites.
  • If your server deals with some external partners, you do not want one partner to see other partner’s contents at all. You can create one site for each partner. This will also avoid potential mistakes of assigning partner A user to partner B site.
  • If you have some very sensitive data or contents (like internal auditing data), a separate site will make much better data security control – from development phase to production.
  • Using sites as Separation of Duties (SoD) strategy to prevent fraud or some potential conflicting of interests for some powerful business site admins.
  • You just have too many publishers on your server that you want to distribute some admin work to those who are closer to the publishers for agility reasons.

Arguably, you can achieve all of those above by using Projects w/o using sites. Why sites again?  First, Sites just make things easier for large Tableau server deployment. Many out of box server admin views go by site. So it will be easier to know each BU’s usage if you have site by BU. Second, if you have a few super knowledgable business users, you can empower them better when you grant them site admin access.  

3. Governance processes around Tableau sites.

Thoughtful site management approaches, clearly defined roles and responsibilities, documented request and approval process and naming conversions have to be planned ahead before you go with site strategy to avoid potential chaos later on. Here is the checklist:

    • Site structure: How do you want to segment a server to multiple sites? Should site follow organization or business structure? There is no right or wrong answer here. However you do want to think and plan ahead.
    • How many sites you should have? It completely depends on your use cases, data sources, user base, levels of controls you want to have. As a rule of thumb, I will argue anyone who plans to create more than 50 sites on a server would be too many sites although I know a very large corporation has about 300 sites that work well for them. I will prefer to have  less than 20 sites.
    • Who should be the site admin? Either IT or business users (or both) can be site admins. One site can have more than one admin. One person can admin multiple sites as well. When a new site is created, server admin normally just adds one user as site admin who can add others as site admins.
    • What controls are at site level? All the following controls can be checked or unchecked at site level:
      • Storage limitation
      • Revision history on or off and max numbers of revisions
      • Allow the site to have web authoring. When web authoring is on, it does not mean that all views within the site are web editable. The workbook/view level has to be set web editing allowed by specific users or user groups before the end user can have web editing.
      • Allow subscriptions. Each site can have one ‘email from address’ to send out subscriptions from that site.
      • Record workbook performance key events metrics
      • Create offline snapshots of favorites for iOS users.
      • Site-specific SAML with local authentication
      • Language and locale
    • What privileges server admin should give to site admins? Server admin can give all the above controls to site admin when the site is created. Server admin can change those site level settings as well. Server admin can even take back those privileges at anytime from site admin.
    • What is new site creation process? I have new site request questionnaires that requester has to answer (see below). The answers help server and governance team to understand the use cases, data sources, user base, and data governance requirements to decide if their use cases fit Tableau server or not, if they should share an existing site or a new site should be created. The key criteria are if same data sources exist in other site, if the user base overlaps with other site. It is balance between duplication of work vs. flexibility.
    • What is site request questionnaires?
      • Does your bigger team have an existing Tableau site already on Tableau server? If yes, you can use the existing site. Please contact the site admin who may need to create a project within the existing site for your team. List of existing sites and admins can be found @……. 
      • Who is the primary business / application contact?
      • What business process / group does this application represent? (like sales, finance, etc)?
      • Briefly describe the purpose and value of the application
      • Do you have an IT contact for your group for this application? Who is it?
      • What are the data sources?
      • Are there any sensitive data to be reporting on? If yes, pls describe the data source
      • Are there any private data as part of source data? (like HR data, sensitive finance data)
      • Who are the audiences of the reports? How many audiences do you anticipate? Are there any partners who will access the data
      • Does the source data have more than one Geo data? If yes, what is the plan for data level security?
      • What are the primary data elements / measures to be reporting on (e.g. booking, revenue, customer cases, expenses, etc)
      • What will be the dimensions by which the measure will be shown (e.g. Geo, product, calendar, etc)
      • How often the source data needs to be refreshed?
      • What is anticipated volume of source data? How many quarters of data? Roughly how many rows of the data? Roughly how many columns of the data?
      • Is the data available in enterprise data warehouse?
      • Are the similar reports available in other existing reporting platform already?
      • How many publishers for this application?

4. When should not create a new site?

  • If the requested site will use the same data sources as one of the existing sites, you may want to create a project within the existing site to avoid potential duplicate extracts (or live connections) running against the same source database.
  • If the requested site overlaps end users a lot with one existing site, you may want to create a project within the existing site to avoid duplicating user maintenance works.
  • The requester does not know that his or her bigger team has a site site

As a summary, Tableau site is a great feature for large Tableau server implementations. Sites can be very useful to segment data and contents, distribute admin work, empower business for self-service, etc. However site misuse can create a lot extract work or even chaos later on. Thoughtful site strategy and governance process have to be developed before you start to implement sites although the process evolves toward its maturity as you go.

SCALING TABLEAU (3/10) – USE PUBLISHED DATA SOURCES

Tableau helps us to see and understand our data which is great. A lot of great things are happening every day when creative analysts have powerful Tableau Desktop  with unlocked enterprise source data  and Tableau server collaboration environment.

As Tableau adoption goes from teams to BU to enterprise, you quickly run into scalability challenges : Extract delays and enterprise data warehouse (EDW) struggles to meet ad-hoc workloads, etc.

My last blog talks about setting extract priority on server to improve 50% extract efficiency. This blog will focus on best practices for data source connection to scale EDW & server – use published data sources.

  1. What is Tableau published data source?

It is nothing but Tableau’s semantic layer. For those who have been in BI space for a while, you may be familiar with Oracle BI’s Repository or Business Objects’ Universe. The problem of   Repository or Universe is that they are too complex and are designed for specially trained IT professions only. Tableau is a new tool designed for business analysts who do not have to know SQL.  Tableau has much simplified semantic layer. Tableau community has never focused enough on published data sources till recent when people start to realize that leveraging  published data source is not only a great best practice but almost must to have in scaling Tableau to enterprise.

screenshot_941

 

 

 

 

 

 

2.  Again, what makes up Tableau published data source?

  • Information about how to access or refresh the data:  server name & credentials, Excel path, etc.
  • The data connection information:  table joins,  field friendly names, etc
  • Customization and cleanup : calculations, sets, groups, bins, and parameters; define any custom field formatting; hide unused fields; and so on.

3. Why Tableau published data source?

  • Reusable: Published data sources are reusable connections to data. When you prep your data, add calculations, and make other changes to your fields, these changes are all captured in your data source. Then when you publish the data source, other people can use it to conduct their own analysis.
  • Single source of truth (SSoT): You can have data steward who defines the data model while workbook publishers who can consume the publish data source to create viz and analysis.  Here is an example of how to set up permission to achieve SSoT.

screenshot_943

  • Less workload to EDW: When you use extracts , one refresh of the published data source will refresh all data to its connected workbooks, which reduces a lot workloads to your EDW. This can be a very big deal to your EDW.

screenshot_945

 

 

 

4. How many data sources are embedded vs published data sources? You can find it out from Data_Connections table. Look for the DBCLASS column, when value = ‘sqlproxy’, it means that it is a published data source.  Work with your server admin if you do not have access to workgroup table of Tableau Postgre  database.

If you have <20% data sources are published data sources, it means that published data sources is not well leveraged yet in your org or BU.

5. How to encourage people to use published data sources?

  • Control who can access to EDW: Let’s say you have a team of 10  Desktop users, you may want to give 2 of them the EDW access so you do not have to train all 10 people about table structure details  while have the rest of 8 people to use published data sources created by the two data stewards.
  • If extracts are used, you can create higher priority to all published data sources as incentive for people to use published data sources. See my previous blog for details.
  • Make sure people know the version control feature works for data source as well
  • As data stewards,  add comments to columns – here is how comment looks like when Screen Shot 2016-12-10 at 5.31.24 PMmouse over in Desktop Data pan:

 

 

 

Here is how to add comments:Tableaucomments1

 

 

 

Conclusions: Published data sources are  not new Tableau feature but are not widely used  although they are reusable, SSoT, scalable, less workload to your DB server. Tableau has been improving its publishing workflow by making data source publishing much easier than before since 9.3. Tableau v10 even gives you a new option to publish your data sources separately or not during workbook publish workflow. Data source revision history is great feature to control data source version.  Tableau has announced big roadmap about data governance in TC16. However self-service practitioners do not have to wait any new Tableau features in order to leverage the published data sources.

Scaling Tableau (2/10) – Set Extract Priority Based on Duration

Are you facing the situation that your Tableau server backgrounder jobs have much longer delay during peak hours ?

There are many good reasons why extracts are scheduled at peak hours, likely right after nightly ETL completions or even triggered automatically by ETL completions.

You always have limited backgrounders that you can have on your server. How to cut average extract delay without adding extract backgrounders and without rescheduling any extract jobs?

The keyword is job PRIORITY. There are some good priority suggestions in community(like https://community.tableau.com/thread/152689).  However what I found the most effective approach to prioritize the extracts was duration based priority in additional to business criticality – I managed to reduce  50% extract waiting time  after increased priority for all extracts with duration below median average runtime.

Here is what I will recommend as extract priority best practices:

  1. Priority 10 for any business critical extracts :  Hope nobody will disagree with me to give highest priority to business critical extracts..
  2. Priority 20 for all incremental extracts : Not only normally incremental takes less time than full, but also it is an awesome incentive to encourage more and more people use incremental extracts
  3. Priority 30 for any  extracts with duration below median average (this is 50% of all extract jobs). This  is another great incentive for publishers to make their extracts more effective.  It is the responsibilities of both server admin and publishers to make backgrounder jobs more effective. There are many things that publishers can do to improve the extract efficiency : tune extracts to be more efficient, use incremental vs full extracts, hide unused columns, add extract filters to pull less data,  reduce extract frequency, schedule extracts to off-peak hours, or better run extracts outside of Tableau by using Tableau SDK (see my blog @http://enterprisetableau.com/sdk/), etc.
  4. Priority 50 for all the rest (default)
  5. Turn on tabadmin set backgrounder.sort_jobs_by_run_time_history_observable_hours  36 which will prioritize full extracts in the same priority  to run in order from shortest to longest based on their “last” run duration.

The combination of #3 and #5 will reduce your extract waiting time dramatically during peak hours.

What is this backgrounder sort by run time option (#5 above)? I am sure that you want to read official Tableau online help here.

In short, Tableau server can sort full extract refresh jobs with the same priority (like 50) so they are executed based on the duration of their “last run,” executing the fastest full extract refresh jobs first.

The “last run” duration of a particular job is determined from a random sample of a single instance of the full extract refresh job in last <n> hours which you can config.  By default this sorting is disabled (-1). If enabling this, Tableau’s suggested value is 36 (hours)

Let’s say that you have the following jobs scheduled at 5am, here is how extracts are prioritized:

Priority Job Name Duration (min) Background priority with sort_jobs_by_run_time option ON Background priority with sort_jobs_by_run_time option OFF
10 Job 10.1 2 1 Those 4 jobs will go fist one by one w/o any priority among them. i.e. it could be Job 10.4, then Job 10.2, then Job 10.3 and Job 10.1
Job 10.2 3 2
Job 10.3 9 3
Job 10.4 15 4
20 Job 20.1 1 5 Those 4 jobs will go one by one after all pririty 10 jobs. Again no priority among them.
Job 20.2 2 6
Job 20.3 14 7
Job 20.4 15 8
30 Job 30.1 2 9 Those 4 jobs will go one by one after all pririty 20 jobs. Again no priority among them.
Job 30.2 3 10
Job 30.3 5 11
Job 30.4 8 12
50 Job 50.1 1 13 Those 10 jobs will go one by one after all pririty 30 jobs. Again no priority among them.
Job 50.2 9 14
Job 50.3 20 15
Job 50.4 25 16
Job 50.5 30 17
Job 50.6 50 18
Job 50.7 55 19
Job 50.8 60 20
Job 50.9 70 21
Job 50.10 80 22

For example, the max waiting time for the 1 min Job 20.1 will be 29 mins (all priority 10 jobs) with sort_jobs_by_run_time option ON. However the max waiting time could be 60 min with sort_jobs_by_run_time option OFF (all priority 10 jobs + other priority 2 jobs).

Re-cap on how extracts are run in this order:

  1. Any task already in process is completed first.
  2. Any task that you initiate manually using Run now starts when the next backgrounder process becomes available.
  3. Tasks set with the highest priority (the lowest number) start next, independent of how long they have been in the queue. For example, a task with a priority of 20 will run before a task with a priority of 50, even if the second task has been waiting longer.
  4. Tasks with the same priority are executed in the order they were added to the queue except if tabadmin set backgrounder.sort_jobs_by_run_time_history_observable_hours  36 is turned on. When the above option is on, the fastest full extract refresh jobs go first.

A few final practice guide:

  • Step 1: If most of your extracts have priority 50. You may want to try just turn on tabadmin set backgrounder.sort_jobs_by_run_time_history_observable_hours  36 to see how much waiting time improvement you can gain.
  • Step 2: If step 1 does not give you what you are looking for, try to change extract priority to higher priority (like 30 ) for any  extracts with duration below median average. This will give you big waiting time reduction.   You can start to  change the extract priorities manually to see how it goes. Just be aware that any re-publishing of extracts will change priority back to default 50.
  • How to automate Step 2?   I have not seen API for it. However I just had a program to update tasks.priority directly.
  • Why I do not recommend to have higher priority for more frequent jobs?  I know that it is one of the recommended best practices by a lot of Tableau practitioners. However I just think that it drives a wrong behavior – it will encourage publishers to increase the extract from weekly to daily or to hourly just in order to get their jobs higher priority, which in turn causing more extract delays. I think that job duration and incremental high priority give much better incentive for publishers to make their extracts more effective, which becomes a positive cycle.

Scaling Tableau (1/10) – version control and revision history

Tableau released one of the most wanted server features – version control and revision history in V9.3. Then this feature is  much more enhanced in V10 with previewing old workbook,  one click restoring, and maximum revisions setting. I love all of those new V10 features:

  • The workbook previewing and restoring features are so convenience for publishers.
  • The maximum revision setting is so cool for server admin who can actually control the server space usage so you do not have to run out of storage while enabling revision history. It also shows Tableau’s thought process for built-in governance process while enabling a new feature, which is important to scale Tableau to enterprise.   I will explain those features in details here:
  1. Turn it on. By default, Revision History is not turned on. It can be turned on site by site. To turn it on, go to site Setting, General and select  “Save a history of revisions“.  If you are on V10, you have two choices of Unlimited and # of revisions. Unlimited means that there is no limit on the max version history, which you probably do not want to have. As a server admin, you always want to make sure that your server will not run out of space. You will find # of revision is a very handy feature so admins can have some peace of mind about server storage.Screen Shot 2016-11-27 at 3.27.57 PM

2. How to decide the max. number of revisions?

I asked this question but I did not find any guidances anywhere. I spent days of research and I wanted to share my findings here. First  of all,  my philosophy is to give the max flexibility to publishers by providing as many revisions as possible. On the other side, I also want to be able to project extra storage that the revision history will create for planning purpose.

How many revision you should set? It depends on how much space you can allocate to revision history w/o dramatically impacting your backup/restore timing and how many workbooks the server have. Let’s say that you are Ok to give about 50G to all revision history. Then figure out how many workbooks you have now, and what is the total space for all the xml portion of workbooks (revision history only keeps xml piece), then you can calculate max number of revisions. Here is how:

  • Open Desktop, connect to PostgreSQL, give your server name, port, workgroup as database, give readonly user and password. Select  Workbooks table, look for Size, Data Engine Extracts, and number of records.  The Data Engine Extracts of  Workbooks table tells you if the workbook is embedded workbook or not.
  • If you have total 500 workbooks with 200 of them have Data Engine Extracts as false and total size as 200M for all workbooks with Data Engine Extracts as false.  It means that the avg twb is about 1M per workbook – this is what revision history will keep once it is turned on. Then the total xml size of workbook is about 500M.
  • When you turn on revision history and if you set max revision as 50, overtime, the server storage for revision history would be about 50 x 500 x 1M = 50G overtime.  Two other factors to consider: One is new workbook creation rate, two is that not every workbook would max out revision.
  • Once you set the revision number, you can monitor the storage usage for all revision history by looking at  Workbook_versions table which keeps all the revision history.  You can find the overall size, number of versions, and more insights about use pattens. You can also do the following joins to find out workbook name and use name, etc.

Screen Shot 2016-11-27 at 10.10.39 PM

3.  Can interactors see the previous version as well? No. The end users of interactors can only see the current version.

4. Does publish have to do anything to keep revision history of his or her workbooks? No. Once ‘Save a history for revision’ is turned on for site, every time the  workbook is web edited or modified via Desktop, a new revision w be created automatically – there is no further action for publisher. When the max number of revision is reached out, the oldest version will be deleted automatically. There is no notification to publishers either. All you need to communicate to publisher is that max number of revisions that any publisher can have.  For example, if you keep  50 revisions and one workbook has 50 revision already. When this workbook is changed again, Tableau server will keep the most recent  50 revisions only by deleting the oldest revision automatically.

5. Can you change the max revisions? Yes. Let’s say you have max revision as 50 and you want to reduce it to 25. Tableau server will delete the old revisions (if there are any) and keep the most recent 25 revisions only. What happens if you change back from 25 to 50? All the older revisions are gone and will not show up anymore.

6. What is workflow for publisher to restore an old workbook? Publishers or admin can see revision history for their workbooks by click details, revision history. With one simple click to preview any old workbook or restore. Once it is restored, a new revision will be created automatically again.

7. How to restore data source revision? V10 came with review and restore features for workbooks only. You can view all revisions for data sources as well but you will have to download the data source and upload it gain if you want to restore older version of data source. I am sure Tableau’s scrum team has been working on one click restoring of data source as well.

GOVERNED SELF-SERVICE ANALYTICS: Maturity Model (10/10)

My last 9 blogs covered all aspects of governed self-service and how to scale from department self-service to enterprise self-service. I received some very positive feedback and I am glad that my blogs inspired some readers:

Devdutta Bhosale says: “I read your article governed self-service analytics and as a Tableau server professional could instantly relate with some of challenges of implementing Enterprise BI with Tableau. Compared to Legacy BI tools such as BO, Micro-strategy, etc. enterprise BI is not the strength of Tableau especially compared to “the art of possible” with visualizations. I am so glad that you are writing so much in this space …. The knowledge you have shared has helped me follow some of the best practices with my recent Enterprise BI implementation at employer. I just wanted to say ‘thank you’ “.

Other readers also ask me how to measure governed self-service maturity. There are some BI maturity models by TDWI, Gartner’s, etc. However I have not seen any practical self-service analytics model. Here is my first attempt for the self-service analytics maturity model. I spent a lot of time thinking through this model and I read a lot too before I put this blog together.

I will describe the self-service analytics maturity model as followings: screenshot_184

  • Level 1: Ad-hoc
  • Level 2: Department Adoption
  • Level 3: Enterprise Adoption
  • Level 4: Culture of Analytics

Level 1 ad-hoc is where one or a few teams started to use Tableau for some quick visualization and insights. In other words, this is where Tableau initially landed. When Tableau’s initial value is recognized, Tableau adoption will go to business unit level or department level (level 2), which is where most of Tableau’s implementation is today. To scale further to enterprise adoption (level 3) needs business strategy alignment, bigger investment, and governed self-service model which is what this serious of blogs is about. The ultimate goal is to drive the culture of analytics and enable data-driven decision-making, which is level 4.

What are the characters of each maturity level? I will look into data, technology, governance, and business outcome perspectives for each of those maturity levels:

screenshot_204

Level 1: Ad-hoc

  • Data
    • Heroics data discovery
    • Data inconsistent
    • Poor data quality
  • Technology
    • Team based technology choice
    • Shadow IT tools
    • Exploration
  • Governance
    • No governance
    • Overlapping projects
  • Outcome
    • Focuses on what happened
    • Analytics does not reflect business strategy
    • Business process monitoring metrics

Level 2: Department Adoption

  • Data
    • Data useful
    • Some data definition
    • Siloed data management
    • Limited data polices
  • Technology
    • Practically IT supported architecture
    • Immature data preparation tools
    • Data mart like solutions
    • Early stage of big data technology
    • Scalability challenges
  • Governance
    • Functions and business line governance
    • Immature metadata governance
    • Islands of information
    • Unclear roles and responsibilities
    • Multiple versions of KPIs
  • Outcome
    • Some business functions recognizes analytics value and ROI
    • Analytics is used to inform decision-making
    • More on cause analysis & some resistant on adapting all insights
    • Data governance is managed in a piecemeal fashion

Level 3: Enterprise Adoption

  • Data
    • Data quality certification
    • Process & data measurement
    • Data policies measured & enforced
    • Data exception management
    • Data accuracy & consistency
    • Data protection
  • Technology
    • Enterprise analytics architecture
    • Managed analytics sandboxes
    • Enterprise data warehouse
    • Content catalog
    • Enterprise tools for various power users
    • Advanced technology
    • Exploration
  • Governance
    • Executive steering committee
    • Governed self-service
    • CoE with continuous improvement
    • Data and report governance
    • Enterprise data security
    • Business and IT partnership
  • Outcome
    • Analytics insight as a competitive advantage
    • Relevant information as a differentiator
    • Predictive analytics to optimize decision-making
    • Enterprise information architecture defined
    • Mature governed self-service
    • Tiered information contents

Level 4: Culture of Analytics

  • Data
    • Information life-cycle management
    • Data lineage & data flow impact documented
    • Data risk management and compliance
    • Value creation & monetizing
    • Business Innovation
  • Technology
    • Event detection
    • Correlation
    • Critical event processing & stream
    • Content search
    • Data lake
    • Machine learning
    • Coherent architecture
    • Predictive
  • Governance
    • Data quality certification
    • Process & data measurement
    • Data policies measured & enforced
    • Data exception management
    • Data accuracy & consistency
    • Data protection
    • Organizational process performance
  • Outcome
    • Data drives continuous business model innovation
    • Analytical insight optimizes business process
    • Insight in line with strategic business objectives
    • Information architecture underpins business strategies
    • Information governance as part of business processes

This will conclude the governed self-service analytics blogs. Here is key takeaways for the governance self-service analytics:

  1. Enterprise self-service analytics deployment needs a strong governance process
  2. Business and IT’s partnership is the foundation for a good governance
  3. If you are IT, you need to give more trust to your business partners
  4. If you business, be good citizen and follow the rule
  5. Community participation and neighborhood watch is important part of the success governance
  6. Governance  process evolves as your adoption goes

Thank you for reading.

Governed Self-Service Analytics: Content Management (9/10)

When executives get reports from IT-driven BI system, they trust the numbers. But if the reports are from spreadsheet, which can change anytime, they lower the trust level. If same spreadsheet is used to create Tableau visualization and be shared to executives for decision-making, does the trust level get increased? Can important business decisions be made based on the Tableau reports?

I am not against Tableau or visualization at all. I am a super Tableau fan. I love Tableau’s mission to help people to see and understand their data better. On the other side, as we all know that any dashboard is only as good as its data. How to provide trustworthy contents to end consumers? How to avoid the situation that some numbers are put into10K report while team is still baking the data definition?

The answer is to create a framework of content trust level indicator for end consumers. We do not want to slow down any innovation or discovery by self-service business analysts who still create their own analytics and publish workbooks. After dashboard is published, IT tracks the usages, identifies most valuable contents per defined criteria, certifies the data & contents so end consumers can use the certified reports the same way as reports from IT-driven BI. See the diagram below for overall flow:

Content

When you have a data to explore or you have a new business question to answer, hopefully you have report catalog to search if similar report is available to leverage. If yes, you do not have to develop it anymore although you may need to request an access to the report if you do not have access to it. If the visualization is not exactly what you are looking for but data attributes are there, you can always modify it to create your own version of visualization.

If there is no existing report available, you can also search published data source catalog to see if there is available published data source for you to leverage. If yes, you can create new workbooks by leveraging existing published data connections.

You may still need to bring your own data for your discovery. The early stage of discovery and analysis goes multi-iteration. Initial user feedback helps to reduce the overall time to market for your dashboards. At some point of time when your dashboard is good enough and is moved to production folder to share with a lot of more users, it will fall into track, identify and certify cycle.

Content cycle

What to track? Different organizations will have different answers. Here are examples:

  • Data sources with high hits
  • Reports accessed most frequently
  • Most active users
  • Least used reports for retirement

How to identify the most critical reports?

  • Prioritize based on usage (# of users, use cases, purpose, x-functional, benefits)
  • Prioritize based on data source and contents (data exist in certified env, etc)
  • Prioritize based on users. If CEO uses the report, it must be critical one for the organization

How to certify the critical reports? It is an on-going process:

  • Incrementally add self-service data to source of truth so data governance process can cover the new data sets (data definitions, data stewardship, data quality monitoring, etc)
  • Recreating dashboards (if needed) for better performance, add-on functionality, etc
  • Label the report with report trustworthy indicator

The intent of tracking, identifying and certifying cycle is to certify the most valuable reports in your organization. The output of the process is the report trustworthy indicator that helps end consumers to understand the level of trustworthy of data and reports.

End information consumers continue to use your visualizations that would be replaced with certified reports steps by steps, which is an on-going process. The certified reports will have trustworthy indicators on them.

What is the report trustworthy indicator? You can design multi level of trustworthy indicators. For example:

  • SOX certified:
    • Data Source Certified
    • Report Certified
    • Release Process Controlled
    • Key Controls Documented
    • Periodic Reviews
  • Certified reports:
    • Data Source Certified
    • Report Certified
    • Follow IT Standard Release Process
  • Certified data only
    • Data Source Partially Certified
    • Business Self-Service Releases
    • Follow Tableau Release Best Practices
  • Ad-Hoc
    • Business Self-Service Releases
    • Follow Tableau Release Best Practices

Content gov
As summary, content management helps to reduce the duplications of contents and data sources, and provide end information consumers with trustworthy level of the reports so proper decisions can be made based on the reports and data. The content management process outline above shows how to create the enterprise governance without slowing down innovations.

Please read next blog about governance mature level.

How business benefits from IT leadership with self-service analytics

Last week, I presented Self-Service Analytics in a local Silicon Valley meet-up “Run IT as Business” group.  The audiences include some IT executives and a few ex-CIOs. My presentation is well received with some very positive feedback:

  • “Mark gave an excellent presentation that was extremely informative!”
  • “well structured and very informative”
  • “This is one of the more interested presentations I’ve heard lately”

My talk focused on the new theme of BI and analytics – self-service analytics that is white hot and rapid growing. I shared how NetApp’s  change management to have users take ownership on this technology which is the success factor.

Slides for this talk is @ http://www.slideshare.net/mwu/run-it-as-business-meetup-selfservice-bi

Event feedback details @ http://www.meetup.com/Run-IT-as-a-Business-South-Bay/events/230661871/

Architecture differences between Tableau 8 to 9

We have talked about Tableau new features from user perspective. Recently there was a question asking architecture differences between Tableau 8 to 9. I thought that this was a good question.  Here is my summary of architecture differences between Tableau 8 to 9.

  1. HA and fail-over new components introduced in Tableau 9 : Coordinator Service (manages leader election and ensures that there is a quorum for making decisions during failover.)  & Cluster Controller (report process status and coordinate failover for HA and failover).
  2. New File Store in Tableau 9 to ensure extracts are available on all nodes of a cluster
  3. New Cache Server manages a shared query cache across the server cluster and is used by the VizQL Server, Backgrounder, and Data Server processes
  4. New minimum hardware requirements – Tableau 9 will not install if the hardware does not meet the minimum requirements
  5. New API server – This process is used when you interact with the server via REST API.
  6. Data Engine is no longer limited to running only two data engine nodes per cluster. This new flexibility can improve server clusters that are used for extract-heavy scenarios
  7. Gateway can be configured in multiple nodes for better HA.
  8. You must have minimum 3-node in the cluster to achieve full HA mode starting Tableau 9

Tableau 9.3 New Features

Tableau has speeded up its release cycles from one release per year to three releases 2015. Tableau also announced that there would be four releases in 2016. Tableau is going to spend more R&D $ this year than all the last 13 years of the company combined. I love the pace of innovation.

Tableau 9.3 is released on 3/24. I was able to demo some 9.3 new features at Tableau server upgrade and new feature demo webinar on the day when 9.3 was released, which is cool.

I am excited for Tableau 9.3 release, which features powerful upgrades to Self-Service Analytics environment. These include Workbook Revision History, union Excel or text-based data sources, passing parameters in initial SQL, Snowflake data connector, Map enhancements, Content Analytics, etc.

Workbook Revision History

This is the feature that many Tableau fans have been waiting for long time. In the past, publishers had to manage their own workbook versioning, which is a difficult task for many publishers. When changes did not work out and had to rollback to previous version, sometime publishers had challenges to remember what was the right version before. Unfortunately Tableau server team was helpless. Now 9.3 server keeps published workbook revision history so that publishers can go back to any of the their previous version if changes did not work out. This is huge!

Union & More Data Prep Features

Data prep is the area where most analysts spend a lot of their time unfortunately. Tableau continues enhancing data prep features so analysts can spend their valuable time on analysis and insights vs. copy & paste the data. 9.2 released feature of sub-table detection, data grid editing, data pan searching, etc. 9.3 added union feature that combines data that have been split across multiple files or tables into a single Tableau data source. Union works for Excel or text-based data sources only. I am sure that Tableau will make union work for database tables as well. You can also do more data grid editing now with 9.3: preview data extract or Web Data Connector, creating group or bin, etc.

Parameters in Initial SQL for Row-Level Security

This is huge feature for customers who are looking for better row-level security solution. Initial SQL is a set of commands that can run when you open the workbook, refresh an extract, sign in to Tableau Server, or publish to Tableau Server. Initial SQL can be used to set up temporary tables or a custom data environment during the sessions. Initial SQL is not new but was missing a critical feature – you could not dynamically pass parameters like username. Tableau 9.3 is able to pass parameters (TableauServerUser, etc) to some database. When TableauServerUser as parameter is passed to database for the duration of that user session, you can leverage database’s user security mapping (if you have implemented it) so database will render user specific data only to achieve the row-level security. 9.3 parameter in initial SQL supports Oracle, SQL Server, Sybase ASE, Redshift, and Greenplum only. Click here for details. For Teradata, you can use query band to pass parameters to achieve row-level security.

Project Leader Content Management

I have a blog about how to use Tableau site. I know that many Tableau customers avoid creating a new site unless they have to. How to make sure that site admins not becoming a bottleneck when you scale out Tableau but your deployment has only one or very few sites? If you struggle with this, you will love 9.3 new features that allow project leaders to change workbook owners, run refresh schedule and move contents that are tasks that can be done by site/server admin only in the past. This new feature together with 9.2’s project permission locking feature really empowers project leaders.

Server Management

9.3 added bunch of server management features. Like low disk-space alerts; ProtgreSQL improvement allows failing over from one repository to another much more quickly w/o server restarting; The REST API is underpinned by a completely new platform with significant performance and usability improvements for admins; Postgres connectivity monitoring allows server admin check the underlying PostgreSQL database for corruption with a new tabadmin command.

Publishing Workflow

Publishing data sources or workbooks become easier and faster in 9.3: Tableau Desktop remembers your Tableau Online or Tableau Server connection and signs you in to the last server you used. It is easier to publish, keep your data fresh, and stay connected with the new Publish Data Source flow.

Better Map

Map is enhanced with postal codes for 39 European countries, districts in India, and US demographic data layers for 2016. Postal codes for UK, France, Germany, and the US are also updated. Mapbox supports new Mapbox GL as well in additional to 9.2’s Mapbox Classic.

Progressive Dashboard Load

It is cool that Tableau has progressive dashboard load feature now, which means you can start analyzing your data sooner w/o having to wait for the entire dashboard to load.

First time being keynote speaker to Tableau West Coast Customer Advisory Summit

Last week I was customer keynote speaker at Tableau’s annual West Coast Customer Advisory Summit. My talk was about how to scale NetAApp’s Tableau enterprise deployment to 4,000+ users within one year. It was well received. A lot of people came to me and said that they were inspired by my presentation. It was actually similar talk that I gave at TC15 Las Vegas but I added some recent work around content certification framework. Since it is close door summit, there is no recording but I made my slides public that can be downloaded @ http://www.slideshare.net/mwu/tableau-customer-advocacy-summit-march-2016

Tableau VP Dave Story shared Tableau product roadmap. Other presentations include Tableau alerting, Desktop license management, which are all very good. Of course, we also went through product customer feedback exercises that customers voted for top ask features.  It was fun one-day event. It was great of meeting other Tableau big customers and a lot of Tableau product managers.

First time hosting Webinar for the entire Tableau Server & Online Admin group

I love Tableau. My passion is about Tableau server deployment – how to create governed self-service model with Tableau, people and process.  My company’s Tableau server added 4,000+ users within one year ( http://enterprisetableau.com/presentations). I got a lot tips & helps from Tableau community during last 2 years. I also want to give back, which is why I created a Silicon Valley Enterprise TUG focusing on Tableau server deployment which got a lot positive feedback. Recently it is recommended to extend the Silicon Valley Enterprise TUG to nationwide, which is why I become the co-owner of Tableau Server & Online Admin group. This is the first webinar for this group.

This webinar went extremely well with about 200 audiences via Zoom. Zoom is cool with its video, chatting and Q/A features. Speaker Mike Roberts (Zen Master) did amazing job to keep audiences closely for about 50 minis while I was busy answering questions via Q/A messaging. Mike shared great insights on workbook performance:

  • Workbook MetaData : What’s actually in the workbook (filters, row shelf, column shelf, etc)? Where do we get all the metadata WITHOUT using tabcmd/rest api? PostgreSQL / psql
  • Desktop vs Server: Something that performs well on desktop *should* perform equally well on Server. But sometimes that’s not true, how to troubleshoot it?
  • Alerts: How to create performance alerts to your workbook as people often don’t know if their workbook performance gets slow or not.

According to Mike,  a good workbook isn’t just one which performs well in Tableau Desktop. A good workbook should have following characters:

  • Data – general rule: more data = potential for high latency and poor performance
  • Design – proper use of filters, action, mark types, etc
  • Delivery – Where it’s delivered has a large impact on how it performs

Click here for all previous Server Admin webinar slides, summary and recordings….

Should you upgrade your Tableau Server?

Enterprise-TUG-Webinar-9.2-Upgrade-2-15-16
My last week’s Webinar about enterprise server upgrade (why upgrade, how to upgrade, and new feature demo) was well received with audience survey feedback  4.4  (1-5 scale & 5 being the awesome).

“Love & Hate” best describes each Tableau release. People love the new features and Tableau’s pace of innovation. But enterprise customers dislike the efforts, downtime, and risks associated with each upgrade.

Unfortunately choosing doing nothing is not a good option since Desktop users may have one-click Product Updates feature to upgrade their Desktop before server is upgrade (unless you have Microsoft CCM or something similar in your enterprise) – if it happens, users who have new version of Desktop (not at maintenance release level but major or minor release level, like 9.1 to 9.2) can’t publish workbooks to server, any workbooks opened and saved by 9.2 Desktop can’t be opened by 9.1 Desktop anymore. You will have a lot frustrated Desktop users. It is a lot of communication work of asking all Desktop users not to upgrade their Desktop till server is upgraded. The longer it takes for the server upgrade, the more communication work for enterprise server team……  Which means that doing nothing on server side is actually a lot of work as well.

NetApp’s approach is to upgrade the server ASAP – NetApp did 9.1 sever upgrade within 20 days of general release, and we did 9.2 server upgrade within 10 days of general release, which is win-win for Desktop users and server team.

It is impossible to have a bug-free version but Tableau’s releases are relative good. We did not find any major issues at all with our 9.0, 9.1 and 9.2 upgrades.

How To Use Tableau Sites?

Tableau server has a multi-tenancy feature called “sites” which is mainly for enterprise customers. Site strategy is one of the hot topics at the most recent Silicon Valley Enterprise Tableau User Group meet-up. Many people are not clear how to use sites.

This blog covers three areas about Tableau sites:

  • Basic concepts
  • Common use cases
  • Governance processes and settings

1. Basic concepts about Tableau sites

Let’s start with some basic concepts. Understanding those basic concepts will provide better clarity, avoid confusions, and reduce hesitations to leverage sites.

Sites are partitions or compartmented containers. There is absolutely no ‘communication’ between sites. Nothing can be shared across sites.

Site admin has unrestricted access to the contents on the specific site that he or she owns. Site admin can manage projects, workbooks, and data connections. Site admin can add users, groups, assign site roles and site membership. Site admins can monitor pretty much everything within the site: traffic to views, traffic to data sources, background tasks, space, etc. Site admin can manage extract refresh scheduling, etc.

One user can be assigned roles into multiple sites. The user can be site admin for site A and can also have any roles in site B independently. For example, Joe, as a site admin for site A, can be added as a user to site B as admin role (or Interactor role). However Joe can’t transfer workbooks, views, users, data connections, users groups, or anything between site A and site B sites. When Joe login Tableau, Joe has choice of site A or B: When Joe selects site A, Joe can see everything in site A but Joe can’t see anything in site B – It is not possible for Joe to assign site A’s workbook/view to any users or user groups in site B.

All sites are equal from security perspective. There is no concept of super site or site hierarchy. You can think of a site is an individual virtual server. Site is opposite of ‘sharing’.

Is it possible to share anything across sites? The answer is no for site admins or any other users. However if you are a creative server admin, you can write scripts run on server level to break this rule. For example, server admin can use tabcmd to copy extracts from site A to site B although site admin can’t.

2. Common use case of Tableau sites.

  • If your Tableau server is an enterprise server for multiple business units (fin, sales, marketing, etc), fin does not wants sales to see fin contents, create sites for each business unit so one business unit site admin will not be able to see other business unit’s data or contents.
  • If your Tableau server is an enterprise platform and you want to provide a governed self-service to business. Site approach (business as site admin and IT as server admin) will provide the maximum flexibility to the business while IT can still hold business site admins accounted for everything within his or her sites.
  • If your server deals with some key partners, you do not want one partner to see other partner’s metrics at all. You can create one site for each partner. This will also avoid potential mistakes of assigning partner A user to partner B site.
  • If you have some very sensitive data or contents (like internal auditing data), a separate site will make much better data security control – from development phase to production.
  • Using sites as Separation of Duties (SoD) strategy to prevent fraud or some potential conflicting of interests for some powerful business site admins.
  • You just have too many publishers on your server that you want to distribute some admin work to those who are closer to the publishers for agility reasons.

3. Governance processes around Tableau sites.

Thoughtful site management approaches, clearly defined roles and responsibilities, documented request and approval process and naming conversions have to be planned ahead before you go with site strategy to avoid potential chaos later on. Here is the checklist:

  • Site structure: How do you want to segment a server to multiple sites? Should site follow organization or business structure? There is no right or wrong answer here. However you do want to think and plan ahead. On our server, we partition our data, contents and users by business functions and geography locations. We create sites and site naming conversions as business_functions, or business_ geography. For example (Sales_partner, Marketing_APAC, Finance_audit, etc). When we look at a site name, we have some ideas what site is about.
  • How many sites you should have? It completely depends on your use cases, data sources, user base, levels of controls you want to have. As a rule of thumb, I will argue anyone who plans to create more than 100 sites on a server would be too many sites although I know a very large corporation has about 300 sites that work well for them. Our enterprise server has 4,000 end users with 20+ sites. Our separate Engineering server has 4 sites for about 1,000 engineers.
  • Who should be the site admin? Either IT or business users (or both) can be site admins. One site can have more than one admin. One person can admin multiple sites as well. When a new site is created, server admin normally just adds one user as site admin who can add others as site admins.
  • What controls are at site level? All the following controls can be done at site level:
    • Allow site admin to manage users for the site
    • Allow the site to have web authoring. When web authoring is on, it does not mean that all views within the site are web editable. The workbook/view level has to be set web editing allowed by specific users or user groups before the end user can have web editing.
    • Allow subscriptions. Each site can have one ‘email from address’ to send out subscriptions from that site.
    • Record workbook performance key events metrics
    • Create offline snapshots of favorites for iPad users.
  • What privileges server admin should give to site admins? Server admin can give all the above controls to site admin when the site is created. Server admin can change those site level settings as well. Server admin can even take back those privileges at anytime from site admin.
  • What is new site creation process? I have new site request questionnaires that requester has to answer. The answers help server and governance team to understand the use cases, data sources, user base, and data governance requirements to decide if their use cases fit Tableau server or not, if they should share an existing site or a new site should be created. The key criteria are if same data sources exist in other site, if the user base overlaps with other site. It is balance between duplication of work vs. flexibility. Here are some scenarios when you may not create a new site:
    • If the requested site will use the same data sources as one of the existing sites, you may want to create a project within the existing site to avoid potential duplicate extracts (or live connections) running against the same source database.
    • If the requested site overlaps end users a lot with one existing site, you may want to create a project within the existing site to avoid duplicating user maintenance works.

As a summary, Tableau site is a great feature for large Tableau server implementations. Sites can be very useful to segment data and contents, distribute admin work, empower business for self-service, etc. However site misuse can create a lot extract work or even chaos later on. Thoughtful site strategy and governance process have to be developed before you start to implement sites although the process evolves toward its maturity as you go.

Tableau Filters

Tableau filters change the content of the data that may enter a Tableau workbook, dashboard, or view. Tableau has multiple filter types and each type is created with different purposes. It is important to understand who can change them and the order of each type of filter is executed. The following filters are numbered based on the order of execution.

A. Secure Filters: Filters that can be locked down to prevent unauthorized data access in all interfaces (i.e., Tableau Desktop, Web Edit mode, or standard dashboard mode in a web browser).

1. Data source filters: To be “secure” they must be defined on a data source when it is published. If they are defined in the workbook with live connection, Tableau Desktop users can still edit them. Think of these as a “global” filter that applies to all data that comes out of the data source. There is no way to bypass a data source filter.
2. Extract filters: These filters are only effective at the time the extract is generated. They will not automatically change the dashboard contents until the extract is regenerated/refreshed.

B. Accessible Filters: Can be changed by anyone that opens the dashboard in Tableau Desktop or in Web Edit mode, but not in regular dashboard mode in a web browser.

3. Context filters: You can think of a context filter as being an independent filter. Any other filters that you set are defined as dependent filters because they process only the data that passes through the context filter. Context filters are often used to improve performance. However if the context filter won’t reduce the number of records by 10% or more, it may actually slow the dashboard down.
4. Dimension filters: Filters on dimensions, you can think of as SQL WHERE clause.
5. Measure filters: Filters on measures, you can think of as SQL HAVING clause.

C. User Filters: Can be changed by anyone in Tableau Desktop, in Web Edit mode, or in regular dashboard mode in a web browser.

6. Quick filters: Commonly used end user filters.
7. Dependent quick filters: There are quick filters depends on another quick filter. Dependent quick filters can quickly multiply and slow down dashboard performance.
8. Filter actions: To show related information between a source sheet and one or more target sheets. This type of action works well when you are building guided analytical paths through a workbook or in dashboards that filter from a master sheet to show more details. These will seem the most “responsive” to end users in terms of user experience, as they don’t incur any processing time unless they are clicked on by the user.
9. Table calculation filters: Filters on the calculated fields.

Tableau 9.2 New Features

My last blog shared our Tableau enterprise server 9.2 upgrade experiences. Now we are focusing on training & learning  9.2 new features.

I am excited for Tableau 9.2 release, which features powerful upgrades to our Enterprise Tableau Self-Service environment. These include automated data preparation features, powerful Web Editing, enhanced enterprise data security, native iPhone support, unlimited map customization, and improved performance to help users using their data easier, smarter and faster.

Data Preparation Enhancements

New data preparation features in 9.2 mean people will spend less time preparing and searching for data and more time analyzing it. The data interpreter now not only cleans Excel spreadsheets, but also automatically detects sub-tables and converts them to tables that can be analyzed in Tableau. Data grid improvements make it easier to craft the ideal data source and quickly move on to analysis and the enhancements to the Data pane help people take fewer steps to find and update metadata.

Greater Web-Editing Flexibility

Web Editing (or Web Authoring) is a feature that enables Tableau Server users to edit and create visualizations on the fly without a license for Desktop. New features added in 9.2 include:
· Data: Edit the data within your projects with new in-browser capabilities:
o Create new fields from all or part of a formula.
o Change your field’s data type, default aggregation, and geographic role.
o Manage data blends
o Toggle fields between continuous and discrete.
o View icons that indicate which fields are linking data sources when working in workbooks with blended data.
· Dashboards: Directly access worksheets within a dashboard, and easily export an image or PDF of the dashboard.

Enhanced enterprise data security

Use the new permission controls to set default permissions for projects as well as the associated workbooks and data sources. With one click, administrators and project leaders can now lock a project’s permissions. When locked, all workbooks and data sources within a project are set to the project’s permissions and cannot be edited by individual publishers. This increases security for critical and the most sensitive data.

Native iPhone Support

People could always use their iPhones with their Tableau dashboards and visualizations, but the Tableau Mobile app is now available for the iPhone, making it easier for people to interact and access their data on the go. Tableau also introduced geolocation, which makes it possible to orient your map around your current location with a simple tap on a Tableau map in a mobile browser or on the Tableau iPad and iPhone app.

Unlimited Map Customization

Tableau 9.2 introduces more options for controlling map behavior and unlimited potential for map customization. Mapbox integration in Tableau Desktop means people can easily customize, brand, enhance and add context to maps delivering an unprecedented flexibility to create beautiful and contextually rich maps. Additionally, Tableau is expanding the support for international postal codes with the addition of Japanese postal codes and other data updates such as U.S. congressional districts.

Improved Performance

Who doesn’t want their visualizations and dashboards to render faster? Published workbooks take advantage of browser capabilities to display shape marks more quickly. Workbook legends are a little smarter to only redraw when visible changes are made. In addition, Tableau can cache more queries using its external query cache compression leading to leveraging our server memory better.

Merry Christmas and Happy New Year!

Tableau server 9.2 upgrade experience

Tableau 9.2 was released on Dec 7th. Our production Tableau 16-core server was upgraded to 9.2 on Dec 17. The upgrade process took about 3 hours. It was very smooth and easy for us.

Why upgrade? We have 260+ Desktop users. A lot of them saw the Desktop 9.2 upgrade reminder at their lower right corner of their Tableau Desktop. Some users ask if they can upgrade their Desktop. The problem is that any workbooks developed by Desktop 9.2 can’t be published to 9.1 Tableau server. It is a lot of education to ask 260+ Desktop users to hold for their Desktop upgrade. I wish that I would have a pop-up message to overwrite Tableau’s default Desktop upgrade reminder, but I do not have the option…

So our game plan is to upgrade the Tableau server ASAP.  We upgraded Stage server on Dec 10th, with one week of test & validation, we upgraded our production server to 9.2. Of course, 9.2 has some great features (like iPhone support, smart Data prep, Mapbox integration, Project permission, etc). Our intent is to let users to leverage those new features as soon as possible.

We just followed Tableau’s overall upgrade process @ http://onlinehelp.tableau.com/current/server/en-us/upgrade_samehrdwr.htm

For our configuration, the upgrade procedures used are as following:

a. Backup primary server configuration

b. Clean up logs

c. Create backup copy

d. Uninstall workers

e. Uninstall primary server

f. Install workers

g. Install primary server

h. Verify configuration settings

Tableau Data Extract API, Tableau SDK and Web Data Connector

If you are confused about Tableau Data Extract API, Tableau SDK and Web Data Connector, please read this blog.

Tableau Data Extract API, introduced in v8, is to create binary TDE files from data sources. You can use C, C++, Java or Python to code the Extract API that generates TDE files.

Tableau v9.1 incorporated existing Extract API into new Tableau SDK that has following features:

  • Extract API(existing v8 feature): create extracts from data sources
  • Server API (v9.1 new feature): enables auto publishing of extracts to server.
  • Mac + Linux support (v9.1 new feature)

Tableau v9.1 also released  Web Data Connector that is to build Tableau connectors to read data from web site data in JSON, XML, HTML formats. Web Data Connector is programmed by JavaScript & HTML.

Some comparisons:

Native Tableau Connectors Customer SQL ODBC Connections Tableau SDK Tableau Web Data Connector
Use case Live or extracts Relational Data Sources ODBC-compliant data sources Any data sources w/o native connectors or excel Web source data only
Output live data or TDE live data or TDE live data or TDE TDE file TDE file
Language n/a SQL SQL C, C++, Java, Python 2.6, 2,7 JavaScript, HTML
Publishing & Refreshing Tableau server Tableau server Tableau server Managed outside Tableau server Tableau server

What are the steps for developing and implementing Tableau SDK?

  1. Developer: Develop Extract API (C, C++, Java, Python)
  2. Publisher or Site Admin: Connect to server (URL, user, password, site ID) and publish the extract.
  3. Once TDE is published, others can leverage the TDE the same way as any other TDE.

What are the steps for developing and implementing Web Data Connector?

  1. Developer: Develop Web Data Connector (JavaScript & HTML)
  2. Server admin: Import a Web Data Connector to Tableau server (example tabadmin import_webdataconnector connector1.html)
  3. Publisher: Workbook to embed credentials to the data sources
  4. Site Admin: Schedule Web Data Connector refresh (similar with any other data source scheduling)

As summary, there are so many data sources that Tableau is not able to come up with all native connectors. So Tableau Data Extract API was released v8 to create TDE out of data source, then v9.1 added Server API feature to automate the publishing from TDE to server. Tableau calls Extract API and Server API bundle SDK from v9.1.

Web Data Connector is a brand new feature released in v9.1 to connect to  web data sources. For security concerns, new Web Data Connector has to be registered by Tableau server admin before it can be used. Web Data Connector is coded by JavaScript & HTML, however if you just use a Web Data Connector developed by others, you do not have to know JavaScript at all.

NetApp’s Tableau enterprise deployment added 2,500 users in less than 10 months

NetApp’s presentation about Tableau enterprise deployment is well received at Tableau conference 2015 Las Vegas – Survey shows 4.5 out of 5 on contents and 4.3 out of 5 for speaker presentation.

The key success factors for large scale Tableau server deployment are:

1. Create enterprise Tableau Council with members from both business and IT. NetApp’s Tableau Council has 10 members who are all Tableau experts from each BU & IT. Most of the Council members are from business. Council meets weekly to assess and define governance rules. This council is representatives of larger Tableau community.

2. Enable and support Tableau community within company. NetApp has a very active 300+ member Tableau community which are mainly Tableau Desktop license owners.  NetApp’s Tableau Intranet is the place for everything about Tableau.  Anyone can post any questions in community intranet and a few committed members ensure all questions are answered timely . NetApp also has monthly Tableau user CoE meeting, Hackathon, quarterly Tableau Day, and internal Tableau training program.

3. Define clear roles and responsibilities in new self-service analytics model. NetApp uses site strategy – each BU has its own site.

  • BU site admins are empowered to manage everything within his or her site: Local or departmental data sources, workbooks, user groups and permissions, QA/release/publishing process, user support, etc.
  • IT owns server management, server licenses, enterprise data extracts, technical consulting, performance auditing & data security auditing, etc
  • Business and IT partnership for learning, training, support and governance.

4. Define Tableau publishing or release process.  The question here  is how much IT should be involved for publishing or release? This is a simple question but very difficult to answer. Trust and integrity is at heart of NetApp culture. NetApp’s approach is that IT is not involved for any workbook publishing.  BU site admins are empowered to make  decisions for their own QA/test/release/publishing process.

There are two simple principles: One is test fist before production. Second is performance rule of thumb which is 5 second-10 second-20 second rule. Less than 5 second workbook render time is good.  Workbook render time more than 10 seconds is bad. No one should publish any workbook if render time is more than 20 seconds.

What if people do not follow? NetApp wants to give BU maximum flexibility and agility for release or publishing. However if rules are not followed, IT will have to step in and take control the release process. When it happens,  it will becomes weekly release process. Is this something which IT wants to do? No. Is this something that IT may have to if things go south.. Yes but hopefully not….

5. Performance management – trust but verify approach. Performance has been everyone’s concern when it comes to a shared platform, specially when each BU decides their own publishing criteria and IT does not gate the publishing.

How to protect the value of shared Tableau self-service environment? How to prevent one badly-designed query from bringing all servers to their knees? NetApp has done a couple of things:

  • First, set server policy to ensure Tableau platform healthy: like maximum workbook size, extract timeout limits, etc
  • Second, send out daily workbook performance alerts to site admin about their long running workbooks.
  • Third, make workbook performance matrix public so everyone in the community has visibility on the worst performed workbooks/views to create some peer pressures with good intent.

It is site admin’s responsibility to tune the workbook performance. If action is not taken, site admin will get a warning, which can lead to a site closure.

6. Must have data governance for self-service analytics platform. Objective is to ensure Tableau self-service compliance with existing data governance process, polices and controls that company has.

Data governance  is not ‘nice to have’ but ‘must have’ even for Tableau environment. NetApp has a pretty mature enterprise data governance (EDM) process. BI team works very closely with EDM team to identify & enforce critical controls. For example, IT has masked all sensitive human resource & federal data in enterprise tier 2 data warehouse from database layer so we have piece of mind when Tableau desktop users to explore the tier 2 data.

NetApp is also working on auditing process to identify potential data governance issues and working with data management team to address those, this is the verifying piece of ‘trust but verify model’.

The goal is to create a governed self-service analytics platform.  It has been a journey toward maturity of the enterprise self-service analytics model.

The attached is the presentation deck

NetApp-Tableau-Presentation-Final1