Category Archives: Tableau Enterprise

SCALING TABLEAU (10/10) – ARCHITECTURE & AUTOMATION

I’d like to complete my scaling Tableau 10 serial blogs with architecture and automation topic. If you follow the tips/approaches in this scaling Tableau 10 serials and governance self-service 10 serials, you should not have any problems to deploy Tableau at enterprise with thousands of Desktop publishers on a few hundred core server cluster that supports ten thousand extracts/day, ten thousands unique active users/day with a few million clicks/month.

Architecture 

    • Prod, DR and Test : It is advisable to have 3 different env for any large Tableau deployment: Prod, DR and Test:
      • DR: During regular maintenance when Prod is down, server user traffics will be routed automatically to DR cluster. The best practice is to restore Prod to DR once a day so DR can have relative new contents. If you use extracts, it is trade-off if you want to refresh extracts on DR or not. If yes, it will be double load to your data sources, but your DR will have latest data. If not refresh DR, your DR will have one day old data during weekend prod maintenance period. If you create extracts outside Tableau and use Tableau SDK to push new extracts to server, you can easily push the extracts to both Prod and DR to keep DR data refresh.
      • Test: It is not advisable to publish all workbooks on Test instance before  Prod although it is a common transitional SDLC approach. If you do so, you are creating a lot of extra work for your publishers and server admin team. However it does not mean that you can ignore the controls and governance on prod version of workbooks. The best practice is to control and govern workbooks in different projects within the Prod instance. Then you may ask, what Test instance is for? For Tableau upgrade, OS upgrade, new drivers, new configuration file, performance test, load test, new TDC file, etc. Of course, Test can still be used to validate workbooks, permissions etc.
    • Server location: For best performance, Tableau server should be installed in the same zone of the same data center as where your data sources are. However your data sources are likely in different data centers, current Tableau server cluster does not support WAN nodes, you will have to choose one location to install your Tableau server cluster.  There are so many factors impacting workbook performance, if your West Coast server has to connect live to large data in East Coast data source, your workbook will not have a good performance. Option is to use extracts or split into two clusters – one in East Coast mainly for East Coast data sources, one is West Coast. It is always a trade-off.
    • Bare Metal vs. VM: Tableau server performs better on bare metal Windows server although VM gives you other flexibilities. For your benchmarking purpose, you can assume VM has 10-20% less efficiency vs. bare metals but there are so many other factors affecting your decision between bare metal vs. VM.
    • Server Configurations: There is no universal standard config for your backgrounders, VizQL server, Cache server, Data Engine, etc. The best approach is optimize your config based on your TabMon feedback. Here is a few common tips:
      • Get more RAM to each node, specially Cache server node
      • Make sure Primary and File Engine nodes have  enough disk for backup/restore purpose. As benchmarking, your Tableau database size should be less than  25% of disk.
      • It is Ok to keep CPU of backgrounder node at about 80% average to fully leverage your core licenses.
      • It is Ok to keep CPU of VizQL node at about 50% average
      • Install File Engine on Primary will reduce 75% backup/restore time although your Primary cores will be counted as licenses
      • Number of cores on single node should be less than 24
      • Continuously optimize config based on feedback from TabMon and other monitoring tools. 

Automation

  • Fundamental things to automate :
    • Backup: Setup backup with file auto rotation so you do not have to worry about backup disk out of space.  You should backup data, server config and logs daily. Pls find my working code @ here
    • User provisioning: Automatically sync Tableau server group and group members from company directory.
    • Extract failure alerts: Send email alerts whenever extract failed. See details here
  •  Advanced automation (Tableau has no API for those, more risk but great value. I have done all those below ):
    • Duration based extract priority : If you face some extract delays, adjust extract priority can increase 40-70% extract efficiency without adding new backgounders. The best practice is to set priority 10 for business critical extracts, priority 20 for incremental,  priority 30 for extract with duration below median average (this is 50% of all extract jobs). Priority 50 for all the rest. How to update priority? I have not seen API for it. However I just had a program to update tasks.priority directly (this is something that Tableau does not support officially but it works well). Read my blog about extracts.
    • Re-schedule extracts based on usage: One of the common problems in self-service world is that people do not bother to schedule the existing extracts when usage is less often than before. Server admin can re-schedule extracts based on usage: For example, the daily extracts should be re-scheduled to weekly if the workbook has no usage in past 2 week, weekly extracts should be re-scheduled to monthly if workbook has no usage in past 2 weeks. All those can be automated by updating tasks.priority directly although it is not officially supported approach.
    • Delete old workbooks: I have deleted 50% of workbooks on Tableau server in a few quarters. Any workbooks that have no usage  in past 90 days are deleted automatically. This policy is well received because it helps users to clean up old contents and it also help IT for disks and avoid unnecessary attentions to junk contents.  The best practice is to agree on  this policy between business and IT via governance process,  then do not provide a list of old workbooks to publishers before the deletion (to avoid unnecessary clicks). Only communicate to publishers after the workbooks are deleted. The best way is to communicate is to  send their specific .twb that got deleted  automatically by email while disregard the .tde. Publishers can always publish the workbooks again as self-service.   Use HISTORICAL_EVENTS table to identify old workbooks. I do not recommend to archive the old workbooks since it is extra work that does not have a lot of value.  Pls refer Matt Coles’ blog as start point.
    • Workbook performance alerts:  If workbook render is one of your challenges on server, you can create alerts being sent to workbook owners based on workbook render time. It is a good practice to create multi-level of warning like yellow and red warning with different threshold. Yellow alerts are warnings while red alerts are for actions. If owner did not take corrective actions during agreed period of time for red warning, a meeting should be arranged to discuss the situation. If the site admin refuses to take actions, the governance body has to make decision for agreed-upon penalty actions. The penalty can lead to site suspension. Please read more details for my performance management blog.
  •  Things should not be automated: Certain things you should not automate. For example, you may not want to automate site or project creation since sites/projects should be carefully evaluated and discussed before creation. You may not want to automate creation of Publisher site role since Publisher site role should be also under controlled. Proper training should be required before grant new Publisher.

As re-cap to scale Tableau to enterprise, there are mainly 5 areas of things to drive:  Community, Learning, Data Security, Governance and Enterprise Approach. This serials focuses more on the Enterprise Approach.  Hope is helps. I’d love to hear your tips and tricks too.

SCALING TABLEAU (9/10) – CONTROL DESKTOP UPGRADE

After your Tableau server is upgraded (let’s say from 10.0 to 10.2), you  want user’s Desktop to popup for 10.2 Desktop upgrade automatically. Read this blog.

Tableau Desktop can check for product updates and install them automatically. However, most large Tableau customers have to turn it off  (by modifying the setting for the AutoUpdateAllowed property value) since if Desktop is automatically updated to a newer version than Server, you can’t publish. For example, you can’t publish from 10.3 Desktop to 10.2 server.

What we really need is a controlled Desktop updates that Tableau team  can control when the Desktop users should be prompted for the upgrade after server upgrade.

In an attempt to achieve this, Tableau came up with control product updates for maintenance version only of Tableau Desktop. The problem is that out of box Tableau approaches works for maintenance updates only.

verssion

Tableau’s version is like this below: major.minor.maintenance.

Tableau control product updates only works for maintenance version updates, does not work for minor version updates.

I figured out  how to control your Desktop update for both minor & maintenance upgrade (i.e. from 10.0 to 10.2). It should work for major upgrade (from 9.* to 10.2) as well but I have not tested enough for it yet. This blog is a small deviation from Tableau’s out of box solution but is a big breakthrough on usability.

The  use case is that you already have control on your users’  Desktop configurations (for example,  you have built Mac installer package to update Mac plist or you have built Windows .bat to update registration of Windows), you plan to upgrade your Tableau server from 10.0 to 10.2, you want user’s Desktop to popup for 10.2 upgrade after your server is upgraded to 10.2.  It can be done by following the steps below:

  1. Create your own Tableau download server: Find an internal web server and create one Tableau download folder (let’s call it your own download server)  to host one TableauAutoUpdate.xml and the new installations packages. download folder

 

 

  • Make sure HTTPS is enabled
  • Validate download server by open browser @ https://xxx.corp.xyz.com/tableau/ to make sure that you are able to see the list of files. You will get error when click xml from browser, which is Ok.

2. Create your TableauAutoUpdate.xml from this example below:

<?xml version=”1.0″ ?>
<versions xmlns=”https://xxx.com/Tableau”>
<version hashAlg=”sha512″ latestVersion=”10200.17.0505.1445″ latestVersionPath=”” name=”10.0″ public_supported=”false” reader_supported=”false” releaseNotesVersion=”10.2.2″ showEula=”false”>
<installer hash=”86efa75ecbc40d6cf2ef4ffff18c3100f85381091e59e283f36b2d0a7a0d32e5243d62944e3ee7c8771ff39cc795099820661a49105d60e6270f682ded023669″ name=”TableauDesktop-10-2-2.pkg” size=”316511726″ type=”desktopMac”/>
<installer hash=”bb5f5ec1b52b3c3d799b42ec4f9aad39cc77b08916aba743b2bac90121215597300785152bafec5d754478e1de163eedfb33919457ad8c7ea93085f6deabff1e” name=”TableauDesktop-64bit-10-2-2.exe” size=”304921808″ type=”desktop64″/>
<version hashAlg=”sha512″ latestVersion=”10200.17.0505.1445″ latestVersionPath=”” name=”10.1″ public_supported=”false” reader_supported=”false” releaseNotesVersion=”10.2.2″ showEula=”false”>
<installer hash=”bb5f5ec1b52b3c3d799b42ec4f9aad39cc77b08916aba743b2bac90121215597300785152bafec5d754478e1de163eedfb33919457ad8c7ea93085f6deabff1e” name=”TableauDesktop-64bit-10-2-2.exe” size=”304921808″ type=”desktop64″/>
<installer hash=”86efa75ecbc40d6cf2ef4ffff18c3100f85381091e59e283f36b2d0a7a0d32e5243d62944e3ee7c8771ff39cc795099820661a49105d60e6270f682ded023669″ name=”TableauDesktop-10-2-2.pkg” size=”316511726″ type=”desktopMac”/>
<version hashAlg=”sha512″ latestVersion=”10200.17.0505.1445″ latestVersionPath=”” name=”10.2″ public_supported=”false” reader_supported=”false” releaseNotesVersion=”10.2.2″ showEula=”false”>
<installer hash=”bb5f5ec1b52b3c3d799b42ec4f9aad39cc77b08916aba743b2bac90121215597300785152bafec5d754478e1de163eedfb33919457ad8c7ea93085f6deabff1e” name=”TableauDesktop-64bit-10-2-2.exe” size=”304921808″ type=”desktop64″/>
<installer hash=”86efa75ecbc40d6cf2ef4ffff18c3100f85381091e59e283f36b2d0a7a0d32e5243d62944e3ee7c8771ff39cc795099820661a49105d60e6270f682ded023669″ name=”TableauDesktop-10-2-2.pkg” size=”316511726″ type=”desktopMac”/>
<installer hash=”bb5f5ec1b52b3c3d799b42ec4f9aad39cc77b08916aba743b2bac90121215597300785152bafec5d754478e1de163eedfb33919457ad8c7ea93085f6deabff1e” name=”TableauDesktop-64bit-10-2-2.exe” size=”304921808″ type=”desktop64″/>
</version>
</versions>

  • Notice latestVersionPath=””  This is the tricky setting so you do not have to create multiple directory within the above download folder to host download file.
  • How to create hash512? If you use Mac, open Terminal and run shasum -a 512 TableauDesktop-10-2-2.pkg  (you need to replace it with your package name)
  • Get the installer file size correctly. If you use Mac, open Terminal and run ls -l to get the file size in bytes
  • What is the LatestVersion? You need to install the target Desktop once, then you will find the  LatestVersion info from About Tableauversion
  • Name = “10.0” is the current Desktop version to be upgraded
  • public_supported=”false” or “true” – if support Tableau Public
  • reader_supported=”false” or “true” – if support Tableau Reader
  • showEula=”false” or “true” – if you want user to see & acknowledge Tableau’s standard End User License Agreement or not during installation
  • type=”desktop64″ means the installer is for Windows 64-bit
  • type=”desktopMac” means the installer is for Mac.

3. Create the installer packages (for Mac or Windows or both) and put them in the same folder as where TableauAutoUpdate.xml is.

  • Please do not put the package in any sub-directories.
  • Please make sure that the names of the installer packages are exactly the same used in TableauAutoUpdate.xml
  • If you change the name of the installer package, you will have to re-create the hash512

4. Configure user computers to point to your own Tableau download server

  • Windows: Make an entry for each product and operating system type (32-bit and 64-bit) in your environment. The following entry is for 64-bit Tableau Desktop:
    HKEY_LOCAL_MACHINE\SOFTWARE\Tableau\Tableau <version>\AutoUpdate
    Server = "xxx.corp.xyz.com/tableau/"

    For example:

    HKEY_LOCAL_MACHINE\SOFTWARE\Tableau\Tableau 10.3\AutoUpdate
    Server = "xxx.corp.xyz.com/tableau/"
    • Mac: Change the settings file for each user to list the download server. Use the defaultscommand.defaults write com.tableau.Tableau-<version> AutoUpdate.Server "xxx.corp.xyz.com/tableau/"For example:
      defaults write com.tableau.Tableau-10.2 AutoUpdate.Server "xxx.corp.xyz.com/tableau/"
  • Note:  AutoUpdate.Server “xxx.corp.xyz.com/tableau/” it does not have https in front since Tableau automatically adds https. Pls do not forget the ‘/’ at the end

5. How it works after you have done all the setups correctly?  When Desktop users launch an old version of Desktop, they will be getting the following popup automatically: user reminder

 

 

 

  • If ‘Download and install when I quit’ is selected, users can continue use Desktop, nothing happens till user close the Desktop.
  • Soon as Desktop is closed, download of the right new version Desktop will happen
  • The best piece of this is that soon as download is completed, the installation starts immediately and automatically
  • What happens if user canceled in the middle of download? No problem. Next time when Desktop is launched, the above popup will show up again
  • What if user cancel the AutoUpdates installation in the middle of installation? No problem. Next time when Desktop is launched, the above popup will show up again. Since new package is downloaded already, when user click ‘Download and install when I quit’,  it will not download again but kicks off installation right away.
  • The new package is downloaded to /Download/TableauAutoUpdate/
  • Do you need to do anything about the Release notes link? Right now, the link is https://www.tableau.com/support/releases. I’d love to config it so it can point to your own internal upgrade project link – I have not figured it out yet.

6. How to trouble shoot? Check /My Tableau Repository/Logs/log.txt.  Search for ‘AUTOUPDATE’ or/and xxx.corp.xyz.com/tableau/ to get hints why popup did not happen.

7. With AutoUpdate.Server configuration, you still need to turn off AutoUpdate.AutoUpdateAllowed.

SCALING TABLEAU (8/10) – LEVERAGE V10 FEATURES FOR ENTERPRISE

I love Tableau’s path of innovations. Tableau v10 has some most wanted new capabilities to enterprise customers. I have mentioned some of those features in my previous blogs. This blog summarizes V10 enterprise features:

  1. Set Extract Priority Based on Extract Duration.  

This is a very powerful v10 feature for server admin although it is not mentioned enough in Tableau community yet.   What this feature does is for the full extracts in the same priority  to run in order from shortest to longest based on their “last” run duration.

The benefit is to that smaller extracts do not have to wait for long time for big ones to finish. Tableau server will execute the smaller ones first so overall waiting time will be reduced during peak hours.

What server admin have to do to leverage this feature?

  • By default, this feature is off. Server admin has to turn it on. It is not site specific. Once it is on, it applies for all sites. Simplify run the following tabadmin to turn it on:
  •  tabadmin set backgrounder.sort_jobs_by_run_time_history_observable_hours  36
  • Please ready my blog  and Tableau doc for details.

2. Revision History and Version Control

Tableau released one of the most wanted server features – version control and revision history in V9.3. Then this feature is  much more enhanced in V10 with previewing old workbook,  one click restoring, and maximum revisions setting:

  • The workbook previewing and restoring features are so convenience for publishers.
  • The maximum revision setting is so cool for server admin who can actually control the server space usage so you do not have to run out of storage while enabling revision history.

How to deploy those features on server?

  • Turn it on: By default, Revision History is off. It can be turned on site by site. To turn it on, go to site Setting, General and select  “Save a history of revisions“.  If you are on V10, you have two choices of Unlimited and # of revisions. Unlimited means that there is no limit on the max version history, which you probably do not want to have. As a server admin, you always want to make sure that your server will not run out of space. You will find # of revision is a very handy feature so admins can have some peace of mind about server storage.Screen Shot 2016-11-27 at 3.27.57 PM
  • Decide the max revision you want to have which is site specific – it means that you can set diff max revisions for diff sites.
  • How to decide the max revisions to keep? How to find out extra server space for revisions?  Pls read my blog 

3. Cross database Joins and Cross Database Filter

X-DB joins and X-data source filters are two  most requested features by user community. Those are two different but related things.

X-DB joins allows two or more separate data sources to join together in row level. There are still some constraints on which kinds of data sources can be joined in V10 while Tableau plans to extend more in coming releases: V10 only allows extract to be primary data source while joins w other database and does not allow two extracts to join together yet.

What X-DB joins means for server admin?

  • Knowing that server admin has no control for x-db joins. It is totally controlled by publishers. This feature is enabled out of box and server admin  can’t turn it off – hopefully you never need to.
  • Watch server performance. A lot of x-db join activities happen on Tableau server. I was little skeptical about this feature that server admin does not have any control or visibility.  On the other side,  I have not uncounted any issues either after my v10 server upgrade since Nov 2016.
  •  From publisher perspective, the x-db joins can be slow if joins two large datasets.

What is cross database filter?

Use case example: Let’s say you’re connected to multiple data sources, each with common dimensions like Date or Product. And as part of your analysis, you want to have a single filter apply across all the sources.  That’s where this new feature comes in. Any time you have data sets that share a common dimension, you can filter across the data sets.  A few things to know about cross database filter

  • It is not x-db join but more like blending  where you can manage relationship to edit the blending from connected sources
  • You can only filter data across multiple primary data sources.You cannot filter data across secondary data sources.

4. Desktop License Reporting

Enable Desktop License Reporting is included in V10. This is an awesome feature to track Desktop usage even Desktop users do not publish. Pls see details about this @http://enterprisetableau.com/licensing/

The challenge to leverage this feature is how to change each user’s laptop to make the initially configuration. Here is what you need to know:

  • It work only if both Desktop and Server are on v10.
  • This feature is turned off on server by default, you can turn it on  using tabadmin
    tabadmin set features.DesktopReporting true
    tabadmin config
    tabadmin restart
  • The most difficult part is to update Windows Desktop’s registry or Mac Desktop’s plist to point to the Tableau server where you want license usage to be sent to. Best way is  to have Desktop v10 installer. Pls ref my previous blog for details.
  • You should have all company’s Desktop pointing to one Tableau server even Desktop users publish to different servers. This way you will have one place to see all enterprise Desktop usage.
  • By default, Tableau Desktop v10+ will ping Tableau server v10+ for usage reporting every 8 hrs. You can configure intervals on  Desktop.  It is controlled by plist of the Mac or registry of Windows. It is not tabadmin option. See here.

5. Subscribe Others

Finally Tableau delivered this long asking feature in V10. A few things to know:

  • This feature has to be enabled at site level
  • You can create custom email from address for each site. This is handy since users who received the subscription emails may not want to connect server admin rather site admin for questions.
  • Only workbook owners can subscribe others
  • The user has to have an email address in the Account Settings, otherwise subscribe others will not be highlighted.  If a lot of users do not have email address on Tableau server, you may have to mass update all users with valid email address before this feature can really be enabled.
  • You can’t subscribe to groups but users only. If you really want to subscribe group, one workaround is to create dummy user, then give group email to this dummy user.
  • You can’t subscribe to users who are not valid users of the site
  • You can’t subscribe to users who do not have permission to view the workbooks or views
  • The users who are subscribed can click ‘Manager my subscriptions’ link at the bottom of the subscribed emails to de-subscribe anytime.
  • Users can always subscribe themselves if they have view permission to the workbooks or views.

6. Device Specific Dashboard Layout 

After you’ve built a dashboard you can create layouts for it that are specific to particular phone or tablet devices. It will be the same URL but Tableau will render different layout depends on devices used to access the server.

Most of users (specially executive users) use phones to view information. This is great feature to drive Tableau enterprise adoption. A few notes:

  • It is enabled out of the box. There is no server or site level setting to enable or disable this feature.
  • When publish the dashboards, make sure to clear the option ‘ Show Sheets as Tabs’. Otherwise this feature does not work
  • This feature works for Tableau Apps and it also works for mobile devices that do not have Tableau Apps installed.
  • The best practice is to remove some views from default layout so mobile device layout will have fewer views than default layout

What are the design tips:

  • Ask yourself: What key information does my end user need from my dashboard?
  • Click “device preview” to confirm how your dashboard looks across different devices.
  • (For small screens) Remove unnecessary views, filters, titles, and legends.
  • (For small screens) Determine if you need a scrollable dashboard (fit width). If so, stack dashboard objects and use a “peek.”
  • (On touch devices) On scrollable dashboards, pin your maps, and disable pan and zoom.

With device designer, you’ll rest assured knowing your data stands out with optimized dashboards on any device!

6. Dataa Source Analytics

Data source management has been brought into line with Workbooks, so that we now have revision history, usage information and users can have favourite data sources.

You can also change the view for data sources so that you can see them grouped by where they connect to, instead of the data source name.

Tableau has yet to come up with data source lineage features announced in TC16 Austin  – from data source column to tell which workbooks use so you can do impact analysis when data source changes, or from workbooks to tell which data source table or/and columns for us to tell potential duplicated data sources. I am expecting those big new features in 2017.

7. Site Specific SAML

If using SAML authentication, you can make this site specific, instead of for the whole server.  This means that some sites on your Tableau Server can use SAML for single sign on, whilst others will just use normal authentication.

I know that it takes months for enterprise customers to leverage some of those new features. Hope this blog helps. Pls feel free to post your tips and tricks of implementing those features.

SCALING TABLEAU (7/10) – UNDERSTAND SERVER PERMISSIONS

When I think about Tableau permissions, I have two words:

  • Robust –  Tableau’s permission features are very comprehensive and robust. Definitely enterprise grade.
  • Confusion – On the other side, Tableau’s permission is kind of confusing  since it has too many different variables to set permissions.

To understand permissions, let’s start by looking into structures within Tableau server. A server consists of multiple sites (ref Tableau site blog for details). From permission perspective,  one important thing to know is that there is absolutely no ‘communication’ between sites. Nothing can be shared across sites.

Within each site, there are projects. Within project, there are workbooks and data sources, each workbook can have multiple views. Within each site, there are users and group. Sites are partitions or compartmented containers.

site structure

 

 

 

 

 

 

If you think projcet/workbooks are containers, permission is to assign users & groups into containers. Permissions are at all levels: site, project, workbooks, data sources, views. Let’s look into each of those.

1. Site Role

Tableau has many site roles but most common used ones are publisher, interactor in additional to admin.

site role

 

 

 

What is site role and how it works?

  • Site role is site specific. One user can have publisher site role in default but can have interactor site role in other site.
  • Site role can be granted by server admins and maybe site admins if the site site admin is allowed to manage users (site level setting).
  • Site role is ceiling as maximum permissions the user can have  for the site.
  • Interactor site role can never publish even with publisher permission at project level. Now you may start to see confusion part of Tableau permission.
  • Interactor site role can’t save or save as web editing even with “save” allowed at workbook level.
  • Site role does not define what a user can and can’t do at project,  workbook, or data source level. You can think site role as people’s legal right to work in US, while publish permission at project level is employer’s job offer.  Although you have to have legal rights to work in US, it does not mean that you can work for a company unless
    have a job offer from that company.  On the other side, even company gives an offer, Iy will not allowed to work if I do not have legal right to work in US.
  • You can check your own site role at ‘My Account Settings’  but you can’t check other’s site role.

2. Project Level Permissions

Project level permission  deals with who can view, publish, and manager the project. When you click project name, then permission , you will see the project permissions. You can set project roles (Publisher, Viewer and Project Leader) permission, you can also set workbook and data source permission here which will be as default permissions when workbooks or data sources are published to the project.project_permission

  • Publisher:  This is different from site role ‘Publisher’. Project publisher role defines if the  group or user can publish to this project. It is independent from site role ‘publisher’. Site role ‘Interactor’ can still have publisher permission at project level although it does not matter since site role ‘Interactor’ can’t publish to anywhere.
  • Project Leader: 
    • Can set permission to all items in the project
    • Can change refresh schedule: this can be a very handy feature if someone is on vacation and his workbook refresh schedule has to be changed.
    • Can change workbook or data source owner: This is great feature that project leader should do when someone leaving the team or company.
    • Can lock the project permission
  • Lock project permission vs managed by the owner:  The key difference is if you want each publisher to change their  workbook permission or not in your project.  When it is locked, those who have publisher site role and publisher permission to your project can still publish, but they can’t change any workbook permission. All the workbook permissions will default from project level permissions you set for workbooks. So all the workbooks within the project will have exactly the same permission. If you change workbook permissions at project level, it will be applied automatically to all the workbooks/data sources in the project. 
  • When to lock project permission? 
    • For more sensitive contents that you want to make sure permissions can’t be deviated
    • For simplifying permission purpose.
    • Other cases. For example, if you have one project, the workbook permissions are so messed and you want to re-do it. One way is to lock the permission, so all workbook/data source permissions will be cleared up with one simple click. Then you can unlock it to make some additional changes from there.
    • You can’t undo the permissions when you change from ‘managed by the owner’ to locked. Pls take screenshots before change

3. Workbook Level Permissions 

Workbook level has 14 different capacities that you can set independently. To simplify the process, Tableau out of box comes with a few templates (viewer, interactor, editor, none or Denied). When you make any modification to any of those templates, it will be called custom.workbook_permission

  • Download: Tableau workbook has 4 different download controls: Download image/pdf, download summary data, data load full data or download workbook (download workbook/save as is the combined capability).
  • Shared customized: There is shared customized and web edit. Customized view feature comes with filter. If user has filter permission, user can change view filter and can save the preferred filter as customized views, and can even make one of the customized views as default view to this user which is very handy specially for slower views. The shared customized controls if you want user to share his or her shared customized views to all other users who have access to the same views.
  • Web edit: Customized view is different from web edit. Customized view only allows filter type of change while web edit allows change the whole design of the view (like chart type, new calculations, new fields, etc).
  •  Download Workbook/Save As: Download will be highlighted with this permissions. However save as is considered publishing activities. If the user has site role as ‘Interactor’, the user can’t web edit save as or publish from Desktop even this Download Workbook/Save As is allowed at workbook.
  • Save: Save means to trust others to overwrite your workbooks.“Save” feature must know:
    • It works for both Desktop and Web Edit
    • The new user who ‘save’ will become new owner of workbook since workbook only has one owner at any given time
    • What about previous owner’s permission? The new owner can give previous owner any permission or no permission at all for the ‘managed by owner’ project
    • Revision history will create a new workbook revision if it is tuned on
    • “Save” button doesn’t appear except for owners. If you are not the content owner, the “Save As” button will appear. Type same name to overwrite a report, you will be asked to confirm overwriting, then “Save” button will appear.

4. How web edit, save as and save permissions work together

First, does the user have Web Edit permissions on the workbook. If no, then no “Edit” button appears.

Next, does the user have permissions to Publish on the Site. If not,  that user won’t get Save / Save As buttons even if you’ve granted correct Download / Web Save As permissions on the Workbook.

Also, does the user have workbook-level permissions to Download/Web Save As. If not, then No Save / Save As buttons for that workbook.

Finally, which Project can a user save? If you haven’t granted a user permissions to save into a particular project, then it doesn’t matter if all the other permissions are set correctly because the user doesn’t have any place to store their changes. If user has permission to publish to multiple projects, user will have choices of which project to save as.

web edit

 

 

 

 

 

5. Set data source permissions

When you publish a workbook, you can option to publish the data source separately from workbook. Then the published data sources  become reusable for more workbooks, one refresh schedule to update all its connected workbook at the same time so it becomes SSOT and of course less loads to data sources.

When you publish a workbook that connects to a Tableau Server data source, rather than setting the credentials to access the underlying data, you set whether the workbook can access the published data source it connects to.

If you select to prompt users, a user who opens the workbook must have View and Connect permissions on the data source to see the data. If you select embed password, users can see the information in the workbook even if they don’t have View or Connect permissions.

To simplify permission settings: When publish workbooks, select ‘Embedded password’ for published data sources it connects to:

  • When publish workbook, select ‘Embedded password’
  • Only give publisher group ’Connect’ permission at data source level
  • Do nothing for end consumer (‘interactor’) group at data source level

If you select ‘Prompt user’ data auth during workbook publishing while using published data source, a user who opens the workbook must have View and Connect permissions to the data source to see the data. Here is tricky part: You want to make sure that you do not give ‘interactor’ data source level ‘connect’ permission but give ‘interactor’ project level data source ‘connect’ permission. The correct setup is as followings:

  • For interactor group only:
    • Connect permission at project level
    • ‘unspecified’ at data source level
  •  For publisher group only:
    • Connect permission at data source level

The reason is that if you give  interactor group data source level ‘connect’ permission, they will be able to connect to the published data source if they have Desktop, which can potentially by-pass the filters or row level security setup in workbook or published data source. When user has project level data source ‘connect’ permission, the user is not able to connect via Desktop but is able to connect via workbook only. I could not find clear Tableau documentation for this but my test results in v9 and v10 confirmed this setting.

6. Set view permissions

When the workbook is saved without tabs, the default permissions are applied to the workbook and views, but view permissions can then be edited. Permissions for views in workbooks are inherited from the workbook permissions. If a user selects “Show sheets as tabs” when publishing a workbook from Tableau Desktop or saving it on Tableau Server, the workbook permissions override the permissions on individual views anyway.

Best practice is not to get view level permission at all.

6. Summary of best practices:

  • Permission groups, not users
  • Lock project permissions if possible
  • For owner managed projects, permission workbooks, not views
  • Assign project leaders
  • Plan your permissions
  • Use published data sources and ‘Embedded password’ when publish workbook
  • Apply additional row level security
  • Test permissions out
  • Continual reviews

 

SCALING TABLEAU (6/10) – ROW LEVEL SECURITY

Data security has been one of the top concerns for Tableau enterprise adoption. Tableau handles data security by permission and row level security. Permission controls what workbooks/views an user can see. Row level security controls what data sets this user can see. For example APAC users see APAC sales, EMEA users see EMEA sales only while both APAC and EMEA users have the same permission to the same workbook.

Does Tableau row level security works with extracts? Yes. This blog provides everything you need to know to create row level security controls for extracts and live connections, includes a new approach leveraging V10 x-db join features.

Use case : To create one workbook that server users can see subset of the data based on their Region (Central, East, South and West) and segments (Consumer, Corporate and Home Office) they are assigned to.

Solution A – Workbook filter for Row Level Security by Group

  1. Create following 12 Tableau server groups (Central-Consumer, Central-Corporate, Central-HomeOffice, East-Consumer, East-Corporate, East-HomeOffice,….). Central-Consumer group has all the Central region users who are assigned to Consumer segment….
  2.  Create calculated field
    ISMEMBEROF(‘Central-Consumer’) AND [Region] = ‘Central’ AND [Segment] = ‘Consumer’ OR
    ISMEMBEROF(‘Central-Coporate’) AND [Region] = ‘Central’ AND [Segment] = ‘Coporate’ OR
    ISMEMBEROF(‘Central-HomeOffice’) AND [Region] = ‘Central’ AND [Segment] = ‘HomeOffice’ OR
    ISMEMBEROF(‘West-Consumer’) AND [Region] = ‘West’ AND [Segment] = ‘Consumer’ OR
    ISMEMBEROF(‘West-Coporate’) AND [Region] = ‘West’ AND [Segment] = ‘Coporate’ OR
    ISMEMBEROF(‘West-HomeOffice’) AND [Region] = ‘West’ AND [Segment] = ‘HomeOffice’ OR
    ISMEMBEROF(‘East-Consumer’) AND [Region] = ‘East’ AND [Segment] = ‘Consumer’ OR
    ISMEMBEROF(‘East-Coporate’) AND [Region] = ‘East’ AND [Segment] = ‘Coporate’ OR
    ISMEMBEROF(‘East-HomeOffice’) AND [Region] = ‘East’ AND [Segment] = ‘HomeOffice’ OR
    ISMEMBEROF(‘South-Consumer’) AND [Region] = ‘South’ AND [Segment] = ‘Consumer’ OR
    ISMEMBEROF(‘South-Coporate’) AND [Region] = ‘South’ AND [Segment] = ‘Coporate’ OR
    ISMEMBEROF(‘South-HomeOffice’) AND [Region] = ‘South’ AND [Segment] = ‘HomeOffice’
  3. Add the calculated field to filter and select ‘true’
  4. After publish the workbook, set interactor permission to all the above 12 groups.
  5. Make sure Web Editing as No, Download as No.

That is all. ISMEMBEROF returns true if server current user is member of given group. ISMEMBEROF is the key function to use here. It  works for both extracts and live connection.

Notice that the control is a workbook filter. If workbook is downloaded, filter can be changed so the row level security will not work anymore, which is why workbook permission has to set download permission as No.

The better solution is to use data source filter for ISMEMBEROF calculation instead of workbook filter

Solution B – Data Source Filter for Row Level Security by Group

  1. You have the groups and calculated field from Solution A step 1 and step 2
  2. Edit data source filters to include the calculated field and select ‘true’pds
  3. Publish the data sources and set connect only permission (no edit)
  4. After publish the workbook, set permission to all the above 12 groups. There is no need to put the above calculated field to workbook filter anymore since filter is at data source level now.

Published data sources are reusable, single source of truth, less loads to data sources and now you have governed row level security built-in.

The Solution B works extracts. The only thing is that it is little tricky during workbook development process where you will need to make local extract local copy to simulate the user behavior from Desktop, and replace data sources from local to server published data source before publish the workbook, you will need to copy & paste all calculations. Pls reference manual fast way  or a hacky way.

The above approaches control user’s visibility of data sets by Tableau server groups.  It assumes that you will manage the group members outside Tableau. When have too many data security groups to manage manaually,  you can automate the group member creation by using Server REST API or your corp directory automation tool.

When group approach in Solution A & B can’t scale, the following USERNAME() approach will be another good option.

Solution C – Entitlement table x-db join for Row Level Security

Same use case but you want to add category as dimension for row level security in additional to Region and Segment. Now you will need 100+ groups just for row level security purpose which can be a lot to manage.  We are going to use Tableau’s USERNAME() function which returns current server user name. It does not use group anymore but assume that you will have separate user entitlement table below.

UserName Region Segment Category
U123 East Comsumer Furniture
U456 East Comsumer Office Supplier

This ser entitlement table can be Excel or separate database table. We can use V10’s cross database join feature for row level security:

  1. Create cross-db join between main datasource (like extract, MySQL) and use entitlement Excel
  2. Create calculated field
    USERNAME() = [UserName]
  3. If you use workbook filter, just add this calculated field into filter and set ‘true’ – the same as Solution A
  4. Or you use published data source, just edit data source filters to include the calculated field and select ‘true’ – the same as Solution B.
  5. You are done

The USERNAME() will return the server current user name. While [UserName] is the user name column of your use entitlement excel which can be a database table.

Please note: The current version of Tableau v10 does not support x-db joins between two extracts although it does support  x-db joins between an extract and excel (or some selective database). So if your primary data source is an extract, your use entitlement table can’t be  extract anymore.

In additional to ISMEMBEROF, the  USERNAME() is another great Tableau server function for row level security.  V10 x-db join feature extends USERNAME()’s use case a lot of more now since you can create your own use entitlement table outside your main database for agility and self-service.

When use entitlement table is in the same database as main FACT table, you may want to use database’ native join feature for row level security :

Solution D – Query Banding or Initial SQL for Row Level Security

For database (like TeraData) support query band, enter query banding:

 

  • ProxyUser = B_<ProxyUser>
  • TableauMode=<TableauMode>
  • TableauApp=<TableauApp>
  • Tableau Version=<TableauVersion>
  • WorkbookName=Name of DataSource

For database( Vertica, Oracle, SQL Server, Sybase ASE, Redshift, and Greenplum, etc)  support Initial SQL:

    • [TableauServerUser] returns the current Tableau Server user’s username only.
    • [TableauServerUserFull]
    • [TableauApp]
    • [WorkbookName}

As summary, ISMEMBEROF and  USERNAME() are two Tableau functions for row level security:

  • ISMEMBEROF returns true if server current user is member of given group. It needs server groups to be setup.
  • USERNAME() returns server current user name. It needs entitlement table. V10 x-db joins allows the entitlement table to be outside main data source.
  • Both can be implemented as Data Source Filter or workbook filter.
  • Both work for extracts and live connections.

Although USERNAME() returns server current user name, it does not pass the current user name to live connected datasource outside Tableau server.  In order to pass the server current user name to data source, you will have to use query banding or initial SQL depends on database you use. Query banding or initial SQL works only for live connections and does not work for extracts.

SCALING TABLEAU (5/10) – LICENSE MANAGEMENT

Tableau license management has been a big pain point to scale Tableau. This blog covers the followings:

  • Tableau license types
  • What is your End User License Agreement
  • How to get most out of your Tableau licenses
  • Desktop and Server license management – The Enterprise Approach
  1. Tableau license types

Tableau has following licenses:

  • Desktop Professional: The most common Desktop license that can connect to about 50 data sources.
  • Desktop Personal: The less used Desktop license that can connect to a few data sources only. It is about half price of Desktop Professional.
  • Tableau Server seat based : Small to medium scale sharing and collaboration purpose. One publisher or one interactor takes one seat. If you purchased 100 seat based licenses, you can assign a total 100 named users on server – you can change them as long as total does not exceed 100 users at any given time.
  • Tableau server core based: Medium to large scale sharing and collaboration purpose. If you have 16 cores, you can have unlimited number of interactors or publishers as long as your server is installed on < 16 core machines.
  • Tableau online: Similar to Tableau Server seat-based but it is on Tableau’s cloud platform.
  • Enterprise License Agreement (ELA): You pay a fixed amount to Tableau for 3 years then you will agreed-upon of Desktop and Server licenses. Tableau starts to see ELA to large enterprises.
  • Subscription: Tableau may move to subscription model by selling licenses valid for a period of time only.

2. What is your End User License Agreement

Nobody wants to read the End User License Agreement. Here is summary of what you should know:

  • Each Desktop license can be installed in two computers of the same user.  You may get a warning when you try to activate 3rd computer.
  • If a Desktop license key is used by Joe who left company or does not use it anymore, this key can be transferred to someone else. The correct process is to deactivate the key from Joe’s machine and reactive it on someone else machine.
  • If you have .edu email, you are lucky as you can get free Desktop as students or teachers.
  • If  you are part of small non-profit org, you can almost get free Desktop licenses.
  • Each server key can be installed in 3 instances: one prod and two non-prod.
  • What if you have to have 4 instances: prod, DR, test, dev? Let’s say you have two core-based keys: key A 8 cores and key B 8 cores. You can activate both keys in prod and DR w 16 each, then you can have key A 8 cores only for test and key B 8 core only for dev. You are good as long as one server key is used in 3 or less instances.
  • What if you do not want to pay maintenance fee anymore? Since it is perpetual licenses, you are still entitled to use the licenses even you do not want to pay maintenance fee. What you are not entitled anymore is upgrade and support.

3. How to get most out of your Tableau licenses

  • If the registration info (name, email, last installed, product version) in Tableau Customer Portal – Keys report is null, it means that this key is never used so you can re-assign it to someone else. You may be surprised how many keys are never used……
  • If the registration info (name, email, last installed, product version) in Tableau Customer Portal – Keys report is associated with someone who left company and this key has single registration, you can re-assign it to someone else.
  • If the registered product version is very old, likely the key owner is not active Desktop user.
  • Enable Desktop license reporting work when you upgrade to v10 to see who does not use Desktop for last a few months. Then potentially you can get license transferred (see below for more).

4. Desktop and Server license management – enterprise approach

When you have hundreds of Desktop licensees, you will need following approaches to scale:

  • Co-term all of your licenses for easy renewals.  Co-term means to have the same renewal date for all of your Desktop & Server: both what you have  and new purchases. This may take a few quarters to complete. Start to pick one renewal date, then agree with your Tableau sales rep, renewal rep,  purchasing department and users for the one renewal date.
  • The Tableau champion to have visibility on every team’s Tableau licenses in Customer Portal. Tableau’s land and expand sales approach creates multiple accounts in Customer Portal. Each team can only see their own keys & renewals. If you drive enterprise Tableau, ask for access for all accounts in Customer Portal.
  • Automate Desktop Installation, Activation and Registration process. No matter you are in Windows or Mac environment, you can automate Desktop installation, activation and registration via  Command lines. Read details.
  • Transit to Single Master Key. Tableau Desktop supports single master key. Instead of having 500 individual Desktop keys, you can consolidate all into one single master key which can be activated by 500 users. The pre-request is co-term all individual keys. A few important notes:
    • When single master key is created, make sure to ask Tableau to turn on hidden key feature so Desktop users will not see the key anymore. You do not want the single master key to be leaked out. See screenshot on Desktop where ‘Manage Product Keys’ menu does not show up anymore:screenshot_1028
    •  What it also means is that you will have to use quiet installer so key can be activated w/o user’s interaction.
    • If you have some users who have two computers at work and both have Tableau Desktop installs. Tableau may consider one user as two installs which will mess up your total license counts. Tableau license team can help you out.
  • Enable Desktop License Reporting in V10. This is an awesome feature to track Desktop usage even Desktop users do not publish. The challenge is how to change each user’s laptop. Here is what you need to know:
    • It work only if both Desktop and Server are on v10. It will be better on v10.0.2 or above as earlier v10 versions are buggy.
    • This feature is turned off on server by default, you can turn it on  using tabadmin
      tabadmin set features.DesktopReporting true
      tabadmin config
      tabadmin restart
    • The most difficult part is to update Windows Desktop’s registry or Mac Desktop’s plist to point to the Tableau server where you want license usage to be sent to. Best way is  to have Desktop v10 installer (ref the Automate Desktop Installation, Activation and Registration process).
    • You should have all company’s Desktop pointing to one Tableau server even Desktop users publish to different servers. This way you will have one place to see all enterprise Desktop usage.
    • By default, Tableau Desktop v10+ will ping Tableau server v10+ for usage reporting every 8 hrs. You can configure intervals on  Desktop. screenshot_1029 Windows example
      Mac plist example:
    • <?xml version="1.0" encoding="UTF-8"?>
      <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
        <plist version="1.0">
          <dict>
            <key>Server</key>
            <string>https://mytableau02:8010,http://mytableau</string> 
            <key>scheduleReportInterval</key>
            <string>3600</string>
          </dict>
      </plist>
    • The Desktop usage (every 8 hrs) is not sent to Tableau company but to your own Tableau server only. What is sent to Tableau company from Desktop is only the registration info. Of course, the registration info is also sent to your defined Tableau server(s).
    • What table has Desktop usage? The Postgres table name is desktop_reporting
    • What dates desktop_reporting has? It  has 4 columns for dates:
      • Maintenance expiration date
      • Expiration date (3 month after maintenance expiration date)
      • Registration date (when registered)
      • Last report date (when last time Desktop used).  Notice it captures only last time when Desktop is used. If you want to know how often Desktop is used in past 3 months, you can’t tell …..
    • How can tell historical Desktop usage? What you can do is to build incremental refresh for the desktop_reporting by last report date, then you will build out your own history for better Desktop license reporting. I am sure that Tableau is working on getting this small table to historical as well…..

As summary, Tableau Desktop and server license management is not a simple task. Hopefully those tips and tricks of The Enterprise Approach will easy your pains. It is a good practice to build those out step by step when you are not too big or not too messy.

SCALING TABLEAU (4/10) – USE SITES

Tableau server has a multi-tenancy feature called “sites” which can be leveraged by enterprise customers for better scalability, better security and advanced self-service.

This blog covers following areas about Tableau sites:

  • Basic concepts
  • Common use cases
  • Governance processes and settings
  • When should not create a new site

1. Basic concepts about Tableau sites

Let’s start with some basic concepts. Understanding those basic concepts will provide better clarity, avoid confusions, and reduce hesitations to leverage sites.

Sites are partitions or compartmented containers. There is absolutely no ‘communication’ between sites. Nothing can be shared across sites.

Site admin has unrestricted access to the contents on the specific site that he or she owns. Site admin can manage projects, workbooks, and data connections. Site admin can add users, groups, assign site roles and site membership. Site admins can monitor pretty much everything within the site: traffic to views, traffic to data sources, background tasks, space, etc. Site admin can manage extract refresh scheduling, etc.

One user can be assigned roles into multiple sites. The user can be site admin for site A and can also have any roles in site B independently. For example, Joe, as a site admin for site A, can be added as a user to site B as admin role (or Interactor role). However Joe can’t transfer workbooks, views, users, data connections, users groups, or anything between site A and site B sites. When Joe login Tableau, Joe has choice of site A or B: When Joe selects site A, Joe can see everything in site A but Joe can’t see anything in site B – It is not possible for Joe to assign site A’s workbook/view to any users or user groups in site B.

All sites are equal from security perspective. There is no concept of super site or site hierarchy. You can think of a site is an individual virtual server.  Site is opposite of ‘sharing’.

Is it possible to share anything across sites? The answer is no for site admins or any other users. However if you are a creative server admin, you can write scripts run on server level to break this rule. For example, server admin can use tabcmd to copy extracts from site A to site B although this goes to the areas where Tableau does not support anymore officially.

2. Common use case of Tableau sites. 

  • If your Tableau server is an enterprise server for multiple business units (fin, sales, marketing, etc), fin does not wants sales to see fin contents, create sites for each business unit so one business unit site admin will not be able to see other business unit’s data or contents.
  • If your Tableau server is an enterprise platform and you want to provide a governed self-service to business. Site approach (business as site admin and IT as server admin) will provide the maximum flexibility to the business while IT can still hold business site admins accounted for everything within his or her sites.
  • If your server deals with some external partners, you do not want one partner to see other partner’s contents at all. You can create one site for each partner. This will also avoid potential mistakes of assigning partner A user to partner B site.
  • If you have some very sensitive data or contents (like internal auditing data), a separate site will make much better data security control – from development phase to production.
  • Using sites as Separation of Duties (SoD) strategy to prevent fraud or some potential conflicting of interests for some powerful business site admins.
  • You just have too many publishers on your server that you want to distribute some admin work to those who are closer to the publishers for agility reasons.

Arguably, you can achieve all of those above by using Projects w/o using sites. Why sites again?  First, Sites just make things easier for large Tableau server deployment. Many out of box server admin views go by site. So it will be easier to know each BU’s usage if you have site by BU. Second, if you have a few super knowledgable business users, you can empower them better when you grant them site admin access.  

3. Governance processes around Tableau sites.

Thoughtful site management approaches, clearly defined roles and responsibilities, documented request and approval process and naming conversions have to be planned ahead before you go with site strategy to avoid potential chaos later on. Here is the checklist:

    • Site structure: How do you want to segment a server to multiple sites? Should site follow organization or business structure? There is no right or wrong answer here. However you do want to think and plan ahead.
    • How many sites you should have? It completely depends on your use cases, data sources, user base, levels of controls you want to have. As a rule of thumb, I will argue anyone who plans to create more than 50 sites on a server would be too many sites although I know a very large corporation has about 300 sites that work well for them. I will prefer to have  less than 20 sites.
    • Who should be the site admin? Either IT or business users (or both) can be site admins. One site can have more than one admin. One person can admin multiple sites as well. When a new site is created, server admin normally just adds one user as site admin who can add others as site admins.
    • What controls are at site level? All the following controls can be checked or unchecked at site level:
      • Storage limitation
      • Revision history on or off and max numbers of revisions
      • Allow the site to have web authoring. When web authoring is on, it does not mean that all views within the site are web editable. The workbook/view level has to be set web editing allowed by specific users or user groups before the end user can have web editing.
      • Allow subscriptions. Each site can have one ‘email from address’ to send out subscriptions from that site.
      • Record workbook performance key events metrics
      • Create offline snapshots of favorites for iOS users.
      • Site-specific SAML with local authentication
      • Language and locale
    • What privileges server admin should give to site admins? Server admin can give all the above controls to site admin when the site is created. Server admin can change those site level settings as well. Server admin can even take back those privileges at anytime from site admin.
    • What is new site creation process? I have new site request questionnaires that requester has to answer (see below). The answers help server and governance team to understand the use cases, data sources, user base, and data governance requirements to decide if their use cases fit Tableau server or not, if they should share an existing site or a new site should be created. The key criteria are if same data sources exist in other site, if the user base overlaps with other site. It is balance between duplication of work vs. flexibility.
    • What is site request questionnaires?
      • Does your bigger team have an existing Tableau site already on Tableau server? If yes, you can use the existing site. Please contact the site admin who may need to create a project within the existing site for your team. List of existing sites and admins can be found @……. 
      • Who is the primary business / application contact?
      • What business process / group does this application represent? (like sales, finance, etc)?
      • Briefly describe the purpose and value of the application
      • Do you have an IT contact for your group for this application? Who is it?
      • What are the data sources?
      • Are there any sensitive data to be reporting on? If yes, pls describe the data source
      • Are there any private data as part of source data? (like HR data, sensitive finance data)
      • Who are the audiences of the reports? How many audiences do you anticipate? Are there any partners who will access the data
      • Does the source data have more than one Geo data? If yes, what is the plan for data level security?
      • What are the primary data elements / measures to be reporting on (e.g. booking, revenue, customer cases, expenses, etc)
      • What will be the dimensions by which the measure will be shown (e.g. Geo, product, calendar, etc)
      • How often the source data needs to be refreshed?
      • What is anticipated volume of source data? How many quarters of data? Roughly how many rows of the data? Roughly how many columns of the data?
      • Is the data available in enterprise data warehouse?
      • Are the similar reports available in other existing reporting platform already?
      • How many publishers for this application?

4. When should not create a new site?

  • If the requested site will use the same data sources as one of the existing sites, you may want to create a project within the existing site to avoid potential duplicate extracts (or live connections) running against the same source database.
  • If the requested site overlaps end users a lot with one existing site, you may want to create a project within the existing site to avoid duplicating user maintenance works.
  • The requester does not know that his or her bigger team has a site site

As a summary, Tableau site is a great feature for large Tableau server implementations. Sites can be very useful to segment data and contents, distribute admin work, empower business for self-service, etc. However site misuse can create a lot extract work or even chaos later on. Thoughtful site strategy and governance process have to be developed before you start to implement sites although the process evolves toward its maturity as you go.

SCALING TABLEAU (3/10) – USE PUBLISHED DATA SOURCES

Tableau helps us to see and understand our data which is great. A lot of great things are happening every day when creative analysts have powerful Tableau Desktop  with unlocked enterprise source data  and Tableau server collaboration environment.

As Tableau adoption goes from teams to BU to enterprise, you quickly run into scalability challenges : Extract delays and enterprise data warehouse (EDW) struggles to meet ad-hoc workloads, etc.

My last blog talks about setting extract priority on server to improve 50% extract efficiency. This blog will focus on best practices for data source connection to scale EDW & server – use published data sources.

  1. What is Tableau published data source?

It is nothing but Tableau’s semantic layer. For those who have been in BI space for a while, you may be familiar with Oracle BI’s Repository or Business Objects’ Universe. The problem of   Repository or Universe is that they are too complex and are designed for specially trained IT professions only. Tableau is a new tool designed for business analysts who do not have to know SQL.  Tableau has much simplified semantic layer. Tableau community has never focused enough on published data sources till recent when people start to realize that leveraging  published data source is not only a great best practice but almost must to have in scaling Tableau to enterprise.

screenshot_941

 

 

 

 

 

 

2.  Again, what makes up Tableau published data source?

  • Information about how to access or refresh the data:  server name & credentials, Excel path, etc.
  • The data connection information:  table joins,  field friendly names, etc
  • Customization and cleanup : calculations, sets, groups, bins, and parameters; define any custom field formatting; hide unused fields; and so on.

3. Why Tableau published data source?

  • Reusable: Published data sources are reusable connections to data. When you prep your data, add calculations, and make other changes to your fields, these changes are all captured in your data source. Then when you publish the data source, other people can use it to conduct their own analysis.
  • Single source of truth (SSoT): You can have data steward who defines the data model while workbook publishers who can consume the publish data source to create viz and analysis.  Here is an example of how to set up permission to achieve SSoT.

screenshot_943

  • Less workload to EDW: When you use extracts , one refresh of the published data source will refresh all data to its connected workbooks, which reduces a lot workloads to your EDW. This can be a very big deal to your EDW.

screenshot_945

 

 

 

4. How many data sources are embedded vs published data sources? You can find it out from Data_Connections table. Look for the DBCLASS column, when value = ‘sqlproxy’, it means that it is a published data source.  Work with your server admin if you do not have access to workgroup table of Tableau Postgre  database.

If you have <20% data sources are published data sources, it means that published data sources is not well leveraged yet in your org or BU.

5. How to encourage people to use published data sources?

  • Control who can access to EDW: Let’s say you have a team of 10  Desktop users, you may want to give 2 of them the EDW access so you do not have to train all 10 people about table structure details  while have the rest of 8 people to use published data sources created by the two data stewards.
  • If extracts are used, you can create higher priority to all published data sources as incentive for people to use published data sources. See my previous blog for details.
  • Make sure people know the version control feature works for data source as well
  • As data stewards,  add comments to columns – here is how comment looks like when Screen Shot 2016-12-10 at 5.31.24 PMmouse over in Desktop Data pan:

 

 

 

Here is how to add comments:Tableaucomments1

 

 

 

Conclusions: Published data sources are  not new Tableau feature but are not widely used  although they are reusable, SSoT, scalable, less workload to your DB server. Tableau has been improving its publishing workflow by making data source publishing much easier than before since 9.3. Tableau v10 even gives you a new option to publish your data sources separately or not during workbook publish workflow. Data source revision history is great feature to control data source version.  Tableau has announced big roadmap about data governance in TC16. However self-service practitioners do not have to wait any new Tableau features in order to leverage the published data sources.

Scaling Tableau (2/10) – Set Extract Priority Based on Duration

Are you facing the situation that your Tableau server backgrounder jobs have much longer delay during peak hours ?

There are many good reasons why extracts are scheduled at peak hours, likely right after nightly ETL completions or even triggered automatically by ETL completions.

You always have limited backgrounders that you can have on your server. How to cut average extract delay without adding extract backgrounders and without rescheduling any extract jobs?

The keyword is job PRIORITY. There are some good priority suggestions in community(like https://community.tableau.com/thread/152689).  However what I found the most effective approach to prioritize the extracts was duration based priority in additional to business criticality – I managed to reduce  50% extract waiting time  after increased priority for all extracts with duration below median average runtime.

Here is what I will recommend as extract priority best practices:

  1. Priority 10 for any business critical extracts :  Hope nobody will disagree with me to give highest priority to business critical extracts..
  2. Priority 20 for all incremental extracts : Not only normally incremental takes less time than full, but also it is an awesome incentive to encourage more and more people use incremental extracts
  3. Priority 30 for any  extracts with duration below median average (this is 50% of all extract jobs). This  is another great incentive for publishers to make their extracts more effective.  It is the responsibilities of both server admin and publishers to make backgrounder jobs more effective. There are many things that publishers can do to improve the extract efficiency : tune extracts to be more efficient, use incremental vs full extracts, hide unused columns, add extract filters to pull less data,  reduce extract frequency, schedule extracts to off-peak hours, or better run extracts outside of Tableau by using Tableau SDK (see my blog @http://enterprisetableau.com/sdk/), etc.
  4. Priority 50 for all the rest (default)
  5. Turn on tabadmin set backgrounder.sort_jobs_by_run_time_history_observable_hours  36 which will prioritize full extracts in the same priority  to run in order from shortest to longest based on their “last” run duration.

The combination of #3 and #5 will reduce your extract waiting time dramatically during peak hours.

What is this backgrounder sort by run time option (#5 above)? I am sure that you want to read official Tableau online help here.

In short, Tableau server can sort full extract refresh jobs with the same priority (like 50) so they are executed based on the duration of their “last run,” executing the fastest full extract refresh jobs first.

The “last run” duration of a particular job is determined from a random sample of a single instance of the full extract refresh job in last <n> hours which you can config.  By default this sorting is disabled (-1). If enabling this, Tableau’s suggested value is 36 (hours)

Let’s say that you have the following jobs scheduled at 5am, here is how extracts are prioritized:

Priority Job Name Duration (min) Background priority with sort_jobs_by_run_time option ON Background priority with sort_jobs_by_run_time option OFF
10 Job 10.1 2 1 Those 4 jobs will go fist one by one w/o any priority among them. i.e. it could be Job 10.4, then Job 10.2, then Job 10.3 and Job 10.1
Job 10.2 3 2
Job 10.3 9 3
Job 10.4 15 4
20 Job 20.1 1 5 Those 4 jobs will go one by one after all pririty 10 jobs. Again no priority among them.
Job 20.2 2 6
Job 20.3 14 7
Job 20.4 15 8
30 Job 30.1 2 9 Those 4 jobs will go one by one after all pririty 20 jobs. Again no priority among them.
Job 30.2 3 10
Job 30.3 5 11
Job 30.4 8 12
50 Job 50.1 1 13 Those 10 jobs will go one by one after all pririty 30 jobs. Again no priority among them.
Job 50.2 9 14
Job 50.3 20 15
Job 50.4 25 16
Job 50.5 30 17
Job 50.6 50 18
Job 50.7 55 19
Job 50.8 60 20
Job 50.9 70 21
Job 50.10 80 22

For example, the max waiting time for the 1 min Job 20.1 will be 29 mins (all priority 10 jobs) with sort_jobs_by_run_time option ON. However the max waiting time could be 60 min with sort_jobs_by_run_time option OFF (all priority 10 jobs + other priority 2 jobs).

Re-cap on how extracts are run in this order:

  1. Any task already in process is completed first.
  2. Any task that you initiate manually using Run now starts when the next backgrounder process becomes available.
  3. Tasks set with the highest priority (the lowest number) start next, independent of how long they have been in the queue. For example, a task with a priority of 20 will run before a task with a priority of 50, even if the second task has been waiting longer.
  4. Tasks with the same priority are executed in the order they were added to the queue except if tabadmin set backgrounder.sort_jobs_by_run_time_history_observable_hours  36 is turned on. When the above option is on, the fastest full extract refresh jobs go first.

A few final practice guide:

  • Step 1: If most of your extracts have priority 50. You may want to try just turn on tabadmin set backgrounder.sort_jobs_by_run_time_history_observable_hours  36 to see how much waiting time improvement you can gain.
  • Step 2: If step 1 does not give you what you are looking for, try to change extract priority to higher priority (like 30 ) for any  extracts with duration below median average. This will give you big waiting time reduction.   You can start to  change the extract priorities manually to see how it goes. Just be aware that any re-publishing of extracts will change priority back to default 50.
  • How to automate Step 2?   I have not seen API for it. However I just had a program to update tasks.priority directly.
  • Why I do not recommend to have higher priority for more frequent jobs?  I know that it is one of the recommended best practices by a lot of Tableau practitioners. However I just think that it drives a wrong behavior – it will encourage publishers to increase the extract from weekly to daily or to hourly just in order to get their jobs higher priority, which in turn causing more extract delays. I think that job duration and incremental high priority give much better incentive for publishers to make their extracts more effective, which becomes a positive cycle.

Scaling Tableau (1/10) – version control and revision history

Tableau released one of the most wanted server features – version control and revision history in V9.3. Then this feature is  much more enhanced in V10 with previewing old workbook,  one click restoring, and maximum revisions setting. I love all of those new V10 features:

  • The workbook previewing and restoring features are so convenience for publishers.
  • The maximum revision setting is so cool for server admin who can actually control the server space usage so you do not have to run out of storage while enabling revision history. It also shows Tableau’s thought process for built-in governance process while enabling a new feature, which is important to scale Tableau to enterprise.   I will explain those features in details here:
  1. Turn it on. By default, Revision History is not turned on. It can be turned on site by site. To turn it on, go to site Setting, General and select  “Save a history of revisions“.  If you are on V10, you have two choices of Unlimited and # of revisions. Unlimited means that there is no limit on the max version history, which you probably do not want to have. As a server admin, you always want to make sure that your server will not run out of space. You will find # of revision is a very handy feature so admins can have some peace of mind about server storage.Screen Shot 2016-11-27 at 3.27.57 PM

2. How to decide the max. number of revisions?

I asked this question but I did not find any guidances anywhere. I spent days of research and I wanted to share my findings here. First  of all,  my philosophy is to give the max flexibility to publishers by providing as many revisions as possible. On the other side, I also want to be able to project extra storage that the revision history will create for planning purpose.

How many revision you should set? It depends on how much space you can allocate to revision history w/o dramatically impacting your backup/restore timing and how many workbooks the server have. Let’s say that you are Ok to give about 50G to all revision history. Then figure out how many workbooks you have now, and what is the total space for all the xml portion of workbooks (revision history only keeps xml piece), then you can calculate max number of revisions. Here is how:

  • Open Desktop, connect to PostgreSQL, give your server name, port, workgroup as database, give readonly user and password. Select  Workbooks table, look for Size, Data Engine Extracts, and number of records.  The Data Engine Extracts of  Workbooks table tells you if the workbook is embedded workbook or not.
  • If you have total 500 workbooks with 200 of them have Data Engine Extracts as false and total size as 200M for all workbooks with Data Engine Extracts as false.  It means that the avg twb is about 1M per workbook – this is what revision history will keep once it is turned on. Then the total xml size of workbook is about 500M.
  • When you turn on revision history and if you set max revision as 50, overtime, the server storage for revision history would be about 50 x 500 x 1M = 50G overtime.  Two other factors to consider: One is new workbook creation rate, two is that not every workbook would max out revision.
  • Once you set the revision number, you can monitor the storage usage for all revision history by looking at  Workbook_versions table which keeps all the revision history.  You can find the overall size, number of versions, and more insights about use pattens. You can also do the following joins to find out workbook name and use name, etc.

Screen Shot 2016-11-27 at 10.10.39 PM

3.  Can interactors see the previous version as well? No. The end users of interactors can only see the current version.

4. Does publish have to do anything to keep revision history of his or her workbooks? No. Once ‘Save a history for revision’ is turned on for site, every time the  workbook is web edited or modified via Desktop, a new revision w be created automatically – there is no further action for publisher. When the max number of revision is reached out, the oldest version will be deleted automatically. There is no notification to publishers either. All you need to communicate to publisher is that max number of revisions that any publisher can have.  For example, if you keep  50 revisions and one workbook has 50 revision already. When this workbook is changed again, Tableau server will keep the most recent  50 revisions only by deleting the oldest revision automatically.

5. Can you change the max revisions? Yes. Let’s say you have max revision as 50 and you want to reduce it to 25. Tableau server will delete the old revisions (if there are any) and keep the most recent 25 revisions only. What happens if you change back from 25 to 50? All the older revisions are gone and will not show up anymore.

6. What is workflow for publisher to restore an old workbook? Publishers or admin can see revision history for their workbooks by click details, revision history. With one simple click to preview any old workbook or restore. Once it is restored, a new revision will be created automatically again.

7. How to restore data source revision? V10 came with review and restore features for workbooks only. You can view all revisions for data sources as well but you will have to download the data source and upload it gain if you want to restore older version of data source. I am sure Tableau’s scrum team has been working on one click restoring of data source as well.