All posts by markwu2000@gmail.com

New Feature Adoption – Data-Driven Alert

Data-driven alert is a great long waiting feature of Tableau V10.3. It allows interactors to set threshold on the numeric axis of the view and receive emails only when the data exceeds (or below) the defined threshold. It also allows interactors to define how often to receive the emails when conditions are met. It works for custom view as well.  Interactors can define multiple alerts with different thresholds for the same view.  This blog explains how it works, why it is better than subscriptions and w
hat are the limitation.screenshot_2408
How Data-Driven Alerts work?

    • Numeric axis only
    • Interactors decides alerts
    • Interactors can add any site users who have email address to the alerts
    • Permissions are checked before  emails are sending NOT when users are added (opposite to subscriptions).
    • Email will not go to those recipients who  have no permission to the view although those users can still be added w/o any warning or error.
    • Pls vote IDEA https://community.tableau.com/ideas/8048
    • Can’t add group as recipients
    • Alerts can be created on saved custom views
    • Live connection –  server checks hourly(can be configured)
    • Extract – whenever refresh happens
    • When condition is true, how often emails to be sent is decided by alert owner.
    • Recipients can decide to be removed from alerts but can’t decide how often the emails will go out when conditions are met

What are the controls server admins have for data-driven alerts?

  1. Turn data-driven alerts on/off at site level (site, settings, uncheck or check data-driven alerts
  2. For live connections, decide how often server checks  alert conditions

Why Data-Driven Alerts are better?

  1. screenshot_2406It is a backgrounder process (admins can check Backgrounder Tasks for Non-Extracts : Check if Data Alert Condition is True)
  2. Less emails go out since Alert owners can compress emails when conditional are met
  3. It is a ‘push’ – every single extract completion will trigger one Check if Data Alert Condition is True backgrounder task

Why Subscriptions are not preferred?

  1. Subscriptions send out email at defined intervals, which does have a lot of convienances for some users. Tableau’s strength is interactive. The subscription is counter-interactive. 
  2. Each subscription is a simulation of user click on Tableau server unlike data-driven alerts which is a backgrounder process. User subscriptions are part of usage from http_requests table.
  3. It is nothing wrong for users to get Tableau views in their inbox. The problem is that server admins have no way to tell if the users open the emails at all. Overtime, admins can’t tell if users are actually using the Tableau server or not.

image_png

Tips and tricks to manage data-driven alerts in enterprise

  1. Limit number of subscription schedule to ‘force’ users from subscription to data-driven alert
  2. If your Tableau server has a content archiving program to archive unused workbooks, you can exclude subscription usage (historical_event_types, ‘Send Email’)
  3. Monitor your server to understand percentage of subscriptions vs. total click. It is hard to see what is the right balance but if your server subscriptions are >10% of total usage, it suggests that you have too many subscriptions.subs

SCALING TABLEAU (10/10) – ARCHITECTURE & AUTOMATION

I’d like to complete my scaling Tableau 10 serial blogs with architecture and automation topic. If you follow the tips/approaches in this scaling Tableau 10 serials and governance self-service 10 serials, you should not have any problems to deploy Tableau at enterprise with thousands of Desktop publishers on a few hundred core server cluster that supports ten thousand extracts/day, ten thousands unique active users/day with a few million clicks/month.

Architecture 

    • Prod, DR and Test : It is advisable to have 3 different env for any large Tableau deployment: Prod, DR and Test:
      • DR: During regular maintenance when Prod is down, server user traffics will be routed automatically to DR cluster. The best practice is to restore Prod to DR once a day so DR can have relative new contents. If you use extracts, it is trade-off if you want to refresh extracts on DR or not. If yes, it will be double load to your data sources, but your DR will have latest data. If not refresh DR, your DR will have one day old data during weekend prod maintenance period. If you create extracts outside Tableau and use Tableau SDK to push new extracts to server, you can easily push the extracts to both Prod and DR to keep DR data refresh.
      • Test: It is not advisable to publish all workbooks on Test instance before  Prod although it is a common transitional SDLC approach. If you do so, you are creating a lot of extra work for your publishers and server admin team. However it does not mean that you can ignore the controls and governance on prod version of workbooks. The best practice is to control and govern workbooks in different projects within the Prod instance. Then you may ask, what Test instance is for? For Tableau upgrade, OS upgrade, new drivers, new configuration file, performance test, load test, new TDC file, etc. Of course, Test can still be used to validate workbooks, permissions etc.
    • Server location: For best performance, Tableau server should be installed in the same zone of the same data center as where your data sources are. However your data sources are likely in different data centers, current Tableau server cluster does not support WAN nodes, you will have to choose one location to install your Tableau server cluster.  There are so many factors impacting workbook performance, if your West Coast server has to connect live to large data in East Coast data source, your workbook will not have a good performance. Option is to use extracts or split into two clusters – one in East Coast mainly for East Coast data sources, one is West Coast. It is always a trade-off.
    • Bare Metal vs. VM: Tableau server performs better on bare metal Windows server although VM gives you other flexibilities. For your benchmarking purpose, you can assume VM has 10-20% less efficiency vs. bare metals but there are so many other factors affecting your decision between bare metal vs. VM.
    • Server Configurations: There is no universal standard config for your backgrounders, VizQL server, Cache server, Data Engine, etc. The best approach is optimize your config based on your TabMon feedback. Here is a few common tips:
      • Get more RAM to each node, specially Cache server node
      • Make sure Primary and File Engine nodes have  enough disk for backup/restore purpose. As benchmarking, your Tableau database size should be less than  25% of disk.
      • It is Ok to keep CPU of backgrounder node at about 80% average to fully leverage your core licenses.
      • It is Ok to keep CPU of VizQL node at about 50% average
      • Install File Engine on Primary will reduce 75% backup/restore time although your Primary cores will be counted as licenses
      • Number of cores on single node should be less than 24
      • Continuously optimize config based on feedback from TabMon and other monitoring tools. 

Automation

  • Fundamental things to automate :
    • Backup: Setup backup with file auto rotation so you do not have to worry about backup disk out of space.  You should backup data, server config and logs daily. Pls find my working code @ here
    • User provisioning: Automatically sync Tableau server group and group members from company directory.
    • Extract failure alerts: Send email alerts whenever extract failed. See details here
  •  Advanced automation (Tableau has no API for those, more risk but great value. I have done all those below ):
    • Duration based extract priority : If you face some extract delays, adjust extract priority can increase 40-70% extract efficiency without adding new backgounders. The best practice is to set priority 10 for business critical extracts, priority 20 for incremental,  priority 30 for extract with duration below median average (this is 50% of all extract jobs). Priority 50 for all the rest. How to update priority? I have not seen API for it. However I just had a program to update tasks.priority directly (this is something that Tableau does not support officially but it works well). Read my blog about extracts.
    • Re-schedule extracts based on usage: One of the common problems in self-service world is that people do not bother to schedule the existing extracts when usage is less often than before. Server admin can re-schedule extracts based on usage: For example, the daily extracts should be re-scheduled to weekly if the workbook has no usage in past 2 week, weekly extracts should be re-scheduled to monthly if workbook has no usage in past 2 weeks. All those can be automated by updating tasks.priority directly although it is not officially supported approach.
    • Delete old workbooks: I have deleted 50% of workbooks on Tableau server in a few quarters. Any workbooks that have no usage  in past 90 days are deleted automatically. This policy is well received because it helps users to clean up old contents and it also help IT for disks and avoid unnecessary attentions to junk contents.  The best practice is to agree on  this policy between business and IT via governance process,  then do not provide a list of old workbooks to publishers before the deletion (to avoid unnecessary clicks). Only communicate to publishers after the workbooks are deleted. The best way is to communicate is to  send their specific .twb that got deleted  automatically by email while disregard the .tde. Publishers can always publish the workbooks again as self-service.   Use HISTORICAL_EVENTS table to identify old workbooks. I do not recommend to archive the old workbooks since it is extra work that does not have a lot of value.  Pls refer Matt Coles’ blog as start point.
    • Workbook performance alerts:  If workbook render is one of your challenges on server, you can create alerts being sent to workbook owners based on workbook render time. It is a good practice to create multi-level of warning like yellow and red warning with different threshold. Yellow alerts are warnings while red alerts are for actions. If owner did not take corrective actions during agreed period of time for red warning, a meeting should be arranged to discuss the situation. If the site admin refuses to take actions, the governance body has to make decision for agreed-upon penalty actions. The penalty can lead to site suspension. Please read more details for my performance management blog.
  •  Things should not be automated: Certain things you should not automate. For example, you may not want to automate site or project creation since sites/projects should be carefully evaluated and discussed before creation. You may not want to automate creation of Publisher site role since Publisher site role should be also under controlled. Proper training should be required before grant new Publisher.

As re-cap to scale Tableau to enterprise, there are mainly 5 areas of things to drive:  Community, Learning, Data Security, Governance and Enterprise Approach. This serials focuses more on the Enterprise Approach.  Hope is helps. I’d love to hear your tips and tricks too.

SCALING TABLEAU (9/10) – CONTROL DESKTOP UPGRADE

After your Tableau server is upgraded (let’s say from 10.0 to 10.2), you  want user’s Desktop to popup for 10.2 Desktop upgrade automatically. Read this blog.

Tableau Desktop can check for product updates and install them automatically. However, most large Tableau customers have to turn it off  (by modifying the setting for the AutoUpdateAllowed property value) since if Desktop is automatically updated to a newer version than Server, you can’t publish. For example, you can’t publish from 10.3 Desktop to 10.2 server.

What we really need is a controlled Desktop updates that Tableau team  can control when the Desktop users should be prompted for the upgrade after server upgrade.

In an attempt to achieve this, Tableau came up with control product updates for maintenance version only of Tableau Desktop. The problem is that out of box Tableau approaches works for maintenance updates only.

verssion

Tableau’s version is like this below: major.minor.maintenance.

Tableau control product updates only works for maintenance version updates, does not work for minor version updates.

I figured out  how to control your Desktop update for both minor & maintenance upgrade (i.e. from 10.0 to 10.2). It should work for major upgrade (from 9.* to 10.2) as well but I have not tested enough for it yet. This blog is a small deviation from Tableau’s out of box solution but is a big breakthrough on usability.

The  use case is that you already have control on your users’  Desktop configurations (for example,  you have built Mac installer package to update Mac plist or you have built Windows .bat to update registration of Windows), you plan to upgrade your Tableau server from 10.0 to 10.2, you want user’s Desktop to popup for 10.2 upgrade after your server is upgraded to 10.2.  It can be done by following the steps below:

  1. Create your own Tableau download server: Find an internal web server and create one Tableau download folder (let’s call it your own download server)  to host one TableauAutoUpdate.xml and the new installations packages. download folder

 

 

  • Make sure HTTPS is enabled
  • Validate download server by open browser @ https://xxx.corp.xyz.com/tableau/ to make sure that you are able to see the list of files. You will get error when click xml from browser, which is Ok.

2. Create your TableauAutoUpdate.xml from this example below:

<?xml version=”1.0″ ?>
<versions xmlns=”https://xxx.com/Tableau”>
<version hashAlg=”sha512″ latestVersion=”10200.17.0505.1445″ latestVersionPath=”” name=”10.0″ public_supported=”false” reader_supported=”false” releaseNotesVersion=”10.2.2″ showEula=”false”>
<installer hash=”86efa75ecbc40d6cf2ef4ffff18c3100f85381091e59e283f36b2d0a7a0d32e5243d62944e3ee7c8771ff39cc795099820661a49105d60e6270f682ded023669″ name=”TableauDesktop-10-2-2.pkg” size=”316511726″ type=”desktopMac”/>
<installer hash=”bb5f5ec1b52b3c3d799b42ec4f9aad39cc77b08916aba743b2bac90121215597300785152bafec5d754478e1de163eedfb33919457ad8c7ea93085f6deabff1e” name=”TableauDesktop-64bit-10-2-2.exe” size=”304921808″ type=”desktop64″/>
<version hashAlg=”sha512″ latestVersion=”10200.17.0505.1445″ latestVersionPath=”” name=”10.1″ public_supported=”false” reader_supported=”false” releaseNotesVersion=”10.2.2″ showEula=”false”>
<installer hash=”bb5f5ec1b52b3c3d799b42ec4f9aad39cc77b08916aba743b2bac90121215597300785152bafec5d754478e1de163eedfb33919457ad8c7ea93085f6deabff1e” name=”TableauDesktop-64bit-10-2-2.exe” size=”304921808″ type=”desktop64″/>
<installer hash=”86efa75ecbc40d6cf2ef4ffff18c3100f85381091e59e283f36b2d0a7a0d32e5243d62944e3ee7c8771ff39cc795099820661a49105d60e6270f682ded023669″ name=”TableauDesktop-10-2-2.pkg” size=”316511726″ type=”desktopMac”/>
<version hashAlg=”sha512″ latestVersion=”10200.17.0505.1445″ latestVersionPath=”” name=”10.2″ public_supported=”false” reader_supported=”false” releaseNotesVersion=”10.2.2″ showEula=”false”>
<installer hash=”bb5f5ec1b52b3c3d799b42ec4f9aad39cc77b08916aba743b2bac90121215597300785152bafec5d754478e1de163eedfb33919457ad8c7ea93085f6deabff1e” name=”TableauDesktop-64bit-10-2-2.exe” size=”304921808″ type=”desktop64″/>
<installer hash=”86efa75ecbc40d6cf2ef4ffff18c3100f85381091e59e283f36b2d0a7a0d32e5243d62944e3ee7c8771ff39cc795099820661a49105d60e6270f682ded023669″ name=”TableauDesktop-10-2-2.pkg” size=”316511726″ type=”desktopMac”/>
<installer hash=”bb5f5ec1b52b3c3d799b42ec4f9aad39cc77b08916aba743b2bac90121215597300785152bafec5d754478e1de163eedfb33919457ad8c7ea93085f6deabff1e” name=”TableauDesktop-64bit-10-2-2.exe” size=”304921808″ type=”desktop64″/>
</version>
</versions>

  • Notice latestVersionPath=””  This is the tricky setting so you do not have to create multiple directory within the above download folder to host download file.
  • How to create hash512? If you use Mac, open Terminal and run shasum -a 512 TableauDesktop-10-2-2.pkg  (you need to replace it with your package name)
  • Get the installer file size correctly. If you use Mac, open Terminal and run ls -l to get the file size in bytes
  • What is the LatestVersion? You need to install the target Desktop once, then you will find the  LatestVersion info from About Tableauversion
  • Name = “10.0” is the current Desktop version to be upgraded
  • public_supported=”false” or “true” – if support Tableau Public
  • reader_supported=”false” or “true” – if support Tableau Reader
  • showEula=”false” or “true” – if you want user to see & acknowledge Tableau’s standard End User License Agreement or not during installation
  • type=”desktop64″ means the installer is for Windows 64-bit
  • type=”desktopMac” means the installer is for Mac.

3. Create the installer packages (for Mac or Windows or both) and put them in the same folder as where TableauAutoUpdate.xml is.

  • Please do not put the package in any sub-directories.
  • Please make sure that the names of the installer packages are exactly the same used in TableauAutoUpdate.xml
  • If you change the name of the installer package, you will have to re-create the hash512

4. Configure user computers to point to your own Tableau download server

  • Windows: Make an entry for each product and operating system type (32-bit and 64-bit) in your environment. The following entry is for 64-bit Tableau Desktop:
    HKEY_LOCAL_MACHINE\SOFTWARE\Tableau\Tableau <version>\AutoUpdate
    Server = "xxx.corp.xyz.com/tableau/"

    For example:

    HKEY_LOCAL_MACHINE\SOFTWARE\Tableau\Tableau 10.3\AutoUpdate
    Server = "xxx.corp.xyz.com/tableau/"
    • Mac: Change the settings file for each user to list the download server. Use the defaultscommand.defaults write com.tableau.Tableau-<version> AutoUpdate.Server "xxx.corp.xyz.com/tableau/"For example:
      defaults write com.tableau.Tableau-10.2 AutoUpdate.Server "xxx.corp.xyz.com/tableau/"
  • Note:  AutoUpdate.Server “xxx.corp.xyz.com/tableau/” it does not have https in front since Tableau automatically adds https. Pls do not forget the ‘/’ at the end

5. How it works after you have done all the setups correctly?  When Desktop users launch an old version of Desktop, they will be getting the following popup automatically: user reminder

 

 

 

  • If ‘Download and install when I quit’ is selected, users can continue use Desktop, nothing happens till user close the Desktop.
  • Soon as Desktop is closed, download of the right new version Desktop will happen
  • The best piece of this is that soon as download is completed, the installation starts immediately and automatically
  • What happens if user canceled in the middle of download? No problem. Next time when Desktop is launched, the above popup will show up again
  • What if user cancel the AutoUpdates installation in the middle of installation? No problem. Next time when Desktop is launched, the above popup will show up again. Since new package is downloaded already, when user click ‘Download and install when I quit’,  it will not download again but kicks off installation right away.
  • The new package is downloaded to /Download/TableauAutoUpdate/
  • Do you need to do anything about the Release notes link? Right now, the link is https://www.tableau.com/support/releases. I’d love to config it so it can point to your own internal upgrade project link – I have not figured it out yet.

6. How to trouble shoot? Check /My Tableau Repository/Logs/log.txt.  Search for ‘AUTOUPDATE’ or/and xxx.corp.xyz.com/tableau/ to get hints why popup did not happen.

7. With AutoUpdate.Server configuration, you still need to turn off AutoUpdate.AutoUpdateAllowed.

SCALING TABLEAU (8/10) – LEVERAGE V10 FEATURES FOR ENTERPRISE

I love Tableau’s path of innovations. Tableau v10 has some most wanted new capabilities to enterprise customers. I have mentioned some of those features in my previous blogs. This blog summarizes V10 enterprise features:

  1. Set Extract Priority Based on Extract Duration.  

This is a very powerful v10 feature for server admin although it is not mentioned enough in Tableau community yet.   What this feature does is for the full extracts in the same priority  to run in order from shortest to longest based on their “last” run duration.

The benefit is to that smaller extracts do not have to wait for long time for big ones to finish. Tableau server will execute the smaller ones first so overall waiting time will be reduced during peak hours.

What server admin have to do to leverage this feature?

  • By default, this feature is off. Server admin has to turn it on. It is not site specific. Once it is on, it applies for all sites. Simplify run the following tabadmin to turn it on:
  •  tabadmin set backgrounder.sort_jobs_by_run_time_history_observable_hours  36
  • Please ready my blog  and Tableau doc for details.

2. Revision History and Version Control

Tableau released one of the most wanted server features – version control and revision history in V9.3. Then this feature is  much more enhanced in V10 with previewing old workbook,  one click restoring, and maximum revisions setting:

  • The workbook previewing and restoring features are so convenience for publishers.
  • The maximum revision setting is so cool for server admin who can actually control the server space usage so you do not have to run out of storage while enabling revision history.

How to deploy those features on server?

  • Turn it on: By default, Revision History is off. It can be turned on site by site. To turn it on, go to site Setting, General and select  “Save a history of revisions“.  If you are on V10, you have two choices of Unlimited and # of revisions. Unlimited means that there is no limit on the max version history, which you probably do not want to have. As a server admin, you always want to make sure that your server will not run out of space. You will find # of revision is a very handy feature so admins can have some peace of mind about server storage.Screen Shot 2016-11-27 at 3.27.57 PM
  • Decide the max revision you want to have which is site specific – it means that you can set diff max revisions for diff sites.
  • How to decide the max revisions to keep? How to find out extra server space for revisions?  Pls read my blog 

3. Cross database Joins and Cross Database Filter

X-DB joins and X-data source filters are two  most requested features by user community. Those are two different but related things.

X-DB joins allows two or more separate data sources to join together in row level. There are still some constraints on which kinds of data sources can be joined in V10 while Tableau plans to extend more in coming releases: V10 only allows extract to be primary data source while joins w other database and does not allow two extracts to join together yet.

What X-DB joins means for server admin?

  • Knowing that server admin has no control for x-db joins. It is totally controlled by publishers. This feature is enabled out of box and server admin  can’t turn it off – hopefully you never need to.
  • Watch server performance. A lot of x-db join activities happen on Tableau server. I was little skeptical about this feature that server admin does not have any control or visibility.  On the other side,  I have not uncounted any issues either after my v10 server upgrade since Nov 2016.
  •  From publisher perspective, the x-db joins can be slow if joins two large datasets.

What is cross database filter?

Use case example: Let’s say you’re connected to multiple data sources, each with common dimensions like Date or Product. And as part of your analysis, you want to have a single filter apply across all the sources.  That’s where this new feature comes in. Any time you have data sets that share a common dimension, you can filter across the data sets.  A few things to know about cross database filter

  • It is not x-db join but more like blending  where you can manage relationship to edit the blending from connected sources
  • You can only filter data across multiple primary data sources.You cannot filter data across secondary data sources.

4. Desktop License Reporting

Enable Desktop License Reporting is included in V10. This is an awesome feature to track Desktop usage even Desktop users do not publish. Pls see details about this @http://enterprisetableau.com/licensing/

The challenge to leverage this feature is how to change each user’s laptop to make the initially configuration. Here is what you need to know:

  • It work only if both Desktop and Server are on v10.
  • This feature is turned off on server by default, you can turn it on  using tabadmin
    tabadmin set features.DesktopReporting true
    tabadmin config
    tabadmin restart
  • The most difficult part is to update Windows Desktop’s registry or Mac Desktop’s plist to point to the Tableau server where you want license usage to be sent to. Best way is  to have Desktop v10 installer. Pls ref my previous blog for details.
  • You should have all company’s Desktop pointing to one Tableau server even Desktop users publish to different servers. This way you will have one place to see all enterprise Desktop usage.
  • By default, Tableau Desktop v10+ will ping Tableau server v10+ for usage reporting every 8 hrs. You can configure intervals on  Desktop.  It is controlled by plist of the Mac or registry of Windows. It is not tabadmin option. See here.

5. Subscribe Others

Finally Tableau delivered this long asking feature in V10. A few things to know:

  • This feature has to be enabled at site level
  • You can create custom email from address for each site. This is handy since users who received the subscription emails may not want to connect server admin rather site admin for questions.
  • Only workbook owners can subscribe others
  • The user has to have an email address in the Account Settings, otherwise subscribe others will not be highlighted.  If a lot of users do not have email address on Tableau server, you may have to mass update all users with valid email address before this feature can really be enabled.
  • You can’t subscribe to groups but users only. If you really want to subscribe group, one workaround is to create dummy user, then give group email to this dummy user.
  • You can’t subscribe to users who are not valid users of the site
  • You can’t subscribe to users who do not have permission to view the workbooks or views
  • The users who are subscribed can click ‘Manager my subscriptions’ link at the bottom of the subscribed emails to de-subscribe anytime.
  • Users can always subscribe themselves if they have view permission to the workbooks or views.

6. Device Specific Dashboard Layout 

After you’ve built a dashboard you can create layouts for it that are specific to particular phone or tablet devices. It will be the same URL but Tableau will render different layout depends on devices used to access the server.

Most of users (specially executive users) use phones to view information. This is great feature to drive Tableau enterprise adoption. A few notes:

  • It is enabled out of the box. There is no server or site level setting to enable or disable this feature.
  • When publish the dashboards, make sure to clear the option ‘ Show Sheets as Tabs’. Otherwise this feature does not work
  • This feature works for Tableau Apps and it also works for mobile devices that do not have Tableau Apps installed.
  • The best practice is to remove some views from default layout so mobile device layout will have fewer views than default layout

What are the design tips:

  • Ask yourself: What key information does my end user need from my dashboard?
  • Click “device preview” to confirm how your dashboard looks across different devices.
  • (For small screens) Remove unnecessary views, filters, titles, and legends.
  • (For small screens) Determine if you need a scrollable dashboard (fit width). If so, stack dashboard objects and use a “peek.”
  • (On touch devices) On scrollable dashboards, pin your maps, and disable pan and zoom.

With device designer, you’ll rest assured knowing your data stands out with optimized dashboards on any device!

6. Dataa Source Analytics

Data source management has been brought into line with Workbooks, so that we now have revision history, usage information and users can have favourite data sources.

You can also change the view for data sources so that you can see them grouped by where they connect to, instead of the data source name.

Tableau has yet to come up with data source lineage features announced in TC16 Austin  – from data source column to tell which workbooks use so you can do impact analysis when data source changes, or from workbooks to tell which data source table or/and columns for us to tell potential duplicated data sources. I am expecting those big new features in 2017.

7. Site Specific SAML

If using SAML authentication, you can make this site specific, instead of for the whole server.  This means that some sites on your Tableau Server can use SAML for single sign on, whilst others will just use normal authentication.

I know that it takes months for enterprise customers to leverage some of those new features. Hope this blog helps. Pls feel free to post your tips and tricks of implementing those features.

SCALING TABLEAU (7/10) – UNDERSTAND SERVER PERMISSIONS

When I think about Tableau permissions, I have two words:

  • Robust –  Tableau’s permission features are very comprehensive and robust. Definitely enterprise grade.
  • Confusion – On the other side, Tableau’s permission is kind of confusing  since it has too many different variables to set permissions.

To understand permissions, let’s start by looking into structures within Tableau server. A server consists of multiple sites (ref Tableau site blog for details). From permission perspective,  one important thing to know is that there is absolutely no ‘communication’ between sites. Nothing can be shared across sites.

Within each site, there are projects. Within project, there are workbooks and data sources, each workbook can have multiple views. Within each site, there are users and group. Sites are partitions or compartmented containers.

site structure

 

 

 

 

 

 

If you think projcet/workbooks are containers, permission is to assign users & groups into containers. Permissions are at all levels: site, project, workbooks, data sources, views. Let’s look into each of those.

1. Site Role

Tableau has many site roles but most common used ones are publisher, interactor in additional to admin.

site role

 

 

 

What is site role and how it works?

  • Site role is site specific. One user can have publisher site role in default but can have interactor site role in other site.
  • Site role can be granted by server admins and maybe site admins if the site site admin is allowed to manage users (site level setting).
  • Site role is ceiling as maximum permissions the user can have  for the site.
  • Interactor site role can never publish even with publisher permission at project level. Now you may start to see confusion part of Tableau permission.
  • Interactor site role can’t save or save as web editing even with “save” allowed at workbook level.
  • Site role does not define what a user can and can’t do at project,  workbook, or data source level. You can think site role as people’s legal right to work in US, while publish permission at project level is employer’s job offer.  Although you have to have legal rights to work in US, it does not mean that you can work for a company unless
    have a job offer from that company.  On the other side, even company gives an offer, Iy will not allowed to work if I do not have legal right to work in US.
  • You can check your own site role at ‘My Account Settings’  but you can’t check other’s site role.

2. Project Level Permissions

Project level permission  deals with who can view, publish, and manager the project. When you click project name, then permission , you will see the project permissions. You can set project roles (Publisher, Viewer and Project Leader) permission, you can also set workbook and data source permission here which will be as default permissions when workbooks or data sources are published to the project.project_permission

  • Publisher:  This is different from site role ‘Publisher’. Project publisher role defines if the  group or user can publish to this project. It is independent from site role ‘publisher’. Site role ‘Interactor’ can still have publisher permission at project level although it does not matter since site role ‘Interactor’ can’t publish to anywhere.
  • Project Leader: 
    • Can set permission to all items in the project
    • Can change refresh schedule: this can be a very handy feature if someone is on vacation and his workbook refresh schedule has to be changed.
    • Can change workbook or data source owner: This is great feature that project leader should do when someone leaving the team or company.
    • Can lock the project permission
  • Lock project permission vs managed by the owner:  The key difference is if you want each publisher to change their  workbook permission or not in your project.  When it is locked, those who have publisher site role and publisher permission to your project can still publish, but they can’t change any workbook permission. All the workbook permissions will default from project level permissions you set for workbooks. So all the workbooks within the project will have exactly the same permission. If you change workbook permissions at project level, it will be applied automatically to all the workbooks/data sources in the project. 
  • When to lock project permission? 
    • For more sensitive contents that you want to make sure permissions can’t be deviated
    • For simplifying permission purpose.
    • Other cases. For example, if you have one project, the workbook permissions are so messed and you want to re-do it. One way is to lock the permission, so all workbook/data source permissions will be cleared up with one simple click. Then you can unlock it to make some additional changes from there.
    • You can’t undo the permissions when you change from ‘managed by the owner’ to locked. Pls take screenshots before change

3. Workbook Level Permissions 

Workbook level has 14 different capacities that you can set independently. To simplify the process, Tableau out of box comes with a few templates (viewer, interactor, editor, none or Denied). When you make any modification to any of those templates, it will be called custom.workbook_permission

  • Download: Tableau workbook has 4 different download controls: Download image/pdf, download summary data, data load full data or download workbook (download workbook/save as is the combined capability).
  • Shared customized: There is shared customized and web edit. Customized view feature comes with filter. If user has filter permission, user can change view filter and can save the preferred filter as customized views, and can even make one of the customized views as default view to this user which is very handy specially for slower views. The shared customized controls if you want user to share his or her shared customized views to all other users who have access to the same views.
  • Web edit: Customized view is different from web edit. Customized view only allows filter type of change while web edit allows change the whole design of the view (like chart type, new calculations, new fields, etc).
  •  Download Workbook/Save As: Download will be highlighted with this permissions. However save as is considered publishing activities. If the user has site role as ‘Interactor’, the user can’t web edit save as or publish from Desktop even this Download Workbook/Save As is allowed at workbook.
  • Save: Save means to trust others to overwrite your workbooks.“Save” feature must know:
    • It works for both Desktop and Web Edit
    • The new user who ‘save’ will become new owner of workbook since workbook only has one owner at any given time
    • What about previous owner’s permission? The new owner can give previous owner any permission or no permission at all for the ‘managed by owner’ project
    • Revision history will create a new workbook revision if it is tuned on
    • “Save” button doesn’t appear except for owners. If you are not the content owner, the “Save As” button will appear. Type same name to overwrite a report, you will be asked to confirm overwriting, then “Save” button will appear.

4. How web edit, save as and save permissions work together

First, does the user have Web Edit permissions on the workbook. If no, then no “Edit” button appears.

Next, does the user have permissions to Publish on the Site. If not,  that user won’t get Save / Save As buttons even if you’ve granted correct Download / Web Save As permissions on the Workbook.

Also, does the user have workbook-level permissions to Download/Web Save As. If not, then No Save / Save As buttons for that workbook.

Finally, which Project can a user save? If you haven’t granted a user permissions to save into a particular project, then it doesn’t matter if all the other permissions are set correctly because the user doesn’t have any place to store their changes. If user has permission to publish to multiple projects, user will have choices of which project to save as.

web edit

 

 

 

 

 

5. Set data source permissions

When you publish a workbook, you can option to publish the data source separately from workbook. Then the published data sources  become reusable for more workbooks, one refresh schedule to update all its connected workbook at the same time so it becomes SSOT and of course less loads to data sources.

When you publish a workbook that connects to a Tableau Server data source, rather than setting the credentials to access the underlying data, you set whether the workbook can access the published data source it connects to.

If you select to prompt users, a user who opens the workbook must have View and Connect permissions on the data source to see the data. If you select embed password, users can see the information in the workbook even if they don’t have View or Connect permissions.

To simplify permission settings: When publish workbooks, select ‘Embedded password’ for published data sources it connects to:

  • When publish workbook, select ‘Embedded password’
  • Only give publisher group ’Connect’ permission at data source level
  • Do nothing for end consumer (‘interactor’) group at data source level

If you select ‘Prompt user’ data auth during workbook publishing while using published data source, a user who opens the workbook must have View and Connect permissions to the data source to see the data. Here is tricky part: You want to make sure that you do not give ‘interactor’ data source level ‘connect’ permission but give ‘interactor’ project level data source ‘connect’ permission. The correct setup is as followings:

  • For interactor group only:
    • Connect permission at project level
    • ‘unspecified’ at data source level
  •  For publisher group only:
    • Connect permission at data source level

The reason is that if you give  interactor group data source level ‘connect’ permission, they will be able to connect to the published data source if they have Desktop, which can potentially by-pass the filters or row level security setup in workbook or published data source. When user has project level data source ‘connect’ permission, the user is not able to connect via Desktop but is able to connect via workbook only. I could not find clear Tableau documentation for this but my test results in v9 and v10 confirmed this setting.

6. Set view permissions

When the workbook is saved without tabs, the default permissions are applied to the workbook and views, but view permissions can then be edited. Permissions for views in workbooks are inherited from the workbook permissions. If a user selects “Show sheets as tabs” when publishing a workbook from Tableau Desktop or saving it on Tableau Server, the workbook permissions override the permissions on individual views anyway.

Best practice is not to get view level permission at all.

6. Summary of best practices:

  • Permission groups, not users
  • Lock project permissions if possible
  • For owner managed projects, permission workbooks, not views
  • Assign project leaders
  • Plan your permissions
  • Use published data sources and ‘Embedded password’ when publish workbook
  • Apply additional row level security
  • Test permissions out
  • Continual reviews

 

SCALING TABLEAU (6/10) – ROW LEVEL SECURITY

Data security has been one of the top concerns for Tableau enterprise adoption. Tableau handles data security by permission and row level security. Permission controls what workbooks/views an user can see. Row level security controls what data sets this user can see. For example APAC users see APAC sales, EMEA users see EMEA sales only while both APAC and EMEA users have the same permission to the same workbook.

Does Tableau row level security works with extracts? Yes. This blog provides everything you need to know to create row level security controls for extracts and live connections, includes a new approach leveraging V10 x-db join features.

Use case : To create one workbook that server users can see subset of the data based on their Region (Central, East, South and West) and segments (Consumer, Corporate and Home Office) they are assigned to.

Solution A – Workbook filter for Row Level Security by Group

  1. Create following 12 Tableau server groups (Central-Consumer, Central-Corporate, Central-HomeOffice, East-Consumer, East-Corporate, East-HomeOffice,….). Central-Consumer group has all the Central region users who are assigned to Consumer segment….
  2.  Create calculated field
    ISMEMBEROF(‘Central-Consumer’) AND [Region] = ‘Central’ AND [Segment] = ‘Consumer’ OR
    ISMEMBEROF(‘Central-Coporate’) AND [Region] = ‘Central’ AND [Segment] = ‘Coporate’ OR
    ISMEMBEROF(‘Central-HomeOffice’) AND [Region] = ‘Central’ AND [Segment] = ‘HomeOffice’ OR
    ISMEMBEROF(‘West-Consumer’) AND [Region] = ‘West’ AND [Segment] = ‘Consumer’ OR
    ISMEMBEROF(‘West-Coporate’) AND [Region] = ‘West’ AND [Segment] = ‘Coporate’ OR
    ISMEMBEROF(‘West-HomeOffice’) AND [Region] = ‘West’ AND [Segment] = ‘HomeOffice’ OR
    ISMEMBEROF(‘East-Consumer’) AND [Region] = ‘East’ AND [Segment] = ‘Consumer’ OR
    ISMEMBEROF(‘East-Coporate’) AND [Region] = ‘East’ AND [Segment] = ‘Coporate’ OR
    ISMEMBEROF(‘East-HomeOffice’) AND [Region] = ‘East’ AND [Segment] = ‘HomeOffice’ OR
    ISMEMBEROF(‘South-Consumer’) AND [Region] = ‘South’ AND [Segment] = ‘Consumer’ OR
    ISMEMBEROF(‘South-Coporate’) AND [Region] = ‘South’ AND [Segment] = ‘Coporate’ OR
    ISMEMBEROF(‘South-HomeOffice’) AND [Region] = ‘South’ AND [Segment] = ‘HomeOffice’
  3. Add the calculated field to filter and select ‘true’
  4. After publish the workbook, set interactor permission to all the above 12 groups.
  5. Make sure Web Editing as No, Download as No.

That is all. ISMEMBEROF returns true if server current user is member of given group. ISMEMBEROF is the key function to use here. It  works for both extracts and live connection.

Notice that the control is a workbook filter. If workbook is downloaded, filter can be changed so the row level security will not work anymore, which is why workbook permission has to set download permission as No.

The better solution is to use data source filter for ISMEMBEROF calculation instead of workbook filter

Solution B – Data Source Filter for Row Level Security by Group

  1. You have the groups and calculated field from Solution A step 1 and step 2
  2. Edit data source filters to include the calculated field and select ‘true’pds
  3. Publish the data sources and set connect only permission (no edit)
  4. After publish the workbook, set permission to all the above 12 groups. There is no need to put the above calculated field to workbook filter anymore since filter is at data source level now.

Published data sources are reusable, single source of truth, less loads to data sources and now you have governed row level security built-in.

The Solution B works extracts. The only thing is that it is little tricky during workbook development process where you will need to make local extract local copy to simulate the user behavior from Desktop, and replace data sources from local to server published data source before publish the workbook, you will need to copy & paste all calculations. Pls reference manual fast way  or a hacky way.

The above approaches control user’s visibility of data sets by Tableau server groups.  It assumes that you will manage the group members outside Tableau. When have too many data security groups to manage manaually,  you can automate the group member creation by using Server REST API or your corp directory automation tool.

When group approach in Solution A & B can’t scale, the following USERNAME() approach will be another good option.

Solution C – Entitlement table x-db join for Row Level Security

Same use case but you want to add category as dimension for row level security in additional to Region and Segment. Now you will need 100+ groups just for row level security purpose which can be a lot to manage.  We are going to use Tableau’s USERNAME() function which returns current server user name. It does not use group anymore but assume that you will have separate user entitlement table below.

UserName Region Segment Category
U123 East Comsumer Furniture
U456 East Comsumer Office Supplier

This ser entitlement table can be Excel or separate database table. We can use V10’s cross database join feature for row level security:

  1. Create cross-db join between main datasource (like extract, MySQL) and use entitlement Excel
  2. Create calculated field
    USERNAME() = [UserName]
  3. If you use workbook filter, just add this calculated field into filter and set ‘true’ – the same as Solution A
  4. Or you use published data source, just edit data source filters to include the calculated field and select ‘true’ – the same as Solution B.
  5. You are done

The USERNAME() will return the server current user name. While [UserName] is the user name column of your use entitlement excel which can be a database table.

Please note: The current version of Tableau v10 does not support x-db joins between two extracts although it does support  x-db joins between an extract and excel (or some selective database). So if your primary data source is an extract, your use entitlement table can’t be  extract anymore.

In additional to ISMEMBEROF, the  USERNAME() is another great Tableau server function for row level security.  V10 x-db join feature extends USERNAME()’s use case a lot of more now since you can create your own use entitlement table outside your main database for agility and self-service.

When use entitlement table is in the same database as main FACT table, you may want to use database’ native join feature for row level security :

Solution D – Query Banding or Initial SQL for Row Level Security

For database (like TeraData) support query band, enter query banding:

 

  • ProxyUser = B_<ProxyUser>
  • TableauMode=<TableauMode>
  • TableauApp=<TableauApp>
  • Tableau Version=<TableauVersion>
  • WorkbookName=Name of DataSource

For database( Vertica, Oracle, SQL Server, Sybase ASE, Redshift, and Greenplum, etc)  support Initial SQL:

    • [TableauServerUser] returns the current Tableau Server user’s username only.
    • [TableauServerUserFull]
    • [TableauApp]
    • [WorkbookName}

As summary, ISMEMBEROF and  USERNAME() are two Tableau functions for row level security:

  • ISMEMBEROF returns true if server current user is member of given group. It needs server groups to be setup.
  • USERNAME() returns server current user name. It needs entitlement table. V10 x-db joins allows the entitlement table to be outside main data source.
  • Both can be implemented as Data Source Filter or workbook filter.
  • Both work for extracts and live connections.

Although USERNAME() returns server current user name, it does not pass the current user name to live connected datasource outside Tableau server.  In order to pass the server current user name to data source, you will have to use query banding or initial SQL depends on database you use. Query banding or initial SQL works only for live connections and does not work for extracts.

SCALING TABLEAU (5/10) – LICENSE MANAGEMENT

Tableau license management has been a big pain point to scale Tableau. This blog covers the followings:

  • Tableau license types
  • What is your End User License Agreement
  • How to get most out of your Tableau licenses
  • Desktop and Server license management – The Enterprise Approach
  1. Tableau license types

Tableau has following licenses:

  • Desktop Professional: The most common Desktop license that can connect to about 50 data sources.
  • Desktop Personal: The less used Desktop license that can connect to a few data sources only. It is about half price of Desktop Professional.
  • Tableau Server seat based : Small to medium scale sharing and collaboration purpose. One publisher or one interactor takes one seat. If you purchased 100 seat based licenses, you can assign a total 100 named users on server – you can change them as long as total does not exceed 100 users at any given time.
  • Tableau server core based: Medium to large scale sharing and collaboration purpose. If you have 16 cores, you can have unlimited number of interactors or publishers as long as your server is installed on < 16 core machines.
  • Tableau online: Similar to Tableau Server seat-based but it is on Tableau’s cloud platform.
  • Enterprise License Agreement (ELA): You pay a fixed amount to Tableau for 3 years then you will agreed-upon of Desktop and Server licenses. Tableau starts to see ELA to large enterprises.
  • Subscription: Tableau may move to subscription model by selling licenses valid for a period of time only.

2. What is your End User License Agreement

Nobody wants to read the End User License Agreement. Here is summary of what you should know:

  • Each Desktop license can be installed in two computers of the same user.  You may get a warning when you try to activate 3rd computer.
  • If a Desktop license key is used by Joe who left company or does not use it anymore, this key can be transferred to someone else. The correct process is to deactivate the key from Joe’s machine and reactive it on someone else machine.
  • If you have .edu email, you are lucky as you can get free Desktop as students or teachers.
  • If  you are part of small non-profit org, you can almost get free Desktop licenses.
  • Each server key can be installed in 3 instances: one prod and two non-prod.
  • What if you have to have 4 instances: prod, DR, test, dev? Let’s say you have two core-based keys: key A 8 cores and key B 8 cores. You can activate both keys in prod and DR w 16 each, then you can have key A 8 cores only for test and key B 8 core only for dev. You are good as long as one server key is used in 3 or less instances.
  • What if you do not want to pay maintenance fee anymore? Since it is perpetual licenses, you are still entitled to use the licenses even you do not want to pay maintenance fee. What you are not entitled anymore is upgrade and support.

3. How to get most out of your Tableau licenses

  • If the registration info (name, email, last installed, product version) in Tableau Customer Portal – Keys report is null, it means that this key is never used so you can re-assign it to someone else. You may be surprised how many keys are never used……
  • If the registration info (name, email, last installed, product version) in Tableau Customer Portal – Keys report is associated with someone who left company and this key has single registration, you can re-assign it to someone else.
  • If the registered product version is very old, likely the key owner is not active Desktop user.
  • Enable Desktop license reporting work when you upgrade to v10 to see who does not use Desktop for last a few months. Then potentially you can get license transferred (see below for more).

4. Desktop and Server license management – enterprise approach

When you have hundreds of Desktop licensees, you will need following approaches to scale:

  • Co-term all of your licenses for easy renewals.  Co-term means to have the same renewal date for all of your Desktop & Server: both what you have  and new purchases. This may take a few quarters to complete. Start to pick one renewal date, then agree with your Tableau sales rep, renewal rep,  purchasing department and users for the one renewal date.
  • The Tableau champion to have visibility on every team’s Tableau licenses in Customer Portal. Tableau’s land and expand sales approach creates multiple accounts in Customer Portal. Each team can only see their own keys & renewals. If you drive enterprise Tableau, ask for access for all accounts in Customer Portal.
  • Automate Desktop Installation, Activation and Registration process. No matter you are in Windows or Mac environment, you can automate Desktop installation, activation and registration via  Command lines. Read details.
  • Transit to Single Master Key. Tableau Desktop supports single master key. Instead of having 500 individual Desktop keys, you can consolidate all into one single master key which can be activated by 500 users. The pre-request is co-term all individual keys. A few important notes:
    • When single master key is created, make sure to ask Tableau to turn on hidden key feature so Desktop users will not see the key anymore. You do not want the single master key to be leaked out. See screenshot on Desktop where ‘Manage Product Keys’ menu does not show up anymore:screenshot_1028
    •  What it also means is that you will have to use quiet installer so key can be activated w/o user’s interaction.
    • If you have some users who have two computers at work and both have Tableau Desktop installs. Tableau may consider one user as two installs which will mess up your total license counts. Tableau license team can help you out.
  • Enable Desktop License Reporting in V10. This is an awesome feature to track Desktop usage even Desktop users do not publish. The challenge is how to change each user’s laptop. Here is what you need to know:
    • It work only if both Desktop and Server are on v10. It will be better on v10.0.2 or above as earlier v10 versions are buggy.
    • This feature is turned off on server by default, you can turn it on  using tabadmin
      tabadmin set features.DesktopReporting true
      tabadmin config
      tabadmin restart
    • The most difficult part is to update Windows Desktop’s registry or Mac Desktop’s plist to point to the Tableau server where you want license usage to be sent to. Best way is  to have Desktop v10 installer (ref the Automate Desktop Installation, Activation and Registration process).
    • You should have all company’s Desktop pointing to one Tableau server even Desktop users publish to different servers. This way you will have one place to see all enterprise Desktop usage.
    • By default, Tableau Desktop v10+ will ping Tableau server v10+ for usage reporting every 8 hrs. You can configure intervals on  Desktop. screenshot_1029 Windows example
      Mac plist example:
    • <?xml version="1.0" encoding="UTF-8"?>
      <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
        <plist version="1.0">
          <dict>
            <key>Server</key>
            <string>https://mytableau02:8010,http://mytableau</string> 
            <key>scheduleReportInterval</key>
            <string>3600</string>
          </dict>
      </plist>
    • The Desktop usage (every 8 hrs) is not sent to Tableau company but to your own Tableau server only. What is sent to Tableau company from Desktop is only the registration info. Of course, the registration info is also sent to your defined Tableau server(s).
    • What table has Desktop usage? The Postgres table name is desktop_reporting
    • What dates desktop_reporting has? It  has 4 columns for dates:
      • Maintenance expiration date
      • Expiration date (3 month after maintenance expiration date)
      • Registration date (when registered)
      • Last report date (when last time Desktop used).  Notice it captures only last time when Desktop is used. If you want to know how often Desktop is used in past 3 months, you can’t tell …..
    • How can tell historical Desktop usage? What you can do is to build incremental refresh for the desktop_reporting by last report date, then you will build out your own history for better Desktop license reporting. I am sure that Tableau is working on getting this small table to historical as well…..

As summary, Tableau Desktop and server license management is not a simple task. Hopefully those tips and tricks of The Enterprise Approach will easy your pains. It is a good practice to build those out step by step when you are not too big or not too messy.

SCALING TABLEAU (4/10) – USE SITES

Tableau server has a multi-tenancy feature called “sites” which can be leveraged by enterprise customers for better scalability, better security and advanced self-service.

This blog covers following areas about Tableau sites:

  • Basic concepts
  • Common use cases
  • Governance processes and settings
  • When should not create a new site

1. Basic concepts about Tableau sites

Let’s start with some basic concepts. Understanding those basic concepts will provide better clarity, avoid confusions, and reduce hesitations to leverage sites.

Sites are partitions or compartmented containers. There is absolutely no ‘communication’ between sites. Nothing can be shared across sites.

Site admin has unrestricted access to the contents on the specific site that he or she owns. Site admin can manage projects, workbooks, and data connections. Site admin can add users, groups, assign site roles and site membership. Site admins can monitor pretty much everything within the site: traffic to views, traffic to data sources, background tasks, space, etc. Site admin can manage extract refresh scheduling, etc.

One user can be assigned roles into multiple sites. The user can be site admin for site A and can also have any roles in site B independently. For example, Joe, as a site admin for site A, can be added as a user to site B as admin role (or Interactor role). However Joe can’t transfer workbooks, views, users, data connections, users groups, or anything between site A and site B sites. When Joe login Tableau, Joe has choice of site A or B: When Joe selects site A, Joe can see everything in site A but Joe can’t see anything in site B – It is not possible for Joe to assign site A’s workbook/view to any users or user groups in site B.

All sites are equal from security perspective. There is no concept of super site or site hierarchy. You can think of a site is an individual virtual server.  Site is opposite of ‘sharing’.

Is it possible to share anything across sites? The answer is no for site admins or any other users. However if you are a creative server admin, you can write scripts run on server level to break this rule. For example, server admin can use tabcmd to copy extracts from site A to site B although this goes to the areas where Tableau does not support anymore officially.

2. Common use case of Tableau sites. 

  • If your Tableau server is an enterprise server for multiple business units (fin, sales, marketing, etc), fin does not wants sales to see fin contents, create sites for each business unit so one business unit site admin will not be able to see other business unit’s data or contents.
  • If your Tableau server is an enterprise platform and you want to provide a governed self-service to business. Site approach (business as site admin and IT as server admin) will provide the maximum flexibility to the business while IT can still hold business site admins accounted for everything within his or her sites.
  • If your server deals with some external partners, you do not want one partner to see other partner’s contents at all. You can create one site for each partner. This will also avoid potential mistakes of assigning partner A user to partner B site.
  • If you have some very sensitive data or contents (like internal auditing data), a separate site will make much better data security control – from development phase to production.
  • Using sites as Separation of Duties (SoD) strategy to prevent fraud or some potential conflicting of interests for some powerful business site admins.
  • You just have too many publishers on your server that you want to distribute some admin work to those who are closer to the publishers for agility reasons.

Arguably, you can achieve all of those above by using Projects w/o using sites. Why sites again?  First, Sites just make things easier for large Tableau server deployment. Many out of box server admin views go by site. So it will be easier to know each BU’s usage if you have site by BU. Second, if you have a few super knowledgable business users, you can empower them better when you grant them site admin access.  

3. Governance processes around Tableau sites.

Thoughtful site management approaches, clearly defined roles and responsibilities, documented request and approval process and naming conversions have to be planned ahead before you go with site strategy to avoid potential chaos later on. Here is the checklist:

    • Site structure: How do you want to segment a server to multiple sites? Should site follow organization or business structure? There is no right or wrong answer here. However you do want to think and plan ahead.
    • How many sites you should have? It completely depends on your use cases, data sources, user base, levels of controls you want to have. As a rule of thumb, I will argue anyone who plans to create more than 50 sites on a server would be too many sites although I know a very large corporation has about 300 sites that work well for them. I will prefer to have  less than 20 sites.
    • Who should be the site admin? Either IT or business users (or both) can be site admins. One site can have more than one admin. One person can admin multiple sites as well. When a new site is created, server admin normally just adds one user as site admin who can add others as site admins.
    • What controls are at site level? All the following controls can be checked or unchecked at site level:
      • Storage limitation
      • Revision history on or off and max numbers of revisions
      • Allow the site to have web authoring. When web authoring is on, it does not mean that all views within the site are web editable. The workbook/view level has to be set web editing allowed by specific users or user groups before the end user can have web editing.
      • Allow subscriptions. Each site can have one ‘email from address’ to send out subscriptions from that site.
      • Record workbook performance key events metrics
      • Create offline snapshots of favorites for iOS users.
      • Site-specific SAML with local authentication
      • Language and locale
    • What privileges server admin should give to site admins? Server admin can give all the above controls to site admin when the site is created. Server admin can change those site level settings as well. Server admin can even take back those privileges at anytime from site admin.
    • What is new site creation process? I have new site request questionnaires that requester has to answer (see below). The answers help server and governance team to understand the use cases, data sources, user base, and data governance requirements to decide if their use cases fit Tableau server or not, if they should share an existing site or a new site should be created. The key criteria are if same data sources exist in other site, if the user base overlaps with other site. It is balance between duplication of work vs. flexibility.
    • What is site request questionnaires?
      • Does your bigger team have an existing Tableau site already on Tableau server? If yes, you can use the existing site. Please contact the site admin who may need to create a project within the existing site for your team. List of existing sites and admins can be found @……. 
      • Who is the primary business / application contact?
      • What business process / group does this application represent? (like sales, finance, etc)?
      • Briefly describe the purpose and value of the application
      • Do you have an IT contact for your group for this application? Who is it?
      • What are the data sources?
      • Are there any sensitive data to be reporting on? If yes, pls describe the data source
      • Are there any private data as part of source data? (like HR data, sensitive finance data)
      • Who are the audiences of the reports? How many audiences do you anticipate? Are there any partners who will access the data
      • Does the source data have more than one Geo data? If yes, what is the plan for data level security?
      • What are the primary data elements / measures to be reporting on (e.g. booking, revenue, customer cases, expenses, etc)
      • What will be the dimensions by which the measure will be shown (e.g. Geo, product, calendar, etc)
      • How often the source data needs to be refreshed?
      • What is anticipated volume of source data? How many quarters of data? Roughly how many rows of the data? Roughly how many columns of the data?
      • Is the data available in enterprise data warehouse?
      • Are the similar reports available in other existing reporting platform already?
      • How many publishers for this application?

4. When should not create a new site?

  • If the requested site will use the same data sources as one of the existing sites, you may want to create a project within the existing site to avoid potential duplicate extracts (or live connections) running against the same source database.
  • If the requested site overlaps end users a lot with one existing site, you may want to create a project within the existing site to avoid duplicating user maintenance works.
  • The requester does not know that his or her bigger team has a site site

As a summary, Tableau site is a great feature for large Tableau server implementations. Sites can be very useful to segment data and contents, distribute admin work, empower business for self-service, etc. However site misuse can create a lot extract work or even chaos later on. Thoughtful site strategy and governance process have to be developed before you start to implement sites although the process evolves toward its maturity as you go.

SCALING TABLEAU (3/10) – USE PUBLISHED DATA SOURCES

Tableau helps us to see and understand our data which is great. A lot of great things are happening every day when creative analysts have powerful Tableau Desktop  with unlocked enterprise source data  and Tableau server collaboration environment.

As Tableau adoption goes from teams to BU to enterprise, you quickly run into scalability challenges : Extract delays and enterprise data warehouse (EDW) struggles to meet ad-hoc workloads, etc.

My last blog talks about setting extract priority on server to improve 50% extract efficiency. This blog will focus on best practices for data source connection to scale EDW & server – use published data sources.

  1. What is Tableau published data source?

It is nothing but Tableau’s semantic layer. For those who have been in BI space for a while, you may be familiar with Oracle BI’s Repository or Business Objects’ Universe. The problem of   Repository or Universe is that they are too complex and are designed for specially trained IT professions only. Tableau is a new tool designed for business analysts who do not have to know SQL.  Tableau has much simplified semantic layer. Tableau community has never focused enough on published data sources till recent when people start to realize that leveraging  published data source is not only a great best practice but almost must to have in scaling Tableau to enterprise.

screenshot_941

 

 

 

 

 

 

2.  Again, what makes up Tableau published data source?

  • Information about how to access or refresh the data:  server name & credentials, Excel path, etc.
  • The data connection information:  table joins,  field friendly names, etc
  • Customization and cleanup : calculations, sets, groups, bins, and parameters; define any custom field formatting; hide unused fields; and so on.

3. Why Tableau published data source?

  • Reusable: Published data sources are reusable connections to data. When you prep your data, add calculations, and make other changes to your fields, these changes are all captured in your data source. Then when you publish the data source, other people can use it to conduct their own analysis.
  • Single source of truth (SSoT): You can have data steward who defines the data model while workbook publishers who can consume the publish data source to create viz and analysis.  Here is an example of how to set up permission to achieve SSoT.

screenshot_943

  • Less workload to EDW: When you use extracts , one refresh of the published data source will refresh all data to its connected workbooks, which reduces a lot workloads to your EDW. This can be a very big deal to your EDW.

screenshot_945

 

 

 

4. How many data sources are embedded vs published data sources? You can find it out from Data_Connections table. Look for the DBCLASS column, when value = ‘sqlproxy’, it means that it is a published data source.  Work with your server admin if you do not have access to workgroup table of Tableau Postgre  database.

If you have <20% data sources are published data sources, it means that published data sources is not well leveraged yet in your org or BU.

5. How to encourage people to use published data sources?

  • Control who can access to EDW: Let’s say you have a team of 10  Desktop users, you may want to give 2 of them the EDW access so you do not have to train all 10 people about table structure details  while have the rest of 8 people to use published data sources created by the two data stewards.
  • If extracts are used, you can create higher priority to all published data sources as incentive for people to use published data sources. See my previous blog for details.
  • Make sure people know the version control feature works for data source as well
  • As data stewards,  add comments to columns – here is how comment looks like when Screen Shot 2016-12-10 at 5.31.24 PMmouse over in Desktop Data pan:

 

 

 

Here is how to add comments:Tableaucomments1

 

 

 

Conclusions: Published data sources are  not new Tableau feature but are not widely used  although they are reusable, SSoT, scalable, less workload to your DB server. Tableau has been improving its publishing workflow by making data source publishing much easier than before since 9.3. Tableau v10 even gives you a new option to publish your data sources separately or not during workbook publish workflow. Data source revision history is great feature to control data source version.  Tableau has announced big roadmap about data governance in TC16. However self-service practitioners do not have to wait any new Tableau features in order to leverage the published data sources.

Scaling Tableau (2/10) – Set Extract Priority Based on Duration

Are you facing the situation that your Tableau server backgrounder jobs have much longer delay during peak hours ?

There are many good reasons why extracts are scheduled at peak hours, likely right after nightly ETL completions or even triggered automatically by ETL completions.

You always have limited backgrounders that you can have on your server. How to cut average extract delay without adding extract backgrounders and without rescheduling any extract jobs?

The keyword is job PRIORITY. There are some good priority suggestions in community(like https://community.tableau.com/thread/152689).  However what I found the most effective approach to prioritize the extracts was duration based priority in additional to business criticality – I managed to reduce  50% extract waiting time  after increased priority for all extracts with duration below median average runtime.

Here is what I will recommend as extract priority best practices:

  1. Priority 10 for any business critical extracts :  Hope nobody will disagree with me to give highest priority to business critical extracts..
  2. Priority 20 for all incremental extracts : Not only normally incremental takes less time than full, but also it is an awesome incentive to encourage more and more people use incremental extracts
  3. Priority 30 for any  extracts with duration below median average (this is 50% of all extract jobs). This  is another great incentive for publishers to make their extracts more effective.  It is the responsibilities of both server admin and publishers to make backgrounder jobs more effective. There are many things that publishers can do to improve the extract efficiency : tune extracts to be more efficient, use incremental vs full extracts, hide unused columns, add extract filters to pull less data,  reduce extract frequency, schedule extracts to off-peak hours, or better run extracts outside of Tableau by using Tableau SDK (see my blog @http://enterprisetableau.com/sdk/), etc.
  4. Priority 50 for all the rest (default)
  5. Turn on tabadmin set backgrounder.sort_jobs_by_run_time_history_observable_hours  36 which will prioritize full extracts in the same priority  to run in order from shortest to longest based on their “last” run duration.

The combination of #3 and #5 will reduce your extract waiting time dramatically during peak hours.

What is this backgrounder sort by run time option (#5 above)? I am sure that you want to read official Tableau online help here.

In short, Tableau server can sort full extract refresh jobs with the same priority (like 50) so they are executed based on the duration of their “last run,” executing the fastest full extract refresh jobs first.

The “last run” duration of a particular job is determined from a random sample of a single instance of the full extract refresh job in last <n> hours which you can config.  By default this sorting is disabled (-1). If enabling this, Tableau’s suggested value is 36 (hours)

Let’s say that you have the following jobs scheduled at 5am, here is how extracts are prioritized:

Priority Job Name Duration (min) Background priority with sort_jobs_by_run_time option ON Background priority with sort_jobs_by_run_time option OFF
10 Job 10.1 2 1 Those 4 jobs will go fist one by one w/o any priority among them. i.e. it could be Job 10.4, then Job 10.2, then Job 10.3 and Job 10.1
Job 10.2 3 2
Job 10.3 9 3
Job 10.4 15 4
20 Job 20.1 1 5 Those 4 jobs will go one by one after all pririty 10 jobs. Again no priority among them.
Job 20.2 2 6
Job 20.3 14 7
Job 20.4 15 8
30 Job 30.1 2 9 Those 4 jobs will go one by one after all pririty 20 jobs. Again no priority among them.
Job 30.2 3 10
Job 30.3 5 11
Job 30.4 8 12
50 Job 50.1 1 13 Those 10 jobs will go one by one after all pririty 30 jobs. Again no priority among them.
Job 50.2 9 14
Job 50.3 20 15
Job 50.4 25 16
Job 50.5 30 17
Job 50.6 50 18
Job 50.7 55 19
Job 50.8 60 20
Job 50.9 70 21
Job 50.10 80 22

For example, the max waiting time for the 1 min Job 20.1 will be 29 mins (all priority 10 jobs) with sort_jobs_by_run_time option ON. However the max waiting time could be 60 min with sort_jobs_by_run_time option OFF (all priority 10 jobs + other priority 2 jobs).

Re-cap on how extracts are run in this order:

  1. Any task already in process is completed first.
  2. Any task that you initiate manually using Run now starts when the next backgrounder process becomes available.
  3. Tasks set with the highest priority (the lowest number) start next, independent of how long they have been in the queue. For example, a task with a priority of 20 will run before a task with a priority of 50, even if the second task has been waiting longer.
  4. Tasks with the same priority are executed in the order they were added to the queue except if tabadmin set backgrounder.sort_jobs_by_run_time_history_observable_hours  36 is turned on. When the above option is on, the fastest full extract refresh jobs go first.

A few final practice guide:

  • Step 1: If most of your extracts have priority 50. You may want to try just turn on tabadmin set backgrounder.sort_jobs_by_run_time_history_observable_hours  36 to see how much waiting time improvement you can gain.
  • Step 2: If step 1 does not give you what you are looking for, try to change extract priority to higher priority (like 30 ) for any  extracts with duration below median average. This will give you big waiting time reduction.   You can start to  change the extract priorities manually to see how it goes. Just be aware that any re-publishing of extracts will change priority back to default 50.
  • How to automate Step 2?   I have not seen API for it. However I just had a program to update tasks.priority directly.
  • Why I do not recommend to have higher priority for more frequent jobs?  I know that it is one of the recommended best practices by a lot of Tableau practitioners. However I just think that it drives a wrong behavior – it will encourage publishers to increase the extract from weekly to daily or to hourly just in order to get their jobs higher priority, which in turn causing more extract delays. I think that job duration and incremental high priority give much better incentive for publishers to make their extracts more effective, which becomes a positive cycle.

Scaling Tableau (1/10) – version control and revision history

Tableau released one of the most wanted server features – version control and revision history in V9.3. Then this feature is  much more enhanced in V10 with previewing old workbook,  one click restoring, and maximum revisions setting. I love all of those new V10 features:

  • The workbook previewing and restoring features are so convenience for publishers.
  • The maximum revision setting is so cool for server admin who can actually control the server space usage so you do not have to run out of storage while enabling revision history. It also shows Tableau’s thought process for built-in governance process while enabling a new feature, which is important to scale Tableau to enterprise.   I will explain those features in details here:
  1. Turn it on. By default, Revision History is not turned on. It can be turned on site by site. To turn it on, go to site Setting, General and select  “Save a history of revisions“.  If you are on V10, you have two choices of Unlimited and # of revisions. Unlimited means that there is no limit on the max version history, which you probably do not want to have. As a server admin, you always want to make sure that your server will not run out of space. You will find # of revision is a very handy feature so admins can have some peace of mind about server storage.Screen Shot 2016-11-27 at 3.27.57 PM

2. How to decide the max. number of revisions?

I asked this question but I did not find any guidances anywhere. I spent days of research and I wanted to share my findings here. First  of all,  my philosophy is to give the max flexibility to publishers by providing as many revisions as possible. On the other side, I also want to be able to project extra storage that the revision history will create for planning purpose.

How many revision you should set? It depends on how much space you can allocate to revision history w/o dramatically impacting your backup/restore timing and how many workbooks the server have. Let’s say that you are Ok to give about 50G to all revision history. Then figure out how many workbooks you have now, and what is the total space for all the xml portion of workbooks (revision history only keeps xml piece), then you can calculate max number of revisions. Here is how:

  • Open Desktop, connect to PostgreSQL, give your server name, port, workgroup as database, give readonly user and password. Select  Workbooks table, look for Size, Data Engine Extracts, and number of records.  The Data Engine Extracts of  Workbooks table tells you if the workbook is embedded workbook or not.
  • If you have total 500 workbooks with 200 of them have Data Engine Extracts as false and total size as 200M for all workbooks with Data Engine Extracts as false.  It means that the avg twb is about 1M per workbook – this is what revision history will keep once it is turned on. Then the total xml size of workbook is about 500M.
  • When you turn on revision history and if you set max revision as 50, overtime, the server storage for revision history would be about 50 x 500 x 1M = 50G overtime.  Two other factors to consider: One is new workbook creation rate, two is that not every workbook would max out revision.
  • Once you set the revision number, you can monitor the storage usage for all revision history by looking at  Workbook_versions table which keeps all the revision history.  You can find the overall size, number of versions, and more insights about use pattens. You can also do the following joins to find out workbook name and use name, etc.

Screen Shot 2016-11-27 at 10.10.39 PM

3.  Can interactors see the previous version as well? No. The end users of interactors can only see the current version.

4. Does publish have to do anything to keep revision history of his or her workbooks? No. Once ‘Save a history for revision’ is turned on for site, every time the  workbook is web edited or modified via Desktop, a new revision w be created automatically – there is no further action for publisher. When the max number of revision is reached out, the oldest version will be deleted automatically. There is no notification to publishers either. All you need to communicate to publisher is that max number of revisions that any publisher can have.  For example, if you keep  50 revisions and one workbook has 50 revision already. When this workbook is changed again, Tableau server will keep the most recent  50 revisions only by deleting the oldest revision automatically.

5. Can you change the max revisions? Yes. Let’s say you have max revision as 50 and you want to reduce it to 25. Tableau server will delete the old revisions (if there are any) and keep the most recent 25 revisions only. What happens if you change back from 25 to 50? All the older revisions are gone and will not show up anymore.

6. What is workflow for publisher to restore an old workbook? Publishers or admin can see revision history for their workbooks by click details, revision history. With one simple click to preview any old workbook or restore. Once it is restored, a new revision will be created automatically again.

7. How to restore data source revision? V10 came with review and restore features for workbooks only. You can view all revisions for data sources as well but you will have to download the data source and upload it gain if you want to restore older version of data source. I am sure Tableau’s scrum team has been working on one click restoring of data source as well.

GOVERNED SELF-SERVICE ANALYTICS: Maturity Model (10/10)

My last 9 blogs covered all aspects of governed self-service and how to scale from department self-service to enterprise self-service. I received some very positive feedback and I am glad that my blogs inspired some readers:

Devdutta Bhosale says: “I read your article governed self-service analytics and as a Tableau server professional could instantly relate with some of challenges of implementing Enterprise BI with Tableau. Compared to Legacy BI tools such as BO, Micro-strategy, etc. enterprise BI is not the strength of Tableau especially compared to “the art of possible” with visualizations. I am so glad that you are writing so much in this space …. The knowledge you have shared has helped me follow some of the best practices with my recent Enterprise BI implementation at employer. I just wanted to say ‘thank you’ “.

Other readers also ask me how to measure governed self-service maturity. There are some BI maturity models by TDWI, Gartner’s, etc. However I have not seen any practical self-service analytics model. Here is my first attempt for the self-service analytics maturity model. I spent a lot of time thinking through this model and I read a lot too before I put this blog together.

I will describe the self-service analytics maturity model as followings: screenshot_184

  • Level 1: Ad-hoc
  • Level 2: Department Adoption
  • Level 3: Enterprise Adoption
  • Level 4: Culture of Analytics

Level 1 ad-hoc is where one or a few teams started to use Tableau for some quick visualization and insights. In other words, this is where Tableau initially landed. When Tableau’s initial value is recognized, Tableau adoption will go to business unit level or department level (level 2), which is where most of Tableau’s implementation is today. To scale further to enterprise adoption (level 3) needs business strategy alignment, bigger investment, and governed self-service model which is what this serious of blogs is about. The ultimate goal is to drive the culture of analytics and enable data-driven decision-making, which is level 4.

What are the characters of each maturity level? I will look into data, technology, governance, and business outcome perspectives for each of those maturity levels:

screenshot_204

Level 1: Ad-hoc

  • Data
    • Heroics data discovery
    • Data inconsistent
    • Poor data quality
  • Technology
    • Team based technology choice
    • Shadow IT tools
    • Exploration
  • Governance
    • No governance
    • Overlapping projects
  • Outcome
    • Focuses on what happened
    • Analytics does not reflect business strategy
    • Business process monitoring metrics

Level 2: Department Adoption

  • Data
    • Data useful
    • Some data definition
    • Siloed data management
    • Limited data polices
  • Technology
    • Practically IT supported architecture
    • Immature data preparation tools
    • Data mart like solutions
    • Early stage of big data technology
    • Scalability challenges
  • Governance
    • Functions and business line governance
    • Immature metadata governance
    • Islands of information
    • Unclear roles and responsibilities
    • Multiple versions of KPIs
  • Outcome
    • Some business functions recognizes analytics value and ROI
    • Analytics is used to inform decision-making
    • More on cause analysis & some resistant on adapting all insights
    • Data governance is managed in a piecemeal fashion

Level 3: Enterprise Adoption

  • Data
    • Data quality certification
    • Process & data measurement
    • Data policies measured & enforced
    • Data exception management
    • Data accuracy & consistency
    • Data protection
  • Technology
    • Enterprise analytics architecture
    • Managed analytics sandboxes
    • Enterprise data warehouse
    • Content catalog
    • Enterprise tools for various power users
    • Advanced technology
    • Exploration
  • Governance
    • Executive steering committee
    • Governed self-service
    • CoE with continuous improvement
    • Data and report governance
    • Enterprise data security
    • Business and IT partnership
  • Outcome
    • Analytics insight as a competitive advantage
    • Relevant information as a differentiator
    • Predictive analytics to optimize decision-making
    • Enterprise information architecture defined
    • Mature governed self-service
    • Tiered information contents

Level 4: Culture of Analytics

  • Data
    • Information life-cycle management
    • Data lineage & data flow impact documented
    • Data risk management and compliance
    • Value creation & monetizing
    • Business Innovation
  • Technology
    • Event detection
    • Correlation
    • Critical event processing & stream
    • Content search
    • Data lake
    • Machine learning
    • Coherent architecture
    • Predictive
  • Governance
    • Data quality certification
    • Process & data measurement
    • Data policies measured & enforced
    • Data exception management
    • Data accuracy & consistency
    • Data protection
    • Organizational process performance
  • Outcome
    • Data drives continuous business model innovation
    • Analytical insight optimizes business process
    • Insight in line with strategic business objectives
    • Information architecture underpins business strategies
    • Information governance as part of business processes

This will conclude the governed self-service analytics blogs. Here is key takeaways for the governance self-service analytics:

  1. Enterprise self-service analytics deployment needs a strong governance process
  2. Business and IT’s partnership is the foundation for a good governance
  3. If you are IT, you need to give more trust to your business partners
  4. If you business, be good citizen and follow the rule
  5. Community participation and neighborhood watch is important part of the success governance
  6. Governance  process evolves as your adoption goes

Thank you for reading.

Governed Self-Service Analytics: Content Management (9/10)

When executives get reports from IT-driven BI system, they trust the numbers. But if the reports are from spreadsheet, which can change anytime, they lower the trust level. If same spreadsheet is used to create Tableau visualization and be shared to executives for decision-making, does the trust level get increased? Can important business decisions be made based on the Tableau reports?

I am not against Tableau or visualization at all. I am a super Tableau fan. I love Tableau’s mission to help people to see and understand their data better. On the other side, as we all know that any dashboard is only as good as its data. How to provide trustworthy contents to end consumers? How to avoid the situation that some numbers are put into10K report while team is still baking the data definition?

The answer is to create a framework of content trust level indicator for end consumers. We do not want to slow down any innovation or discovery by self-service business analysts who still create their own analytics and publish workbooks. After dashboard is published, IT tracks the usages, identifies most valuable contents per defined criteria, certifies the data & contents so end consumers can use the certified reports the same way as reports from IT-driven BI. See the diagram below for overall flow:

Content

When you have a data to explore or you have a new business question to answer, hopefully you have report catalog to search if similar report is available to leverage. If yes, you do not have to develop it anymore although you may need to request an access to the report if you do not have access to it. If the visualization is not exactly what you are looking for but data attributes are there, you can always modify it to create your own version of visualization.

If there is no existing report available, you can also search published data source catalog to see if there is available published data source for you to leverage. If yes, you can create new workbooks by leveraging existing published data connections.

You may still need to bring your own data for your discovery. The early stage of discovery and analysis goes multi-iteration. Initial user feedback helps to reduce the overall time to market for your dashboards. At some point of time when your dashboard is good enough and is moved to production folder to share with a lot of more users, it will fall into track, identify and certify cycle.

Content cycle

What to track? Different organizations will have different answers. Here are examples:

  • Data sources with high hits
  • Reports accessed most frequently
  • Most active users
  • Least used reports for retirement

How to identify the most critical reports?

  • Prioritize based on usage (# of users, use cases, purpose, x-functional, benefits)
  • Prioritize based on data source and contents (data exist in certified env, etc)
  • Prioritize based on users. If CEO uses the report, it must be critical one for the organization

How to certify the critical reports? It is an on-going process:

  • Incrementally add self-service data to source of truth so data governance process can cover the new data sets (data definitions, data stewardship, data quality monitoring, etc)
  • Recreating dashboards (if needed) for better performance, add-on functionality, etc
  • Label the report with report trustworthy indicator

The intent of tracking, identifying and certifying cycle is to certify the most valuable reports in your organization. The output of the process is the report trustworthy indicator that helps end consumers to understand the level of trustworthy of data and reports.

End information consumers continue to use your visualizations that would be replaced with certified reports steps by steps, which is an on-going process. The certified reports will have trustworthy indicators on them.

What is the report trustworthy indicator? You can design multi level of trustworthy indicators. For example:

  • SOX certified:
    • Data Source Certified
    • Report Certified
    • Release Process Controlled
    • Key Controls Documented
    • Periodic Reviews
  • Certified reports:
    • Data Source Certified
    • Report Certified
    • Follow IT Standard Release Process
  • Certified data only
    • Data Source Partially Certified
    • Business Self-Service Releases
    • Follow Tableau Release Best Practices
  • Ad-Hoc
    • Business Self-Service Releases
    • Follow Tableau Release Best Practices

Content gov
As summary, content management helps to reduce the duplications of contents and data sources, and provide end information consumers with trustworthy level of the reports so proper decisions can be made based on the reports and data. The content management process outline above shows how to create the enterprise governance without slowing down innovations.

Please read next blog about governance mature level.

Governed Self-Service Analytics: Data Governance (8/10)

I was in the panel discussion at Tableau Conference 2015 about self-service analytics to a group of executives. Guess what is the no.1 most frequent asked question – data governance. How to make sure that data not get out of hands? How to make sure that the self-service analytics does not break the existing organization’s process, policy around data protections, data governance?

Data governance is a big topic. This blog focuses following 3 things:

  • Data governance for self-service analytics
  • How to enforce data governance in self-service environment
  • How to audit self-service environment
  1. Data governance for self-service analytics

First of all, what is data governance?

Data governance is a business discipline that brings together data quality, data management, data policies, business process management, and risk management surrounding the handling of data.

The intent is to put people in charge of fixing and preventing issues with data so that the enterprise can become more efficient.

The value of enterprise data governance is as followings:

  • Visibility & effective decisions: Consistent and accurate data visibility enables more accurate and timely business decisions
  • Compliance, security and privacy: Enable business to efficiently and accurately meet growing global compliance requirements

What data should be governed?

Data is any information in any of our systems. Data is a valuable corporate asset that indirectly contributes to organization’s performance.   Data in self-service analytics platform (like Tableau) definitely is part of data governance scope. All the following data should be governed:

  • Master Data: Data that is shared commonly across the company in multiple systems, applications and/or processes. Master Data should be controlled, cleansed and standardized at one single source. Examples: Customer master, product item master. Master data enable information optimization across systems, enable data enrichment, data cleaning and increase accuracy in reporting.
  • Reference Data: Structured data used in an application, system, or process. Often are common lists set once a fiscal year or with periodic updates. Examples like current codes, country codes, chart of accounts, sales regions, etc.
  • Transactional Data: The information recorded from transactions. Examples like user clicks, user registrations, sales transactions, shipments, etc. The majority of the enterprise data should be the transactional data. Can be financial, logistical or work-related, involving everything from a purchase order to shipping status to employee hours worked to insurance costs and claims. As a part of transactional records, transactional data is grouped with associated master data and reference data. Transactional data records a time and relevant reference data needed for a particular transaction record.

What are data governance activities?

  • Data ownership and definition: The data owner decides and approves the use of data, like data sharing/usage requests by other functions. Typically data owners are the executives of the business areas. One data owner is supported by many data stewards who are the operational point of accountability for data, data relationship and process definitions. The steward represents the executive owners and stakeholders. Data definition is what data steward’s responsibility although many people can contribute to the data definitions. In the self-service environment where data is made available to many analyst’s hands, it is business advantage to be able to leverage those data analyst’s knowledge and know-how about the data by allowing each self-service analyst to comment, tag the data, and then find a way to aggregate those comments/tags. This is again the community concept.
  • Monitor and corrective actions: This is an ongoing process to define process flow, data flow, quality requirement, business rules, etc. In the self-service environment where more and more self-service developers have capability to change metadata and create calculated fields to transform the data, it can be an advantage and can also become chaos if data sources and process are not defined within one business group.
  • Data process and policy: This is about exception handlings.
  • Data accuracy and consistency: Commonly known as data quality. This is where most of time and efforts are spent.
  • Data privacy and protection: There are too many examples that data leakage damages brand and causes millions for organizations. Some fundamental rules have to be defined and enforced for self-service enterprise to have a piece of mind.

2. How to enforce privacy and protection in self-service environment?

The concept here is to have thought leadership about top sensitive data before make data available for self-service consumption. To avoid potential chaos and costly mistakes, define what are the top sensitive dataset for your organization, then have IT to create enforcement in database layer so self-service users can’t mess up. Here is list of examples of what should be enforced to have a piece of mind:

  • No privacy and private data is allowed to self-service server. Like SSN, federal customer data, credit cards, etc. Most of those self-service platform (like Tableau) is defined for easy of use, and does not have the sophisticate data encrypt technologies.
  • Remove the sensitive data fields (like address, contacts) in database level before making the data available for self-service consumption. The reason is that it is really hard to control those data attributes once you open them to some business analytics super users.
  • Use site as partition to separate data, users, and contents for better data security. For example, finance is a separate site that has finance users only. Sales people have no visibility on finance site.
  • Create separate server instance for external users if possible. Put the external server instance in DMZ zone. Different level of network security will be applied as additional layer of security.
  • Create site for each partner / vendor to avoid potential problems. When you have multiple partners or vendors accessing your Tableau server, never put two vendors into same site. Try to create one site for each vendor to avoid potential surprises.

3. How to audit self-service environment?

You can’t enforce everything. You do not want to enforce everything either. Enforcement comes with disadvantages too, like inflexibility. You want to choose the most critical things to enforce, and then you leave the remaining as best practices for people to follow. Knowing the self-service analytics community always tries to find the boundary, you should have audit in your toolbox. And most importantly let community know that you have the auditing process.

  • What to audit:
    • All the enforced contents should be part of audit scope to make sure your enforcement works in the intended way
    • For all the policy that your BU or organization agreed upon.
    • For any other ad-hoc as needed
  • Who should review the audit results:
    • Self-service governance body should review the results
    • BU data executive owners are the main audiences of auditing reports. It is possible that executives gave special approvals in advanced for self-service analysts to work on some datasets that she or he does not have access normally. When they are too many exceptions, it is an indication of potential problem.
  • Roles and responsibilities of audit: Normally IT provides audit results while business evaluate risks and make decisions about process changes.
  • How to audit: Unfortunately Tableau does not have a lot of server audit features. There is where a lot of creativities come into play. VizAlert can be used. Often creating workbooks from Tableau database directly is the only way to audit.

Please read next blog about content management.

Governed Self-Service Analytics: Performance Management (7/10)

Performance management has been everyone’s concerns when it comes to a shared self-service environment since nobody wants to be impacted by others. This is especially true when each business unit decides their own publishing criteria where central IT team does not gate the publishing process.

How to protect the shared self-service environment? How to prevent one badly designed query from bringing all servers to their knees?

  • First, set server parameters to enforce policy.
  • Second, create daily alerts for any slow dashboards.
  • Third, made performance metrics public to your internal community so everyone in the community has visibility of the worse performed dashboards to create some peer pressures with good intent.
  • Fourth, hold site admin or business leads to be accounted for the self-service dashboard performance.

You will be in good shape if you do those four things above. Let me explain each of those in details.

performance

  1. Server policy enforcement

The server policy setting is for enforced policies. For anything that can be enforced, it is better to enforce those so everyone can have a piece of mind. The enforced parameters should be agreed upon business and IT, ideally in the governance council. The parameters can always be reviewed and revised when situation changes.

Examples of commonly enforced parameters like overall sizing allocation for a site, extracting time out, etc.

  1. Exception alerts

There are only a few limited parameters that you are able to control as enforcement. All the rest will have to be governed by process. The alerts are most common approach to server as exception management:

  • Performance alert: Create alerts when dashboard render time exceeds agreed threshold.
  • Extract size alerts: Create alerts when extract size exceed define thresholds (Extract timeout can be enforced on server but not size).
  • Extract failure alerts: Create alerts for failed extracts. Very often stakeholders will not know the extract failed. It is essential to let owners know his or her extracts failed so actions can be taken timely.
  • You can create a lot of more alerts, like CPU usage, overall storage, memory, etc.

How to do the alerts? There are multiple choices. My favorite one is VizAlert for Tableau https://community.tableau.com/groups/tableau-server-email-alert-testing-feedbac

Who should receive the alerts? It depends. A lot of alerts are for server admin team only, like CPU usage, memory, storage, etc. However most of the extracts and performance alerts are for the content owners. One best practice for content alert is always to include site admins or/and project owners as part of alerts. Why? Workbook owners may change jobs so the original owner may not be responsible for the workbooks anymore. I was talking with a well known Silicon Valley company recently, they are telling me that a lot of workbook owner changed in last 2 years, they had hard time to figure out whom they should go after for issues related to workbooks. Site admin should be able to help to identify the new owners. If site admin is not close enough to workbook level in your implementation, you can choose project leaders instead of site admin.

What should be the threshold? There is no universal answer. But nobody wants to wait for more than 10 seconds. The rule of thumb is that anything less than 5 seconds good. However anything more than 10 seconds is no good. I got a question when I present this in one local Tableau event. The question was what if one specific query used to take 30 minutes, and team made great progress to reduce it to 3 minutes. Do we allow this query to be published and run on server? The answer is depends. If the view is so critical for business, it will be of course worth of waiting 3 minutes for results to render. Everything has exception. However if the 3-minute query chokes everything else on the server and users may click the view to trigger the query often, you may want to re-think the architecture. Maybe the right answer will be to spin-off another server for this mission critical 3-minute application only so the rest of users will not impact.

Yellow and red warning: It is a good practice to create multi-level of warning like yellow and red warning with different threshold. Yellow alerts are warnings while red alerts are for actions.

You may say, hi Mark, this all sounds great but what if people do not take the actions.

This is exactly where some self-service deployments go wrong. There is where governance comes to play. In short, you need to have strong and agreed-upon process enforcement:

  • Some organizations use charging back process to motivate good behaviors. The charge back will influence people’s behaviors but will not be able to enforce anything.
  • The key process enforcement is a penalty system when red alert actions are not taken timely.

If owner did not take corrective actions during agreed period of time for red warning, a meeting should be arranged to discuss the situation. If the site admin refuses to take actions, the governance body has to make decision for agreed-upon penalty actions. The penalty can lead to site suspension. Once a site is suspended, nobody can excess any of the contents anymore except server admins. The site owners have to work on the improvement actions and show the compliances before site can be re-activated. The good news is that all the contents are still there when a site is suspended and it takes less than 10 seconds for server admin to suspend or re-active a site.

I had this policy that was agreed with governance body. I communicate to as many self-service developers about this policy as I can. I never got push back about this policy. It is clear to me that self-service community likes to have a strong and clearly defined governance process to ensure everyone’s success. I suspended a site for some other reasons but never had to suspend a site due to performance alerts. What happens is that is my third tricky about worse performed dashboard visibility.

  1. Make performance metric public

It takes some efforts to make your server dashboard performance metric public to all your internal community. But it turns out that it is one of the best things that a server team can do. It has a few benefits:

  • It serves as a benchmarking for community to understand what is good and good enough since the metric shows your site overall performance comparing with others on the server
  • It shows all the long render dashboards to provide peer pressures.
  • It shows patterns that help people to focus the problematic areas
  • It creates great opportunity for community to help each other. This is one most important success factor. What turns out is that the problematic areas are often the new team on-boarded to the server. It community always have so many ideas to make dashboard perform a lot of better. This is why we never had to suspend any sites since when it comes with a lot of red alerts that community is aware of, it is the whole community that makes things happen, which is awesome.
  1. Hold site admin accounted for

I used to manage Hewlett Packard’s product assembly line during my early career. Hewlett Packard has some well-known quality control processes. One thing that I learned was that each assembler is response for his or her own quality. Although there is QA at the end of line but each workstation has a checklist before pass to next station. This simple philosophy applies today’s software development and self-service analytics environment. The site admin is responsible for the performance of workbooks in the sites. The site admin can further hold workbook owners accounted for the shared workbooks. Flexibility comes with accountability too.

I believe theory Y (people have good intent and want to perform better) and I have been practicing theory Y for years. The whole intent of server dashboard performance management is to provide performance visibility to community and content owners so owners know where the issues are so they can take actions.

What I see often is that a well-performed dashboard may become bad over-time due to data changes and many other factors. The alerts will catch all of those exceptions no matter your dashboards are released yesterday, last week, last month or last year – this approach is a lot of better than gating releases process which is a common IT practice.

During a recent Run-IT as business meet-up, audiences were skeptical when I said that IT did not gate any workbook publishing process and it is completely a self-service. Then audiences started to realize that it did make sense when I started to talk about performance alerts that will catch it all together. What business likes most about this approach is the freedom to push some urgent workbooks to server even workbooks are not performing great – they can always come back later on to tune them and make them perform better for both better use experiences and being good citizen.

Please continue to read next blog about data governance.

GOVERNED SELF-SERVICE ANALYTICS: PUBLISHING (6/10)

The publishing process & policy covers the followings areas:  Engagement Process; Publisher Roles; Publishing Process and Dashboard Permissions.

PublishingFirst step is to get a  space on the shared enterprise self-service server for your group’s data and self-service dashboard, which is called Engagement Process. The main questions are:

  • From requester perspective, how to request a space on shared enterprise self-service server for my group
  • From governance perspective, who decides and how to decide the self-service right fit?

Once a business group has a space on the shared enterprise self-service server, the business group has to ask the following questions:

  • Who can publish dashboard from your group?
  • Who oversees or manages all publishers in my group?

After you have given a publishing permission to some super users from your business group, those publishers need to know the rules, guidance, constraints on server, and best practices for effective dashboard publishing. Later on you may also want to make sure that your publishers are not creating the islands of information or creating multiple versions of KPIs.

  • What are publishing rules?
  • How to avoid duplications?

The purpose of publishing is to share your reports, dashboards, stories and insights to others who can make data-driven decisions. The audiences are normally defined already before you publishing the dashboards although dashboard permissions are assigned after publishing from workflow perspective. The questions are:

  • Who can access the published dashboards?
  • What is the approval process?

Engagement Process

Self-service analytics does not replace traditional BI tools but co-exists with traditional BI tools. It is very rare that you will find self-service analytics platform is the only reporting platform in your corporation. Very likely that you will have at least one IT-controlled enterprise-reporting platform designed for standard reporting to answer known questions using data populated from enterprise data warehouse. In additional to this traditional BI reporting platform, your organization has decided to implement a new self-service analytics platform to answer unknown questions and ad-hoc analysis using all the available data sources.  Co-exist

This realization of traditional BI and self-service BI co-existing is important to understand this engagement process because guidance has to be defined which platform does what kinds of reporting. After this guidance is defined and agreed, continuous communication and education has to be done to make sure all self-service super users are in the same page for this strategic guidance.

Whenever there is a new request for a new self-service analytics application, fitness assessment has to be done before proceed. The following checklist serves this purpose:

  • Does your bigger team have an existing site already on self-service analytics server? If yes, you can use the existing site.
  • Who is the primary business / application contact?
  • What business process / group does this application represent? (like sales, finance, etc)?
  • Briefly describe the purpose and value of the application?
  • Do you have an IT contact for your group for this application? Who is the contact?
  • What are the data sources?
  • Are there any sensitive data to be reporting on (like federal data, customer or client data)? If yes, describe in details about the source data.
  • Are there any private data as part of source data? (like HR data, sensitive finance data)
  • Who are the audiences of the reports? How many audiences do you anticipate? Are there any partners who will access the data?
  • Does the source data have more than one enterprise data? If yes, what is the plan for data level security?
  • What are the primary data elements / measures to be reporting on (e.g. booking, revenue, customer cases, expenses, etc)
  • What will be the dimensions by which the measure will be shown (e.g. product, period, region, etc)
  • How often the source data needs to be refreshed?
  • What is anticipated volume of source data? How many quarters of data? Roughly how many rows of the data? Roughly how many columns of the data?
  • Is the data available in enterprise data warehouse?
  • How many self-service report developers for this application?
  • Do you agree with organization’s Self-Service Analytics Server Governance policy (URL ….)?
  • Do you agree with organization’s Self-Service Analytics Data Governance policy (URL ….)?

The above questionnaires also include your organization’s high-level policies on data governance, data privacy, service level agreement, etc since most of the existing self-service tools have some constraints in those areas. On one side, we want to encourage business teams to leverage the enterprise investment of the self-service analytics platform. On the other side, we want to make sure that every new application is setup for success and do not create chaos that can be very expensive to fix later on.

Publisher Roles

I heard a lot of exciting stories about how easy people can get new insights with visualization tools (like Tableau). Myself experienced a few of those insightful moments as well. However I also heard a story about new Tableau Desktop user who just came out of fundamental training, he quickly published something and shared to the team but caused a lot of confusions about the KPIs being published. What is wrong? It is not about the tool. It is not about the training but publishing roles and related process.

The questions are as followings:

  • Who can publish dashboard from my group?
  • Who oversees or manages all publishers in my group?

Sometimes you may have easy answers to those questions but you may not have easy answers for many other cases. One common approach is to use projects or folders to separate boundary for various publishers. Each project has project leader role who overalls all publishers within the project.

You can also define a common innovation zone where a lot of publishers can share their new insights to others. However just be aware that the dashboards in innovation zone are early discovery phase only and not officially agreed KPIs. Most of the dashboards will go through multiple iterations of feedback and improvement before become useful insights. We do encourage people to share their new innovations as soon as possible for feedback and improvement purpose. It will be better to distingue official KPIs with innovation by using different color templates to avoid the potential confusions to end audiences.

Publishing Process

To protect the shared self-service environment, you need to have clear defined publishing process:

  • Does IT have to be involved before publish a dashboard to the server?
  • Do you have to go from a non-production instance or non-production folder to a production folder?
  • What is the performance guidance?
  • Should you use live connection or extracts?
  • How often you should schedule your extracts? Can you use full refresh?
  • What are the data security requirements?
  • Do you have some new business glossary in your dashboards? If yes, did you spell out the definition of the new business glossary?
  • Does the new glossary definition need to get approval from data stewardship? Did you get the approval?
  • Who supports the new dashboards?
  • Does this new dashboard create potential duplication with existing ones?

Each organization or each business group will have different answers to those above questions. The answers to above questions form the basic publishing process that is essential for scalability and avoid chaos.

Here is summary of what most companies do – so call the common best practices:

  1. IT normally is not involved for the releasing or publishing process for those dashboards designed by business group – this is the concept of self-service
  2. IT and business agreed on the performance and extract guidance in advanced. IT will enforce some of the guidance on  server policy settings (like extract timeout thresholds, etc). For many other parameters that can’t be systematically enforced, business and IT agreed on alert process to detect the exceptions. For example a performance alert that will be sent to dashboard owner and project owner (or site admin) if dashboard renders time exceeds 10 seconds.
  3. Business terms or glossary definition are important part of the dashboards.
  4. Business support process is defined so end information consumers know how to get help when they have questions about the dashboard or data.
  5. Dashboards are clarified as certified and non-certified. Non-certified dashboards are for feedback purpose while certified ones are officially approved and supported.

Dashboard Permissions

When you design a dashboard, most likely you have audiences defined already. The audiences have some business questions; your dashboards are to answer those questions. The audiences should be classified into groups and your dashboards can be assigned to one or multiple groups.

If your dashboards have row level security requirements, the complexity of dashboards will be increased many times. It is advised that business works with IT for the row level security design. Many self-service tools have limitations for row level security although they all claim row level security capability.

The best practice is to let database handle row level security to ensures data access consistence when you have multiple reporting tools against the same database. There are two challenges to figure out:

  • Self-service visualization tool has to be able to pass session user variable dynamically to database. Tableau starts to support this feature for some database (like query banding feature for Teradata or initial SQL for Oracle)
  • Database has user/group role tables implemented.

As summary, publishing involves a set of controls, process,  policy and best practices. While support self-service and self-publishing, rules and processes have to be defined to avoid potential expensive mistakes later on.

Please read next blog for performance management

Governed Self-Service Analytics: Multi-tendance (5/10)

Tableau has a multi-tendance strategy which is called site.  I heard many people asking if they should use site, when should use site. For some large Tableau deployment,  people also ask if you have created separate Tableau instances. All those are Tableau architecture questions or multi-tendance strategy.

 

How do you approach this? I will use the following Goal – Strategy – Tactics to guide the decision making process.screenshot_42

It starts with goals. The  self-service analytics system has to meet the following expectations which are ultimate goals:  Fast, Easy, Cost Effectiveness, Data Security, Self-Service, Structured and unstructured data.

Now keep those goals in our mind while scale out Tableau from individual teams to department, and then from department to enterprise.

 

How do we maintain self-service, fast and easy with solid data security and cost effectiveness while you deal with thousands of users? This is where you need to have well-defined strategies to avoid chaos.

First of all, each organization has its own culture, operating principles, and different business environment. Some of the strategies that work very well in one company may not work for others. You just have to figure out the best approach that matches your business requirement. Here is some of food for thoughts:

  1. Do you have to maintain only one Tableau instance in your organization? The answer is no. For SMB, the answer may be yes but I have seen many large organizations have multiple Tableau instances for better data security and better agility. I am not saying that Tableau server can’t scale out or scale up. I have read the Tableau architecture white paper for how many cores one server can scale. However they are many other considerations that you just do not want to put every application in one instance.
  2. What are the common use cases when you may want to create a separate instance? Here is some examples:
    • You have both internal employees and external partners accessing your Tableau server. Tableau allows both internal and external people accessing the same instance. However if you would have to create a lot of data security constraints in order to allow external partners to access your Tableau server, the same constraints will be applied to all Tableau internal users which may cause extra complexity. Depends on the constraints you will have, if fast and easy goals are compromised, you may want to create a separate instance to completely separate internal users vs. external users – this way you have completely piece of mind.
    • Network seperation. It is getting common that some corporations have separate engineering network from the rest of corp network for better IP protections. When this is the case, create a separate Tableau instance within engineering network is an easy and simple strategy.
    • Network latency. If your data source is in APAC while your Tableau server is in US, likely you will have some challenges with your dashboard performance. You should either sync your database to US or you will need to have a separate Tableau server instance that sits in APAC to achieve your fast goals.
    • Enterprise mission critical applications. Although Tableau started as ad-hoc and exploration for many users, some Tableau dashboard start to become mission critical business applications. If you have any of those, congratulations! You have a good problem to deal with. Once some apps become mission critical, you will have no choice but tight up the change control and related processes which unfortunately are killers to self-service and explorations. The best way to resolve this conflict is to spin-off a separate instance with more rigors on mission critical app while leave the rest of Tableau as fast, easy self-service.

What about Tableau server licenses? Tableau server have seat-based license model and core-based license model. If you have seat-based model, which goes by users. The separate of instance should not have much impacts on total numbers of licenses.

Now Let’s say that you have 8 core based licenses for existing internal users. You plan to add some external users. If you will have to add 8 more cores due to external users,  your separate instance will not have any impacts on licenses.  What if you only want to have a handful external users? Then you will have to make trade-off decision. Alternately you can keep your 8 core for internal users while get handful seat-based license for external users only.

How about platform cost and additional maintenance cost when we add separate instance? VM or hardware are relatively cheap today. I will agree that there are some additional work initially to setup a separate instance but server admin work is not doubled because you have another server instance.  On the other side, when your server is too big, it is a lot of more coordinations with all business functions for maintenance, upgrade and everything. I have seen some large corp are happy with multiple instance vs. one huge instance.

How about sites?  I have blog about how to use site. As summary, site is useful for better data security, easy governance, employing self-service and distributing administrative work. Here is some cases when sites should not be used:

  • Do not create a new site if the requested site will use the same data sets as one of the existing sites, you may want to create a project within the existing site to avoid potential duplicate extracts (or live connections) running against the same source database.
  • Do not create a new site if the requested site overlaps end users a lot with one existing site, you may want to create a project within the existing site to avoid duplicating user maintenance works

As summary, while you plan to scale Tableau from department to enterprise. you do not have to put all of your enterprise users on one huge Tableau instance. Keep goals in your mind while deciding the best strategy for your business. The goals are easy, fast, simple, self-service, data security, cost effectiveness. The strategies are separate instance and sites.

 

Please read next blogs about release process.

Governed Self-Service Analytics: Community (4/10)

Self-service analytics community is a group of people who share the common interest about self-service analytics and common value about data-driven decision-making culture.

Why people are motivated for the internal self-service community?

The self-service community motivations are as followings:

  • Empowerment: Self-service stems from – and affects – a wider macro trend of DIY on one hand, and collaboration on the other hand: content builders are taking the lead for services they require, and often collaborate with others. The key is to offer the members empowerment and control over the process, so they can choose the level of services they would like to engage in, thus affecting the overall experience.
  • Convenience: The benefit of community self-service is obvious – they can get fast access to the information they need without having to email or call IT or a contact center. According to Forrester, 78% of people prefer to get answers via a company’s website versus telephone or email.
  • Engagement: It is their shared ideas, interests, professions that bring people together to form a community. The members join in because they wish to share, contribute and learn from one another. Some members contribute, while others benefit from the collective knowledge shared within the community. This engagement is amplified when members induce discussion and debate about tools, features, processes and services provided and any new products that are being introduced. The discussions within the community inform and familiarize people with the new and better ways of getting things done – the best practices.

How to start creating an internal user community?

When you start creating an internal user community, you need to keep in mind that a lot of community activities are completely dependent on intranet. So you need to ensure that the community is one that can be easily accessed by the maximum number of people. Below is the checklist:

  • Determine a purpose or goal for it. One example: The place you find anything and everything about self-service analytics. Community is the place of sharing, learning, collaborating….
  • Decide who your target audience will be. Most likely audience should be those content developers and future content developers. Mostly likely the audiences are not the server end users.
  • Design the site keeping in mind the tools for interaction and the structure of your community.
  • Decide upon the manner in which you will host the community.
  • Create the community using tools available within your organization.
  • Create interesting content for the community.
  • Invite or attract members to join your community. Try to find out who has the developer licenses and send invitation to all of them.
  • Administer it properly so that the community flourishes and expands. It is a good practice to have at least two volunteers as moderators who make sure to answer user’s questions timely and close out all open questions if possible.

Who are the community members?screenshot_20

The audiences are all the content builders or content developers from business and IT across organization. Of course, the governing body or council members are the cores of the community. It is a good practice that council members lead most if not all the community activities. The community audiences also include future potential content builders. Council should put some focuses to reach out to those potential content builders. The end information consumers, those who get dashboards or reports, are normally not parts of the community, as end information consumers really do not care too much tools, technology or processes associated with the self-service. All end information consumers care is the data, insights and actions.

What are the community activities?

The quick summary is in the below picture. More detailed will be discussed later on.

  • Intranet: Your community home. It is the place for everything and everything about your self-service analytics. The tool, process, policies, best practices, system configuration, usage, data governance polices, server policies, publishing process, license purchasing process, tip, FAQ, etc.
  • Training: The knowledge base at community intranet is good but is not good enough. Although most of the new self-service tools are designed for easy of use, they do have a few learning curves. Training has to be organized to better leverage the investment.
  • User Meetings: User summit or regular best practice sharing is one must have community activity.
  • License Model: When a lot of business super users have dashboard development tools, what is most cost effective license model for dashboard development tools? Do you want to charge back for the server usage?
  • Support Process: Who support the dashboards developed by business super users? What is IT’s vs. business’ role in support end users?
  • External Community: Most self-service software vendors have ver active local or virtual or industrial community. How to leverage external community? How to learn the best practices?

Key takeaway: Build a strong community is the critical piece for success self-service analytics deployment in enterprise.

Please next blogs for Multi-tendance strategy

Governed Self-Service Analytics: Roles & Responsibilities (3/10)

When business super users are empowered to create discovery, data exploration, analysis, dashboard building and sharing dashboards to business teams for feedback, business is taking a lot of more responsibilities than what they used to do in traditional BI & analytics environment. One of the critical self-service analytics governance components is to create a clear roles and responsibilities framework between business and IT. This is one reason why the governing body must have stakeholders from both business and IT departments. The governing body should think holistically about analytics capabilities throughout their organization. For example they could use business analysts to explore the value and quality of a new data source and define data transformations before establishing broader governance rules.

A practical framework for the roles and responsibilities of self-server analytics is in following picture.screenshot_18

Business owns

  • Departmental data sources and any new data sources which are not available in IT managed enterprise data warehouse
  • Simple data preparation: Data joining, data blending, simple data transformation without heavy lifting ETL, data cleansing, etc.
  • Content building: exploration, analysis, report and dashboard building by using departmental data or blending multiple data sources together
  • Release or publishing: sharing the analysis, report or dashboard to information end consumers for feedback, business review, metrics, etc.
  • User training and business process changes associated with the new reports & dashboard releases.

IT owns

  • Server and platform management, licensing, vendor management, etc
  • Enterprise data management and deliver certified, trustworthy data to business, build and manage data warehouse, etc
  • Create and maintain data dictionary that will help business super users to navigate the data warehouse.
  • Support business unit report developers by collaborating to build robust departmental dashboards and scorecards, converting ad hoc reports into production reports if necessary.
  • Training business to user self-service analytics tools

It is a power shift from IT to business. Both IT and business leaders have to recognize this shift and be ready to support the new roles and responsibilities. What are the leader’s roles to support this shift?

  • Create BI/Analytics Center of Excellence: Identify the players, create shared vision, facilitate hand-offs between IT and business
  • Evangelize the value of self-service analytics: create a branding of self-service analytics and market it to drive the culture of analytics and data-driven decision-making culture; run internal data/analytics summit or conference to promote analytics
  • Create a federated BI organization: manage steering committee or BI council, leverage BI& Data gurus in each organization, and encourage IT people to go from order takers to consultants.

Please read my next blogs for Community.

Governed Self-Service Analytics : Governance (2/10)

How to govern the enterprise self-service analytics? Who makes the decisions for the process and policies? Who enforces the decisions?

In the traditional model, governance is done centrally by IT since IT handles the entire data access, ETL and dashboard development activities. In the new self-service model, a lot of business super users are involved for the data access, data preparation and development activities. The traditional top down governance model will not work anymore. However no-governance will create chaos situation. What will be needed for self-service environment is the new bottom up governance approach.

In the new self-service analytics model, since super business users do most of dashboard development, the more effective governance structure is to include representatives of those super business users.screenshot_17

In the picture, the blue box in the middle is the self-service analytics governing body for enterprise. It consists of both business and IT team members. The self-service analytics governing body members are self-service analytics experts & stakeholders selected by each business unit. You can think of the governing body members are the representatives of their business units or representatives of the entire self-service analytics content builder community. The charter of this governing body is as followings:

  • Define roles and responsibilities between business & IT
  • Develop and share self-service best practices
  • Define content release or publishing process
  • Define analytics support process
  • Define data access, data connections and data governance process
  • Define self-moderating model
  • Define dashboard performance best practices
  • Helps on hiring and training new self-service analytics skills
  • Communicate self-service process to entire self-service content builder community and management teams
  • Enforce self-service analytics policies to protect the shared enterprise self-service environment
  • Make sure that self-service process and policy alignment with enterprise process and policy around data governance, architecture, business objectives, etc

Should business or IT lead the governing body? While there are times when a business-led governing body can be more effective, do not discount an IT-led governing body. There are many good reasons to consider the IT-led governing body.

  • IT understands how to safely and accurately exposes an organization’s data and can standardize how data is exposed to self-service super users.
  • IT has a centralized view of all analytics needs from all functions of the organization, which can help the enterprise develop streamlined, reusable processes and leading practices to help business groups be more efficient using the tool.
  • IT can also centralize functions such as infrastructure, licensing, administration, and deeper level development, all which further cut down costs and mitigates risks.

What are the key skills and expectations of the head of governing body or leader of the center of excellence team? Different organizations use very different titles for this person. But the person at the helm of your of governing body or leader of the center of excellence team should have the following skills:

  • The passion about self-service analytics and related technologies
  • The ability to lead, set strategy, and prioritize objectives based on needs/impact
  • An in-depth understanding of self-service tool, the business analytics space, and the analytics needs of the business
  • The ability to align self-service analytics objectives with corporate strategy and direction
  • Comfort in partnering and negotiating with both business and IT stakeholders
  • A talent for navigating the organization to get things done

Please read my next blogs for roles and responsibilities

Governed Self-Service Analytics (1/10)

Organizations committed to improve data-driven decision-making processes are increasingly formulating an enterprise analytics strategy to guide the efforts in finding new patterns and relationships in data, understanding why certain results occurred, and forecasting future results. Self-service analytics has become the new norm due to availability and simplicity of newer data visualization tool (like Tableau) and data preparation technologies (like Alteryx)

However many organizations struggle to scale self-service analytics into enterprise level or even business unit level beyond the proof of concept. Then they blame tools and start to try different tools or technologies. It is nothing wrong to try something else, however what many analytics practitioners did not realize that technologies along were never enough to improve data-driven decision-making processes. Self-service tools alone do not resolve organizational challenges, data governance issues, and process inefficiencies. Organizations that are most successful with self-service analytics deployment tend to have a strong business and IT partnership around self-service; a strategy around data governance; and defined self-service processes and best practices. The business understands its current and future analytics needs, as well as the pain points around existing processes. And IT knows how to support an organization’s technology needs and plays a critical role in how data is made available to the enterprise. Formalizing this partnership between business and IT in the form of a Center of Excellence (COE) is one of the best ways to maximize the value of a self-service analytics investment.

What are the key questions that Center of Excellence will answer?

  1. Who is your governing body?
  2. How to draw a line between business and IT?
  3. What are the checks and balances for self-service releases?
  4. How to manage server performance?
  5. How to avoid multiple versions of KPIs?
  6. How to handle data security?
  7. How to provide trustworthy data & contents to end consumers?

The ultimate goal of the center of excellence is to have governed self-service in enterprise. The governance can be classified as six areas with total 30 processes:

screenshot_16

Governing body

  • Governing structure
  • Multi tenant strategy
  • Roles & responsibilities
  • Direction alignment
  • Vendor management

Community

  • Intranet Space
  • Training strategy
  • Tableau User CoE meeting
  • Tableau licensing model
  • Support process

Publishing

  • Engagement process
  • Publishing permissions
  • Publishing process
  • Dashboard permission

Performance

  • Workbook management
  • Data extracts
  • Performance alerts
  • Server checkups for tuning & performance

Data Governance

  • Data protection
  • Data privacy
  • Data access consistence
  • Role level security
  • Data sources and structure

Content Certificatio

  • Content governance cycle
  • Report catalog
  • Report category
  • Data certification
  • Report certification

Please read my next blogs for each of those areas..

 

How business benefits from IT leadership with self-service analytics

Last week, I presented Self-Service Analytics in a local Silicon Valley meet-up “Run IT as Business” group.  The audiences include some IT executives and a few ex-CIOs. My presentation is well received with some very positive feedback:

  • “Mark gave an excellent presentation that was extremely informative!”
  • “well structured and very informative”
  • “This is one of the more interested presentations I’ve heard lately”

My talk focused on the new theme of BI and analytics – self-service analytics that is white hot and rapid growing. I shared how NetApp’s  change management to have users take ownership on this technology which is the success factor.

Slides for this talk is @ http://www.slideshare.net/mwu/run-it-as-business-meetup-selfservice-bi

Event feedback details @ http://www.meetup.com/Run-IT-as-a-Business-South-Bay/events/230661871/

Architecture differences between Tableau 8 to 9

We have talked about Tableau new features from user perspective. Recently there was a question asking architecture differences between Tableau 8 to 9. I thought that this was a good question.  Here is my summary of architecture differences between Tableau 8 to 9.

  1. HA and fail-over new components introduced in Tableau 9 : Coordinator Service (manages leader election and ensures that there is a quorum for making decisions during failover.)  & Cluster Controller (report process status and coordinate failover for HA and failover).
  2. New File Store in Tableau 9 to ensure extracts are available on all nodes of a cluster
  3. New Cache Server manages a shared query cache across the server cluster and is used by the VizQL Server, Backgrounder, and Data Server processes
  4. New minimum hardware requirements – Tableau 9 will not install if the hardware does not meet the minimum requirements
  5. New API server – This process is used when you interact with the server via REST API.
  6. Data Engine is no longer limited to running only two data engine nodes per cluster. This new flexibility can improve server clusters that are used for extract-heavy scenarios
  7. Gateway can be configured in multiple nodes for better HA.
  8. You must have minimum 3-node in the cluster to achieve full HA mode starting Tableau 9

Tableau 9.3 New Features

Tableau has speeded up its release cycles from one release per year to three releases 2015. Tableau also announced that there would be four releases in 2016. Tableau is going to spend more R&D $ this year than all the last 13 years of the company combined. I love the pace of innovation.

Tableau 9.3 is released on 3/24. I was able to demo some 9.3 new features at Tableau server upgrade and new feature demo webinar on the day when 9.3 was released, which is cool.

I am excited for Tableau 9.3 release, which features powerful upgrades to Self-Service Analytics environment. These include Workbook Revision History, union Excel or text-based data sources, passing parameters in initial SQL, Snowflake data connector, Map enhancements, Content Analytics, etc.

Workbook Revision History

This is the feature that many Tableau fans have been waiting for long time. In the past, publishers had to manage their own workbook versioning, which is a difficult task for many publishers. When changes did not work out and had to rollback to previous version, sometime publishers had challenges to remember what was the right version before. Unfortunately Tableau server team was helpless. Now 9.3 server keeps published workbook revision history so that publishers can go back to any of the their previous version if changes did not work out. This is huge!

Union & More Data Prep Features

Data prep is the area where most analysts spend a lot of their time unfortunately. Tableau continues enhancing data prep features so analysts can spend their valuable time on analysis and insights vs. copy & paste the data. 9.2 released feature of sub-table detection, data grid editing, data pan searching, etc. 9.3 added union feature that combines data that have been split across multiple files or tables into a single Tableau data source. Union works for Excel or text-based data sources only. I am sure that Tableau will make union work for database tables as well. You can also do more data grid editing now with 9.3: preview data extract or Web Data Connector, creating group or bin, etc.

Parameters in Initial SQL for Row-Level Security

This is huge feature for customers who are looking for better row-level security solution. Initial SQL is a set of commands that can run when you open the workbook, refresh an extract, sign in to Tableau Server, or publish to Tableau Server. Initial SQL can be used to set up temporary tables or a custom data environment during the sessions. Initial SQL is not new but was missing a critical feature – you could not dynamically pass parameters like username. Tableau 9.3 is able to pass parameters (TableauServerUser, etc) to some database. When TableauServerUser as parameter is passed to database for the duration of that user session, you can leverage database’s user security mapping (if you have implemented it) so database will render user specific data only to achieve the row-level security. 9.3 parameter in initial SQL supports Oracle, SQL Server, Sybase ASE, Redshift, and Greenplum only. Click here for details. For Teradata, you can use query band to pass parameters to achieve row-level security.

Project Leader Content Management

I have a blog about how to use Tableau site. I know that many Tableau customers avoid creating a new site unless they have to. How to make sure that site admins not becoming a bottleneck when you scale out Tableau but your deployment has only one or very few sites? If you struggle with this, you will love 9.3 new features that allow project leaders to change workbook owners, run refresh schedule and move contents that are tasks that can be done by site/server admin only in the past. This new feature together with 9.2’s project permission locking feature really empowers project leaders.

Server Management

9.3 added bunch of server management features. Like low disk-space alerts; ProtgreSQL improvement allows failing over from one repository to another much more quickly w/o server restarting; The REST API is underpinned by a completely new platform with significant performance and usability improvements for admins; Postgres connectivity monitoring allows server admin check the underlying PostgreSQL database for corruption with a new tabadmin command.

Publishing Workflow

Publishing data sources or workbooks become easier and faster in 9.3: Tableau Desktop remembers your Tableau Online or Tableau Server connection and signs you in to the last server you used. It is easier to publish, keep your data fresh, and stay connected with the new Publish Data Source flow.

Better Map

Map is enhanced with postal codes for 39 European countries, districts in India, and US demographic data layers for 2016. Postal codes for UK, France, Germany, and the US are also updated. Mapbox supports new Mapbox GL as well in additional to 9.2’s Mapbox Classic.

Progressive Dashboard Load

It is cool that Tableau has progressive dashboard load feature now, which means you can start analyzing your data sooner w/o having to wait for the entire dashboard to load.

First time being keynote speaker to Tableau West Coast Customer Advisory Summit

Last week I was customer keynote speaker at Tableau’s annual West Coast Customer Advisory Summit. My talk was about how to scale NetAApp’s Tableau enterprise deployment to 4,000+ users within one year. It was well received. A lot of people came to me and said that they were inspired by my presentation. It was actually similar talk that I gave at TC15 Las Vegas but I added some recent work around content certification framework. Since it is close door summit, there is no recording but I made my slides public that can be downloaded @ http://www.slideshare.net/mwu/tableau-customer-advocacy-summit-march-2016

Tableau VP Dave Story shared Tableau product roadmap. Other presentations include Tableau alerting, Desktop license management, which are all very good. Of course, we also went through product customer feedback exercises that customers voted for top ask features.  It was fun one-day event. It was great of meeting other Tableau big customers and a lot of Tableau product managers.

First time hosting Webinar for the entire Tableau Server & Online Admin group

I love Tableau. My passion is about Tableau server deployment – how to create governed self-service model with Tableau, people and process.  My company’s Tableau server added 4,000+ users within one year ( http://enterprisetableau.com/presentations). I got a lot tips & helps from Tableau community during last 2 years. I also want to give back, which is why I created a Silicon Valley Enterprise TUG focusing on Tableau server deployment which got a lot positive feedback. Recently it is recommended to extend the Silicon Valley Enterprise TUG to nationwide, which is why I become the co-owner of Tableau Server & Online Admin group. This is the first webinar for this group.

This webinar went extremely well with about 200 audiences via Zoom. Zoom is cool with its video, chatting and Q/A features. Speaker Mike Roberts (Zen Master) did amazing job to keep audiences closely for about 50 minis while I was busy answering questions via Q/A messaging. Mike shared great insights on workbook performance:

  • Workbook MetaData : What’s actually in the workbook (filters, row shelf, column shelf, etc)? Where do we get all the metadata WITHOUT using tabcmd/rest api? PostgreSQL / psql
  • Desktop vs Server: Something that performs well on desktop *should* perform equally well on Server. But sometimes that’s not true, how to troubleshoot it?
  • Alerts: How to create performance alerts to your workbook as people often don’t know if their workbook performance gets slow or not.

According to Mike,  a good workbook isn’t just one which performs well in Tableau Desktop. A good workbook should have following characters:

  • Data – general rule: more data = potential for high latency and poor performance
  • Design – proper use of filters, action, mark types, etc
  • Delivery – Where it’s delivered has a large impact on how it performs

The slides used is @ https://docs.com/DataRoberts/3584/tableau-workbook-internals

The code used is @ https://github.com/ps-data/ps-analytics-public

The recording is @ here for Webinar recording

Should you upgrade your Tableau Server?

My last week’s Webinar about enterprise server upgrade (why upgrade, how to upgrade, and new feature demo) was well received with audience survey feedback  4.4  (1-5 scale & 5 being the awesome).

“Love & Hate” best describes each Tableau release. People love the new features and Tableau’s pace of innovation. But enterprise customers dislike the efforts, downtime, and risks associated with each upgrade.

Unfortunately choosing doing nothing is not a good option since Desktop users may have one-click Product Updates feature to upgrade their Desktop before server is upgrade (unless you have Microsoft CCM or something similar in your enterprise) – if it happens, users who have new version of Desktop (not at maintenance release level but major or minor release level, like 9.1 to 9.2) can’t publish workbooks to server, any workbooks opened and saved by 9.2 Desktop can’t be opened by 9.1 Desktop anymore. You will have a lot frustrated Desktop users. It is a lot of communication work of asking all Desktop users not to upgrade their Desktop till server is upgraded. The longer it takes for the server upgrade, the more communication work for enterprise server team……  Which means that doing nothing on server side is actually a lot of work as well.

NetApp’s approach is to upgrade the server ASAP – NetApp did 9.1 sever upgrade within 20 days of general release, and we did 9.2 server upgrade within 10 days of general release, which is win-win for Desktop users and server team.

It is impossible to have a bug-free version but Tableau’s releases are relative good. We did not find any major issues at all with our 9.0, 9.1 and 9.2 upgrades.

How To Use Tableau Sites?

Tableau server has a multi-tenancy feature called “sites” which is mainly for enterprise customers. Site strategy is one of the hot topics at the most recent Silicon Valley Enterprise Tableau User Group meet-up. Many people are not clear how to use sites.

This blog covers three areas about Tableau sites:

  • Basic concepts
  • Common use cases
  • Governance processes and settings

1. Basic concepts about Tableau sites

Let’s start with some basic concepts. Understanding those basic concepts will provide better clarity, avoid confusions, and reduce hesitations to leverage sites.

Sites are partitions or compartmented containers. There is absolutely no ‘communication’ between sites. Nothing can be shared across sites.

Site admin has unrestricted access to the contents on the specific site that he or she owns. Site admin can manage projects, workbooks, and data connections. Site admin can add users, groups, assign site roles and site membership. Site admins can monitor pretty much everything within the site: traffic to views, traffic to data sources, background tasks, space, etc. Site admin can manage extract refresh scheduling, etc.

One user can be assigned roles into multiple sites. The user can be site admin for site A and can also have any roles in site B independently. For example, Joe, as a site admin for site A, can be added as a user to site B as admin role (or Interactor role). However Joe can’t transfer workbooks, views, users, data connections, users groups, or anything between site A and site B sites. When Joe login Tableau, Joe has choice of site A or B: When Joe selects site A, Joe can see everything in site A but Joe can’t see anything in site B – It is not possible for Joe to assign site A’s workbook/view to any users or user groups in site B.

All sites are equal from security perspective. There is no concept of super site or site hierarchy. You can think of a site is an individual virtual server. Site is opposite of ‘sharing’.

Is it possible to share anything across sites? The answer is no for site admins or any other users. However if you are a creative server admin, you can write scripts run on server level to break this rule. For example, server admin can use tabcmd to copy extracts from site A to site B although site admin can’t.

2. Common use case of Tableau sites.

  • If your Tableau server is an enterprise server for multiple business units (fin, sales, marketing, etc), fin does not wants sales to see fin contents, create sites for each business unit so one business unit site admin will not be able to see other business unit’s data or contents.
  • If your Tableau server is an enterprise platform and you want to provide a governed self-service to business. Site approach (business as site admin and IT as server admin) will provide the maximum flexibility to the business while IT can still hold business site admins accounted for everything within his or her sites.
  • If your server deals with some key partners, you do not want one partner to see other partner’s metrics at all. You can create one site for each partner. This will also avoid potential mistakes of assigning partner A user to partner B site.
  • If you have some very sensitive data or contents (like internal auditing data), a separate site will make much better data security control – from development phase to production.
  • Using sites as Separation of Duties (SoD) strategy to prevent fraud or some potential conflicting of interests for some powerful business site admins.
  • You just have too many publishers on your server that you want to distribute some admin work to those who are closer to the publishers for agility reasons.

3. Governance processes around Tableau sites.

Thoughtful site management approaches, clearly defined roles and responsibilities, documented request and approval process and naming conversions have to be planned ahead before you go with site strategy to avoid potential chaos later on. Here is the checklist:

  • Site structure: How do you want to segment a server to multiple sites? Should site follow organization or business structure? There is no right or wrong answer here. However you do want to think and plan ahead. On our server, we partition our data, contents and users by business functions and geography locations. We create sites and site naming conversions as business_functions, or business_ geography. For example (Sales_partner, Marketing_APAC, Finance_audit, etc). When we look at a site name, we have some ideas what site is about.
  • How many sites you should have? It completely depends on your use cases, data sources, user base, levels of controls you want to have. As a rule of thumb, I will argue anyone who plans to create more than 100 sites on a server would be too many sites although I know a very large corporation has about 300 sites that work well for them. Our enterprise server has 4,000 end users with 20+ sites. Our separate Engineering server has 4 sites for about 1,000 engineers.
  • Who should be the site admin? Either IT or business users (or both) can be site admins. One site can have more than one admin. One person can admin multiple sites as well. When a new site is created, server admin normally just adds one user as site admin who can add others as site admins.
  • What controls are at site level? All the following controls can be done at site level:
    • Allow site admin to manage users for the site
    • Allow the site to have web authoring. When web authoring is on, it does not mean that all views within the site are web editable. The workbook/view level has to be set web editing allowed by specific users or user groups before the end user can have web editing.
    • Allow subscriptions. Each site can have one ‘email from address’ to send out subscriptions from that site.
    • Record workbook performance key events metrics
    • Create offline snapshots of favorites for iPad users.
  • What privileges server admin should give to site admins? Server admin can give all the above controls to site admin when the site is created. Server admin can change those site level settings as well. Server admin can even take back those privileges at anytime from site admin.
  • What is new site creation process? I have new site request questionnaires that requester has to answer. The answers help server and governance team to understand the use cases, data sources, user base, and data governance requirements to decide if their use cases fit Tableau server or not, if they should share an existing site or a new site should be created. The key criteria are if same data sources exist in other site, if the user base overlaps with other site. It is balance between duplication of work vs. flexibility. Here are some scenarios when you may not create a new site:
    • If the requested site will use the same data sources as one of the existing sites, you may want to create a project within the existing site to avoid potential duplicate extracts (or live connections) running against the same source database.
    • If the requested site overlaps end users a lot with one existing site, you may want to create a project within the existing site to avoid duplicating user maintenance works.

As a summary, Tableau site is a great feature for large Tableau server implementations. Sites can be very useful to segment data and contents, distribute admin work, empower business for self-service, etc. However site misuse can create a lot extract work or even chaos later on. Thoughtful site strategy and governance process have to be developed before you start to implement sites although the process evolves toward its maturity as you go.

Tableau Filters

Tableau filters change the content of the data that may enter a Tableau workbook, dashboard, or view. Tableau has multiple filter types and each type is created with different purposes. It is important to understand who can change them and the order of each type of filter is executed. The following filters are numbered based on the order of execution.

A. Secure Filters: Filters that can be locked down to prevent unauthorized data access in all interfaces (i.e., Tableau Desktop, Web Edit mode, or standard dashboard mode in a web browser).

1. Data source filters: To be “secure” they must be defined on a data source when it is published. If they are defined in the workbook with live connection, Tableau Desktop users can still edit them. Think of these as a “global” filter that applies to all data that comes out of the data source. There is no way to bypass a data source filter.
2. Extract filters: These filters are only effective at the time the extract is generated. They will not automatically change the dashboard contents until the extract is regenerated/refreshed.

B. Accessible Filters: Can be changed by anyone that opens the dashboard in Tableau Desktop or in Web Edit mode, but not in regular dashboard mode in a web browser.

3. Context filters: You can think of a context filter as being an independent filter. Any other filters that you set are defined as dependent filters because they process only the data that passes through the context filter. Context filters are often used to improve performance. However if the context filter won’t reduce the number of records by 10% or more, it may actually slow the dashboard down.
4. Dimension filters: Filters on dimensions, you can think of as SQL WHERE clause.
5. Measure filters: Filters on measures, you can think of as SQL HAVING clause.

C. User Filters: Can be changed by anyone in Tableau Desktop, in Web Edit mode, or in regular dashboard mode in a web browser.

6. Quick filters: Commonly used end user filters.
7. Dependent quick filters: There are quick filters depends on another quick filter. Dependent quick filters can quickly multiply and slow down dashboard performance.
8. Filter actions: To show related information between a source sheet and one or more target sheets. This type of action works well when you are building guided analytical paths through a workbook or in dashboards that filter from a master sheet to show more details. These will seem the most “responsive” to end users in terms of user experience, as they don’t incur any processing time unless they are clicked on by the user.
9. Table calculation filters: Filters on the calculated fields.

Tableau 9.2 New Features

My last blog shared our Tableau enterprise server 9.2 upgrade experiences. Now we are focusing on training & learning  9.2 new features.

I am excited for Tableau 9.2 release, which features powerful upgrades to our Enterprise Tableau Self-Service environment. These include automated data preparation features, powerful Web Editing, enhanced enterprise data security, native iPhone support, unlimited map customization, and improved performance to help users using their data easier, smarter and faster.

Data Preparation Enhancements

New data preparation features in 9.2 mean people will spend less time preparing and searching for data and more time analyzing it. The data interpreter now not only cleans Excel spreadsheets, but also automatically detects sub-tables and converts them to tables that can be analyzed in Tableau. Data grid improvements make it easier to craft the ideal data source and quickly move on to analysis and the enhancements to the Data pane help people take fewer steps to find and update metadata.

Greater Web-Editing Flexibility

Web Editing (or Web Authoring) is a feature that enables Tableau Server users to edit and create visualizations on the fly without a license for Desktop. New features added in 9.2 include:
· Data: Edit the data within your projects with new in-browser capabilities:
o Create new fields from all or part of a formula.
o Change your field’s data type, default aggregation, and geographic role.
o Manage data blends
o Toggle fields between continuous and discrete.
o View icons that indicate which fields are linking data sources when working in workbooks with blended data.
· Dashboards: Directly access worksheets within a dashboard, and easily export an image or PDF of the dashboard.

Enhanced enterprise data security

Use the new permission controls to set default permissions for projects as well as the associated workbooks and data sources. With one click, administrators and project leaders can now lock a project’s permissions. When locked, all workbooks and data sources within a project are set to the project’s permissions and cannot be edited by individual publishers. This increases security for critical and the most sensitive data.

Native iPhone Support

People could always use their iPhones with their Tableau dashboards and visualizations, but the Tableau Mobile app is now available for the iPhone, making it easier for people to interact and access their data on the go. Tableau also introduced geolocation, which makes it possible to orient your map around your current location with a simple tap on a Tableau map in a mobile browser or on the Tableau iPad and iPhone app.

Unlimited Map Customization

Tableau 9.2 introduces more options for controlling map behavior and unlimited potential for map customization. Mapbox integration in Tableau Desktop means people can easily customize, brand, enhance and add context to maps delivering an unprecedented flexibility to create beautiful and contextually rich maps. Additionally, Tableau is expanding the support for international postal codes with the addition of Japanese postal codes and other data updates such as U.S. congressional districts.

Improved Performance

Who doesn’t want their visualizations and dashboards to render faster? Published workbooks take advantage of browser capabilities to display shape marks more quickly. Workbook legends are a little smarter to only redraw when visible changes are made. In addition, Tableau can cache more queries using its external query cache compression leading to leveraging our server memory better.

Merry Christmas and Happy New Year!

Tableau server 9.2 upgrade experience

Tableau 9.2 was released on Dec 7th. Our production Tableau 16-core server was upgraded to 9.2 on Dec 17. The upgrade process took about 3 hours. It was very smooth and easy for us.

Why upgrade? We have 260+ Desktop users. A lot of them saw the Desktop 9.2 upgrade reminder at their lower right corner of their Tableau Desktop. Some users ask if they can upgrade their Desktop. The problem is that any workbooks developed by Desktop 9.2 can’t be published to 9.1 Tableau server. It is a lot of education to ask 260+ Desktop users to hold for their Desktop upgrade. I wish that I would have a pop-up message to overwrite Tableau’s default Desktop upgrade reminder, but I do not have the option…

So our game plan is to upgrade the Tableau server ASAP.  We upgraded Stage server on Dec 10th, with one week of test & validation, we upgraded our production server to 9.2. Of course, 9.2 has some great features (like iPhone support, smart Data prep, Mapbox integration, Project permission, etc). Our intent is to let users to leverage those new features as soon as possible.

We just followed Tableau’s overall upgrade process @ http://onlinehelp.tableau.com/current/server/en-us/upgrade_samehrdwr.htm

For our configuration, the upgrade procedures used are as following:

a. Backup primary server configuration

b. Clean up logs

c. Create backup copy

d. Uninstall workers

e. Uninstall primary server

f. Install workers

g. Install primary server

h. Verify configuration settings

Tableau Data Extract API, Tableau SDK and Web Data Connector

If you are confused about Tableau Data Extract API, Tableau SDK and Web Data Connector, please read this blog.

Tableau Data Extract API, introduced in v8, is to create binary TDE files from data sources. You can use C, C++, Java or Python to code the Extract API that generates TDE files.

Tableau v9.1 incorporated existing Extract API into new Tableau SDK that has following features:

  • Extract API(existing v8 feature): create extracts from data sources
  • Server API (v9.1 new feature): enables auto publishing of extracts to server.
  • Mac + Linux support (v9.1 new feature)

Tableau v9.1 also released  Web Data Connector that is to build Tableau connectors to read data from web site data in JSON, XML, HTML formats. Web Data Connector is programmed by JavaScript & HTML.

Some comparisons:

Native Tableau Connectors Customer SQL ODBC Connections Tableau SDK Tableau Web Data Connector
Use case Live or extracts Relational Data Sources ODBC-compliant data sources Any data sources w/o native connectors or excel Web source data only
Output live data or TDE live data or TDE live data or TDE TDE file TDE file
Language n/a SQL SQL C, C++, Java, Python 2.6, 2,7 JavaScript, HTML
Publishing & Refreshing Tableau server Tableau server Tableau server Managed outside Tableau server Tableau server

What are the steps for developing and implementing Tableau SDK?

  1. Developer: Develop Extract API (C, C++, Java, Python)
  2. Publisher or Site Admin: Connect to server (URL, user, password, site ID) and publish the extract.
  3. Once TDE is published, others can leverage the TDE the same way as any other TDE.

What are the steps for developing and implementing Web Data Connector?

  1. Developer: Develop Web Data Connector (JavaScript & HTML)
  2. Server admin: Import a Web Data Connector to Tableau server (example tabadmin import_webdataconnector connector1.html)
  3. Publisher: Workbook to embed credentials to the data sources
  4. Site Admin: Schedule Web Data Connector refresh (similar with any other data source scheduling)

As summary, there are so many data sources that Tableau is not able to come up with all native connectors. So Tableau Data Extract API was released v8 to create TDE out of data source, then v9.1 added Server API feature to automate the publishing from TDE to server. Tableau calls Extract API and Server API bundle SDK from v9.1.

Web Data Connector is a brand new feature released in v9.1 to connect to  web data sources. For security concerns, new Web Data Connector has to be registered by Tableau server admin before it can be used. Web Data Connector is coded by JavaScript & HTML, however if you just use a Web Data Connector developed by others, you do not have to know JavaScript at all.

NetApp’s Tableau enterprise deployment added 2,500 users in less than 10 months

NetApp’s presentation about Tableau enterprise deployment is well received at Tableau conference 2015 Las Vegas – Survey shows 4.5 out of 5 on contents and 4.3 out of 5 for speaker presentation.

The key success factors for large scale Tableau server deployment are:

1. Create enterprise Tableau Council with members from both business and IT. NetApp’s Tableau Council has 10 members who are all Tableau experts from each BU & IT. Most of the Council members are from business. Council meets weekly to assess and define governance rules. This council is representatives of larger Tableau community.

2. Enable and support Tableau community within company. NetApp has a very active 300+ member Tableau community which are mainly Tableau Desktop license owners.  NetApp’s Tableau Intranet is the place for everything about Tableau.  Anyone can post any questions in community intranet and a few committed members ensure all questions are answered timely . NetApp also has monthly Tableau user CoE meeting, Hackathon, quarterly Tableau Day, and internal Tableau training program.

3. Define clear roles and responsibilities in new self-service analytics model. NetApp uses site strategy – each BU has its own site.

  • BU site admins are empowered to manage everything within his or her site: Local or departmental data sources, workbooks, user groups and permissions, QA/release/publishing process, user support, etc.
  • IT owns server management, server licenses, enterprise data extracts, technical consulting, performance auditing & data security auditing, etc
  • Business and IT partnership for learning, training, support and governance.

4. Define Tableau publishing or release process.  The question here  is how much IT should be involved for publishing or release? This is a simple question but very difficult to answer. Trust and integrity is at heart of NetApp culture. NetApp’s approach is that IT is not involved for any workbook publishing.  BU site admins are empowered to make  decisions for their own QA/test/release/publishing process.

There are two simple principles: One is test fist before production. Second is performance rule of thumb which is 5 second-10 second-20 second rule. Less than 5 second workbook render time is good.  Workbook render time more than 10 seconds is bad. No one should publish any workbook if render time is more than 20 seconds.

What if people do not follow? NetApp wants to give BU maximum flexibility and agility for release or publishing. However if rules are not followed, IT will have to step in and take control the release process. When it happens,  it will becomes weekly release process. Is this something which IT wants to do? No. Is this something that IT may have to if things go south.. Yes but hopefully not….

5. Performance management – trust but verify approach. Performance has been everyone’s concern when it comes to a shared platform, specially when each BU decides their own publishing criteria and IT does not gate the publishing.

How to protect the value of shared Tableau self-service environment? How to prevent one badly-designed query from bringing all servers to their knees? NetApp has done a couple of things:

  • First, set server policy to ensure Tableau platform healthy: like maximum workbook size, extract timeout limits, etc
  • Second, send out daily workbook performance alerts to site admin about their long running workbooks.
  • Third, make workbook performance matrix public so everyone in the community has visibility on the worst performed workbooks/views to create some peer pressures with good intent.

It is site admin’s responsibility to tune the workbook performance. If action is not taken, site admin will get a warning, which can lead to a site closure.

6. Must have data governance for self-service analytics platform. Objective is to ensure Tableau self-service compliance with existing data governance process, polices and controls that company has.

Data governance  is not ‘nice to have’ but ‘must have’ even for Tableau environment. NetApp has a pretty mature enterprise data governance (EDM) process. BI team works very closely with EDM team to identify & enforce critical controls. For example, IT has masked all sensitive human resource & federal data in enterprise tier 2 data warehouse from database layer so we have piece of mind when Tableau desktop users to explore the tier 2 data.

NetApp is also working on auditing process to identify potential data governance issues and working with data management team to address those, this is the verifying piece of ‘trust but verify model’.

The goal is to create a governed self-service analytics platform.  It has been a journey toward maturity of the enterprise self-service analytics model.

The attached is the presentation deck