Tableau Data Extract API, Tableau SDK and Web Data Connector

If you are confused about Tableau Data Extract API, Tableau SDK and Web Data Connector, please read this blog.

Tableau Data Extract API, introduced in v8, is to create binary TDE files from data sources. You can use C, C++, Java or Python to code the Extract API that generates TDE files.

Tableau v9.1 incorporated existing Extract API into new Tableau SDK that has following features:

  • Extract API(existing v8 feature): create extracts from data sources
  • Server API (v9.1 new feature): enables auto publishing of extracts to server.
  • Mac + Linux support (v9.1 new feature)

Tableau v9.1 also released  Web Data Connector that is to build Tableau connectors to read data from web site data in JSON, XML, HTML formats. Web Data Connector is programmed by JavaScript & HTML.

Some comparisons:

Native Tableau Connectors Customer SQL ODBC Connections Tableau SDK Tableau Web Data Connector
Use case Live or extracts Relational Data Sources ODBC-compliant data sources Any data sources w/o native connectors or excel Web source data only
Output live data or TDE live data or TDE live data or TDE TDE file TDE file
Language n/a SQL SQL C, C++, Java, Python 2.6, 2,7 JavaScript, HTML
Publishing & Refreshing Tableau server Tableau server Tableau server Managed outside Tableau server Tableau server

What are the steps for developing and implementing Tableau SDK?

  1. Developer: Develop Extract API (C, C++, Java, Python)
  2. Publisher or Site Admin: Connect to server (URL, user, password, site ID) and publish the extract.
  3. Once TDE is published, others can leverage the TDE the same way as any other TDE.

What are the steps for developing and implementing Web Data Connector?

  1. Developer: Develop Web Data Connector (JavaScript & HTML)
  2. Server admin: Import a Web Data Connector to Tableau server (example tabadmin import_webdataconnector connector1.html)
  3. Publisher: Workbook to embed credentials to the data sources
  4. Site Admin: Schedule Web Data Connector refresh (similar with any other data source scheduling)

As summary, there are so many data sources that Tableau is not able to come up with all native connectors. So Tableau Data Extract API was released v8 to create TDE out of data source, then v9.1 added Server API feature to automate the publishing from TDE to server. Tableau calls Extract API and Server API bundle SDK from v9.1.

Web Data Connector is a brand new feature released in v9.1 to connect to  web data sources. For security concerns, new Web Data Connector has to be registered by Tableau server admin before it can be used. Web Data Connector is coded by JavaScript & HTML, however if you just use a Web Data Connector developed by others, you do not have to know JavaScript at all.

NetApp’s Tableau enterprise deployment added 2,500 users in less than 10 months

NetApp’s presentation about Tableau enterprise deployment is well received at Tableau conference 2015 Las Vegas – Survey shows 4.5 out of 5 on contents and 4.3 out of 5 for speaker presentation.

The key success factors for large scale Tableau server deployment are:

1. Create enterprise Tableau Council with members from both business and IT. NetApp’s Tableau Council has 10 members who are all Tableau experts from each BU & IT. Most of the Council members are from business. Council meets weekly to assess and define governance rules. This council is representatives of larger Tableau community.

2. Enable and support Tableau community within company. NetApp has a very active 300+ member Tableau community which are mainly Tableau Desktop license owners.  NetApp’s Tableau Intranet is the place for everything about Tableau.  Anyone can post any questions in community intranet and a few committed members ensure all questions are answered timely . NetApp also has monthly Tableau user CoE meeting, Hackathon, quarterly Tableau Day, and internal Tableau training program.

3. Define clear roles and responsibilities in new self-service analytics model. NetApp uses site strategy – each BU has its own site.

  • BU site admins are empowered to manage everything within his or her site: Local or departmental data sources, workbooks, user groups and permissions, QA/release/publishing process, user support, etc.
  • IT owns server management, server licenses, enterprise data extracts, technical consulting, performance auditing & data security auditing, etc
  • Business and IT partnership for learning, training, support and governance.

4. Define Tableau publishing or release process.  The question here  is how much IT should be involved for publishing or release? This is a simple question but very difficult to answer. Trust and integrity is at heart of NetApp culture. NetApp’s approach is that IT is not involved for any workbook publishing.  BU site admins are empowered to make  decisions for their own QA/test/release/publishing process.

There are two simple principles: One is test fist before production. Second is performance rule of thumb which is 5 second-10 second-20 second rule. Less than 5 second workbook render time is good.  Workbook render time more than 10 seconds is bad. No one should publish any workbook if render time is more than 20 seconds.

What if people do not follow? NetApp wants to give BU maximum flexibility and agility for release or publishing. However if rules are not followed, IT will have to step in and take control the release process. When it happens,  it will becomes weekly release process. Is this something which IT wants to do? No. Is this something that IT may have to if things go south.. Yes but hopefully not….

5. Performance management – trust but verify approach. Performance has been everyone’s concern when it comes to a shared platform, specially when each BU decides their own publishing criteria and IT does not gate the publishing.

How to protect the value of shared Tableau self-service environment? How to prevent one badly-designed query from bringing all servers to their knees? NetApp has done a couple of things:

  • First, set server policy to ensure Tableau platform healthy: like maximum workbook size, extract timeout limits, etc
  • Second, send out daily workbook performance alerts to site admin about their long running workbooks.
  • Third, make workbook performance matrix public so everyone in the community has visibility on the worst performed workbooks/views to create some peer pressures with good intent.

It is site admin’s responsibility to tune the workbook performance. If action is not taken, site admin will get a warning, which can lead to a site closure.

6. Must have data governance for self-service analytics platform. Objective is to ensure Tableau self-service compliance with existing data governance process, polices and controls that company has.

Data governance  is not ‘nice to have’ but ‘must have’ even for Tableau environment. NetApp has a pretty mature enterprise data governance (EDM) process. BI team works very closely with EDM team to identify & enforce critical controls. For example, IT has masked all sensitive human resource & federal data in enterprise tier 2 data warehouse from database layer so we have piece of mind when Tableau desktop users to explore the tier 2 data.

NetApp is also working on auditing process to identify potential data governance issues and working with data management team to address those, this is the verifying piece of ‘trust but verify model’.

The goal is to create a governed self-service analytics platform.  It has been a journey toward maturity of the enterprise self-service analytics model.

The attached is the presentation deck