Governed Self-Service Analytics: Data Governance (8/10)

I was in the panel discussion at Tableau Conference 2015 about self-service analytics to a group of executives. Guess what is the no.1 most frequent asked question – data governance. How to make sure that data not get out of hands? How to make sure that the self-service analytics does not break the existing organization’s process, policy around data protections, data governance?

Data governance is a big topic. Tableau’s Data Management Add-on model (Data Catalog, Tableau Prep Conduct) are making great progress toward data management. This blog focuses following 3 things:

  • Data governance for self-service analytics
  • How to enforce data governance in self-service environment
  • How to audit self-service environment
  1. Data governance for self-service analytics

First of all, what is data governance?

Data governance is a business discipline that brings together data quality, data management, data policies, business process management, and risk management surrounding the handling of data.

The intent is to put people in charge of fixing and preventing issues with data so that the enterprise can become more efficient.

The value of enterprise data governance is as followings:

  • Visibility & effective decisions: Consistent and accurate data visibility enables more accurate and timely business decisions
  • Compliance, security and privacy: Enable business to efficiently and accurately meet growing global compliance requirements

What data should be governed?

Data is any information in any of our systems. Data is a valuable corporate asset that indirectly contributes to organization’s performance.   Data in self-service analytics platform (like Tableau) definitely is part of data governance scope. All the following data should be governed:

  • Master Data: Data that is shared commonly across the company in multiple systems, applications and/or processes. Master Data should be controlled, cleansed and standardized at one single source. Examples: Customer master, product item master. Master data enable information optimization across systems, enable data enrichment, data cleaning and increase accuracy in reporting.
  • Reference Data: Structured data used in an application, system, or process. Often are common lists set once a fiscal year or with periodic updates. Examples like current codes, country codes, chart of accounts, sales regions, etc.
  • Transactional Data: The information recorded from transactions. Examples like user clicks, user registrations, sales transactions, shipments, etc. The majority of the enterprise data should be the transactional data. Can be financial, logistical or work-related, involving everything from a purchase order to shipping status to employee hours worked to insurance costs and claims. As a part of transactional records, transactional data is grouped with associated master data and reference data. Transactional data records a time and relevant reference data needed for a particular transaction record.

What are data governance activities?

  • Data ownership and definition: The data owner decides and approves the use of data, like data sharing/usage requests by other functions. Typically data owners are the executives of the business areas. One data owner is supported by many data stewards who are the operational point of accountability for data, data relationship and process definitions. The steward represents the executive owners and stakeholders. Data definition is what data steward’s responsibility although many people can contribute to the data definitions. In the self-service environment where data is made available to many analyst’s hands, it is business advantage to be able to leverage those data analyst’s knowledge and know-how about the data by allowing each self-service analyst to comment, tag the data, and then find a way to aggregate those comments/tags. This is again the community concept.
  • Monitor and corrective actions: This is an ongoing process to define process flow, data flow, quality requirement, business rules, etc. In the self-service environment where more and more self-service developers have capability to change metadata and create calculated fields to transform the data, it can be an advantage and can also become chaos if data sources and process are not defined within one business group.
  • Data process and policy: This is about exception handlings.
  • Data accuracy and consistency: Commonly known as data quality. This is where most of time and efforts are spent.
  • Data privacy and protection: There are too many examples that data leakage damages brand and causes millions for organizations. Some fundamental rules have to be defined and enforced for self-service enterprise to have a piece of mind.

2. How to enforce privacy and protection in self-service environment?

The concept here is to have thought leadership about top sensitive data before make data available for self-service consumption. To avoid potential chaos and costly mistakes, here is high level approach I use:

  • Define what are the top sensitive dataset for your organization (for example Personally Identifiable Information (PII) Tier classifications.
  • Then use Tableau’s Data Lineage Feature to find out any workbooks using those PII data.
  • Send alerts to workbook owners for the list of workbooks using PII.

3. What are the additional considerations for data privacy governance?  

  • No privacy and private data is allowed to self-service server. Like SSN, federal customer data, credit cards, etc. Most of those self-service platform (like Tableau) is defined for easy of use, and does not have the sophisticate data encrypt technologies.
  • Remove the sensitive data fields (like address, contacts) in database level before making the data available for self-service consumption. The reason is that it is really hard to control those data attributes once you open them to some business analytics super users.
  • Use site as partition to separate data, users, and contents for better data security. For example, finance is a separate site that has finance users only. Sales people have no visibility on finance site.
  • Create separate server instance for external users if possible. Put the external server instance in DMZ zone. Different level of network security will be applied as additional layer of security.
  • Create site for each partner / vendor to avoid potential problems. When you have multiple partners or vendors accessing your Tableau server, never put two vendors into same site. Try to create one site for each vendor to avoid potential surprises.

4. How to audit self-service environment?

You can’t enforce everything. You do not want to enforce everything either. Enforcement comes with disadvantages too, like inflexibility. You want to choose the most critical things to enforce, and then you leave the remaining as best practices for people to follow. Knowing the self-service analytics community always tries to find the boundary, you should have audit in your toolbox. And most importantly let community know that you have the auditing process.

  • What to audit:
    • All the enforced contents should be part of audit scope to make sure your enforcement works in the intended way
    • For all the policy that your BU or organization agreed upon.
    • For any other ad-hoc as needed
  • Who should review the audit results:
    • Self-service governance body should review the results
    • BU data executive owners are the main audiences of auditing reports. It is possible that executives gave special approvals in advanced for self-service analysts to work on some datasets that she or he does not have access normally. When they are too many exceptions, it is an indication of potential problem.
  • Roles and responsibilities of audit: Normally IT provides audit results while business evaluate risks and make decisions about process changes.
  • How to audit: Unfortunately Tableau does not have a lot of server audit features. There is where a lot of creativities come into play. VizAlert can be used. Often creating workbooks from Tableau database directly is the only way to audit.

Please read next blog about content management.

One thought on “Governed Self-Service Analytics: Data Governance (8/10)”

Leave a Reply