Governed Self-Service Analytics: Content Management (9/10)

When executives get reports from IT-driven BI system, they trust the numbers. But if the reports are from spreadsheet, which can change anytime, they lower the trust level. If same spreadsheet is used to create Tableau visualization and be shared to executives for decision-making, does the trust level get increased? Can important business decisions be made based on the Tableau reports?

I am not against Tableau or visualization at all. I am a super Tableau fan. I love Tableau’s mission to help people to see and understand their data better. On the other side, as we all know that any dashboard is only as good as its data. How to provide trustworthy contents to end consumers? How to avoid the situation that some numbers are put into10K report while team is still baking the data definition?

The answer is to create a framework of content trust level indicator for end consumers. We do not want to slow down any innovation or discovery by self-service business analysts who still create their own analytics and publish workbooks. After dashboard is published, IT tracks the usages, identifies most valuable contents per defined criteria, certifies the data & contents so end consumers can use the certified reports the same way as reports from IT-driven BI. See the diagram below for overall flow:

Content

When you have a data to explore or you have a new business question to answer, hopefully you have report catalog to search if similar report is available to leverage. If yes, you do not have to develop it anymore although you may need to request an access to the report if you do not have access to it. If the visualization is not exactly what you are looking for but data attributes are there, you can always modify it to create your own version of visualization.

If there is no existing report available, you can also search published data source catalog to see if there is available published data source for you to leverage. If yes, you can create new workbooks by leveraging existing published data connections.

You may still need to bring your own data for your discovery. The early stage of discovery and analysis goes multi-iteration. Initial user feedback helps to reduce the overall time to market for your dashboards. At some point of time when your dashboard is good enough and is moved to production folder to share with a lot of more users, it will fall into track, identify and certify cycle.

Content cycle

What to track? Different organizations will have different answers. Here are examples:

  • Data sources with high hits
  • Reports accessed most frequently
  • Most active users
  • Least used reports for retirement

How to identify the most critical reports?

  • Prioritize based on usage (# of users, use cases, purpose, x-functional, benefits)
  • Prioritize based on data source and contents (data exist in certified env, etc)
  • Prioritize based on users. If CEO uses the report, it must be critical one for the organization

How to certify the critical reports? It is an on-going process:

  • Incrementally add self-service data to source of truth so data governance process can cover the new data sets (data definitions, data stewardship, data quality monitoring, etc)
  • Recreating dashboards (if needed) for better performance, add-on functionality, etc
  • Label the report with report trustworthy indicator

The intent of tracking, identifying and certifying cycle is to certify the most valuable reports in your organization. The output of the process is the report trustworthy indicator that helps end consumers to understand the level of trustworthy of data and reports.

End information consumers continue to use your visualizations that would be replaced with certified reports steps by steps, which is an on-going process. The certified reports will have trustworthy indicators on them.

What is the report trustworthy indicator? You can design multi level of trustworthy indicators. For example:

  • SOX certified:
    • Data Source Certified
    • Report Certified
    • Release Process Controlled
    • Key Controls Documented
    • Periodic Reviews
  • Certified reports:
    • Data Source Certified
    • Report Certified
    • Follow IT Standard Release Process
  • Certified data only
    • Data Source Partially Certified
    • Business Self-Service Releases
    • Follow Tableau Release Best Practices
  • Ad-Hoc
    • Business Self-Service Releases
    • Follow Tableau Release Best Practices

Content gov
As summary, content management helps to reduce the duplications of contents and data sources, and provide end information consumers with trustworthy level of the reports so proper decisions can be made based on the reports and data. The content management process outline above shows how to create the enterprise governance without slowing down innovations.

Please read next blog about governance mature level.

Governed Self-Service Analytics: Data Governance (8/10)

I was in the panel discussion at Tableau Conference 2015 about self-service analytics to a group of executives. Guess what is the no.1 most frequent asked question – data governance. How to make sure that data not get out of hands? How to make sure that the self-service analytics does not break the existing organization’s process, policy around data protections, data governance?

Data governance is a big topic. This blog focuses following 3 things:

  • Data governance for self-service analytics
  • How to enforce data governance in self-service environment
  • How to audit self-service environment
  1. Data governance for self-service analytics

First of all, what is data governance?

Data governance is a business discipline that brings together data quality, data management, data policies, business process management, and risk management surrounding the handling of data.

The intent is to put people in charge of fixing and preventing issues with data so that the enterprise can become more efficient.

The value of enterprise data governance is as followings:

  • Visibility & effective decisions: Consistent and accurate data visibility enables more accurate and timely business decisions
  • Compliance, security and privacy: Enable business to efficiently and accurately meet growing global compliance requirements

What data should be governed?

Data is any information in any of our systems. Data is a valuable corporate asset that indirectly contributes to organization’s performance.   Data in self-service analytics platform (like Tableau) definitely is part of data governance scope. All the following data should be governed:

  • Master Data: Data that is shared commonly across the company in multiple systems, applications and/or processes. Master Data should be controlled, cleansed and standardized at one single source. Examples: Customer master, product item master. Master data enable information optimization across systems, enable data enrichment, data cleaning and increase accuracy in reporting.
  • Reference Data: Structured data used in an application, system, or process. Often are common lists set once a fiscal year or with periodic updates. Examples like current codes, country codes, chart of accounts, sales regions, etc.
  • Transactional Data: The information recorded from transactions. Examples like user clicks, user registrations, sales transactions, shipments, etc. The majority of the enterprise data should be the transactional data. Can be financial, logistical or work-related, involving everything from a purchase order to shipping status to employee hours worked to insurance costs and claims. As a part of transactional records, transactional data is grouped with associated master data and reference data. Transactional data records a time and relevant reference data needed for a particular transaction record.

What are data governance activities?

  • Data ownership and definition: The data owner decides and approves the use of data, like data sharing/usage requests by other functions. Typically data owners are the executives of the business areas. One data owner is supported by many data stewards who are the operational point of accountability for data, data relationship and process definitions. The steward represents the executive owners and stakeholders. Data definition is what data steward’s responsibility although many people can contribute to the data definitions. In the self-service environment where data is made available to many analyst’s hands, it is business advantage to be able to leverage those data analyst’s knowledge and know-how about the data by allowing each self-service analyst to comment, tag the data, and then find a way to aggregate those comments/tags. This is again the community concept.
  • Monitor and corrective actions: This is an ongoing process to define process flow, data flow, quality requirement, business rules, etc. In the self-service environment where more and more self-service developers have capability to change metadata and create calculated fields to transform the data, it can be an advantage and can also become chaos if data sources and process are not defined within one business group.
  • Data process and policy: This is about exception handlings.
  • Data accuracy and consistency: Commonly known as data quality. This is where most of time and efforts are spent.
  • Data privacy and protection: There are too many examples that data leakage damages brand and causes millions for organizations. Some fundamental rules have to be defined and enforced for self-service enterprise to have a piece of mind.

2. How to enforce privacy and protection in self-service environment?

The concept here is to have thought leadership about top sensitive data before make data available for self-service consumption. To avoid potential chaos and costly mistakes, define what are the top sensitive dataset for your organization, then have IT to create enforcement in database layer so self-service users can’t mess up. Here is list of examples of what should be enforced to have a piece of mind:

  • No privacy and private data is allowed to self-service server. Like SSN, federal customer data, credit cards, etc. Most of those self-service platform (like Tableau) is defined for easy of use, and does not have the sophisticate data encrypt technologies.
  • Remove the sensitive data fields (like address, contacts) in database level before making the data available for self-service consumption. The reason is that it is really hard to control those data attributes once you open them to some business analytics super users.
  • Use site as partition to separate data, users, and contents for better data security. For example, finance is a separate site that has finance users only. Sales people have no visibility on finance site.
  • Create separate server instance for external users if possible. Put the external server instance in DMZ zone. Different level of network security will be applied as additional layer of security.
  • Create site for each partner / vendor to avoid potential problems. When you have multiple partners or vendors accessing your Tableau server, never put two vendors into same site. Try to create one site for each vendor to avoid potential surprises.

3. How to audit self-service environment?

You can’t enforce everything. You do not want to enforce everything either. Enforcement comes with disadvantages too, like inflexibility. You want to choose the most critical things to enforce, and then you leave the remaining as best practices for people to follow. Knowing the self-service analytics community always tries to find the boundary, you should have audit in your toolbox. And most importantly let community know that you have the auditing process.

  • What to audit:
    • All the enforced contents should be part of audit scope to make sure your enforcement works in the intended way
    • For all the policy that your BU or organization agreed upon.
    • For any other ad-hoc as needed
  • Who should review the audit results:
    • Self-service governance body should review the results
    • BU data executive owners are the main audiences of auditing reports. It is possible that executives gave special approvals in advanced for self-service analysts to work on some datasets that she or he does not have access normally. When they are too many exceptions, it is an indication of potential problem.
  • Roles and responsibilities of audit: Normally IT provides audit results while business evaluate risks and make decisions about process changes.
  • How to audit: Unfortunately Tableau does not have a lot of server audit features. There is where a lot of creativities come into play. VizAlert can be used. Often creating workbooks from Tableau database directly is the only way to audit.

Please read next blog about content management.

Governed Self-Service Analytics: Performance Management (7/10)

Performance management has been everyone’s concerns when it comes to a shared self-service environment since nobody wants to be impacted by others. This is especially true when each business unit decides their own publishing criteria where central IT team does not gate the publishing process.

How to protect the shared self-service environment? How to prevent one badly designed query from bringing all servers to their knees?

  • First, set server parameters to enforce policy.
  • Second, create daily alerts for any slow dashboards.
  • Third, made performance metrics public to your internal community so everyone in the community has visibility of the worse performed dashboards to create some peer pressures with good intent.
  • Fourth, hold site admin or business leads to be accounted for the self-service dashboard performance.

You will be in good shape if you do those four things above. Let me explain each of those in details.

performance

  1. Server policy enforcement

The server policy setting is for enforced policies. For anything that can be enforced, it is better to enforce those so everyone can have a piece of mind. The enforced parameters should be agreed upon business and IT, ideally in the governance council. The parameters can always be reviewed and revised when situation changes.

Examples of commonly enforced parameters like overall sizing allocation for a site, extracting time out, etc.

  1. Exception alerts

There are only a few limited parameters that you are able to control as enforcement. All the rest will have to be governed by process. The alerts are most common approach to server as exception management:

  • Performance alert: Create alerts when dashboard render time exceeds agreed threshold.
  • Extract size alerts: Create alerts when extract size exceed define thresholds (Extract timeout can be enforced on server but not size).
  • Extract failure alerts: Create alerts for failed extracts. Very often stakeholders will not know the extract failed. It is essential to let owners know his or her extracts failed so actions can be taken timely.
  • You can create a lot of more alerts, like CPU usage, overall storage, memory, etc.

How to do the alerts? There are multiple choices. My favorite one is VizAlert for Tableau https://community.tableau.com/groups/tableau-server-email-alert-testing-feedbac

Who should receive the alerts? It depends. A lot of alerts are for server admin team only, like CPU usage, memory, storage, etc. However most of the extracts and performance alerts are for the content owners. One best practice for content alert is always to include site admins or/and project owners as part of alerts. Why? Workbook owners may change jobs so the original owner may not be responsible for the workbooks anymore. I was talking with a well known Silicon Valley company recently, they are telling me that a lot of workbook owner changed in last 2 years, they had hard time to figure out whom they should go after for issues related to workbooks. Site admin should be able to help to identify the new owners. If site admin is not close enough to workbook level in your implementation, you can choose project leaders instead of site admin.

What should be the threshold? There is no universal answer. But nobody wants to wait for more than 10 seconds. The rule of thumb is that anything less than 5 seconds good. However anything more than 10 seconds is no good. I got a question when I present this in one local Tableau event. The question was what if one specific query used to take 30 minutes, and team made great progress to reduce it to 3 minutes. Do we allow this query to be published and run on server? The answer is depends. If the view is so critical for business, it will be of course worth of waiting 3 minutes for results to render. Everything has exception. However if the 3-minute query chokes everything else on the server and users may click the view to trigger the query often, you may want to re-think the architecture. Maybe the right answer will be to spin-off another server for this mission critical 3-minute application only so the rest of users will not impact.

Yellow and red warning: It is a good practice to create multi-level of warning like yellow and red warning with different threshold. Yellow alerts are warnings while red alerts are for actions.

You may say, hi Mark, this all sounds great but what if people do not take the actions.

This is exactly where some self-service deployments go wrong. There is where governance comes to play. In short, you need to have strong and agreed-upon process enforcement:

  • Some organizations use charging back process to motivate good behaviors. The charge back will influence people’s behaviors but will not be able to enforce anything.
  • The key process enforcement is a penalty system when red alert actions are not taken timely.

If owner did not take corrective actions during agreed period of time for red warning, a meeting should be arranged to discuss the situation. If the site admin refuses to take actions, the governance body has to make decision for agreed-upon penalty actions. The penalty can lead to site suspension. Once a site is suspended, nobody can excess any of the contents anymore except server admins. The site owners have to work on the improvement actions and show the compliances before site can be re-activated. The good news is that all the contents are still there when a site is suspended and it takes less than 10 seconds for server admin to suspend or re-active a site.

I had this policy that was agreed with governance body. I communicate to as many self-service developers about this policy as I can. I never got push back about this policy. It is clear to me that self-service community likes to have a strong and clearly defined governance process to ensure everyone’s success. I suspended a site for some other reasons but never had to suspend a site due to performance alerts. What happens is that is my third tricky about worse performed dashboard visibility.

  1. Make performance metric public

It takes some efforts to make your server dashboard performance metric public to all your internal community. But it turns out that it is one of the best things that a server team can do. It has a few benefits:

  • It serves as a benchmarking for community to understand what is good and good enough since the metric shows your site overall performance comparing with others on the server
  • It shows all the long render dashboards to provide peer pressures.
  • It shows patterns that help people to focus the problematic areas
  • It creates great opportunity for community to help each other. This is one most important success factor. What turns out is that the problematic areas are often the new team on-boarded to the server. It community always have so many ideas to make dashboard perform a lot of better. This is why we never had to suspend any sites since when it comes with a lot of red alerts that community is aware of, it is the whole community that makes things happen, which is awesome.
  1. Hold site admin accounted for

I used to manage Hewlett Packard’s product assembly line during my early career. Hewlett Packard has some well-known quality control processes. One thing that I learned was that each assembler is response for his or her own quality. Although there is QA at the end of line but each workstation has a checklist before pass to next station. This simple philosophy applies today’s software development and self-service analytics environment. The site admin is responsible for the performance of workbooks in the sites. The site admin can further hold workbook owners accounted for the shared workbooks. Flexibility comes with accountability too.

I believe theory Y (people have good intent and want to perform better) and I have been practicing theory Y for years. The whole intent of server dashboard performance management is to provide performance visibility to community and content owners so owners know where the issues are so they can take actions.

What I see often is that a well-performed dashboard may become bad over-time due to data changes and many other factors. The alerts will catch all of those exceptions no matter your dashboards are released yesterday, last week, last month or last year – this approach is a lot of better than gating releases process which is a common IT practice.

During a recent Run-IT as business meet-up, audiences were skeptical when I said that IT did not gate any workbook publishing process and it is completely a self-service. Then audiences started to realize that it did make sense when I started to talk about performance alerts that will catch it all together. What business likes most about this approach is the freedom to push some urgent workbooks to server even workbooks are not performing great – they can always come back later on to tune them and make them perform better for both better use experiences and being good citizen.

Please continue to read next blog about data governance.

GOVERNED SELF-SERVICE ANALYTICS: PUBLISHING (6/10)

The publishing process & policy covers the followings areas:  Engagement Process; Publisher Roles; Publishing Process and Dashboard Permissions.

PublishingFirst step is to get a  space on the shared enterprise self-service server for your group’s data and self-service dashboard, which is called Engagement Process. The main questions are:

  • From requester perspective, how to request a space on shared enterprise self-service server for my group
  • From governance perspective, who decides and how to decide the self-service right fit?

Once a business group has a space on the shared enterprise self-service server, the business group has to ask the following questions:

  • Who can publish dashboard from your group?
  • Who oversees or manages all publishers in my group?

After you have given a publishing permission to some super users from your business group, those publishers need to know the rules, guidance, constraints on server, and best practices for effective dashboard publishing. Later on you may also want to make sure that your publishers are not creating the islands of information or creating multiple versions of KPIs.

  • What are publishing rules?
  • How to avoid duplications?

The purpose of publishing is to share your reports, dashboards, stories and insights to others who can make data-driven decisions. The audiences are normally defined already before you publishing the dashboards although dashboard permissions are assigned after publishing from workflow perspective. The questions are:

  • Who can access the published dashboards?
  • What is the approval process?

Engagement Process

Self-service analytics does not replace traditional BI tools but co-exists with traditional BI tools. It is very rare that you will find self-service analytics platform is the only reporting platform in your corporation. Very likely that you will have at least one IT-controlled enterprise-reporting platform designed for standard reporting to answer known questions using data populated from enterprise data warehouse. In additional to this traditional BI reporting platform, your organization has decided to implement a new self-service analytics platform to answer unknown questions and ad-hoc analysis using all the available data sources.  Co-exist

This realization of traditional BI and self-service BI co-existing is important to understand this engagement process because guidance has to be defined which platform does what kinds of reporting. After this guidance is defined and agreed, continuous communication and education has to be done to make sure all self-service super users are in the same page for this strategic guidance.

Whenever there is a new request for a new self-service analytics application, fitness assessment has to be done before proceed. The following checklist serves this purpose:

  • Does your bigger team have an existing site already on self-service analytics server? If yes, you can use the existing site.
  • Who is the primary business / application contact?
  • What business process / group does this application represent? (like sales, finance, etc)?
  • Briefly describe the purpose and value of the application?
  • Do you have an IT contact for your group for this application? Who is the contact?
  • What are the data sources?
  • Are there any sensitive data to be reporting on (like federal data, customer or client data)? If yes, describe in details about the source data.
  • Are there any private data as part of source data? (like HR data, sensitive finance data)
  • Who are the audiences of the reports? How many audiences do you anticipate? Are there any partners who will access the data?
  • Does the source data have more than one enterprise data? If yes, what is the plan for data level security?
  • What are the primary data elements / measures to be reporting on (e.g. booking, revenue, customer cases, expenses, etc)
  • What will be the dimensions by which the measure will be shown (e.g. product, period, region, etc)
  • How often the source data needs to be refreshed?
  • What is anticipated volume of source data? How many quarters of data? Roughly how many rows of the data? Roughly how many columns of the data?
  • Is the data available in enterprise data warehouse?
  • How many self-service report developers for this application?
  • Do you agree with organization’s Self-Service Analytics Server Governance policy (URL ….)?
  • Do you agree with organization’s Self-Service Analytics Data Governance policy (URL ….)?

The above questionnaires also include your organization’s high-level policies on data governance, data privacy, service level agreement, etc since most of the existing self-service tools have some constraints in those areas. On one side, we want to encourage business teams to leverage the enterprise investment of the self-service analytics platform. On the other side, we want to make sure that every new application is setup for success and do not create chaos that can be very expensive to fix later on.

Publisher Roles

I heard a lot of exciting stories about how easy people can get new insights with visualization tools (like Tableau). Myself experienced a few of those insightful moments as well. However I also heard a story about new Tableau Desktop user who just came out of fundamental training, he quickly published something and shared to the team but caused a lot of confusions about the KPIs being published. What is wrong? It is not about the tool. It is not about the training but publishing roles and related process.

The questions are as followings:

  • Who can publish dashboard from my group?
  • Who oversees or manages all publishers in my group?

Sometimes you may have easy answers to those questions but you may not have easy answers for many other cases. One common approach is to use projects or folders to separate boundary for various publishers. Each project has project leader role who overalls all publishers within the project.

You can also define a common innovation zone where a lot of publishers can share their new insights to others. However just be aware that the dashboards in innovation zone are early discovery phase only and not officially agreed KPIs. Most of the dashboards will go through multiple iterations of feedback and improvement before become useful insights. We do encourage people to share their new innovations as soon as possible for feedback and improvement purpose. It will be better to distingue official KPIs with innovation by using different color templates to avoid the potential confusions to end audiences.

Publishing Process

To protect the shared self-service environment, you need to have clear defined publishing process:

  • Does IT have to be involved before publish a dashboard to the server?
  • Do you have to go from a non-production instance or non-production folder to a production folder?
  • What is the performance guidance?
  • Should you use live connection or extracts?
  • How often you should schedule your extracts? Can you use full refresh?
  • What are the data security requirements?
  • Do you have some new business glossary in your dashboards? If yes, did you spell out the definition of the new business glossary?
  • Does the new glossary definition need to get approval from data stewardship? Did you get the approval?
  • Who supports the new dashboards?
  • Does this new dashboard create potential duplication with existing ones?

Each organization or each business group will have different answers to those above questions. The answers to above questions form the basic publishing process that is essential for scalability and avoid chaos.

Here is summary of what most companies do – so call the common best practices:

  1. IT normally is not involved for the releasing or publishing process for those dashboards designed by business group – this is the concept of self-service
  2. IT and business agreed on the performance and extract guidance in advanced. IT will enforce some of the guidance on  server policy settings (like extract timeout thresholds, etc). For many other parameters that can’t be systematically enforced, business and IT agreed on alert process to detect the exceptions. For example a performance alert that will be sent to dashboard owner and project owner (or site admin) if dashboard renders time exceeds 10 seconds.
  3. Business terms or glossary definition are important part of the dashboards.
  4. Business support process is defined so end information consumers know how to get help when they have questions about the dashboard or data.
  5. Dashboards are clarified as certified and non-certified. Non-certified dashboards are for feedback purpose while certified ones are officially approved and supported.

Dashboard Permissions

When you design a dashboard, most likely you have audiences defined already. The audiences have some business questions; your dashboards are to answer those questions. The audiences should be classified into groups and your dashboards can be assigned to one or multiple groups.

If your dashboards have row level security requirements, the complexity of dashboards will be increased many times. It is advised that business works with IT for the row level security design. Many self-service tools have limitations for row level security although they all claim row level security capability.

The best practice is to let database handle row level security to ensures data access consistence when you have multiple reporting tools against the same database. There are two challenges to figure out:

  • Self-service visualization tool has to be able to pass session user variable dynamically to database. Tableau starts to support this feature for some database (like query banding feature for Teradata or initial SQL for Oracle)
  • Database has user/group role tables implemented.

As summary, publishing involves a set of controls, process,  policy and best practices. While support self-service and self-publishing, rules and processes have to be defined to avoid potential expensive mistakes later on.

Please read next blog for performance management