Automation – Data Source Archiving

If you follow my previous blog Automation – Advanced Archiving  to archive workbooks,  overtime you may also need to archive data sources.

Why delete data sources?

If the workbook has embedded data sources, the embedded data will be deleted when the workbook is deleted. However if the workbook has separate published data sources, when the workbook is deleted, the published data source is not deleted.

First of all, when the workbook is deleted, you do not want to delete the published data source right away, why?

  • The published data source could be used by other workbooks
  • The published data source can still be used for new workbook later on

On the other side, it is possible that your server may have a lot orphan  published data sources not connected to any workbooks – those are the candidate for additional deletion, which is why this blog about.

How to delete data sources?

Good new is that there is Delete Data Source API : DELETE /api/api-version/sites/site-id/datasources/datasource-id

api-version The version of the API to use, such as 3.4. For more information, see REST API Versions.
site-id The ID of the site that contains the data source.
datasource-id The ID of the data source to delete.

How to decide what data sources to delete?

That is hard part.  The high level selection criteria should be as followings:

  1. Not connected to any workbooks  : Ref https://community.tableau.com/thread/230904
  2. Created a few weeks ago : Do not delete newly published data sources
  3. No usage for a period of time (like 3 months): It is possible the data source is for Ask Data only or for others to access via Desktop. Join historical_events and historical_event_types and look for Access Type = Access Data Source with specific hist data source idaccess_DB

 

 

 

 

 

 

 

 

Another way to identify those data source not used for long time is to use  the following criteria:

select datasource_id, ((now())::date – max(last_view_time)::date) as last_used
from _datasources_stats
where last_used > 90
group by datasource_id

Download Tableau Data Source Archiving Recommendation.twb

Conclusions:  It is a good idea not only to delete old workbooks but also old data sources. This is specially important if the workbook is deleted but the published data sources still have scheduled refresh.

The idea is to delete orphan data sources published for a period of time but has no more usage at all.

One thought on “Automation – Data Source Archiving”

Leave a Reply