Query Tags: Turning Precision into Visibility

image_pdf

Background

6 years ago I wrote a post entitled “Actionable Intelligence” in which I described flipping the script on the data storytelling world. Rather than presenting a story about the data you have, you first consider what story you need to tell. Then you make sure you have the data to tell it. When I wrote “The Hidden Lens Every CFO Needs On Cloud Data Spend” I shaped an insightful story of how technical precision could be turned into strategic visibility which is needed. Query Tags are the key to revealing that hidden lens of traceability. While that post was intended more for the big cheeses in the organization, this post is for the data people in the organization. You know: Those moving the data, as well as those consuming the data.

The Foundation: A Flexible JSON Framework for Query Tags

In “The Hidden Lens Every CFO Needs On Cloud Data Spend” I explained that it is Query Tags that provide the hidden lens into the accountability and intelligence when it comes to cloud database costs. Each cloud database vendor varies in how you apply those query tags. But all are TEXT based. While you could define your query tag to be “Bob the Big Cheese Told Me to Insert this Data.” While it might be true, it is not a good foundation from which to move forward. I suggest a JSON framework for those query tags instead. One that lets you identify all of the pieces of information necessary to answer the questions that your organization can’t answer from the black hole that is the current bill.

Based on the expanded list I suggested in that post, your JSON structure would look like this with simple field/value pairs.

{
  “business_unit”: “”,
  “project”: “”,
  “environment”: “”,
  “process”: “”,
  “priority”: “”,
  “application”: “”,
  “deployment”: “”,
  “process_owner”: “”
}

If your organization is a giant conglomerate and business_unit won’t provide the clarity intended, perhaps you will need to add another fields like “region, territory, team” etc. If everything your organization does is in the cloud, you won’t have need for the “deployment” value, as none of the entries would be “on-premise.” In other words, this is my suggestion for a framework, not necessarily the exact blue print you should espouse for your company.

It’s always helpful to see an example, so here you go:

{
  “business_unit”: “Sales”,
  “project”: “Q1_Forecasting”,
  “environment”: “Prod”,
  “process”: “Data Load”,
  “priority”: “High”,
  “application”: “Qlik Replicate”,
  “deployment”: “On-Premise”,
  “process_owner”: “Qlik Dork”
}

Tagging at the Source: Implementing with Qlik Replicate

Until now I’ve been agnostic in terms of implementations. For the remainder of the post I will be talking solely to Qlikkies and the content will be all about how they can implement this framework. Specifically, these instructions revolve around Snowflake as the cloud database provider.

As you know Qlik Replicate involves push down SQL. Which means you can “tag” all of the SQL Queries that are pushed down to Snowflake.

Step 1: Edit your Snowflake Endpoint for your project

Step 2: Click on the “Advanced” tab

Step 3: Click on Internal Parameters.

Step 4: After the “Internal Parameters” dialog is presented to you, type “afterConnectScript” into the search. As soon as you are finished it will show a drop-down with “afterConnectScript” and you need to click on it. The concept here is that once your session is connected to Snowflake, we want to set a session value that will be used for the remaining of our session. In other words we want everything tagged with what we indicated should be used. ie Our framework.

Step 5: The system will then allow you to input a value.

Step 6: Please refer to the Snowflake documentation for Alter Session to understand all of the values that we could potentially set. The key is that it uses this basic syntax: “alter session set query_tag = ‘{tag you want}’

As I shared before it wants a TEXT value. For the framework example I used earlier, the Value for the afterConnectScript Parameter would be:

alter session set query_tag = ‘{ “business_unit”: “Sales”,   “project”: “Partner Testing”,   “environment”: “Test”,   “process”: “Replication”,   “priority”: “Trial”,   “application”: “Replicate “,   “deployment”: “On-Premise”,   “owner”: “QlikDork” }’

Obviously you need to input your actual values, and not those from this example. Once you have input the value press “Ok.”

Qlik Replicate provides an afterConnectScript parameter that can be used to set the query tags for the session as data is being moved into your cloud database

Step 7: Press Save for your endpoint

Step 8: Now when you run this Replication task all of the queries that are pushed down to Snowflake will be “tagged” and will visibility and clarity. No more black holes of just a bunch of queries against Snowflake.

Snowflake query details revealing the query tags that we sent for the session in which the query was executed

Closing the Loop: Query Tags in Qlik Sense

Your organization very well may interact with your cloud database much more for analytics/artificial intelligence than you do in pushing data into it. Thus it is even more vital that you utilize query tags as for your consumption queries.

Step 1: The first step is to ensure that all of your Snowflake connections are configured to allow you to add the query tags needed. Simply edit each of your Snowflake connections and check the box under Miscellaneous to allow non-select queries. You may well have done that for some of your connections in the past for various reasons. So, don’t be surprised if you edit them and find this already checked.

Step 2: When you issue a LIB CONNECT TO statement a “session” is being established with Snowflake in order to communicate and issue queries. Again please refer to the Snowflake documentation for Alter Session to understand all of the values that we could potentially set. As with our use for Qlik Replicate we want to alter that session and the JSON framework for our extraction queries against Snowflake. To do that we simply need to add a line of code like this and be sure an notice that in addition to configuring the connection to allow the non-select queries, we also have to add a statement that clearly indicates we don’t expect any return from our command:

LIB CONNECT TO ‘Snowflake Connection Name’;
SQL alter session set query_tag = ‘query tag’ !EXECUTE_NON_SELECT_QUERY;

In our case imagine that we are going to be extracting our Sales data that was moved into Snowflake by Replicate. The particular project is to track Quarterly Achievement numbers. For Replicate I used a priority “trial” suggesting you might have “trial/low/medium/high” type priorities. I also encouraged you to make the values fit your organization. So, this time I used T1 as the priority indicating it is Tier 1. Many organizations refer to things like that to indicate “if this fails on the weekend or late at night the on-call person better get it running again.” While our Qlik Replicate tasks run on-premise, this is Qlik Analytics in the cloud thus SaaS is the deployment type.

SQL alter session set query_tag = ‘{ “business_unit”: “Sales”,   “project”: “Quarterly Achievement”,   “environment”: “Production”,   “process”: “Extract”,   “priority”: “T1”,   “application”: “Qlik Sense”,   “deployment”: “SaaS”,   “owner”: “QlikDork” }’ !EXECUTE_NON_SELECT_QUERY;

Traceability

In many enterprise environments one extraction application is utilized to extract many different things and places them in a QVD library. Feel free to utilize additional SQL Alter Session commands as needed to be as clear as possible. We don’t want anything hidden.

LIB CONNECT TO ‘Snowflake Connection Name’;
SQL alter session set query_tag = ‘tag for Finance data extracts’ !EXECUTE_NON_SELECT_QUERY;

FinanceTable1:
FinanceTable2:

SQL alter session set query_tag = ‘tag for HR data extracts’ !EXECUTE_NON_SELECT_QUERY;

HRTable1:
HRTable2:

SQL alter session set query_tag = ‘tag for Sales data extracts’ !EXECUTE_NON_SELECT_QUERY;

SalesTable1:
SalesTable2:

For your process value you might be tempted to always use EXTRACT since Qlik Sense is always reading data. However, since the point of using the Query Tags is to add traceability feel free to be precise with values like FULL EXTRACT or INCREMENTAL EXTRACT. In addition if you utilize the On Demand Application Generation (ODAG) feature or Dynamic Views, consider specifying that in your query tag LIVE EXTRACT or ODAG EXTRACT or … whatever term you need to call that out so that your organization can clearly understand the difference in load for in-memory applications versus these on-demand technologies.

Summary

This is absolutely additional work above your existing tasks. However, your taking this time to ensure precise query tags makes your invisible work visible. It adds strategic value to your otherwise technical work. I see you … and you do matter.

Comments are closed.