Visualizing Data at REST

My boy Sir Isaac Newton is famous for a few laws he wrote about motion. His first such law on the topic says:

An object at rest will remain at rest unless acted on by an unbalanced force. An object in motion continues in motion with the same speed and in the same direction unless acted upon by an unbalanced force

Apparently back in his day motion was not only a big deal but it was also apparently out of control and needed to have some laws written to govern it. If I lived back then I would have been mad about apples falling on my head as well and probably would have come up with some pretty awesome laws of my own.

But the Qlik Dork is living in a day and age where 0’s and 1’s are falling out of the sky and totally are totally out of control. Not sure there is any governing body to prevent me from doing so therefore I figured it was time for me dictate a few Laws of Data:

Qlik Dork’s First Law of Data – “Data at REST is stupid. Put that data into motion by visualizing it and your company will pick up momentum.”

Qlik Dork’s Second Law of Data – “Data at REST is expensive. Put that data into motion by visualizing it and the data will start paying for itself.”

Qlik Dork’s Third Law of Data – “Data at REST is useless. Unless you expect an 8’th day of the week to be added to the calendar I’ve got bad news for you … your ‘Someday’ won’get here. So start making use of your data ‘Today.'”


Big Data is kind of old news. How old? Even I jumped on the bandwagon all the way back in September of 2016. In this post I’m not going to talk about Big Data instead as the heading suggests I’m going to write about BIGger DATA.

Marketing slides would say that Qlik has 10 points of integration with Cloudera. Woo-hoo. You can actually get the value out of all that Big Data with Qlik. But those of you reading this blog aren’t novices. You already know that Qlik is a data ingesting beast and already knew that Big Data is simply another source of data. Qlik’s Associative Model will gladly allow you to associative your 0’s and 1’s from cocktail napkins, spreadsheets, flat files, databases, EDW’s and of course Big Data sources.

Kind of crazy that I can actually make light of that point, but let’s face we already know that. Nothing new to look at there. Where I’m going today is beyond even that. I’m suggesting that the data Cloudera collects about your ingestion of Big Data is also a source of meaningful data which is now just sitting at rest but can and should be visualized so you can get value out of it as well. In other words BIGger DATA.

Don’t laugh to hard but my use of the uppercase when using the word REST isn’t accidental. It’s intentional. What I’m driving at this post is using the Qlik REST data connector to take advantage of all of the data sitting there, at rest, about what you are doing with your Big Data. Sneaky huh??? That’s how I roll.

Visualizing Data at REST

Let me lay the ground work for what I’m talking about. Cloudera provides a nice Cloudera Manager tool that you can utilize to see a lot of different things about your implementation. I can see what’s going across the system.

I can also drill into different systems to see what’s going on with them specifically.

There is data behind those things. Meaningful. Important. Useful. Data. Visualizing all of that wonderful information inside Qlik provides you the ability to get an overall and subsystem visual as well:

No big deal you say? Happy to click around and hunt and peck to find everything are you? Well how would you find any issues in any of the systems that have to do with memory or see their history? Well in Qlik I would simply use the Smart Search feature to find anything that has to do with the term “memory” but that’s just me.

What about the metadata collected about the queries that are used to pull the Big Data? Doesn’t have value? OF COURSE IT DOES so you know I want to visualize that as well. Not only can we show you which queries were fired and how many times they were fired, we can show a distribution plot of how long the queries took each time. Glad I was the guy who selected “Hello World” in 18 ms and not 5.5 minutes. Hate to be that guy.

More importantly I’d hate to be the team that is just leaving all this data lay at rest on your system. If you don’t understand why… see my 3 Laws of Data above.

Using REST to get your data at rest about your Big Data usage

I’d like to begin this section by sharing that all of the real work in coding that you will see was done by my buddy, and Cloudera implementation stud David Frericks. Yes, there are others at Qlik that are bigger data dorks than I. And David is the guy the Qlik Dork goes to.

Did you even know that Cloudera had a fully supported set of REST API’s that could be used to pull this wonderful data that is only resting as of now? Guessing the answer is no or it would be redundant to write the question. You can check it all out right here.

Let me walk you through a very simple REST API call. Let’s say we want to get all of the Impala queries that were run on the system. There is a REST API for that.

http://your_VM_IP_here:7180/api/v15/clusters/Cloudera QuickStart/services/impala/impalaQueries?from=2017-01-01&limit=1000&offset=0

We would implement that call using the Qlik Rest Connector like so. Notice that I’ve filled in my Cloudera system IP address and put my Cloudera system credentials in.

That will then allow me to see what that REST API call will surface in terms of data.

I say heck yeah, go get that for me … and boom it loads that data. It’s important to understand what’s happening behind the scenes at this point. Script is written that will execute and process that call.

Phooey you say? TMI you say?

No my friends here is where the rubber meets the road … with script you can overcome any kind of REST API barriers that may exist with Cloudera or with others … say those pesky times when they implement a MAXIMUM number of rows being returned.

Qlik can loop and get all of the rows for you. But wait there’s more.

What about those times where you have 1 REST API call that returns some key information and you then need to go get the details using another REST API call? Or when you have multiple systems implemented and want to get all of the information about the systems, the calls that were fired, their history, their blah-blah-blah-blah. Yeah Qlik let’s you do all of that because of it’s ability to do the ETL on the fly.

When I started the section with a shout out to my friend David Frericks it wasn’t in jest. Dude provided a serious baseline of work in making the Cloudera REST API’s fly information and has produced some great work.

Follow up

“Hope isn’t a Strategy.” I read that recently in a book called “Wintality” by Baylor Barbee and that it was an awesome quote to summarize what I see happening right now in most people when it comes to big data. Lot’s of HOPE. They are hoping that the time will come when they will learn, understand, begin diving into “Big Data.” I’m assuming that you are reading this post because you are the kind that is more action oriented and you have or are formulating you strategy for visualizing your Big Data and hopefully now your Big’ger’ Data as I’ve laid it out for you.

Be sure to check out to see some great examples of how Qlik brings Big Data to life.

Hungry for even more? Want to see a phenomenal example of how Qlik can call SOLR Search on the fly using the REST API, ingest the data and produce a web application using the Qlik Sense Visualization API calls then check this out.



Posted in Data Science / Big Data | Tagged , , , , | Leave a comment

The World “is” Flat

Oh sure those nerdy science types will give you explanations and “supposed” evidence that the world isn’t round. Blah-blah-blah.

But let’s face it … we are humans and the way our brains work it’s simply easier to see things in 2 dimensions. So unless you have a new fangled holographic imaging systems even a sphere shaped globe appears to you as 2 dimensions.

Oh sure those nerdy data visualization types will tell you that your eyes interpret shades of light and dark as distance  … but let’s face it … that just makes the situation worse for us. People fool us in paintings with shades of light/dark into thinking something is 3 dimensional when in fact it isn’t.

Solid Logical Argument for the World being Flat

Don’t trust them. I propose that in our weakened human state of brain power it’s easier if we just get our maps on a piece of paper that is flat rather than some hologram. Hard to argue with solid logic right? If it’s easier … then it must be true.

Alright! You win. The world isn’t flat. But trust me it would be a whole lot easier if it was so when you get to the end of this post you are going to wish you had simply agreed with me.


Here is the problem not only is the world not flat, it isn’t round either. Did I mention that measuring it continually evolves as we get more and precise instrumentation? Meaning that as we gain technical proficiency are perception of the shape of the globe actually changes.

It gets better. That shape you think continents have … they change.

But wait there’s more … there is this thing called Continental Drift which means those great big things that you think are locked down, are actually moving.

Where are you?

All this poses a serious issue for us … because a lot of how we define ourselves has to do with “where we are.” Tell me where you are so that I can come and verify you really exist.

Seriously! Go for it. Where are you?

You see my point yet? How are you even going to tell me where you are in a world that is ever changing it’s shape and position?

Enter Latitude and Longitude

Oh look there is smarty pants in Illinois pulling out his phone right now and he says he is at X Latitude and Y Longitude. Super duper. That helps me a lot if you can tell me what Datum was used to calculate the Latitude and Longitude you just gave me.

Because if you are just telling me a Lat/Long value it doesn’t really help me reproduce where you are to verify that you exist. I’ve got to be using the same system.

Never heard the word “datum” before? Yeah me neither until I recently ventured into this whole geo analytics stuff. But don’t worry you can find all you ever wanted to know by clicking here to read this great resource. It only took me 3-4 reads before I could pronounce most of the words so feel free to give it a shot.

Precision Matters

Once you are comfortable telling me your Latitude and Longitude and can tell me if that was determined by the NAD27, NAD83 or WGS84 Datum I can get close to your location. Close to proving you exist may or may not be good enough though.

Ok let’s be real … none of you need me to come and prove you exist. But let me ask you this “Do you care if you can find that address you are trying to get to?” Because Latitude and Longitude can actually come in 2 flavors: Centroid and Polygon.

No. For real. It’s not as simple as you think. Take for instance a location that is part of some giant outdoor shopping mall, or a giant apartment complex. As you read about Centroid and Polygon calculations you will discover that some times the lat/long of an address is calculated as a guesstimate based on the entire range of addresses. (For your continued reading pleasure as you dive into Geo Analytics be sure to favorite this awesome GIS Dictionary site thanks to ESRI.)

If you check out public sites like you will find that they provide you the latitude and longitude in both flavors.

Get the “point”

You get the pun there? It would be funnier if you could see my facial expressions as I said “Get  the point” but hey this is a written blog and I haven’t started my Dork Casts video channel yet.

Regardless of flat/round/sphere/ellipsoid … there are a lot, and I mean a lot, of publicly available data sets out there and frankly I want you to tap into as many as you can. In the embedded video you will see that I find one such site and say to myself “Gee I’d like to visualize this data.” It’s a “Shapefile” that contains the polygons (shapes) as well as the data that I need for coloring.

Here is where the Qlik GeoAnalytics applications comes in really handy … I can load the shapefile in it’s ZIP format and ingest the data and the shapes and get right to visualizing the data. Oh yeah, I actually walk you through all that Datum stuff in the video as well so perhaps it will help you make sense of it.

Proving you exist by your location is kind of a joke. But visualizing the world around you is serious business. If you are interested in population health like me then you likely want to tap into all sorts of Social Determinants of Health. But even if you are using your analytics/visualization skills in a totally different field … the point is there is geo data out there … just waiting for you to explore.

Posted in Geo Analytics | Tagged , , , , , | Leave a comment

Removing the clutter

Ask any Data Visualization expert and one of the best pieces of advice that they will give you is “remove the clutter” so that your “data can tell the story.” Would it be going to far if I suggested that perhaps you might even want to remove the underlying map when you are doing Geo Analytics?

I recently came across a post from a data visualization “guru” that I follow named Ken Flerlage. His post was entitled Visualizing Earthquakes.  As always Ken’s post was very clean and informative and could be printed as an “infographic” with little additional work.

Normally I would press “like” and move on, but for some reason I was very intrigued with something that caught my eye. So unlike the multitude of times in the past I felt like I “had” to get involved so I pulled the data set down myself and starting playing with the data using Qlik GeoAnalytics.

North and South America

The first thing that caught my eye with Ken’s image and that jumped off the screen when I visualized it was that as large an area as North and South America are it’s crazy to me that the fault lines clearly lie along the Pacific Ocean side of both continents. Hard to miss isn’t it.

In fact, here is where the title for the blog post jumped into my head … Did I even need the map? Did the map add value? Or did the map in this case actually detract from the story that might jump into you mind if it wasn’t even there?

So I decided to experiment and I removed the map. You tell me. Does the map need to be there, or is there a pretty cool story that lies in the data itself?

Hopefully the question in your head is … Are there other areas in the world where the distinction is so incredibly clear cut?



What I found was a resounding … Yes there are. If you knew that the following image was for another continent could you guess?

If you guessed Antarctica, you might be geographically challenged. I’m guessing that you nailed it immediately … Africa.

Very clean northern and eastern border lines with very few points on the western edges of what is a gigantic continent.

With “normal” dashboards you probably already knew that removing the clutter was an important aspect of really telling a visual story with your data. Would you have ever imagined when you started reading and I suggested that removing the map might actually be a good thing to do?

But wait! The story gets more interesting and why I believe data visualization is so intriguing. Yes less is more, but sometimes more is more as well. There are times where visualizing the same thing in multiple ways can make a huge impact. You see while you could take the image above and lay it right down along the continent of Africa on a map the fact is that the points are inland from the coast. Yet they follow the coastline unbelievably well, wouldn’t you say? And just like North and South America, the earthquakes spare the entire western edge of Africa almost entirely.

Infographics vs Analytics

I can’t recall in the 18 months of doing this blog ever explaining the difference between visual analytics and infographic. This is a great time. An infographic is meant to convey a story from the authors perspective. Visual for sure. But also static. If you read Ken’s post you will see that he goes the extra mile in conveying information even about the sizing of the points. For my analysis I simply used a linear scaling of the point sizes based on the magnitude of the quakes. Ken does a great job of visual helping readers understand that the points should really range a gigantic amount in size. Informative. For my “analytics” I only cared about helping the end user doing the analytics realize that some points were 2.0 and others were 8.5. So I used scale and color but I didn’t go the “extra mile” of “informing with detail” as Ken did. I’ve pulled some images and shared the “story” that I thought was intriguing but the application is very much one that is for analytics.

If I wanted to go the extra mile and I had research to back my “coastline” theories I could certainly turn it into an infographic, but that’s not what I do. That extra mile is what reporters and people like Ken do when documenting the story they want to share visually.  The infographic is a form of data visualization intended to inform and answer questions, while analytics are intended to answer questions you may have and also prompt new ones.

For instance in my application you would see the following image and you could immediately count 8 to answer the question “how many earthquakes has Australia had?” But more importantly hopefully spur you to ask “Why on a continent so large have they only had 8 earthquakes, nearly all along the southern edge, and yet nearly all of the surrounding area is covered with earthquakes.

While many online infographics are becoming interactive to some degree, analytics would enable you to filter to what you want. “Show me only earthquakes over 6.5 in magnitude” “How many square miles were affected by earthquakes in the USA?” “How much money did earthquakes cost from 1990-2000 vs 2000 – 2010?” “Which decade from 1900 until present had the most earthquakes?” “If you added the magnitudes together what would a line chart look like over time?”

In this case that time component might have it’s own story to share. So of course you would want to see earthquakes animated over time. In this video I cover quickly the same points above, but I also video the animation. In one screen I animate the decades from 1900 until current and the points are sized and colored based on magnitude. Again only using a linear scaling, not a logarithmic scale as the point sizes would be so big for some they would hide others. Each decade simply displays the earthquakes for that decade. I then do the same animation, but instead I color the points based on the decade. 1900 is a light yellow, while 2010-current is a deep red. The goal for this is that I can then aggregate the earthquake points starting in 1900 and you can tell where the points are for 1910-1920 as they are added and so on.

Analytics would then involve running the same things but coloring the points based on the dollar values that were involved. Or the number of deaths inflicted. Always, always, always searching for ways to visualize in the best way for the data itself to tell the story and help you answer the next question you have.

Popcorn Time

As you enjoy some popcorn while watching this short video of the animation take the time to ponder a philosophical question: What clutter can you remove from your life so that “your personal” story shines brighter?



Posted in Geo Analytics, Visualization | 1 Comment

What in the world were you thinking???

datasciencehumanmindI can tell you that as the father of two daughters, the grandfather of 7 and a 20 year veteran coach/instructor for thousands of adolescent female athletes I’ve probably said “What in the world were you thinking” at least a thousand times. You know what I mean … children so often do things that just completely defy all logic or known thought processes.

The irony is that as adults we say this mostly in gest as we roll our eyes. All the while knowing full well the problem wasn’t what they thought but the fact that they didn’t think. They simply allowed themselves to be distracted by something else.

Two years ago I began this blogging journey and I’ve greatly enjoyed every minute of research, every post and every conversation that was sparked about Data Visualization topics. But I have to be honest watching the battle of hype versus hope unravel right before my eyes on the Data Science and Big Data fronts has kind of driven me crazy. So as this blogging journey is about me, I find that I need to begin at least intermixing what I’m learning and feel about Data Science and Big Data in with my posts on Data Visualization.

The American Recovery and Reinvestment Act of 2009 pushed $20 Billion into data producing factories in the form of EHR systems. Unlike the common myth data storage isn’t cheap. You need bigger data centers, with more racks of disks, which require more power, which require more cooling, which requires more backups, more network bandwidth both internally and externally for redundancy and  require more staff to manage the infrastructure. Ugh!

Not really sure what they were thinking. To my knowledge real factories don’t produce goods that can’t be consumed. Yet here many of you sit 7 years later with data centers full of unused 0’s and 1’s. Producing them at a frantic pace, but doing nothing with them. Because the push was to collect data but there was no plan on how to utilize the data.

Data Science


Over the past several years I have spent a great many hours consuming free training about Data Science via Coursera. Why would I read “Data Science for dummies” when geniuses like Roger Peng and Jeffrey Leek of Johns Hopkins are teaching Data Science courses. Free courses! Free courses that I can take from the comfort of my own sofa I should add. When they recently authored Executive Data Science – A guide to training and managing the best Data Scientists, I figured I could afford to pay for their book since I had already MOOChed off of their expertise so much. I bring up their book because they had a profound concept that you may want to write in permanent ink on your monitor … “The key word in data science is not data, it is science. Data science is only useful when the data are used to answer a question. That is the science part of the equation.”

No wonder these guys are professors at Johns Hopkins. Seriously as I start this series on Big Data and Data Science I wanted to ensure that we are all on the same footing. As I refer to the term “Data Science” it’s always, always, always going to be in regards to applying science to data to answer some business question.

Data Science, like anything new, has been greatly over hyped for sure. Many businesses jumped in with both feet and lots of money praying that they would magically uncover a “Beer and Diapers” or “predicting pregnancy” story of their own that would help their company make a billion dollars in the following quarter. What in the world were they thinking? Data science isn’t black magic that you just conjure up answers with … it’s science. It follows scientific principles. It takes discipline.

Unfortunately due to some of the expected failures due to a lack of using reasoning, many, many more are sitting on the sidelines watching their business lose money hand over fist ignoring the fact that data science is available. They don’t understand how data science works so they simply ignore it instead. What in the world are they thinking?

Is data science for everyone? Of course not.  But tucking your head in the sand while other companies use it as a competitive asset just isn’t a good business practice. You want to separate “hype” from “hope” so you know if it is right for you then start with “What is the question I am trying to answer with data.” Follow that up with “Do I even have the data I need to answer it?” If the answer to both is yes, then allow the science to lead you to the answers that is hidden in your data.

Big Data

One of the reasons for so many dashed hopes and dreams is that some organizations starting building massive data lakes thinking the more data I have the better the answers I can get. They had no business questions in mind they just figured if they assembled enough data files together on disk drives that problems would somehow solve themselves. Quite simply they ignored the science and focused on the data. I don’t want you to make the same mistake.

If you abigdatawordcloudre going to undertake anything new like Data Science or Big Data you have to understand that major changes like this require organizational change as well. They aren’t just a technical matter.  If you are going to go with a Big Data solution then for goodness sakes please start by following sound advice like that found in Benjamin Bowen’s book titled Hadoop Operations. He makes it clear that organizations must combine three facets of strategy: Technical, Organizational and Cultural.

The difficulty for many who have succeeded in Analytics but are afraid to jump into Big Data is the simple fact that it’s hard for many to truly understand what Big Data really is. I can’t blame someone for not wanting to invest in something that they can’t understand. At least “science” is a word that people can relate to and that’s why Peng/Leek focused on their phrase immediately as they began their book. It gives you a point of reference.

Unfortunately Big Data is an entirely different beast. I wish I could write something profound like “The most important word in Big Data is big” or “The most important words in Big Data” is data. To help you focus. But the truth is the most important word in “Big Data” is neither big, nor data. The most important word to describe it is actually a set of the 3 words: Volume, Veracity and Variety. However the hard part even for the Qlik Dork to explain is that none of them alone explain the concept and you need to refer to them in combination and here is why:

Volume – Just because your organization has Gajigbytes of data doesn’t mean you need to turn to Big Data. Relational database systems, especially Teradata, can be grown to be as large as you will ever need so it’s not just volume that forces the issue.

Velocity – Simply means the speed with which the data is coming. There are all sorts of interfaces that handle rapidly moving data traffic so again, that alone doesn’t constitute a need for Big Data.

Variety – In the context of the Big Data field it is most often used to refer to the differences between structured and unstructured data. Unstructured data would be things like documents, videos, sound recordings etc. Don’t let me shock you when I say this but “I was storing those things into SQL Server 20 years ago as BLOB’s (binary large objects.)” So guess what, again this “variety” by itself isn’t what big data is about.

So what then is Big Data? It is a combination of all 3 of those things and oh by the way you also need to include business components like time and money. Big Data is centered around the fact that you can use commodity hardware including much cheaper disks than you would typically use for large Storage Area Network (SAN) disk infrastructure. The reason that it is typically considered “faster” in terms of storage is that it doesn’t deal with transactions and rows it simply deals with big old blocks of data so massive files are a breeze to store. The fact that it is block/file oriented means it doesn’t really matter what you throw at it. A stack of CSV or XLS or XML files, a bunch of streaming video or HL7 or sound no problem. You throw and go.

So you can store a wide variety of data, quicker and at less cost than you would using a traditional RDBMS type system. Bonus is also the time savings because nobody in IT really needs to be involved in the process once the infrastructure is put in place. You can have data available and within no time your analysts or your data scientists can begin consuming the data. No requirements documents. No prioritization process. No planning meetings. Very little overhead. And oh by the way it allows the business to actually own the process of solving the problems that they business has. Crazy concept I know.


Enough of my musing, let’s just get down to a few practical examples.

Vaccinations and Side Effects

This week I met two of the most wonderful young Data Scientists. Liam Watson and Misti Vogt just graduated from Cal State Fullerton and delivered a presentation at the Teradata Conference in Atlanta, Georgia on a phenomenal use for data regarding the side effects of vaccinations. In the coming weeks I will be presenting their research and application, but I wanted to quickly plant a seed regarding their work that I think makes an excellent pitch for those of you who may be on the fence about proceeding with Data Science or Big Data.

Much of the “science” of what they did revolved around data that parents completed to report side effects after getting their child vaccinated. The form, like so many in the healthcare and other industries is a typical check this box for this condition, check that box for that condition … Other (Please type in) kind of thing. The check boxes would be considered structured data. The “other” would certainly be considered unstructured 0’s and 1’s that get manufactured in our EHR factories and left to accumulate dust.


If these two used Static Reporting they would have had no choice but to simply ignore the “other” category and count up how many of A, B, C, D or E were checked. But let’s face it if these two were ordinary I wouldn’t be talking about them. Instead they chose the path of using Data Science (which says you can’t leave data behind just because it doesn’t fit your simple report query model and isn’t clean) and they needed to use Big Data because it provides them with so many wonderful text analytics functions.

What they uncovered was that White Blood Cell Disorder which came from the hand input “Other” text box was the third highest side effect. To me that’s like gold. It’s a discovery that quite simply would be overlooked in a traditional environment because it didn’t fit the “we can only deal with structured data mold.”

There is a lot of time and effort expended in tracking physicians and beating them over the heads if they don’t sign off on documentation in a timely manner. I certainly understand that without their signature the organization doesn’t get paid. But I can’t help but wonder what gold may be lying in the textual notes that physicians dictate daily. Don’t believe your organization is ready for Data Science and Big Data to mine for that gold? Not sure what you are thinking.


I recently recorded a video showcasing a stunning use of Data Science and Big Data that was created by two of Qlik’s partners, Bardess Group and Cloudera. The application demonstrate the impact that accumulating data quickly from a wide variety of sources like weather, flights, mosquito populations, suspected and reported Zika infections and supply chain data could have when brought to bear on a problem like Zika.

Right now most organizations are still struggling to understand their own costs and understand their own clinical variances. Move to a population health model? Unthinkable for them as they can’t produce the static reports nor consume them fast enough to understand their own patients, let alone begin consuming data from payers, the census bureau etc.

As you watch the video and you hear the variety of data sources involved in the Zika demo, imagine the time and energy that would have to go into a project to do the same thing in a traditional way. As much as I “like” the work they’ve done to help with the Zika virus issue (and the work is continuing with aid agencies and hospitals), I “love, love, love” the use case it makes for the healthcare world that we need to embrace Data Science and Big Data not run from it because neither fits our current working models.


Blaise Pascal, the 17th century mathematician, once wrote “People almost invariably arrive at their beliefs not on the basis of proof, but on the basis of what they find attractive.” We have science that can help us find truth in data and yet we continue to perpetuate treatment plans based on myths and heresay.

We know our current organizational structures are failing to keep pace with the onslaught of changes and the amounts of data we are generating. But instead of changing to grow cultures that are more data fluent organizations are converting employees to 2×2 cubes so that they can “collaborate” more. No more data is being consumed but at least the status quo is maintained and employees now get to hear endless conversations with spouses and children.

Would I be wrong if I guessed that your organization has a backlog of hundreds of reports, while the previous 10,000 are seldom even if read? What if I guessed that the morale of the report writers is at an all time low because new requests are far outpacing their ability to generate them?

In his book Big Data for Executives author David Macfie puts it pretty eloquently “In a traditional system the data is always getting to you after the event. With Data Science/Big Data the goal is to get the information into your hands before the event occurs.” Put simply static reporting and traditional processes simply aren’t designed to handle the crisis of overrun data centers. I’m not sure what in the world organizations are thinking that are doubling down on static reports.

To be honest I’m not entirely sure what in the world I was thinking taking so long to write this as my thoughts have been bubbling up for so long. If you have yet to actually begin researching or are among those burying your head in the sand and ignoring Data Science and Big Data then you know what is coming … What in the world are you thinking?

Posted in Data Science / Big Data | Tagged , , , , | 1 Comment

Visualizing Data that does not exist … aka Readmissions Dashboard

Many who make requests seem to have a belief that Business Intelligence is magic. They loose their ability to listen to logic and reason and simply ask you to do the impossible.


Pulling data from 18 different sources, many of which that you don’t even have access to. Childs play like pulling a rabbit from a hat.

Turning bad into good and interpreting the meaning of the data. A little tougher kind of like making your stunning assistant float in midair.

Creating a readmissions dashboard. Hey we aren’t Houdini.

That data doesn’t even really exist. Oh sure it exists in the minds of the people who want you to produce it out of thin air, but I’ve yet to see a single Electronic Health Record that stored readmission data. They only store admission data, not RE-admission data.

Patient Name Admission Date Discharge Date
John Doe 1/1/2016 1/4/2016
John Doe 1/7/2016 1/10/2016
John Doe 1/30/2016 2/4/2016

Those who want dashboards for Readmissions look at data like the above and talk to you like you are insane because in their minds it is clear as day that John Doe was readmitted on 1/7, 3 days after their first visit, and was then readmitted again on 1/30, 20 days after his second visit.

You try to explain to them that there is nothing in any of those rows of data that says that. They have filled in the missing data in their minds but in reality it doesn’t exist in the EHR. They respond with all you need to do is have the “report” do the same thing and compare the admission date to the discharge date for subsequent visits. You respond with “Let’s say I could make SQL which is a row based tool magically compare rows, what should I do about the following which is more like the real data?”

Patient Name Admission Date Discharge Date Patient Type
John Doe 1/1/2016 1/4/2016 Inpatient
John Doe 1/7/2016 1/10/2016 Outpatient
John Doe 1/30/2016 2/4/2016 Inpatient

They say “Oh that’s easy, when you get to the visit on 1/30 just skip the visit from 1/7 because it’s an outpatient row and we don’t really care about those and compare the 1/30 admission to the 1/4 discharge.” To which you respond “Well that’s easy enough now I’ll not only somehow make SQL which can’t compare rows magically try to compare rows and if it is an outpatient row I’ll tell SQL to skip it and compare it to something 2 rows above, or maybe 3 rows above or 10 rows above.”

Just then you remember the reality is more complicated than that. In reality you aren’t just comparing all inpatient visits (other than for fun) what you really care about are if the visits were for the same core diagnosis or not.

Enc ID Patient Name Admission Date Discharge Date Patient Type Diagnosis
1 John Doe 1/1/2016 1/4/2016 Inpatient COPD
2 John Doe 1/7/2016 1/10/2016 Outpatient Stubbed toe
3 John Doe 1/30/2016 2/4/2016 Inpatient Heart Failure
4 John Doe 2/6/2016 2/10/2016 Inpatient COPD
5 John Doe 2/11/2016 2/16/2016 Inpatient Heart Failure

You don’t want to compare the 1/30 visit to the 1/4 discharge because the diagnosis aren’t the same you only want to compare the 2/6 visit to the 1/4 discharge and you need to compare the 2/11 visit with the 2/4 discharge.

If you think this is like making a 747 disappear before a crowd of people on all sides, just wait it gets worse.

Not only does the EHR not include the “readmission” flags, it doesn’t really tell you what core diagnosis the visit should count as. Instead what they really store is a table of 15-25 diagnosis codes

Enc ID ICD9_1 ICD9_2 ICD9_3 ICD9_4 ICD9_…. ICD9_25
1 491.1 023.2 33.5 V16.9 37.52

Good thing for your company you used to be a medical coder so you actually understand what the mysterious ICD9 or ICD10 codes stand for. You know for instance that the 491.1 really means “Mucopurulent chronic bronchitis.” It would be nice if that correlated directly to saying “This patient visit is for COPD.” But since we are uncovering magic why not explain the whole trick. You see if the primary diagnosis code is any of the following:

491.1, 491.20, 491.21, 491.22, 491.8, 491.9, 492.0, 492.8, 493.20, 493.21, 493.22, 494.0, 494.1, 496

 Then the visit may be the result of COPD but you also have to check all of the other diagnosis codes and ensure that none of them contain any of the following other diagnosis codes:

33.51, 33.52, 37.51, 37.52, 37.53, 37.54, 37.62, 37.63′, 33.50, 33.6, 50.51, 50.59, 52.80, 52.82, 55.69′,’196.0, 196.1, 196.2, 196.3, 196.5, 196.6, 196.8, 196.9, 197.0, 197.1, 197.2, 197.3, 197.4, 197.5, 197.6, 197.7, 197.8, 198.0, 198.1, 198.2, 198.3, 198.4, 198.5, 198.6, 198.7, 198.81, 198.82, 198.89, 203.02, 203.12, 203.82, 204.02, 204.12, 204.22, 204.82, 204.92, 205.02, 205.12, 205.22, 205.82, 205.92, 206.02, 206.12, 206.22, 206.82, 206.92, 207.02, 207.12, 207.22, 207.82, 208.02, 208.12, 208.22, 208.82, 208.92, 480.3, 480.8, 996.80, 996.81, 996.82, 996.83, 996.84, 996.85, 996.86, 996.87, 996.89, V42.0, V42.1, V42.4, V42.6, V42.7, V42.81, V42.82, V42.83, V42.84, V42.89, V42.9, V43.21, V46.11

If you have ever been asked to produce a Readmissions Dashboard you probably understand why I’ve correlated this to magic. Every time you think you know how to grab the rabbit by the ears to accomplish the trick, the rabbit changes into an elephant.

Fortunately your assistant isn’t the traditional 6 foot blonde, your assistant is Qlik. I’m going to explain how to make the 747 disappear in three easy steps that any of you will be able to reproduce:

Step 1

The heavy lifting for this trick actually involves the ICD9/10 codes. If you combine the 15-25 diagnosis codes into 1 field, then you you can use it to more easily compare the values to determine what core diagnosis you need to assign to each encounter. Qlik helps you accomplish that with simple concatenation as you are loading your encounter diagnosis data:

ICD9_Diagnoses_1 & ‘, ‘ & ICD9_Diagnoses_2 & ‘, ‘ & ICD9_Diagnoses_3 & ‘, ‘ & ICD9_Diagnoses_4 & ‘, ‘ & ICD9_Diagnoses_5 & ‘, ‘ & ICD9_Diagnoses_6 & ‘, ‘ & ICD9_Diagnoses_7 & ‘, ‘ & ICD9_Diagnoses_8 & ‘, ‘ & ICD9_Diagnoses_9 & ‘, ‘ & ICD9_Diagnoses_10 & ‘, ‘ & ICD9_Diagnoses_11 &’, ‘ & ICD9_Diagnoses_12 & ‘, ‘ & ICD9_Diagnoses_13 & ‘, ‘ & ICD9_Diagnoses_14 & ‘, ‘ & ICD9_Diagnoses_15 as [All Diagnosis]

Step 2

One of the really nifty tricks that Qlik can perform in data loading is a preceeding load. A preceeding load simply means you have the ability to write code to refer to fields that don’t exist yet and won’t exist until the code is actually run. The following code is abbreviated slightly so that it’s easier to follow logically but the entire set of code is attached to the post so that you can download it. The “Load *” right below Encounters tells Qlik to load all of the other from the second load statement first, then come back and do the code below. This way we can construct the [All Diagnosis] field and refer to it within this code. You could repeat all of the logic for concatenating all of the fields for all 5-10 of the core diagnosis you want to track, or you could load the encounters and simply do a subsequent join load but you don’t have to. The Preceeding load makes your life easy and works super fast.


This is the preceeding load
Load *,
// If the primary matches then it’s possibly COPD and if the none of the other 14 are one of the values listed then it definitely is COPD
IF ( Match([ICD9 Diagnoses 1] , ‘491.1’, ‘491.20’ … ‘493.21’, ‘493.22’, ‘494.0’, ‘494.1’, ‘496’) > 0
And WildMatch([All Diagnosis], ‘*33.51*’, ‘*33.52*’, ‘*37.51*’ … ‘*V43.21*’, ‘*V46.11*’) = 0, ‘COPD’,
// If we found COPD great, otherwise we need to check for Sepsis
IF (Match ([ICD9 Diagnoses 1] , ‘003.1’, ‘027.0’, … ‘785.52’ ) > 0
And WildMatch([All Diagnosis], ‘*33.50*’, ‘*33.51*’ … ‘*V43.21*’, ‘*205.32*’) = 0, ‘Sepsis’,
‘Nothing’)) as [Core Diagnosis];

This is the regular load from the database or file
[ICD9 Diagnoses 1],
[ICD9 Diagnoses 2] …..

Step 3

The final step, which many believe to be the hardest is actually the easiest to do within Qlik. In fact truth be told when I was a young whipper snapper starting out on my Qlik journey I tried to do everything in SQL because I knew it so well, and did minimal ETL within Qlik itself until I found about this Qlik ETL function. The function is simply called “Previous.” It does exactly what it sounds like it … it allows you to look at the previous row of data. Seriously, while you are on row 2 you can check the value of a field on row 1. In practice it works just like this:

IF(MRN = Previous(MRN) …..

How cool is that? How do I use it for solving this readmissions magic trick? Just like this:

IF(MRN = Previous(MRN),’Yes’, ‘No’) as [Inpatient IsReadmission Flag],

If the MRN of the row I’m on now, is the same as the MRN of the previous row, then yes this is a readmission, otherwise no this is not a readmission it is a new patients first admission. Actually that’s the simplified version of my code.

My code actually thinks through how the results would need to be visualized. Besides an easy human language Yes/No flag someone is going to want to get a count of the readmissions right? Does the Qlik Dork want to have charts or expressions that would have to use IF statements to say if the flag = Yes, of course not. I want the ability to have field that is both human readable Yes/No, but also computer readable for counting 1/0. That’s where the magic of the DUAL function comes into play. It gives me a single field that can be used for both needs.

IF(MRN = Previous(MRN),Dual(‘Yes’, 1),Dual(‘No’,0)) as [Inpatient IsReadmission Flag],

Using the Dual data type allows me to provide the end user with a list box while also allowing me to provide very fast performing expressions:

Sum([Inpatient IsReadmission Flag])

How does the entire Readmissions load work? After loading the encounters, and allowing the preceeding load to qualify the encounters into core diagnosis types I simply do a self-join to the encounter table referring only to the inpatient records and ordering the data by the MRN and the Admission date and time.

Left Join (Encounters)
IF(MRN = Previous(MRN),Dual(‘Yes’, 1),Dual(‘No’,0)) as [Inpatient IsReadmission Flag],
IF(MRN = Previous(MRN),Previous([Discharge Dt/Tm])) as [Inpatient Previous Discharge Date],
IF(MRN = Previous(MRN),Previous(EncounterID)) as [Inpatient Previous EncounterID],
IF(MRN = Previous(MRN),NUM(Interval([Admit Dt/Tm]-Previous([Discharge Dt/Tm])),’#,##0.00′)) as [Inpatient Readmission Difference],
IF(MRN = Previous(MRN),IF(Interval([Admit Dt/Tm]-Previous([Discharge Dt/Tm])) <= 30.0, Dual(‘Yes’, 1),  Dual(‘No’,0)), Dual(‘No’,0)) as [Inpatient IsReadmission within 30]
Resident Encounters
Where [Patient Type] = ‘Inpatient’
Order by MRN, [Admit Dt/Tm];

If you are paying attention you’ll notice that the above is simply our “for fun” counts to show all inpatient readmissions and has nothing to do with any of the core diagnosis. In order to perform that trick I do the same basic steps but I enhance my where clause to only look for encounters that have a core diagnosis of COPD and I simply name my flags and other fields differently.

Left Join (Encounters)
IF(MRN = Previous(MRN),Dual(‘Yes’, 1),Dual(‘No’,0)) as [COPD IsReadmission Flag],
IF(MRN = Previous(MRN),Previous([Discharge Dt/Tm])) as [COPD Previous Discharge Date],
IF(MRN = Previous(MRN),Previous(EncounterID)) as [COPD Previous EncounterID],
IF(MRN = Previous(MRN),NUM(Interval([Admit Dt/Tm]-Previous([Discharge Dt/Tm])),’#,##0.00′)) as [COPD Readmission Difference],
IF(MRN = Previous(MRN),IF(Interval([Admit Dt/Tm]-Previous([Discharge Dt/Tm])) <= 30.0, Dual(‘Yes’, 1), Dual(‘No’,0)), Dual(‘No’,0)) as [COPD IsReadmission within 30],
IF(MRN = Previous(MRN),IF(Interval([Admit Dt/Tm]-Previous([Discharge Dt/Tm])) <= 90.0,’Yes’, ‘No’), ‘No’) as [COPD IsReadmission within 90]
Resident Encounters
Where [Patient Type] = ‘Inpatient’ and  [Core Diagnosis] = ‘COPD’
Order by MRN, [Admit Dt/Tm];

And just when you think I’ve pulled as much handkerchief out of my sleeve that it can possibly I hold I do the same steps for Sepsis this time.

Left Join (Encounters)
IF(MRN = Previous(MRN),Dual(‘Yes’, 1),Dual(‘No’,0)) as [Sepsis IsReadmission Flag],
IF(MRN = Previous(MRN),Previous([Discharge Dt/Tm])) as [Sepsis Previous Discharge Date],
IF(MRN = Previous(MRN),Previous(EncounterID)) as [Sepsis Previous EncounterID],
IF(MRN = Previous(MRN),NUM(Interval([Admit Dt/Tm]-Previous([Discharge Dt/Tm])),’#,##0.00′)) as [Sepsis Readmission Difference],
IF(MRN = Previous(MRN),IF(Interval([Admit Dt/Tm]-Previous([Discharge Dt/Tm])) <= 30.0,Dual(‘Yes’, 1), Dual(‘No’,0)), Dual(‘No’,0)) as [Sepsis IsReadmission within 30],
IF(MRN = Previous(MRN),IF(Interval([Admit Dt/Tm]-Previous([Discharge Dt/Tm])) <= 90.0,’Yes’, ‘No’), ‘No’) as [Sepsis IsReadmission within 90],
IF(MRN = Previous(MRN),IF(Interval([Admit Dt/Tm]-Previous([Discharge Dt/Tm])) <= 120.0,’Yes’, ‘No’), ‘No’) as [Sepsis IsReadmission within 120],
IF(MRN = Previous(MRN),IF(Interval([Admit Dt/Tm]-Previous([Discharge Dt/Tm])) > 120.0,’Yes’, ‘No’), ‘No’) as [Sepsis IsReadmission > 120]
Resident Encounters
Where [Patient Type] = ‘Inpatient’ and  [Core Diagnosis] = ‘Sepsis’
Order By MRN, [Admit Dt/Tm];

And then for AMI. And then for CHF. And then for … Oh you know the handkerchief can go on forever and eventually we end up with a data model that includes all of these awesome fields that didn’t exist when we began so that we can actually do our work.


Voila a Readmissions Dashboard

Not only can we then provide a really nice looking dashboard which includes accurate statistics we can do it using very simple expressions that are incredibly fast.



Click this link to get the entire Readmissions Code start script: ReadmissionsCodeScript

Posted in Visualization | Tagged , | 5 Comments

Have you ever wondered …

Have you ever wondered what events happen to patients after a particular surgery is performed?


Well I did. Like I seriously can’t sleep when I start wondering about things like that. I start believing crazy things like we can change the world by using analytics. What do you when you get crazy analytical questions in your head? Do you just let them go or do you dig and scratch and claw until you pull the data together and solve the puzzle?

In this case even though it’s just a hypothetical example for a blog post I still worked crazy hours setting up the data, building the application, filming the video and writing this post. Why? Because I think there is huge value in tracking not just the variances in costs and timing for individual procedures but in analyzing an entire series of events as well.

Notice I used the word “events” and not just “procedures.” Certainly it would be nice to know if having 1 procedure leads to another procedure in 75% of the cases for a physician. But wouldn’t it also be nice to know how often a procedure leads to a patient having a Code Blue? Or having to have a tube placed? You know … KEY MEDICAL EVENTS in a patients stay. Or … even their return after a stay?

Ok now that we are all agreed me working crazy hours to set this up is a valuable exercise let’s examine what I will demonstrate in my video.

  1. I use an Aster NPath SQL-MR query just like in a previous post to process a set of surgical event data that I’ve loaded.
  2. I also take advantage of Qlik’s ability to do some cool ETL things on the fly and I capture the First Event and the Last Event so that in the UI I can choose which procedure I want to start with or likewise in your world you could select the last event to occur and find the various paths that led to that preceded that event’s occurrence.
  3. While I was at it I also load in some sample patient demographic information to demonstrate that the advanced analytics you can do with Teradata Aster doesn’t have to be visualized in a vacuum. Of course you will want to take advantage of the Qlik Associative model and load data from as many sources as needed.
  4. The application consists of two basic screens. The first is a blah-blah-blah you can filter the data using demographic information and see the results of the NPath query visualized in a Sankey Diagram just like you would expect. The second screen is more a “Are you kidding me I didn’t know you could do Alternate States in Qlik Sense like you can in QlikView” kind of thing you would expect from a Qlik Dork. I demonstrate the ability to compare the event paths between different patient sets thanks to the great extensions built by Svetlin Simeonov.

I could have just shared the video but where is the fun in that I had to do a little creative setup so that you would understand what you were watching.

Do I think you are going to run right out and start building an application like this to analyze surgical events?

Of course I do. I’m a dreamer. I wouldn’t put this kind of effort into something if I didn’t believe it would spark an interest in at least a few of the readers to really start putting advanced analytics to work. Perhaps not for this specific situation but certainly there is some other big problem you’ve wanted to tackle that is like this. You have all of the pieces you need at your fingertips … so GET GOING!


Posted in Uncategorized | 1 Comment

Thousands, and Millions and Billions … Oh My!!!!!

When most people think of Qlik they think of our patented Qlik Indexing Engine having all of your data in memory. I love demonstrating the lightning fast speeds and responsiveness with hundreds of millions of rows of data. More and more recently though I’m getting the smiles mixed with “That’s awesome but can you handle billions of rows of data?”

C’mon really????? Billions of rows of data? Gosh that’s an awful lot of data. I’m afraid.

Just kidding even that much data doesn’t scare me.

In fact it thrills me.

Gives me goose bumps to think about the kind of decisions that can be made when that much data is made available to the analysts and the decision makers. It also provides an opportunity for me to discuss one of the least known features that Qlik offers. It’s called Direct Discovery and it allows you to consume even billions of rows of data.

Direct Discovery

Direct Discovery is a two step process. In step 1 Qlik reads enough information to allow the end user to select a cohort. Step 2 then uses the primary key information for that cohort to go back to the massive data store and read all of the details live.

Oh wait you want an example? More details? Well since you asked so nicely.

Typically with Qlik you would read all of your data from the source with a command like:

SQL Select {my fields} from {some table};

It would bring all of the data back, perform our Qlik magic on it to compress it in memory and you would be off to the races. With Direct Discovery the query is different and uses a different syntax. You start with something like this:






When the data load encounters that Qlik actually issues 2 separate commands to the source:

  1. Select distinct record_id
  2. Select distinct procedure

Why? Because it’s easier and faster of course. The data source only has to prepare a minimal amount of records. Your network only has to transmit a minimum amount of data. Finally Qlik only has to read a minimum amount of data.

The final part of the syntax would be something like:









from surgery_events;

The fields that you identify in the DETAIL section of the command are usable immediately within Qlik despite the fact that it doesn’t actually retrieve the data for them. You can see the field names in the data model viewer they just show as having 0 rows of data. You can see the fields in a field list. You can add the fields to charts. There just isn’t any real data for them. Yet anyway.

Your application is then designed to allow the end user to select a cohort using the DIMENSION fields in some way and then Qlik will go and retrieve the data live from the data source for that cohort.

I’ve had so much fun working with Teradata Aster lately that it only made sense for me to use my Teradata database as a data source. It provides a robust, high performance and highly reliability storage mechanism for those with massive amounts of data. In the video I use the command above to extract the dimensions, select a cohort of patients, then allow Qlik to extract the data live. Just for fun I also utilize the Aster Management Console to show you the commands that Teradata processes from Qlik to further solidify how it all works. Kind of the extra step you’d expect from me.

You want more don’t you?

The ferocious appetite in you to consume massive amounts of data wants more information doesn’t it? You can check out all of the details on the Qlik Sense help page for Direct Discovery:

The following post contains a fantastic PDF document explaining even more including some nifty variables you can use like the one I documented in the video:

Yes you can even use the Direct Discovery feature for cases where you want closer to real time information from smaller sets of data. You know those situations where you only have a few hundred million rows of data but you still need the functionality of pulling live rather than having pre-loaded all of the detailed data.

Posted in Training | Tagged , , , | 1 Comment

Visualizing Advanced Analytics

Advanced Analytics with Aster

I recently stumbled upon Teradata’s Aster and I’m pretty fired up. It turns out there is an entire community dedicated to helping data visualization people like myself learn how to implement advanced analytic functions. The site includes a link to download Aster Express free of charge and includes a slew of great training videos.

Click here to see the Teradata Aster Community

I can almost hear the Data Scientists reading this post laughing at me for just discovering that. Meanwhile all of the Data Visualization people stopped reading and have already clicked the link and started downloading.

Visualization with Qlik Sense

Well if you Data Scientists are so cool did you know that there is likewise an online community site dedicated to helping you learn how to visualize your super cool analytic results? Well did you? The Qlik Sense Community offers similar free downloads for the product as well a slew of great training videos.

Click here to see the Qlik Sense Community

Guess me and the other Data Visualization peeps get the last laugh after all.

Kidding, and sharing of links aside, this is a serious post about how Data Science and Data Visualization can be married through the partnership of Qlik Sense and Teradata Aster. They are an easy and natural fit. Why?

Because Aster uses an SQL’ish syntax they call SQL-MR. Qlik Sense can easily fire any native SQL-MR directly against Aster, retrieve the results and then visualize them. No need to build out views. No need to save the results into tables. Simply fire the SQL-MR queries directly as written.

By offering a complete set of Open API’s Qlik Sense provides developers around the world the ability to construct visualizations to enhance what is available natively in the product. Like what you ask? Well a Sankey for one thing so you can visualize paths. Network/Graphing objects for another so you can visualize networks. Like … oh go see for yourself at:

Click here to see the Qlik Sense Community for Extensions

For your viewing pleasure

I could write and write and write and bore you to tears … or … I could take advantage of this chance to show of my cool new Qlik Dork video stinger and demonstrate the functionality … visually.

In a mere 3:57  I take the pure NPath SQL-MR query that John Thuma demonstrated in the Aster training video series for bank web clicks data and I implement it inside of Qlik Sense. I then take the results and display them in the raw form and using a Sankey.

Wowed yet? Don’t be that’s just me getting warmed up. In a paltry 3:05 this second video demonstrates how you can modify the NPath query so that the results aren’t aggregated. Why wouldn’t I allow it to aggregate the million plus paths? So that I can tie the raw paths together with customer demographics information. Allowing you to then discover the paths for selected cohorts. No way!!!

Yes way. C’mon I’m the Qlik Dork of course I would go the extra step for you. I even utilize a mapping object to select customers from selected states. All while the Sankey diagram is being updated to show the paths that were returned from Aster based on the selections.

But wait! There’s more.

I know you are now fired up and you want more. Don’t worry my friends I’m just getting started down this path of marrying Data Science and Data Visualization. What can you expect next? Keep it a secret but given my background in healthcare it may just have something to do with utilizing an NPath SQL-MR query in Aster to analyze the events for surgical patients but you didn’t hear it from me. After all it’s not like I’m trying to actually help people do real world stuff like that.


Posted in Visualization | Tagged , , , | 2 Comments

To achieve, or not to achieve action

Portrait of William Shakespeare

Portrait of William Shakespeare

That is the question.

At least it’s the question that we in the business intelligence community should be focusing on. Why weave my title so closely to one of the most famous lines by William Shakespeare?

Simple. Our ability to drive actionable intelligence relies heavily on our ability to weave a story around the data insights that we have discovered.

Discovering that we have 10 serious issues in our company and having $5 in your pocket will get you a cup of coffee at Starbucks. But being able to share the information about even 1 of those issues in a way that leads to actual change will put such a spring in your step that coffee will be unneeded.

In her fantastic book “Storytelling with Data” author Cole Nussbaumer Knaflic introduces two great phrases which really brought about great clarification to me. Exploratory Analysis vs Explanatory Analysis.

Exploratory Analysis are the actions that we take to do data discovery. It’s the drilling around. Poking under the hood. Using our human intuition to question the data. And the lights that dawn as a result.

Explanatory Analysis on the other hand is the art of being able to use the data to communicate a story that helps induce actions from those that have the power to make them. It involves our ability to use one of the oldest forms of human communication, storytelling, that has sadly become a lost art.

Emotional Call to Action

Storytelling can involve some very in your face kind of messages as a way to ensure that leadership has a call to action. For example imagine that we’ve spent a few days consuming clinical and financial data using a dashboard similar to the following that has multiple linked screens that we utilized to find an issue with a particular set of selections.


We could hold a meeting and put leadership to sleep showing them how cool our ability to navigate is or we can simply lead with a slide like the following that grabs attention.


You probably don’t want to use humorous sarcasm in your presentation to point the finger at a group but I think it works for this post as you kind of expect it from me. The slide includes enough details to insight some action and by all means include the actions you want to see taken. Of course you may have to prove your details and that’s exactly why the Storytelling feature in Qlik Sense is so valuable you can jump in and out of your story to do demonstrate the exploratory analysis you have done to support the explanatory analysis you are using in the meeting.


Perhaps your data doesn’t really require such an emotional tug to ensure action is taken. Perhaps all you are trying to do is provide some narration to help draw attention to help explain the data.

Consider the following chart before and after adding a few narrative elements are added to help the audience focus on the important things:





As I share on my About page I am far from an expert on any of the things I write about. I’m reading. Learning. Growing. Every single day just like you with the help of many others in the industry. Data is my thing and I own that. But I will be honest and tell you that providing narration for my stories is not something that comes naturally to me.

In fact the key points above … yeah I stole them. Well not actually stole them so much as I copied them to the clipboard and pasted them into my storyboard from what I think is one of the coolest new elements of technology that I’ve seen in a long time. It’s a narration extension for Qlik Sense that you simply tell which chart you want it to consider and it does the narration for you. That is a serious help to someone like me who is trying to learn how to help my audience understand the data that I’m presenting to them.

The fact that Qlik chose to construct it’s architecture using an Open API and the fact that anyone who can code can gain access to the patented Qlik technology while adding value through their secret sauce is what makes it possible for a group like Narrative Science  who is blazing trails in the field of natural language to build such an awesome extension.

The following video will let you see the narrative science extension in action. If you are a Qlik customer you will get all of the instructions you need and can download this exciting new object from this download location that includes instructions on how to install and has it’s own video that demonstrates it’s powerful capabilities. .

To achieve, or not to achieve action

There was a day when all we had to do in our field was surface data. Yeah those days have long since passed. Our jobs now entail not only finding the needles in the data hay stacks but helping our leadership teams understand them so that they can take action. I challenge you today to grow not only in the field of Exploratory Analysis but also in the emerging field of Explanatory Analysis.

Become a storyteller.

Add narration to your charts rather than just pasting them into presentations because you think they look pretty.

Use your newly developed skills to “incite action” and effect real change in your organizations.

Finally quit being selfish and keeping my tips to yourselves. For crying out loud start sharing these pages with others.

Posted in Uncategorized | Tagged , , , , , | 2 Comments

A Bunch of Whiny Brats

Ever have one of those days where you feel like you are surrounded by a bunch of whiny brats?

No I’m not talking about your children (or grand children in my case.) I’m talking about your leadership team.

You beat your head against the wall to surface data from a cocktail napkin and merge it with 147 other data sources from database systems, Excel sheets and external data sources on the web and you make it work. You put all of the data into an amazing analytical application that is truly Functional Art that even Alberto Cairo would give you two thumbs up for. But without even so much as a pat on the back for the great job the first response is “We want something simpler. We already have Executive Portal can’t you just embed those charts into the site we are already have a link to?”

A bunch of whiny brats right. It’s just one more link to save to your favorites. It’s just one more application to learn. But noooooo they want to press the easy button because unlike you that has to learn 189 things per day to stay current they don’t want to change their delicate little processes.

Embedded Analytics

Well don’t be dismayed my friend there are whiny brats like that all over the world and the Qlik platform enables you to support them. I’m not joking. The Qlik API’s enable you to take the gorgeous work you’ve done and embed the KPI’s or charts directly into your existing portal and this quick 6 and a half minute video I show you exactly how to do that.

Ok now how could anyone could complain about this right? You can embed your genius analytical solutions right into the portal they use every day. You can embed Finance related data right into their Sharepoint page and it relates and allows interaction.

C’mon even your leadership team has to stand back in awe. Amazed at your skill and the innovation of Qlik’s platform to support that kind of functionality. Right?

Wrong! These are whiny little brats you are dealing with. Their first reactions are “That’s pretty nice but I don’t want to see the same 5 charts that Bob sees. I need to control my own dashboard because I’m the center of my universe.”

Are you kidding me??? They have access to key information on their mobile device from their executive portal and that isn’t enough?

No it’s not enough.

The reality is that your leadership team aren’t whiny little brats they are saavy business people who need to constantly push the threshold. They need access to the company data that has been kept from them for years. For crying out loud their mothers use Pinetrest everyday to “pin” recipes and come back to them whenever they want. Yet there you stand telling them that every time they want something added/removed from the portal they have to fill out a ticket request and wait for you to be the bottleneck in their accessing the information they need to do their job?

Self Service Dashboards

C’mon this is Qlik we are talking about. A company named by Forbes as one of the Top 10 Innovative Growth companies. Of course they can provide Self Service Dashboard capabilities. What do you think they are doing just helping you visualize data on your own workstation?

How simple can they make it? You know that Pinetrest site that has had “pins” pressed over 50 Billion times … yeah … they’ve made it that simple and in this short 4 minute video I’ve made it that simple for you to see how.

An Innovative Platform

“There are no dreams to large, no innovation unimaginable and no frontier’s beyond our reach.” – John S Herrington.

“There’s wa way to do it better – find it.” – Thomas Edison

Unless your leaders can consume it your companies data is not an asset it is a very expensive liability. Qlik is providing you a platform that allows only your mind to limit how you surface it. You have right now at your disposal the tools to surface your data via embedded analytics on your existing portals as well as allowing your staff to surface only the data they are actually interested in via their own personalized dashboards by simply “pinning” objects.

Just building data visualizations isn’t the answer. Presenting Actionable Intelligence in a way that can be consumed and acted upon is the goal. Now that you know what’s available it’s just a matter of whether you want to innovate the way data is consumed within your company or not.

Posted in Self Service, User Adoption | Tagged , | 4 Comments