Step by Step… Into Garbage Data Quality: My Smart Knee Experience
My Smart Knee
In December of 2023 I had a fight with a step ladder, and it won. As a result I’ve needed 3 surgeries and the most recent was a total knee replacement. Doesn’t sound fun, but for a data guy I was really excited that my surgeon was going to install a smart knee. No, not a bionic knee like the Six Million Dollar Man. Just like any other TKR device but with the added value of a blue tooth communication device in it that could report data on my progress. Visions of bar charts, and trend charts, and best practices for visualizing knee data danced through my head as the surgeon went to work.
Alas, I have no best practices in visualization to offer you. Because the data obtained is hot garbage.
In their book “Data Quality Fundamentals: A Practioner’s Guide to Building Trustworthy Data Pipelines” the authors used a phrase I will never forget:
“Data for the sake of data is as useful as a fish riding a bicycle.”
The irony of it is that I’m at the stage in my recovery where I’m supposed to start using a stationary bike to gain flexibility and motion. As I go through pain later today doing that, at least I will have a smile on my face from the image I generated for this post.
Freshness
One of the overlays in a Venn Diagram for Data Quality and Modern Data Stack Pipelines would be freshness. If data isn’t flowing then the data pipeline is clogged up, and we all know how bad it smells when pipes are clogged up. The same goes for Data Quality. If the data is stale it simply can’t be trusted, and nobody is going to act on it. Before even seeing a single data value, my brain was already triggered … what do you mean you don’t know when you last talked to the device? The single easiest piece of data I would expect from a “smart device” is the ability to “log/trace” the communications. But here I was confronted with the inability to even know how fresh the data was when I looked at it. For the sake of this post I will refer to a Dork Trust Scale that goes from 0 – I would never act on this data to 10 – I will believe this data and use it to motivate me. I was already lowering my expectations and we will say I was at an 8 on my Dork Trust Scale.
Data vs Gut Driven
I’ve been in the industry long enough that I’ve challenged a lot of people with the old “You need to be more DATA DRIVEN, and less GUT DRIVEN” phrase. Yet, there I sat as the CEO of my Smart Knee looking at this chart. Instantly my gut took over and said “This is WRONG” “This CAN’T be right” “I DON’T TRUST this.”
I’m not trying to give you TMI but it takes me at least 15 steps to get from my bed to the restroom. It takes 20 to get from my reclining chair to the kitchen or the bathroom. Thus, my gut was telling … me this data was hot garbage. Before completely lowering the value on my Dork Trust Scale, I had to look at the “labels” on the chart. It was titled “qualified steps.” What does that mean? So, I pressed the question mark in the upper right corner expecting to see the semantic definitions for their “terms.” Bummer was it took me to a contact phone number. Which I did call and did spend 30 minutes on-line with and was told that the term “Qualified Steps” was rather technical. While I only partially understood their definition, the fact that I tried to express to him is that I had left my house on Saturday and walked quite a bit and walked many times on Sunday actually feeling good that my “gate” was improving. But they showed 0 steps. The complete absence of values for 2 whole days was simply to much for my gut to handle. My Dork Trust Scale had dropped to a 5.
Data Fidelity
Another core data quality concept is that of relationships. Fields/values don’t exist in a vacuum. Many years ago my dear friend Prashant Natarajan introduced me to the wonderful phrase “data fidelity” in his best selling work “Data fidelity is the measure of how faithfully a dataset preserves both the intrinsic correctness of individual fields and the contextual dependencies between them, ensuring that each field’s value retains its intended meaning only when the related fields that give it context are also valid. In the sales world it would mean you can’t act on the value in a field Discount Percentage if you don’t have the value for the Original Price.
In the case of my knee I looked at the chart for the Distance Traveled and my heart sank. You see Distance Traveled and Qualified Steps taken are directly related. Yet, I was being shown that I doubled the qualified steps I took on Monday vs Friday, yet the value for Distance Traveled didn’t even show up. There was NO data fidelity. Not only that, but if you look at the labels in the chart you will see the developers clearly don’t understand basic axis values.
But it gets even worse. Walking Speed is also related to Steps. Ignoring the axis values being incorrect, after all it’s a smart knee, not a smart visualization tool, you will notice that my speed seemed to improve slightly on Saturday. Ordinarily, you would think “good for you Qlik Dork you were getting better.” After all that’s kind of the goal to use it as motivation to continue improving.
But this chart actually has the opposite effect. How did I improve on my speed when in fact they reported 0 steps for Saturday. Worse, they reported steps on Monday, yet no distance traveled and clearly no speed.
Summary
If you are struggling with why nobody is adopting your reports, your analytics or your AI … consider where your data quality is at. You know that I’m a data driven guy. That I’m passionate about helping people make data driven decisions. Yet, here I am my friends telling you that my Dork Trust Scale is at 0 for my “smart knee.” The data doesn’t even meet a modicum of Data Quality standards. While I was really looking forward to having a data driven approach for my road to recovery, here I am relying on my gut to tell me if I’m getting better or not.
But this post isn’t just about my not-so-smart knee. It’s about helping you understand how vitally critical data quality is in enabling the executives/leaders in your organization to make data driven decisions. If you think you can slap charts together and then wait for them to complain about data quality, and then fix it … you are sadly mistaken. If your executive team thinks they are simply funding your efforts for a fish to ride a bike, and that they will free up even more money for those new AI projects you would like to undertake … you are sadly mistaken.
Data Quality needs to be built into your data pipeline from the ground up.





