Search This Blog

Thursday, December 5, 2013

Consequences of Data Quality or Lack there of!

Over the last few years more and more has been brought up about data quality.  And now as they talk about big data it's coming up again and again.

Data quality is a bit hard to define as I've found from some posts in Linkedin Groups.  I means different things to different people.  Sometimes to the downfall of the people involved.
It really depends on the objective of the data in question.  Are they sales figures, contacts/customers or employee information, general information on response times, or something else of equal  importance?

Each of these has a different level of said "QUALITY".   Imagine if you have input some sales figures and they are off by a factor of 10 or seem to be replicated.  That would cause a lot of havoc in sales projection and other strategies.  You might even go off and order extra product that later becomes a front page listing at overstock.com.


I think though this ever excessive quantity of data could be causing some other unintended consequences.  We already know about the issues with a wrong phone or address and other typos.  We know about the wrong information being entered on what a customer is interested in or has bought.  Bad scans on products because the one in his hand did not come up in the system properly etc.
All those things people understand and can understand in general terms.  Sometimes these issues are hard to find in the system.  They can come up at a multitude of points along the process.




But, there is a bigger problem.  To much data and lack of coordination of that large amount of data.

Let me give two examples that will help to illustrate the problem.

One person related the story of how they received multiple phone calls from the same company on the same day but, from different offices.  You might say what the heck does this have to do with data quality.  It is rather simpler then you might think.  In this case there may not be duplication in the data (IE. the person entered twice in the database), there is duplication in the access to the data and the use of the data.
It is very apparent that this company does not have their people cross checking to make sure they are not selling to the same person from multiple offices.  Goes back to the old rule of sales regions.  If you have the "South" region here in the united states you'd be contacting only people in states like Florida, Georgia, Tennessee, Mississippi, etc.  You'd not be selling to a company based in Seattle Washington because it is out of your region.  Things can get complicated with multiple locations or offices though you'd technically not have the same guy listed for Seattle and Atlanta.  Also maybe the sales force is small for said company and guys in different offices sell all over the place.  However, proper protocol should have them at least entering some kind of contact date or sales information such that other sales reps know they are being contacted.
It did not sound like the reporter of the story has all that common of a name so it should be easy to figure out your going to contact the same person or maybe not.  And there in lies the problem.

See more on this story at Data Quality & Daily Life.

Bottom line in this situation is that multiple people had access to the data and had no clue anyone else was using it or how they were using it..   So that can be labeled a data quality issue just as much as anything else.  The data wasn't quality since no one knew how to use it properly or did not not have proper information associated to the record.

The second example happened to me.  I'm a member of an auto club that provides roadside assistant and related services.  I've been a member of this group for over 15 years.  I've also been at my current location for over 4 years.  I've also requested auto insurance quotes from them in the past.  No big deal.  The sad part was when an offer to become a member was sent in the mail to me.  As I stated previously I've been a member for a long time.  Now you'd think that what ever data bases they have they'd at least check to see if there is some overlap in the system (dedup) or at least try and verify if a person they want to send membership information to is not already a member.  You'd think they would want to at least review before taking further action anything that came up as a potential duplicate or current member.

This could in reality highlight one of the problems we now find ourselves in.  We have no much data on hand.  Could Big Data be causing us problems.  Do we have to much information to accurately and appropriately deal with?

I know from experience with the data I deal with there are what seem like a million pieces of information available or that has been compiled on people.  Routinely, we append consumer enhancements in a bundled form that includes things like credit cards, owner/renter, presence of child and related items.  However, we did do some work for the Obama campaign the first go around for the state of South Carolina.  You can append information on gun ownership, hunting, RVs, newly weds, or new parents, cellphone owner, and related tech topics, buying habits, ethnic information, age, income and housing information, and any number of interests.  Imagine how targeted your advertising can become if you have this information.  Also imagine how cumbersome your database may become if you included information on all these areas for each and every records or household in your system?

The great thing about a database you can split it into smaller chunks and still centrally link everything via your primary key.




This still does not justify the need for 300 different fields of data on 1 individual in all circumstances.
When we start to talk about big data we start to need to discuss these issues.  Have we due to so much data being available at our finger tips gotten away from doing due diligence in our sales endeavors, or customer service.  Have we justified the need for something or just taken things for granted since we have what seems like unlimited storage space due to the cloud?



So as we are deluged with extreme quantities of data we may very well need to say "is this information really necessary for our operations".  Those who have the responsibilities of designing databases, or maintaining them should be asking that question all the time.  Is this information adding value to what we are doing on a regular basis.?  Is there a better way to deal with this Big Data?

"And big data is really, really big. According to industry think tank IDC, the world now generates more data in 24 hours than existed in the history of planet earth up to the year 2000. If you want to put a figure on it, it was 2.5 quintillionbytes daily."

"Clearly this amount of data can be used as a goldmine of information for businesses around the world. But in its raw, unprocessed state, data can be more of a hindrance than a help, and the single customer view is no use if it doesn’t assist each department in achieving its objectives."
How Big Data Can Help You Close the Deal

Because as we get to the point of having terabytes of data being archived on a regular basis we need to consider the amount of space that takes, the amount of energy consumed moving it around and maintaining it, and the amount of time wasted as we do said actions.
We have to consider data corruption, and other related information along with the usual things related to data input errors.

We really need to change our way of thinking.  Bigger may not be better, but it can be if we use what we are given in an intellectual and prudent way.




So let us not fall into the trap of claiming - data is knowledge or power.  Unless, we put the resources at hand to proper use we may just end up with a bunch of offended customer and a soon to be "out of business" company.

"The success of its [Dell's] big data experiment proves that information is power, but that information on its own is useless. In order to harness the power of data – whether it’s big data, or data from your own CRM –  the data needs to be processed, cleansed, merged and combined.
But more importantly, it needs to be used selectively so that each department gains useful insight and can benefit from the single customer view."
How Big Data Can Help You Close the Deal




Here are couple of additional pages you can check out on data quality I think are worth the read

Big Data Can Tell Big Lies Through Fifth Normal Form (5NF)
“We hold these truths to be self-evident” (or do you trust your data?) 
How Big Data Can Help You Close the Deal
Data Quality & Daily Life 
Data Quality and the Blemishing Effect 


So now you have a few more things to think about in the data quality realm.  Are we doing ourselves a disservice as we head for the holy grail and embrace big data at full warp speed?

So look for the not so obvious as you work to ensure the data you are using is of the quality you want and need..... anything else would be yourself and others a disservice.  But it might just help out godd old overstock.com!

Buaidh - NO - Bas

No comments:

Post a Comment