“The perfect is the enemy of the good.”
For decades now, information-management professionals have stressed the importance of a simple maxim: “Garbage in, garbage out.” In other words, if your application, system, or database contains suspect data, then that’s exactly what you’ll get when you and your employees run reports.
As I write in the new book:
To be sure, GIGO still holds water in a Big-Data world. After all, no organization wants to pay its employees incorrectly or report incorrect financial results because of bogus records and sloppy data entry. By the same token, though, data perfection is unattainable. Visual Organizations recognize that data visualizations may include bad, suspect, duplicate, or incomplete data, but that doesn’t stop them from proceeding. In fact, a dataviz can help users identify fishy information and purify data faster than manual hunting and pecking. Data quality is a continuum, not a binary. Use data visualization to improve data quality.
Explore Early and Often
In my enterprise system consulting days, I would almost always run small tests to see if my reports, dashboards, ETL applications, and other tools worked as soon I realistically could. That is, I would tend to err on the side of action. (I used agile methods to the extent possible, even on Waterfall projects. For more on the difference between the two methodologies, click here.)
Data quality is a continuum, not a binary.
My rationale for often “jumping the gun” was pretty straightforward: Why not see what I was dealing with before a client gave me all of the data or access to it? While I didn’t use the term back then, why not create the equivalent of minimum viable product? (Of course, this wasn’t always feasible. Sometimes I could not proceed without key information.)
And the same thing applies to visual organizations. Sure, sometimes outliers are easily visible, as is the case below:
Other times, my clients’ data was much messier; the issue or issues did not immediately manifest itself In either case, creating simple visualizations would help me understand my forthcoming challenges–at least the data ones. No dataviz could possible prepare me for some of the personnel issues and politics I would face, but that’s a discussion for beers sometime.
Simon Says: Listen to Voltaire
In some cases, the data has to be perfect before proceeding. Submitting corporate taxes and financial reports certainly come to mind. At others, though, playing with questionable or even highly impure data offers a number of significant benefits, especially in when exploring unknown and/or unfamiliar datasets. Voltaire’s quote at the beginning of this post serves as solid advice.
What say you?