Data scientists have become more popular over the last few years, maybe even rock stars. As such, a certain mythology around them has developed. (Have you ever met a true data scientist?) It goes something like this:
Pay them a great deal of money and they’ll descend upon your organization wearing priest-like robes. In a few months, they’ll magically transform raw data into fascinating insights, quadrupling your company’s profits in the process. And, above all, you should never question them because they’re much smarter than you.
Against this backdrop, it’s not surprisingly that it has become increasingly popular to slam them. In his popular recent post Why you should never trust a data scientist, Pete Warden writes:
The data scientists I know are honest people, but there’s [sic] no external checks in the system to keep them that way. The best you can hope for is blog and Twitter feedback, but without access to the data, or even a full paper on the techniques, you can’t dig very deeply.
In his post, Warden cites specific examples of how people have misinterpreted data visualizations based on largely social data (read: Facebook and Twitter).
Now, it’s always been easy to shoot the messenger. I can remember in my ERP and CRM consulting days sitting in tense meetings. Transforming millions of records of decades-old legacy data into a contemporary system would pose plenty of problems. (Only first-time clients believed that systems from different eras stored data in remotely similar ways.)
Faced with a project woefully behind schedule and over budget, eventually I would present an update on the data conversion issues. During initial runs of conversion programs, it wasn’t uncommon to receive thousands of errors, although many of them were “soft” (read: you could ignore them if you liked). On one particularly contentious project, the PM for one of my clients nearly screamed at me for my supposed incompetence. In her view, “my” data and “my” programs weren’t working. As a result, it was my fault that we were all behind. I tried to politely explain to her that I wasn’t keying in journal entries at her hospital in 1992.
Needless to say, the PM and I didn’t work together for too much longer after that.
Simon Says: Don’t Shoot the Messenger
At a minimum, organizations need new tools and employees with new skills to successfully navigate the era of Big Data. Luckily, plenty of new filesystems, frameworks, platforms, and applications exist. These are necessary but insufficient conditions for success in a Big Data world.
Faced with petabytes of unstructured data, there will still be plenty of mistakes–both errors of commission and errors of omission. Sometimes and maybe even often, we will encounter instances in which “the truth” isn’t obvious, even for expensive data scientists. While they’re certainly not above reproach, screaming at the very people trying to help you understand your data–and solve thorny business issues–is most certainly not the way to go.
Big Data is not a package slam. Expect mistakes, not perfection. It’s a process, not a project.
What say you?