Sherlock Holmes, Sexy Data Scientist


Everyone has heard by now that data scientist is the sexiest job of the 21st century, and anyone who has seen the latest BBC incarnation of Sherlock Holmes, can’t argue that Sherlock, as played by Benedict Cumberbatch, does have a certain something. “Smart is the new sexy,” is actually a line from the show. I don’t entirely agree that it’s new, actually. Smart has always been sexy to some of us. The ability to infer important conclusions from limited and messy data has simply become more widely recognized as desirable lately.

One fascinating thing to me about Holmes is that he is famous for deductive reasoning. The truth is, though, that he doesn’t use deductive reasoning. He uses abductive reasoning.  Deductive reasoning is very reliable, but it only works in cases where things are certain, very black and white. In business, just as in crime solving, this is rarely the case.

A = B, B = C, therefore A = C. That’s deductive reasoning.For example:

Every person breathes air.
Benedict Cumberbatch is a person.
Therefore, Benedict Cumberbatch breathes air.

The conclusion in this case is absolutely immutably true, since both of the statements it’s based on are true. That’s the nature of deductive reasoning.

Looking at someone’s pocket watch, a walking stick, or in the modern TV show, a cell phone, and determining everything from whether or not the owner is left-handed to what they do for a living and if they have a relative with a drinking problem is abductive reasoning. Abduction is inferring larger conclusions based on the most likely explanation for a limited data set.

C occurs most commonly when A is true. C is occurring, therefore A is probably true. That’s abductive reasoning.For example:

Watson’s phone has scratches around the charging port.
People who drink often have trouble hitting the tiny port and leave scratches around it.
Therefore, the person who gave Watson the phone has a drinking problem.

The thing about abductive reasoning is that it can be tremendously valuable, but it can also be completely wrong. In the case of the phone, Holmes inferred from the inscription that it came from Watson’s brother Harry, a logical and sensible thing to conclude since Harry is usually a man’s name, but Harry, in this case, was short for Harriet. Watson’s brother was actually a sister, and Holmes, while uncannily right in a dozen other conclusions was completely wrong in that one.

Another important thing about abductive reasoning is that the more data you give it, the more certain the conclusions become, as long as the new evidence supports the original conclusion.

C occurs most commonly when A is true. D occurs most commonly when A is true. C and D are both occurring. Therefore, A is very likely to be true.

Sherlock Holmes isn’t the only one who uses abductive reasoning. Artificial intelligence systems, expert systems, detectives, doctors and data scientists all use the same kind of logic.

Data scientists have to use that style of reasoning, since the data they have is limited and often less than perfect. Businesses rarely operate in an atmosphere of absolute, immutable truths. The essence of the data scientist’s job is to figure out what the most logical explanation of the data is. The more data they can gather, the more accurate their predictions tend to become.

Data scientists are in many ways, the modern Sherlocks, but the trouble with Sherlock Holmes is that he is a rare and brilliant individual. It would create quite a bottleneck if every business puzzle had to be solved by a few reclusive geniuses. So now, we have a high demand skill and a great need that can only be met by a few people. Data scientists just got really sexy.


But what if your business doesn’t have a genius Sherlock Holmes data scientist on the payroll? Not having a data scientist is less of a problem than you might think. Expert systems try to bottle that genius, so that it can be far more widely available. So do many forms of analytic software, to varying degrees of success. Over time, many of the conclusions that require the skill of a data scientist to build logical bridges for businesses to reach now, will become easy for any business savvy professional to reach in the near future. As a recent article I read pointed out, even now, most statistical analysis is not done by statistical analysts.

Only the really tricky cases require a genius. Over time, analytic software will become more business user friendly and more data scientists will be trained and ready to hit the job market. Eventually, data scientists will become less of a bottleneck to analytic productivity. But I predict that one thing will stay the same: Sherlock Holmes will still be sexy.

About Paige Roberts

As Actian’s Hadoop Analytics Evangelist, Paige identifies innovative big data and analytics trends and explores technology alternatives to help organizations drive actionable business value from their data. A seasoned software industry veteran with more than 15 years’ experience, Paige has worn a variety of hats including engineering, consulting, marketing and training. Follow her at @RobertsPaige.

View all posts by Paige Roberts →