Put your data scientists on a leash

Giancarlo Cobino
3 min readMay 9, 2020
Photo by ZHANG FENGSHENG on Unsplash

Not too short to castrate their creativity, but short enough to preserve your production environments, and — ultimately — your organization.

Data Science can be a powerful resource for any organization and their chances to beat the competition. In a world driven by data, being able to read the changes through the analysis of data is a big advantage. Companies can intercept the customers, retain them, offer products and services in line with their interests.

However, Data Scientists have a very open-mind and they tend to unlock too much their creativity, looking for patterns where they are not, for insights that tend to be wrong, leading to false conclusions.

It’s not about Data Scientists’ value

The outcome does not depend on Data Scientists ability to model the world. In 99.9% of the cases this is not the issue. Data Scientists are very skilled, competent and able to define the best models to predict anything. They can be more than competent but at the same time unable to define what is at best interest of the organization they work for.

Governing their work is not an act of faithlessness, but just a way to better define the perimeter of their modeling. Otherwise, something can go wrong, as I semi-seriously paint here.

First problem: ungoverned creativity

Imagine a situation where a business user seats in front of a data scientist asking to identify a specific set of clients who are keen to buy a product. It sounds like a child’s play for the Data Scientist, who stares at the guy in front of him, without really looking, absorbed in thousand of thoughts about what he can do. “Only a set of clients? Come on, be serious… at least 127 different clusters”. “Those users will buy our whole catalog based on my recommendations”. And so forth.

At the end of the day, the outcome will be:

  • Data Scientist will end up with 83,422 cluster, but not the right one
  • Data Scientist will find the right cluster, get bored about it, and look for getting the next best possible cluster of clients, expected in at least 2 years
  • Business user will wait for the cluster of client, too long to stay focused and interested

Lesson: creativity is good but only if applied to reality.

Second problem: ungoverned models

The Data Scientists are good. They learn from any possible source: master, PhD, evenings at the pubs, movies! Everything counts. So, after a dozen of back and forth with business, the Data Scientist is able to identify the right cluster of customers. This keeps the business happy for a while. More evenings at the pub, drinking and thinking about next steps.

Until, in a very normal (apparently!) morning the Data Scientist goes back to the desk, sip his favorite tea and think about new ideas. The model that predicts the right cluster for the business is gone, lost somewhere in someone’s laptop. Mathematics, functions, training, fine tuning. A new model is on its way to production… a shame that the old one is completely forgotten.

What happens next:

  • Data Scientist forgot about the first model
  • Business keeps on pushing the “automatic” prediction (someone who runs the code) for a couple of times, then drop it
  • No more predictions, no more drinks, no more thinking

Lesson: Data Scientists are good but don’t rely too much on them.

Third problem: Data Scientists do it better

Imagine a beautiful meeting room overlooking the ocean. The Data Science team on one side of the table, the marketing team on the other side. The marketing team do the talk, the Data Science team listen, annoyed and frustrated. It goes on for hours. Marketing explains what they want to a group of people fiercely convinced that they can do better. “We listen to the data, we let them talk”, a Data Scientist finally shouts.

What happens next:

  • Data Scientists take months to develop the most sophisticated models, responding to a question they haven’t truly understood
  • Marketing team continues to work on confused excel files
  • In the last meeting nobody will listen to anyone else. To stop that situation, someone proposes to go for a drink.

Lesson: before doing, let the Data Scientist understand what you think.

This is of course for kidding. Or maybe it is not, is t it?

--

--

Giancarlo Cobino

Quant portfolio manager in the past. Now Machine Learning enthusiast, focused on the whole lifecycle of ML projects. Insights with control!