Hiring Joe is not enough

Giancarlo Cobino
5 min readDec 27, 2020

Why hiring the right people in Machine Learning is not enough.

Joe has recently been hired as a Machine Learning Engineer at Reproducible Inc, a software house that uses Machine Learning for various activities. It was Francine’s idea, the Head of Data. She thought that this could increase the productivity of Machine Learning models and their lifetime in production.

In fact, Francine has spent the last three months answering to her boss, trying to explain the reason behind the mistrust of Machine Learning within the company. The same mistrust that has brought down what has been done, because it was not reliable — neither in terms of performance nor in terms of consistency- and users have stopped using them.

Francine was not able to understand. She called in the Data Scientists — Paul, Elizabeth, Margaret and Charles- who claimed that everything has been done properly. Then she called in the Data Engineers, who acknowledge that from a Data Science perspective everything was all right. However, they also raise some concerns regarding the code itself, which was not well structured, nor robust. Francine was confused, but they sentenced that they need a Machine Learning Engineer, someone who knows Machine Learning and is also able to put some sense into code.

Francine ended the meeting with a smile on her face. She finally had a solution. So, she asked HR to put an adv on the web and look for the best possible ML Engineer. A lot of people answered, but — after good rounds of interviews- Joe has been chosen.

And here it is, Joe. A Master Degree in Computer Science, some years spent in software Development and Machine Learning, which he then combined into the new gold profession: Machine Learning Engineer. He sat in front of his laptop, with everyone — metaphorically- on his back, watching what he does.

Photo by Sebastian Herrmann on Unsplash

But Joe doesn’t move a finger. He is more interested in understanding what has been done. And what he sees is rather disturbing. He decides to write a list, which looks pretty much like this

  1. Everything is in Paul, Elizabeth, Margaret and Charles laptops. They take care of testing the code and verify its effectiveness.
  2. Data are flying to one place to the other with no controls in place. They reside, again, in Data Scientists laptops, usually containing sensitive information.
  3. CI/CD are concepts that haven’t reached Data Scientists at Reproducible Inc.
  4. Versioning includes v_x.y on Data Scientists folders. There is no evidence of what has been changed or for what reason.
  5. Deployment. Well, deployment is not a deployment.

Five points brought to the attention of Francine, who — however- does not share Joe’s shock. She appears very calm. She just asks Joe to put everything back on track, because this is the reason why he has been hired in the first place. Joe is concerned. He doesn’t know where to start. So he goes back to his desk and starts thinking. And he ends up with a plan, a series of tasks to structure the code, secure the data, introduce CI/CD, code version and automatic deploy. He is exhausted but happy. He goes back to Francine and explains it in details. Francine is enthusiastic and suggests to start right away.

  1. First week into the plan: Joe talks to every Data Scientist and they are all on board, happy for the new course of things. They swear that it was precisely what they were waiting for.
  2. Second week into the plan: Joe is concerned. He has conducted several meetings to understand what Data Scientists are up to and nothing has changed. They keep using their Notebooks, moving data manually and leaving them into their laptops. Joe talks to Francine and he says: “We need to buy a proper tool”. Francine doesn’t agree. It is too expensive and the budget has already been spent on Joe.
  3. Third week into the plan: Joe is working hard to save what can be saved. He refactors the code, introduces unit tests, opens a git account and structures the repository to be efficient. Meanwhile, Data Scientists are working hard on improving their models, improving the performances. Everything looks great. In the daily meeting with Francine, everyone spreads efficiency and order. Joe keeps his concerns for himself, worried that talking can be nefarious for the team.
  4. Two months into the plan: the refactoring is done. The models perform better than before. Everyone seems satisfied, business included. Francine thinks that her decisions have been right, including the denial of a proper collaborative tool for Machine Learning.
  5. The day of deploy: Joe has created the endpoints for the predictions. He had limited resources to deploy the models but has done his best. Joe takes hit time, checks that everything is fine and then gives the green light.
  6. First two weeks into deploy: at the beginning, everything looks good. Then, the system had some glitches. When users increase, it tends to slow down and this affects sales. Management starts to worry about it but predictions are still good, so they cool down.
  7. First month into deploy: predictions deteriorate. Business talks to Data Scientists, who make some adjustment. Joe is watching that everything complies with the best practice.
  8. Second month into deploy: predictions are out of control. Business gets nervous. Data Scientists are under pressure and they got back to old habits. Notebooks, patches, no versioning, manual deploy. Joe is isolated.
  9. Third month into deploy: the ineffectiveness of extemporary fixes is clear. Joe goes to Francine and resigns. She asks why and he explains the impossibility of creating a virtuous programme. The business decides to stop using the Machine Learning models. Francine is back at square one, having invested a significant amount of money.

Conclusion: Joe alone is not enough to turn a bad process into a good one. As in traditional software development, hiring someone who knows how to drive the car, doesn’t make the car driveable. You need the proper tools, the resources, the competences and the willingness to face a massive change in your way of work. Data Science is different from traditional software development and yet it is similar. Everything that goes in production must respect certain standards and the only way to do that is relying on the right tools, sharing knowledge across the team. Joe alone can’t do that. He needs investiture from the top, support from the bottom, the right budget to buy the right tools and quite some times to put a strategy in practice.

--

--

Giancarlo Cobino

Quant portfolio manager in the past. Now Machine Learning enthusiast, focused on the whole lifecycle of ML projects. Insights with control!