Why You Need a Data Science Way of Working

A lot of the extra maintenance in AI & Data Science work is due to technical debt. Problems caused by sources such as fragile technical environments, non-production ready code and lack of collaboration structures. Accumulated technical debt will force you to an unproportionate amount of time on maintenance and bug fixing. Why you ask? Because that’s the price we pay for breaking the principles of the data universe.

“Machine Learning: The High Interest Credit Card of Technical Debt” *

Luckily, we can combat technical debt by developing a way of working around sound engineering principles, matured tools and collaboration best practices. And thereby make our AI & Data Science work more productive and low maintenance.

So, if you want to reduce your technical debt, a Data Science Way of Working might be something for you. Keep on reading and I’ll explain in more detail.

Background and TrendsMany organizations have started their AI & Data Science journey and noticed a few bumps in the road. Some rather large bumps. Others want to get started but are not sure where to begin and what to prioritize. And that’s not so weird since the technical landscape has been rapidly evolving.

Recently, many AI & Data Science tools have matured and drastically reduced the team size needed to handle them. The cloud megatrend has made Machine Learning accessible for regular non-tech companies and we are now seeing teams making it past the proof-of-concept graveyard and succeed with deploying ML models into production.

Why Would Anyone Need a Data Science Way of Working?The simple answer? To avoid getting time-trapped by too much maintenance and firefighting in the daily AI & Data Science work. We want to solve problems so they stay solved. If we start to accumulate too much technical debt it steals away our time from developing new solutions. And thereby we lose our capacity to deliver increased value to the rest of the organization.

The famous coach Tim Gallwey has a formula that’s been widely adopted by top-performers world-wide:

Performance = Potential – Interference **

Reducing the interference is exactly what we want to do to increase our performance. Even a group of data rock stars would be low performers with enough blockers and time thieves.

Technical debt, high maintenance and unstable production environment are all interference and should therefore be handled. When reducing it, we unleash the potential that gives us a much-wanted increase in delivery capacity.

If a company’s strategy depends on the ability to predict behaviors and actions, it’s crucial to enable that strategic capability by having the fundamentals and prerequisites in place. And as the author Stephen Covey often states: to get the golden eggs, we need to take care of the goose that lays them ***.

We need to work in a way that doesn’t hurt us on the long run. To not frequently trade short-term wins for long-term losses. The “anything goes” method quickly builds up complexity, silos and technical debt. A problem can often be solved in several different ways with various tools, programming languages or project structures. If not kept in check, the data science team can become like a group of solo artists trying to play together without deciding the genre, tempo, and song structure. Imagine what that would imply, in terms of maintenance, for a team running ML models in production. It would not be good.

To combat complexity and build strong teams with a high scale and low maintenance AI portfolio, it’s helpful to establish a Data Science Way of Working. It brings a bunch of positive second and third order effects. Forming a stance on “how we do things here” is not only a recipe for reducing technical debt, lowering maintenance work or a nice clarification for inhouse upskilling, but also a big help in the recruitment process.

Common Challenges that Accumulate Technical DebtHere are some of the most common challenges we’ve seen in AI & Data Science teams. These are all sources of interference that hurts performance:

Code Quality Issues
Many Data Scientists do not have software development experience and a lot of the code is written in “Notebook format”. Notebooks are great for quick visualizations and exploration of data and models. Simultaneously, it promotes some bad coding habits, lacks structure, and doesn’t utilize the full power of proper version control.

Unstable Production Environment
Solutions that break because some part of the system was upgraded to a non-compatible version with other core parts. Managing packages and handling dependencies in compute clusters have often been challenging.

Missing Team Collaboration
Silos in-between projects and lacking knowledge-sharing that creates developer dependent solutions is another issue. When development is done individually without any team quality checks along the way the risk of fragile solutions increases significantly.

In addition, we often also problems in these areas:

  • No Structure for the Development Workflow and/or Automation
  • Limited Development Environment
  • Data Quality and Availability Problems
  • Unclear Roles and Responsibilities

How do we Overcome these Sources of InterferenceOne way is to establish a way of working used by the whole team, something that unites them and is adapted to the specific company context. This combined with time-tested collaboration best practices and principles that minimizes technical debt and thereby the future needed maintenance work.

It’s incredibly helpful to have a shared view on what makes up a proof of concept (PoC), minimum viable product (MVP) or what it means to develop for production. When we know what each phase means and demands of us, it becomes a lot easier to maintain productionized solutions. The core of the Data Science Way of Working is about being in synch and having clear expectations.

 

Here are some tips to get started with more productive AI & Data Science work:

  • Use a Data Science Solution Lifecycle where each phase has appropriate requirements and check lists
  • Synchronize standards to break down collaboration barriers
  • MLOps should be implemented in steps, but start with carrying out the machine learning tasks manually first.


Need help establishing a Data Science Way of Working?
We’re here if you need us!

 

Text references:
* https://research.google/pubs/pub43146/
** https://thesystemsthinker.com/the-inner-game-of-work-building-capability-in-the-workplace/
*** https://www.business2community.com/health-wellness/remembering-the-ppc-balance-and-the-golden-eggs-0498012

Alexander Mafi
alexander.mafi@capgemini.com
Alexander has been developing digital products for the last 10 years, both as an entrepreneur and as a consultant focusing on Machine Learning and decision automation. As the Focus Area Lead in AI & Data Science he frequently switches between PowerPoint and programming. Sometimes being the teacher or product leader, and sometimes being the data scientist. On a constant quest for finding the best principles within data & AI.

Alla inlägg av Alexander Mafi