Insight 2: Solid engineering is the bottleneck because complexity accumulates

TLDR: Dealing with the complexity of the real world is more difficult, but also more valuable, than solving singular novel problems.

Get the article series as a whitepaper

In our second insight, “Engineering is the bottleneck,” we argue that the vast majority of companies need AI engineers, not AI scientists. We also discuss solid engineering principles that can make or break industrial AI systems.

The mess that is engineering

In commercial software engineering and data science, most problems are best solved by applying solid engineering principles, rather than relying on novel research techniques.

Unfortunately, many laymen and managers confuse software engineering with scientific research. The day to day work of a software engineer is imagined as a conveyor belt with complex problems that need to be solved. Once a problem is solved, the engineers move on to the next one. The best engineers are the ones that can solve the most challenging problems the fastest. We believe this view represents a misunderstanding of the field.

Only a tiny percentage of companies deal with deep, novel research problems that can be dealt with in isolation. Rather than dealing with the complexity of individual problems, software engineering usually revolves around tackling the accumulation of complexity from past solutions.

Software engineering is like maintaining a box of wires

Software engineering is more like maintaining a giant box of wires such as the one shown in the picture. The wires must connect in a certain way for the box to work, and every day, the requirements change slightly. Over time, more and more wires are put in, and it becomes increasingly more challenging to grasp the entire system.

It is the reason why the progress of software projects stagnates: You have to take yesterday’s solution into account when solving today’s problem. Solving multiple small problems at the same time is by no means easy: Dealing with the complexity of the real world is often harder than tackling singular, novel problems.

The problems regarding the accumulation of complexity also hold true in data science. Complex procedures quickly add up, effectively slowing down every subsequent change. For data scientists, day to day work means dealing with practical issues such as legacy data formats, algorithms from different frameworks and languages, and the mess that is GPU programming. It requires excellent engineering skills to set up stable ETL pipelines as well as dealing with out-of-core algorithms. Choosing the right tool for the job often requires broad experience with many tools.

The value of broad knowledge

Because complexity in software projects accumulates, good initial decisions are tremendously valuable. Going down the wrong path at the beginning of a project can cost weeks of valuable development time. You can even avoid building entire projects by knowing the right open-source tools, so knowing the right tool for the job is invaluable.

It means that having a broad knowledge of most tools and frameworks often beats specialized knowledge of a particular algorithm. It is opposite to what is essential for an academic career: Deep knowledge of a narrow topic is actively encouraged. In academia, you might spend hours solving an old problem in a new way. For data science in the industry, this is usually a waste of valuable time.

However, shallow knowledge with no depth is useless. You should obtain the theoretical background, (e.g., statistics, linear algebra, and computer science) to dive deep into these topics when necessary.

To gain a broad knowledge of machine learning algorithms, we encourage you to read through the excellent documentation of the “sklearn” library. Because the library contains solutions to a lot of common industry problems, it is an excellent source of reference for a broad understanding of the field.

Classifier comparison: An illustration of the nature of decision boundaries of different classifiers.

For example: If you have only worked on academic classification problems, you might have never had to deal with open world assumptions. Therefore, you might not know anything about outlier or novelty detection, failing to realize that it is a vital part of the problem you are trying to solve.

Because of the ever-growing amount of freely available tools, and their increasing complexity, choosing the right tool is more art than science, and requires a lot of skill and experience.

The shortage of developers

Unfortunately, the world is experiencing a severe shortage of skilled software engineers. Software is useful whenever automation is possible. Typical businesses have thousands of problems lying around waiting to be tacked with automation. Until no human is doing work that a machine could do, there will be a need for software engineers. Unfortunately, good software engineering is as much an art as a science, so it is tough to teach software engineering in a traditional university setting.

The shortage of skilled software developers has had the unfortunate consequence that the bar for what constitutes as a software engineer has become quite low. Since the field is in constant flux, and since solutions are growing ever more complex, it is difficult for non-engineers to evaluate the candidates. Even the least skilled engineers can find work, meaning that many companies are suffering severely from solutions that look like the box of wires shown in the above picture.

We encourage you to fight the good fight: Untangle the mess – remember that complexity accumulates.

To sum up

This second insight “Engineering is the bottleneck” we argue that the vast majority of companies need AI engineers, not AI scientists. It is because most companies are better off developing pragmatic solutions based on existing research rather than conducting their research. It is simply not cost-efficient for most companies to invest in novel technology when there is so much cutting-edge research freely available, ready to be implemented. The insight also makes the argument that it is more difficult to become a good engineer because engineering skills are more difficult to quantify than science skills and more difficult to teach in a classical university setting.

In our next insight, “Deductive reasoning is preferable to inductive reasoning”, we will dive even deeper into the discussion of solid engineering principles that can make or break industrial AI systems and give our point of view on why the simplest possible solution should always be tried first.

Get ‘The Pragmatic Data Scientist’ as a whitepaper