Bookmark and Share

Machine Learning Developers Face Data and IT Integration Challenges

This market research report was originally published at Tractica's website. It is reprinted here with the permission of Tractica.

Enterprises have been using analytical machine learning (ML) techniques for years to solve business problems related to making predictions on raw data. Today, perception-based techniques driven by deep learning and neural networks are gaining traction around understanding vision and language, both of which have applications within enterprise settings.

When people speak of enterprise artificial intelligence (AI), it is quite common to treat it as a generic term or one large entity without being specific about what technology is being used. This can lead to misunderstandings around what AI can and cannot do, the software and hardware that is required, or even the talent needed to develop the AI solution.

As per Tractica’s definition of AI, statistical ML techniques like random forests, support vector machines, naive Bayes, and linear regression, among others, are part of the umbrella term of AI. However, they should be treated separately from the deep learning branch of AI. Therefore, it made sense to question ML developers separately about what challenges they are facing, the software and hardware tools they use, and the application markets where they see most activity.

A recent survey conducted by Tractica in collaboration with ITPro Today has uncovered specific trends related to ML/data science development within the enterprise. The survey included responses from 50 ML developers. It is useful to view this development in a different light from deep learning developers, which seem to get most of the attention due to their association with large internet companies like Google, Facebook, and Amazon, as well as the cloud-based AI frameworks that have been in the news.

What Are the Bottlenecks to Enterprise ML Development?

In terms of the bottlenecks faced by ML developers, data preparation ranks on top, followed by external code integration and enterprise backend integration. This is consistent with what we have been hearing from large, medium, and small enterprises. Cleaning of data, labeling of data, and checking for bias in data are all part of the hard plumbing that is required today to make sure ML-driven AI processes are working. At the same time, the ability to bring code from external environments continues to be challenging, though vendors claim that this is not the case.

Rank the following bottlenecks in order of their severity to your machine learning development workflow (where 1 = highest pain point and 5 = lowest pain point)


Sources: Tractica, ITPro Today

The backend integration of ML platforms is also an issue, as enterprises want a smoother flow between models being built and then being deployed in the IT infrastructure. This is not always a smooth process. As ML scales in the enterprise, backend integration will be a bigger issue than it is currently. Interestingly, visualization is a lower priority issue. However, from a vendor perspective, this is an area of differentiation; most standard solutions have “drag and drop” model development capabilities. This trend suggests that most enterprises like to work at the code level when it comes to ML, rather than work from simplified programming interfaces.

While model portability has been cited as an issue by some vendors and developers, in this survey, it is the lowest barrier. External code integration seems to be a bigger issue. These responses suggest the possibility that ML developers like to use open source platforms to build models and then bring them into their enterprise proprietary platforms, rather than use proprietary platforms to build and port models.

What Are the Most Popular Software Tools for ML?

In terms of software tools for ML, IBM Watson is the most popular platform according to the survey, followed by SAP, Anaconda, and SAS. These are traditional enterprise analytics vendors that have made inroads into the AI/ML space. IBM Watson is the most popular, but SAP, SAS, and Anaconda are not too far off. Many of the newer players in the market like H2O.ai have gained prominence, especially with the Driverless AI concept, which borrows from Google’s AutoML. Also, some of the other players like Databricks, RapidMiner, and Alteryx have been receiving good feedback, as they have been able to innovate more quickly compared to the larger vendors. Over time, Tractica expects the ML software market to consolidate as the hyperscaler platforms like Google Cloud, Microsoft Azure, and AWS encroach upon the ML developer space.

What software tools do you use in the AI (machine learning/data science) model development process?


Sources: Tractica, ITPro Today

What Are the Most Popular Hardware Tools for ML?

While ML development has largely treated hardware as an afterthought, shifts are occurring in the market. NVIDIA’s launch of data science solutions like Titan RTX, Quadro RTX, and Rapids are meant to accelerate graphics processing unit (GPU) adoption for ML and take away share from Intel’s central processing unit (CPU) dominance. As the survey shows, Intel Xeon-based CPUs are the most preferred option for ML developers. However, GPUs are not far behind – especially when considering both NVIDIA and other GPU platforms. Over time, we expect ML workloads to be split between GPUs and CPUs. Depending on performance requirements and budget, users will choose either solution. GPUs give users better performance speedup while CPUs are the lower cost solution. Another consideration is the growing relevance of ARM-based processor solutions, which can be targeted at solving ML workloads.

What hardware tools do you use in the AI model development process?


Sources: Tractica, ITPro Today

Where Is ML Being Deployed?

Finally, the survey found interesting choices of industries where ML is being deployed. Public sector (government) and business services (HR, customer relationship management [CRM], etc.) seem to be the most popular areas for ML development, followed by healthcare and manufacturing. In the Other industries, finance and security seem to be the most popular ones.

What industries have you applied machine learning and data science applications in?


Sources: Tractica, ITPro Today

The public sector surprisingly stands out and is much bigger in terms of its activity level than expected. Traditionally, the public sector has been slow at deploying ML solutions due to the slow progress in digital transformation and pervasive bureaucracy. Unfortunately, the survey does not cover the use cases involved. We will need to dig into the ML activity within the public sector to better understand what this represents or whether this is an anomaly. Healthcare and manufacturing also stand out within Tractica’s latest Artificial Intelligence Market Forecasts analysis as sectors that are adopting ML-based AI solutions, specifically around patient data analysis and predictive maintenance use cases.

Aditya Kaul
Research Director, Tractica