Machine learning is going towards the direction where data scientist does the creative work and ML platform takes care of unpleasant process management.
Machine learning platform is an environment to write code, access common libraries, manage computation resources, deploy machine learning (ML) solutions and co-operate with other team members.
The most common enterprise solutions are AWS SageMaker, Azure Machine Learning, Google Vertex AI, IBM Watson and Databricks. Read the comparison of major cloud ML platforms.
Who are the users of machine learning platforms?
Data Scientists. From development point of view data scientists are the ones who build the individual ML models. They can create experiments, write code and present results to others.
Data Engineers. Most of the data processing pipelines should be done before the data ends up to a machine learning platform. But for example feature engineering is often done on ML platform side. In the proof of concept phase, feature engineering might simply mean creation of appropriate columns to an Excel file.
Machine Learning Engineers. This role is responsible that data scientists can co-operate with each other and deploy the results quickly.
Customers. ML platforms exists to bring benefit for stakeholders. The end users can be either an external customer or an internal development team.
Read more: Difference between Data Scientist and Data Engineer.
Which components make a machine learning platform?
An ML platform constists of a few different components.
Workspace. There needs to be a user interface to write and edit code. Usually the editor is web based and it is referred as workbench, workspace, studio or machine learning IDE (Integrated Development Environment). Python, R, Scala and SQL are the most common programmning languages for data science.
Computation. A virtual machine (a slice from of a physical server) is needed to run the code. Or multiple of them in which case we talk about computation cluster. If you would be working on your laptop, that would be your computation engine. In cloud platform you can scale the computation resources depending on the type of workload.
MLOps tools. ML platforms provide functionalities that make the whole ML process easier. These could be code versioning, ML model versioning, library installation, algorithms or deployment tools. If code editing and computation can be managed mostly from a single view, MLOps is scattered to several components.
Data storage. Usually data is not stored in machine learning platforms to large extent. Rather, data is kept in the source systems and databases. Only the connections are created on ML platform side to make access easier. A common approach is to mount a cloud file system like AWS S3 or Azure Data Lake to read source data and write the results back.
You can use text editor to write code, laptop to source the processing power and file naming conventions as your MLOps. You could say, ML platforms make individual tasks just slightly easier.
Benefits come from brining extra value to each individual buidling block and bundling them to one integrated solution. Code editor can be accessed easily from browser. Cloud computation scales up and down, opposite than laptop. MLOps tools have robust APIs to simplify the processes.
MLOps - Managing the machine learning lifecycle
Let’s take a closer look at the MLOps tools to manage the machine learning lifecycle.
Save experiment results. Many platforms provide an API to save training results in systematic way for later review. You could save the parameters such as maximum depth of a decision tree, metrics such as precision and tags like version number.
Manage ML models. ML models can be saved after training and loaded for predictions. There might be versioning to easily save a new model version after re-training. Just seeing all models in one list view is a big benefit.
Orchestration. Run scripts for data pre-processing, model training and scoring on regular intervals. Some tools have advanced orchestration to combine multiple jobs together.
Deploy and share. A development version of a model can be deployed to production. It can be published through an REST API endpoint to be used by other applications.
Why cloud ML platforms?
Here are some benefits that machine learning platforms provide compared to traditional analytical tools that are designed for individuals.
Scalability. The machine learning platforms are hosted on cloud. This means that the size of computation unit or number of cluster nodes are adjusted easily. You may want to run demanding tasks occassionally, but have periods without significant usage peaks.
Manage ML models. ML cloud platforms provide convenient tools to manage large number of ML models out of the box. You might be able to see them in one view and have common interface to save a trained model and load it for predictions.
Programming environment. Managing Python versions, library versions, database drivers, data lake mounting, environment variables and other configuration quickly becomes a mess without proper platform.
Data and computation in the same environment. In the optimal situation the ML platform and data are hosted in the same data center. In this case the performance, security, costs and the timeliness of the data are guaranteed.
Co-operation. As ML platforms run in the cloud anyway, many of them provide easy access for all team members. If one data scientist writes the code, others can easily contribute, even simultaneously.
Metadata. By knowing how and what kind of data is stored, querying is faster, error free and more convenient. Often this enables to run SQL queries for any kind of source data.
Deployment. Many of the platforms makes it easy to publish a REST API endpoint to make predictions.
Versions control. An integrated version control for the code might be included.
Monitoring. Easily capture issues with production jobs and models.
Availability. Just open the browser and you are ready to work.
Downsides of ML platforms
Costs. Sometimes the expenses might be high and difficult to understand.
Choice of platform. There are so many vendors and alternative cloud components for the same job, that finding the best combination is difficult.
Learning curve. Getting familiar with all available tools and the ML lifecycle process takes time.