Data Science platforms can be categorized to a few different buckets:
- Data science workspace
- MLOps platform
- Full stack data science platform
- Data processing
Data science workspace
In practical terms, this means a Jupyter notebook or similar user interface. Data science workspaces are used by data analysts and data scientists to create pre-studies, visualize data, share insights and experiment early stage machine learning models.
Data science workspaces are more than notebooks. A good workspace provides hosting, collaboration, data access, built-in version control, integrations, convenience utilities and developer tools.
MLOps is a derivative of DevOps in machine learning context. The aim is to operationalize machine learning model development and finally productionize them.
Operationalization makes the training process repeatable. It includes tasks such as code structuring, logging the results and saving new models versions.
Productionizing makes machine learning models usable for others. This could mean model deployment to API and setting up solution monitoring.
Many MLOps platforms have simple workspace as well, but it is not optimized for explorative phase and collaboration.
These platforms are primarily to process large amounts of data.
In special cases the processing performance is the determining factor for a data science platform.Then the teams are willing to sacrifice other features such as user experience and libraries.
Full stack data science platform
Full stack data science platforms have both the workspace and MLOPs tools. Both of them somewhat advanced.
Some full stack data science platforms have also special capabilities for data processing.
Databricks might be best example to combine all these aspects.