Datalore is collaborative data science platform from Jetbrains. The notebook experience has been taken to the next level. The company is best known for its Python IDE PyCharm.
Datalore - For who and why
Collaboration is the keyword here. Combined with simplicity.
Datalore has better collaboration options than for example Vertex AI notebooks in Google Cloud and Azure notebooks. Databricks would be comparable in terms of collaboration but otherwise it is more complex environment.
The primary use case for Datalore is code writing in always available online environment. It does not have capabilities to manage a machine learning lifecycle.
If you like other JetBrains products, need Kotlin language or otherwise simplistic online notebooks.
Most important features in Datalore
Each notebook is its own environment
In Datalore, notebook is not just the one file where you write your code. Rather, a notebook is an environment where you can have multiple tabs (which you usually call notebooks). It is like having multiple sheets in an Excel workbook.
You can manage libraries per notebook. However, for example environment variables are shared among the notebooks.
Notebook can be defined as “reactive”. This means that it runs all the cells at once. Even if I clicked the second cell, Datalore still ran the whole notebook. This can be convenient when running more established notebooks as jobs.
My absolute favorite feature is the ability to create and share reports from notebooks. In the report builder you simply choose the cell you want to include. It is possible to show both the code and the output.
The report can be made public or shared only within your team. If the report is defined as interactive, the viewers can control the data at some extent. In the static mode only commenting is allowed.
Variable viewer is like the developer’s table of contents for the workflow. Even though, you also have actual table of contents from the markdown headers…
Datalore can automatically suggest library and variable names once you start writing. You would think this is typical among the data science platforms, but it definitely is not!
Other UI goodies
Split view makes it easy to run two notebooks side by side. Hide the top menu with distraction free mode.
Database integration and Cloud storages
Datalore has awesome database integration feature. It enables running SQL queries easily to speed up time to insights. Databases such as Google BigQuery, PostgreSQL are included among dozens of others.
Fetching data from cloud storage services is an alternative approach to get started with analysis work. Datalore supports at the moment Amazon S3 and Google Cloud Storage as storage services.
Compute instances start and restart almost instantly. This is a big deal for developer experience.
Too many installed libraries might make startup slower. Fortunately, the most packages are included by default.
Community and Professional plans are for individuals.
With Community edition you compute 120 hours with a basic machine per month for free. This would approximately 6 hours per work day. More than enough for most of us.
Professional package is 20 $ a month. This plan has 750 computation hours per month with basic CPU machines, 120 hours with powerful ones or 20 hours GPU time. You can have unlimited notebooks running at the same time and create as many interactive reports as you like.
Datalore Enterprise is self-hosted solution for teams. It is free for 1-4 users. The price for teams of size 5+ is not revealed. On summer 2022 it was said to be 125 $ per user per month.
Collaboration in Datalore
Datalore supports creating teams. Teams can collaborate by a shared workspace.
Notebooks can be shared between team members. Team mates are even able work simultaneously in the same notebook.
The Community and Professional versions are managed options for individuals.
Enterprise version can be deployed anywhere as a Docker container either locally or to any cloud provider. Kubernetes would obviously be an option.
It is slightly surprising that Datalore does not provide managed deployment for teams. I expect this feature coming at some point.
You can centrally manage the database connections to access data as easy as possible.
The notebooks can be scheduled to run on specific time.
The envrionment is Jupyter compatible. This means, you can both import and export as
Notebook files and
The first one is convenient place to store analysis specific files. Many notebook environments make it difficult to read your own
.py modules, but Datalore shines on this.
Workspace files are easy to share with the collaborators.
Special notebook functionalities
The notebook view of Datalore has several interesting add-ins compared to default Jupyter.
The interface has special cell types for KPIs, dates. Pandas DataFrames can be sorted and filtered convniently directly in the UI.
The report builder view might change how we think about business intelligence. It gives data scientist fasta track to showcase the results for their audiences.
Parallel computation for large datasets
Datalore does not support parallel computation frameworks such as Dask or Spark.
However, Datalore has integration to Google BigQuery which is one way of preparing larger datasets. You can directly run SQL in the notebooks, so heavy lifting is relatively easy to do on database side.
Programming languages supported by Datalore
R and Scala have been added recently.
Datalore is the only data science platform that supports Kotlin ! Maybe because Kotlin was originally developed by JetBrains…
Issues with Datalore
After signing up to the platform I received an error:
No license key found. When trying to run notebooks I got another error:
Can't start the machine. For example the Plans tab was not showing in the Account settings.
The issue was fixed swiftly by Datalore customer service. The reason was that my email was tagged as a disposable one. Maybe because my email servers are not on Gmail or Outlook, but in German mailbox.org partly configured on my own.
Summary of Datalore
Datalore is a generic data science platform. It provides essential features to work with notebooks and collaborate with other team members. It comes with some fancy notebook utilities.
Code auto-completion is something that is missing from many other ML platforms. It might have surprisingly big impact to development experience.