Data Science

Quantum computer companies

30 November 2024 5 min Data science

In this article about quantum computing it is time to see what kind of companies exists in the ecosystem.

Quantum basics explained with a billiards example

3 November 2024 4 min Data science

Quantum basics explained with a billiards example. How quantum state, measurement and entanglement would look like in physical world?

How to copy and paste text in Datalore terminal?

6 October 2023 1 min Data science

Datalore is an online data science environment. Typical CTRL+C and CTRL+V commands do not work in the Datalore terminal, so here is the solution.

List of business intelligence tools

13 September 2023 3 min Data science

List of business intelligence, reporting and data pipeline tools. Reporting tools Reporting and business intelligence tools.

Change Python version in Vertex AI

6 September 2023 3 min Data science

Vertex AI is a bare-bones analytics environment in Google Cloud. As simple tasks as changing the Python version requires multiple steps.

Google Colab - Easily accessible Python workspace

30 August 2023 3 min Data science

Google Colab is a low barrier option to run Python scripts. Here is a brief introduction for the essential features.

Datalore tech review

28 August 2023 6 min Data science

Datalore is a collaborative data science platform. The notebook experience has been taken to the next level.

Datalore pricing - Which licensing model to choose?

26 August 2023 3 min Data science

The online Python worskpace Datalore has three main models for pricing and licensing. At the same time they provide logical path towards building you company’s data analysis ecosystem.

Wealth management web app - Technical implementation

15 July 2023 5 min Data science

Here is a presentation of the wealth management app I have developed.

Public report in Looker Studio requires login - Instructions to solve

25 April 2023 3 min Data science 4

Looker Studio is a free reporting and business intelligence tool in Google Cloud. Read this, if you are redirected to Google account login page after trying to share a Looker Studio report publicly.

Types of data science platforms - Workspace, MLOps or full stack?

18 April 2023 2 min Data science

Data Science platforms can be categorized to a few different buckets:

What kind of teams benefit from data science platforms?

17 April 2023 2 min Data science

Various kinds of teams from business innovation to academic research can benefit from data science platforms. Here are some examples.

Value of data science platforms

16 April 2023 2 min Data science

Let’s go through the most typical use cases and their benefits to start using a data science platform.

List of data science platforms

15 April 2023 5 min Data science

Comprehensive list of data science platforms. Sometimes known also as machine learning platforms, ai platforms or DSML platforms.

ClearML - Robust MLOps platform for end-to-end solutions

14 April 2023 2 min Data science

Robust MLOps platform for end-to-end solutions.

cnvrg.io - Flexible Kubernetes deployments for advanced data science teams

13 April 2023 2 min Data science

For technically advanced teams looking for flexible Kubernetes deployments.

SmartPredict - Specific use cases with fully managed low code

12 April 2023 2 min Data science

Specific use cases with fully managed low code.

neptune.ai - Experiment tracking platform for MLOps

11 April 2023 2 min Data science

Log experiments and ML models versions from any environment.

Paperspace Gradient - ML platform with their own data centers and IPU processors

28 March 2023 2 min Data science

Paperspace Gradient machine learning platform is best known from extensive GPU support. They have recently partnered with Graphcore to provide new generation processors.

Saturn Cloud - Data science workspace with Dask cluster

20 March 2023 2 min Data science

Saturn Cloud is a greate choice for data science teams who want to maximize flexibility of their environment. Integrated parallel processing with Dask differentiates it from the competitors. With open source tools teams can design a workflows that fit best to their specific needs.

Datalore - Introduction to the advanced analytics platform

15 March 2023 1 min Data science

Datalore is a fairly recent online platform for advanced data analytics.

Bodo is a faster alternative for Spark to run massive ETL jobs in Python

14 March 2023 3 min Data science

Bodo is a platform for data processing with Python and SQL. It is especially suitable for large datasets thanks to its unique parallel processing technology.

Vertex AI User-Managed notebooks auto shutdown

13 March 2023 3 min Data science

User-Managed notebooks in Vertex AI are virtual workspaces for data exploration. But they lack automatic shutdown after being idle for specific amount of time.

30 questions for Google Cloud Professional Machine Learning Engineer exam

1 February 2023 9 min Data science

Around 30 questions I memorize from the Google Cloud Professional Machine Learning certification exam. You find all sources for exam training questions from my preparation tips.

I became a certified Google Cloud Professional Machine Learning Engineer!

13 January 2023 4 min Data science 1

After 4 months of intense studying I passed the Google Cloud certification for Professional Machine Learning Engineer!

MLOps in Google Cloud

11 January 2023 8 min Data science

Google Cloud Platform has excellent toolset to operationalize and productionize machine learning models.

Neural networks for natural language processing

10 January 2023 3 min Data science

Natural Language Processing (NLP) refers to tools and methods to explore text data as well as identifiy patterns and making predictions.

Neural networks for image recognition

9 January 2023 2 min Data science

Some notes about image recognition while preparing for Google Cloud MLE certification.

Keras for basic neural networks

7 January 2023 3 min Data science

Keras is one of the high level APIs in Tensorflow deep learning stack. It is the recommended framework to get started with neural networks, if you do not have special requirements.

Tensorflow Extended (TFX) for MLOps

6 January 2023 3 min Data science

Tensorflow Extended (known as TFX) is a framework to define ML pipelines. The extensions are obviously compatible with the core Tensorflow.

Tensorflow for ML Engineers

5 January 2023 7 min Data science

I have more experience from Pandas and Scikit-Learn Python libraries compared to Tensorflow. I was surprised how large the Tensorflow ecosystem with its ML engineering extensions.

Recommendation systems in Google Cloud

3 January 2023 3 min Data science

Recommendation systems are useful to personalize experience and find relevant items among huge catalogs.

Introduction to neural networks for ML

2 January 2023 8 min Data science

I heard about artificial neural networks first time around 2017. Since then I have tried to understand their behavior and explain them in a simple way.

Dataflow for ML Engineers in Google Cloud

1 January 2023 2 min Data science

Dataflow product in Google Cloud is mandatory for advanced data processing pipelines for machine learning solutions.

Machine learning fundamentals

28 December 2022 13 min Data science

Notes about fundamental ML concepts for Google Cloud ML Engineering certification.

Vertex AI for ML Engineers in Google Cloud

2 December 2022 11 min Data science

Some Google materials refer to it as Fully managed Tensorflow.

BigQuery for ML Engineers in Google Cloud

1 December 2022 5 min Data science

BigQuery is by far the most important storage and processing service in Google Cloud from ML perspective.

Machine learning products in Google Cloud

27 November 2022 7 min Data science

This is a summary of Google Cloud Platform (GCP) products relevant for Machine Learning Engineer role.

Running Flask frontend and backend in Kubernetes

28 May 2022 7 min Data science

Kubernetes have been everywhere lately. Especially in the context of MLOps. I gave it a try by creating web app with Python Flask.

Free data science workspaces

25 December 2021 4 min Data science

Google Colab, Databricks Community Edition, Visual Studio Code and Dcoker are some options to create a free data science workspace.

Comparison of machine learning platforms in major clouds

27 November 2021 10 min Data science

Comparing the major machine learning platforms AWS SageMaker, Azure Machine Learning, Google Vertex AI and Databricks.

What is a machine learning platform?

21 November 2021 5 min Data science

What is a machine learning platform? Introducing different components such as workbench, MLOps tools and cloud computation.

Machine learning in predictive maintenance

29 October 2020 1 min Data science

Machine learning in predictive maintenance. The two-part blog series provides insights for cost savings and an example script in Python.

Faking your geographical location to a web service - A hobby project

28 October 2020 1 min Data science

How to fool a web service about your actual location? In an experiment I pretended being in Ireland while traveling in Sweden.

Difference between data scientist and data engineer roles

17 August 2020 6 min Data science

In my opinion the big difference is that a data scientist focuses more on business problems while data engineer solves technical problems.

DataCamp - Learn data science online

13 August 2020 6 min Data science

Experiences from DataCamp online training. Structured data science courses are easy to organize for yourself or a team.

PySpark execution logic and code optimization

9 February 2020 1 min Data science

The article goes through the PySpark execution logic and provides guidelines to optimize the speed and performance.

Clustering data using SQL - An example with industrial IoT data

10 November 2019 2 min Data science

Clustering time series data with SQL - Nice 3D visualization using simple logic. Python notebook example in GitHub with industrial data.

Spark + Python tutorial for data developers

7 October 2019 2 min Data science

A tutorial for parallel computation with Spark and Python. The example has been ran on AWS cloud computing platform.

Introduction to AWS Glue for big data ETL

1 September 2019 1 min Data science

AWS Glue service works especially well for big data batch processing. Read the full post from data.solita.fi.

Finnish stemming and lemmatization in python

23 June 2019 1 min Data science

I wrote to Solita's blog about text analytics with the headline "Finnish stemming and lemmatization in python". The post has code examples.

Experiences from funding application classification by text analytics

27 January 2019 1 min Data science

Experiences from funding application classification by text analytics

Combining machine learning and business - Practical example

15 May 2018 1 min Data science

I give an example about machine learning use case in a format that should be understandable also for less technical people.

Maximizing uptime in Hiab hackathon

1 November 2017 1 min Data science

Read about Solita team's solution in a hackathon organized by Hiab. The task was to take advantage of data to maximize machine uptime.

Csv headers to list using Python

1 September 2017 1 min Data science

Python code to automatically list header fields of multiple CSV files. The original use case was related to data warehouse documentation.

Visualization and clustering of earthquake dataset

25 October 2016 2 min Data science 2

The built-in dataset quakes in RStudio had 1000 records of earthquakes nearby Fiji. The first year of observations is 1964 but the last year remains unknown. Essential variables to perform the visualization were map coordinates, the depth of the quake and the magnitude.