EN FI SV
Paperspace Gradient machine learning platform is best known from extensive GPU support. They have recently partnered with Graphcore to provide new generation Intelligence Processing Units  .

Paperspace Gradient - ML platform with their own data centers and IPU processors

Paperspace Gradient machine learning platform is best known from extensive GPU support. They have recently partnered with Graphcore to provide new generation Intelligence Processing Units .

Saturn Cloud is a greate choice for data science teams who want to maximize flexibility of their environment. Integrated parallel processing with Dask differentiates it from the competitors.

Saturn Cloud - Data science workspace with Dask cluster

Saturn Cloud is a greate choice for data science teams who want to maximize flexibility of their environment. Integrated parallel processing with Dask differentiates it from the competitors.

Datalore   is collaborative data science platform from Jetbrains. The notebook experience has been taken to the next level. The company is best known for its Python IDE PyCharm.

Datalore - Collaborative data science platform with great notebook experience

Datalore is collaborative data science platform from Jetbrains. The notebook experience has been taken to the next level. The company is best known for its Python IDE PyCharm.

Bodo is a platform for data processing with Python and SQL. It is especially suitable for large datasets thanks to its unique parallel processing technology.

Bodo is a faster alternative for Spark to run massive ETL jobs in Python

Bodo is a platform for data processing with Python and SQL. It is especially suitable for large datasets thanks to its unique parallel processing technology.

User-Managed notebooks in Vertex AI are virtual workspaces for data exploration. But they lack automatic shutdown after being idle for specific amount of time.

Vertex AI User-Managed notebooks auto shutdown

User-Managed notebooks in Vertex AI are virtual workspaces for data exploration. But they lack automatic shutdown after being idle for specific amount of time.

Around 30 questions I memorize from the Google Cloud Professional Machine Learning certification exam. You find all sources for exam training questions from my preparation tips.

30 questions for Google Cloud Professional Machine Learning Engineer exam

Around 30 questions I memorize from the Google Cloud Professional Machine Learning certification exam. You find all sources for exam training questions from my preparation tips.

After 4 months of intense studying I passed the Google Cloud certification for Professional Machine Learning Engineer!
Google Cloud Platform certificates are considered to be challenging compared to other cloud platforms  .

I became a certified Google Cloud Professional Machine Learning Engineer!

After 4 months of intense studying I passed the Google Cloud certification for Professional Machine Learning Engineer! Google Cloud Platform certificates are considered to be challenging compared to other cloud platforms .

Google Cloud Platform has excellent toolset to operationalize and productionize machine learning models.
Vertex AI is the key MLOps product while Google Kubernetes Engine is valid alternative for custom workflows.

MLOps in Google Cloud

Google Cloud Platform has excellent toolset to operationalize and productionize machine learning models. Vertex AI is the key MLOps product while Google Kubernetes Engine is valid alternative for custom workflows.

Natural Language Processing (NLP) refers to tools and methods to explore text data as well as identifiy patterns and making predictions.

Neural networks for natural language processing

Natural Language Processing (NLP) refers to tools and methods to explore text data as well as identifiy patterns and making predictions.

Some notes about image recognition while preparing for Google Cloud MLE certification.

Neural networks for image recognition

Some notes about image recognition while preparing for Google Cloud MLE certification.

Keras is one of the high level APIs in Tensorflow deep learning stack. It is the recommended framework to get started with neural networks, if you do not have special requirements.

Keras for basic neural networks

Keras is one of the high level APIs in Tensorflow deep learning stack. It is the recommended framework to get started with neural networks, if you do not have special requirements.

Tensorflow Extended (known as TFX) is a framework to define ML pipelines. The extensions are obviously compatible with the core Tensorflow.

Tensorflow Extended (TFX) for MLOps

Tensorflow Extended (known as TFX) is a framework to define ML pipelines. The extensions are obviously compatible with the core Tensorflow.

I have more experience from Pandas and Scikit-Learn Python libraries compared to Tensorflow. I was surprised how large the Tensorflow ecosystem with its ML engineering extensions.

Tensorflow for ML Engineers

I have more experience from Pandas and Scikit-Learn Python libraries compared to Tensorflow. I was surprised how large the Tensorflow ecosystem with its ML engineering extensions.

Recommendation systems are useful to personalize experience and find relevant items among huge catalogs.
Recommendations have became signigficant sub topic of machine learning.

Recommendation systems in Google Cloud

Recommendation systems are useful to personalize experience and find relevant items among huge catalogs. Recommendations have became signigficant sub topic of machine learning.

I heard about artificial neural networks first time around 2017. Since then I have tried to understand their behavior and explain them in a simple way.

Introduction to neural networks for ML

I heard about artificial neural networks first time around 2017. Since then I have tried to understand their behavior and explain them in a simple way.

Dataflow product in Google Cloud is mandatory for advanced data processing pipelines for machine learning solutions.
It performs typical data engineering tasks by allowing same code to execute both batch and streaming.

Dataflow for ML Engineers in Google Cloud

Dataflow product in Google Cloud is mandatory for advanced data processing pipelines for machine learning solutions. It performs typical data engineering tasks by allowing same code to execute both batch and streaming.

Notes about fundamental ML concepts for Google Cloud ML Engineering certification.
Data exploration Perform mainly univariate and bivariate analysis during initial exploration.

Machine learning fundamentals

Notes about fundamental ML concepts for Google Cloud ML Engineering certification. Data exploration Perform mainly univariate and bivariate analysis during initial exploration.

Some Google materials refer to it as Fully managed Tensorflow.
Vertex AI assumes that data is prepared elsewhere before training the model.

Vertex AI for ML Engineers in Google Cloud

Some Google materials refer to it as Fully managed Tensorflow. Vertex AI assumes that data is prepared elsewhere before training the model.

BigQuery is by far the most important storage and processing service in Google Cloud from ML perspective.
It has many integrated functionalities for ML.

BigQuery for ML Engineers in Google Cloud

BigQuery is by far the most important storage and processing service in Google Cloud from ML perspective. It has many integrated functionalities for ML.

This is a summary of Google Cloud Platform (GCP) products relevant for Machine Learning Engineer role.
Google philosphy seems to be that moving to their platform requires minimal changes to the existing solution.

Machine learning products in Google Cloud

This is a summary of Google Cloud Platform (GCP) products relevant for Machine Learning Engineer role. Google philosphy seems to be that moving to their platform requires minimal changes to the existing solution.

Kubernetes have been everywhere lately. Especially in the context of MLOps. I gave it a try by creating web app with Python Flask.

Running Flask frontend and backend in Kubernetes

Kubernetes have been everywhere lately. Especially in the context of MLOps. I gave it a try by creating web app with Python Flask.

Google Colab, Databricks Community Edition, Visual Studio Code and Dcoker are some options to create a free data science workspace.

Free data science workspaces

Google Colab, Databricks Community Edition, Visual Studio Code and Dcoker are some options to create a free data science workspace.

Comparing the major machine learning platforms AWS SageMaker, Azure Machine Learning, Google Vertex AI and Databricks.

Comparison of machine learning platforms in major clouds

Comparing the major machine learning platforms AWS SageMaker, Azure Machine Learning, Google Vertex AI and Databricks.

What is a machine learning platform? Introducing different components such as workbench, MLOps tools and cloud computation.

What is a machine learning platform?

What is a machine learning platform? Introducing different components such as workbench, MLOps tools and cloud computation.

Machine learning in predictive maintenance. The two-part blog series provides insights for cost savings and an example script in Python.

Machine learning in predictive maintenance

Machine learning in predictive maintenance. The two-part blog series provides insights for cost savings and an example script in Python.

How to  fool a web service about your actual location? In an experiment I pretended being in Ireland while traveling in Sweden.

Faking your geographical location to a web service - A hobby project

How to fool a web service about your actual location? In an experiment I pretended being in Ireland while traveling in Sweden.

In my opinion the big difference is that a data scientist focuses more on business problems while data engineer solves technical problems.

Difference between data scientist and data engineer roles

In my opinion the big difference is that a data scientist focuses more on business problems while data engineer solves technical problems.

Experiences from DataCamp online training. Structured data science courses are easy to organize for yourself or a team.

DataCamp - Learn data science online

Experiences from DataCamp online training. Structured data science courses are easy to organize for yourself or a team.

The article goes through the PySpark execution logic and provides guidelines to optimize the speed and performance.

PySpark execution logic and code optimization

The article goes through the PySpark execution logic and provides guidelines to optimize the speed and performance.

Clustering time series data with SQL - Nice 3D visualization using simple logic. Python notebook example in GitHub with industrial data.

Clustering data using SQL - An example with industrial IoT data

Clustering time series data with SQL - Nice 3D visualization using simple logic. Python notebook example in GitHub with industrial data.

A tutorial for parallel computation with Spark and Python. The example has been ran on AWS cloud computing platform.

Spark + Python tutorial for data developers

A tutorial for parallel computation with Spark and Python. The example has been ran on AWS cloud computing platform.

AWS Glue service works especially well for big data batch processing. Read the full post from data.solita.fi.

Introduction to AWS Glue for big data ETL

AWS Glue service works especially well for big data batch processing. Read the full post from data.solita.fi.

I wrote to Solita's blog about text analytics with the headline "Finnish stemming and lemmatization in python". The post has code examples.

Finnish stemming and lemmatization in python

I wrote to Solita's blog about text analytics with the headline "Finnish stemming and lemmatization in python". The post has code examples.

Experiences from funding application classification by text analytics

Experiences from funding application classification by text analytics

Experiences from funding application classification by text analytics

I give an example about machine learning use case in a format that should be understandable also for less technical people.

Combining machine learning and business - Practical example

I give an example about machine learning use case in a format that should be understandable also for less technical people.

Read about Solita team's solution in a hackathon organized by Hiab. The task was to take advantage of data to maximize machine uptime.

Maximizing uptime in Hiab hackathon

Read about Solita team's solution in a hackathon organized by Hiab. The task was to take advantage of data to maximize machine uptime.

Python code to automatically list header fields of multiple CSV files. The original use case was related to data warehouse documentation.

Csv headers to list using Python

Python code to automatically list header fields of multiple CSV files. The original use case was related to data warehouse documentation.

You can make the living by sports betting. The blog is not sponsored as I'm sharing my own experiences. Read the tutorial.

Sports betting tutorial - Can you make the living?

You can make the living by sports betting. The blog is not sponsored as I'm sharing my own experiences. Read the tutorial.

The built-in dataset quakes in RStudio had 1000 records of earthquakes nearby Fiji. The first year of observations is 1964 but the last year remains unknown.

Visualization and clustering of earthquake dataset

The built-in dataset quakes in RStudio had 1000 records of earthquakes nearby Fiji. The first year of observations is 1964 but the last year remains unknown.

The problem: A virus is spreading across the world - it kills without treatment. Your task is to solve a statistical puzzle.

Virus problem - A statistical puzzle

The problem: A virus is spreading across the world - it kills without treatment. Your task is to solve a statistical puzzle.

It's easy to spot these hype terms like data science, big data in LinkedIn or exhibition posters. I summarized the definitions.

Data science and business intelligence - Definitions

It's easy to spot these hype terms like data science, big data in LinkedIn or exhibition posters. I summarized the definitions.

Python based Django web framework offers a great platform to create a data oriented web application for any size of needs.

Django tutorial - For data oriented web developers

Python based Django web framework offers a great platform to create a data oriented web application for any size of needs.