EN FI SV
I am preparing for Google Cloud Professional Machine Leargning Engineer certification  .
The certification is mandatory in the project I am working on.

Google Cloud ML Engineer certification - Training and preparing

I am preparing for Google Cloud Professional Machine Leargning Engineer certification . The certification is mandatory in the project I am working on.

Kubernetes have been everywhere lately. Especially in the context of MLOps. I gave it a try by creating web app with Python Flask.

Running Flask frontend and backend in Kubernetes

Kubernetes has been everywhere lately. Especially in the context of MLOps to manage the plethora of different tasks such as training, serving and registering the models.

Google Colab, Databricks Community Edition, Visual Studio Code and Dcoker are some options to create a free data science workspace.

Free data science workspaces

I have written multiple blog posts about machine learning (ML) engineering and machine learning platforms. Those systems are usually target to productionize ML solutions, are somewhat big investments and focus on managing the whole ML lifecycle.

Comparing the major machine learning platforms AWS SageMaker, Azure Machine Learning, Google Vertex AI and Databricks.

Comparison of machine learning platforms in major clouds

This blog post compares machine learning platforms from major cloud providers Azure, AWS and Google Cloud. Also Databricks platform has been included.

What is a machine learning platform? Introducing different components such as workbench, MLOps tools and cloud computation.

What is a machine learning platform?

Machine learning is going towards the direction where data scientist does the creative work and ML platform takes care of unpleasant process management.

Machine learning in predictive maintenance. The two-part blog series provides insights for cost savings and an example script in Python.

Machine learning in predictive maintenance

Predictive maintenance aims to repair the equipment before the failure actually happens. Scheduled maintenances minimize the production downtime especially in industrial companies.

How to  fool a web service about your actual location? In an experiment I pretended being in Ireland while traveling in Sweden.

Faking your geographical location to a web service - A hobby project

I wrote to my previous employer’s blog about an experiment where I tried to fake my geographical location to a web service.

In my opinion the big difference is that a data scientist focuses more on business problems while data engineer solves technical problems.

Difference between data scientist and data engineer roles

Working the past few years in both data science and data engineering projects, I have gained pretty good understanding to answer that question.

Experiences from DataCamp online training. Structured data science courses are easy to organize for yourself or a team.

DataCamp - Learn data science online

DataCamp is an online learning platform for data science. The data science course catalog contains wide selection of Python, R, SQL and Excel videos and assignments.

The article goes through the PySpark execution logic and provides guidelines to optimize the speed and performance.

PySpark execution logic and code optimization

On last fall I wrote about the PySpark framework at my previous employer’s blog. As the name indicates, the topic is extremely technical.

Clustering time series data with SQL - Nice 3D visualization using simple logic. Python notebook example in GitHub with industrial data.

Clustering data using SQL - An example with industrial IoT data

Clustering time series data with SQL. The purpose of this experiment was to prove that doing data science doesn’t always require fancy tools.

A tutorial for parallel computation with Spark and Python. The example has been ran on AWS cloud computing platform.

Spark + Python tutorial for data developers

Go to Spark + Python tutorial in AWS Glue in Solita’s data blog. Spark and parallel computing A shop cashier can only serve a limited amount of customers at a given time.

AWS Glue service works especially well for big data batch processing. Read the full post from data.solita.fi.

Introduction to AWS Glue for big data ETL

Amazon Web Services (AWS) cloud computing platform consists of many individual services. Each of them solves a single well defined problem.

I wrote to Solita's blog about text analytics with the headline "Finnish stemming and lemmatization in python". The post has code examples.

Finnish stemming and lemmatization in python

I wrote to Solita’s Data blog about text analytics with the headline Finnish stemming and lemmatization in python. Read the writing here .

Experiences from funding application classification by text analytics

Experiences from funding application classification by text analytics

I wrote to Solita’s data blog about a text analytics project. The goal was to automate manual classification of funding applications.

I give an example about machine learning use case in a format that should be understandable also for less technical people.

Combining machine learning and business - Practical example

You can find the article from Solita’s data related blog site data.solita.fi . Finally I managed to publish my blog post with the topic A Machine Learning Example For Business.

Read about Solita team's solution in a hackathon organized by Hiab. The task was to take advantage of data to maximize machine uptime.

Maximizing uptime in Hiab hackathon

This blog has been published in the blog of my employer Solita. Read here how our team won the competition.

Python code to automatically list header fields of multiple CSV files. The original use case was related to data warehouse documentation.

Csv headers to list using Python

A datawarehouse project required documentation for incoming CSV-files. The intent was to list all header fields of tens of CSV files grouped by the file name.

You can make the living by sports betting. The blog is not sponsored as I'm sharing my own experiences. Read the tutorial.

Sports betting tutorial - Can you make the living?

It is actually possible to make your living by doing sports betting. This blog is not sponsored - these are my own experiences.

NaN

Visualization and clustering of earthquake dataset

The built-in dataset quakes in RStudio had 1000 records of earthquakes nearby Fiji. The first year of observations is 1964 but the last year remains unknown.

The problem: A virus is spreading across the world - it kills without treatment. Your task is to solve a statistical puzzle.

Virus problem - A statistical puzzle

This imaginary problem does not rely on any real situation. A virus is spreading across the world - it kills without treatment.

It's easy to spot these hype terms like data science, big data in LinkedIn or exhibition posters. I summarized the definitions.

Data science and business intelligence - Definitions

It’s easy to spot these hype terms like data science, big data in LinkedIn or exhibition posters. I summarized the definitions of most frequently used buzz words.

Python based Django web framework offers a great platform to create a data oriented web application for any size of needs.

Django tutorial - For data oriented web developers

Django is a web framework for Python programming language which in practise means well designed folder structure and pre-made class modules for most common functionalities in web service.