Running Flask frontend and backend in Kubernetes
Kubernetes has been everywhere lately. Especially in the context of MLOps to manage the plethora of different tasks such as training, serving and registering the models.
Free data science workspaces
I have written multiple blog posts about machine learning (ML) engineering and machine learning platforms. Those systems are usually target to productionize ML solutions, are somewhat big investments and focus on managing the whole ML lifecycle.
Comparison of machine learning platforms in major clouds
This blog post compares machine learning platforms from major cloud providers Azure, AWS and Google Cloud. Also Databricks platform has been included.
What is a machine learning platform?
Machine learning is going towards the direction where data scientist does the creative work and ML platform takes care of unpleasant process management.
Machine learning in predictive maintenance
Predictive maintenance aims to repair the equipment before the failure actually happens. Scheduled maintenances minimize the production downtime especially in industrial companies.
Faking your geographical location to a web service - A hobby project
I wrote to my previous employer’s blog about an experiment where I tried to fake my geographical location to a web service.
Difference between data scientist and data engineer roles
Working the past few years in both data science and data engineering projects, I have gained pretty good understanding to answer that question.
DataCamp - Learn data science online
DataCamp is an online learning platform for data science. The data science course catalog contains wide selection of Python, R, SQL and Excel videos and assignments.
PySpark execution logic and code optimization
On last fall I wrote about the PySpark framework at my previous employer’s blog. As the name indicates, the topic is extremely technical.
Clustering data using SQL - An example with industrial IoT data
Clustering time series data with SQL. The purpose of this experiment was to prove that doing data science doesn’t always require fancy tools.
Spark + Python tutorial for data developers
Go to Spark + Python tutorial in AWS Glue in Solita’s data blog. Spark and parallel computing A shop cashier can only serve a limited amount of customers at a given time.
Introduction to AWS Glue for big data ETL
Amazon Web Services (AWS) cloud computing platform consists of many individual services. Each of them solves a single well defined problem.
Finnish stemming and lemmatization in python
I wrote to Solita’s Data blog about text analytics with the headline Finnish stemming and lemmatization in python. Read the writing here .
Experiences from funding application classification by text analytics
I wrote to Solita’s data blog about a text analytics project. The goal was to automate manual classification of funding applications.
Combining machine learning and business - Practical example
You can find the article from Solita’s data related blog site data.solita.fi . Finally I managed to publish my blog post with the topic A Machine Learning Example For Business.
Maximizing uptime in Hiab hackathon
This blog has been published in the blog of my employer Solita. Read here how our team won the competition.
Csv headers to list using Python
A datawarehouse project required documentation for incoming CSV-files. The intent was to list all header fields of tens of CSV files grouped by the file name.
Sports betting tutorial - Can you make the living?
It is actually possible to make your living by doing sports betting. This blog is not sponsored - these are my own experiences.
Visualization and clustering of earthquake dataset
The built-in dataset quakes in RStudio had 1000 records of earthquakes nearby Fiji. The first year of observations is 1964 but the last year remains unknown.
Virus problem - A statistical puzzle
This imaginary problem does not rely on any real situation. A virus is spreading across the world - it kills without treatment.
Data science and business intelligence - Definitions
It’s easy to spot these hype terms like data science, big data in LinkedIn or exhibition posters. I summarized the definitions of most frequently used buzz words.
Django tutorial - For data oriented web developers
Django is a web framework for Python programming language which in practise means well designed folder structure and pre-made class modules for most common functionalities in web service.