Kubernetes has been everywhere lately. Especially in the context of MLOps to manage the plethora of different tasks such as training, serving and registering the models.
I have written multiple blog posts about machine learning (ML) engineering and machine learning platforms. Those systems are usually target to productionize ML solutions, are somewhat big investments and focus on managing the whole ML lifecycle.
This blog post compares machine learning platforms from major cloud providers Azure, AWS and Google Cloud. Also Databricks platform has been included.
Machine learning is going towards the direction where data scientist does the creative work and ML platform takes care of unpleasant process management.
Predictive maintenance aims to repair the equipment before the failure actually happens. Scheduled maintenances minimize the production downtime especially in industrial companies.
I wrote to my previous employer’s blog about an experiment where I tried to fake my geographical location to a web service.
Working the past few years in both data science and data engineering projects, I have gained pretty good understanding to answer that question.
DataCamp is an online learning platform for data science. The data science course catalog contains wide selection of Python, R, SQL and Excel videos and assignments.
On last fall I wrote about the PySpark framework at my previous employer’s blog. As the name indicates, the topic is extremely technical.
Clustering time series data with SQL. The purpose of this experiment was to prove that doing data science doesn’t always require fancy tools.
Go to Spark + Python tutorial in AWS Glue in Solita’s data blog. Spark and parallel computing A shop cashier can only serve a limited amount of customers at a given time.
Amazon Web Services (AWS) cloud computing platform consists of many individual services. Each of them solves a single well defined problem.
I wrote to Solita’s Data blog about text analytics with the headline Finnish stemming and lemmatization in python. Read the writing here .
I wrote to Solita’s data blog about a text analytics project. The goal was to automate manual classification of funding applications.
You can find the article from Solita’s data related blog site data.solita.fi . Finally I managed to publish my blog post with the topic A Machine Learning Example For Business.
This blog has been published in the blog of my employer Solita. Read here how our team won the competition.
A datawarehouse project required documentation for incoming CSV-files. The intent was to list all header fields of tens of CSV files grouped by the file name.
It is actually possible to make your living by doing sports betting. This blog is not sponsored - these are my own experiences.
The built-in dataset quakes in RStudio had 1000 records of earthquakes nearby Fiji. The first year of observations is 1964 but the last year remains unknown.
This imaginary problem does not rely on any real situation. A virus is spreading across the world - it kills without treatment.
It’s easy to spot these hype terms like data science, big data in LinkedIn or exhibition posters. I summarized the definitions of most frequently used buzz words.
Django is a web framework for Python programming language which in practise means well designed folder structure and pre-made class modules for most common functionalities in web service.