Google Colab, Databricks Community Edition, Visual Studio Code and Dcoker are some options to create a free data science workspace.
Comparing the major machine learning platforms AWS SageMaker, Azure Machine Learning, Google Vertex AI and Databricks.
What is a machine learning platform? Introducing different components such as workbench, MLOps tools and cloud computation.
Machine learning in predictive maintenance. The two-part blog series provides insights for cost savings and an example script in Python.
How to fool a web service about your actual location? In an experiment I pretended being in Ireland while traveling in Sweden.
In my opinion the big difference is that a data scientist focuses more on business problems while data engineer solves technical problems.
Experiences from DataCamp online training. Structured data science courses are easy to organize for yourself or a team.
The article goes through the PySpark execution logic and provides guidelines to optimize the speed and performance.
Clustering time series data with SQL – Nice 3D visualization using simple logic. Python notebook example in GitHub with industrial data.
A tutorial for parallel computation with Spark and Python. The example has been ran on AWS cloud computing platform.
AWS Glue service works especially well for big data batch processing. Read the full post from data.solita.fi.
Excel Power Map is designed to visualize spatial data. Watch the demo video about visualizing annual asylum seeker data.
I wrote to Solita’s blog about text analytics with the headline “Finnish stemming and lemmatization in python”. The post has code examples.
Experiences from funding application classification by text analytics
You can find the article from Solita’s data related blog site data.solita.fi. Finally I managed to publish my blog post with the topic A Machine […]
Read about Solita team’s solution in a hackathon organized by Hiab. The task was to take advantage of data to maximize machine uptime.
Unpivoting columns to rows with Excel PowerQuery. Watch 30 seconds video how to do it without any formulas.
Python code to automatically list header fields of multiple CSV files. The original use case was related to data warehouse documentation.
Parsing first name, last name and company from email in Excel Do you have a list of emails that you want to split by first […]
It is actually possible to make your living by doing sports betting. This blog is not sponsored – these are my own experiences. Betting – […]
The built-in dataset quakes in RStudio had 1000 records of earthquakes nearby Fiji. The first year of observations is 1964 but the last year remains […]
This imaginary problem does not rely on any real situation. A virus is spreading across the world – it kills without treatment. A medicine does exist […]
Next I will introduxe briefly how Power tools in Microsoft Excel Power BI family are related to each other. That way you quickly realize whether […]
It’s easy to spot these hype terms like data science, big data in LinkedIn or exhibition posters. I summarized the definitions of most frequently used […]
Django is a web framework for Python programming language which in practise means well designed folder structure and pre-made class modules for most common functionalities in […]