PySpark execution logic and code optimization

The article goes through the PySpark execution logic and provides guidelines to optimize the speed and performance.

Spark + Python tutorial for data developers

A tutorial for parallel computation with Spark and Python. The example has been ran on AWS cloud computing platform.

Introduction to AWS Glue for big data ETL

AWS Glue service works especially well for big data batch processing. Read the full post from data.solita.fi.

Excel Power Map - Spatial data visualization as a time series

Excel Power Map is designed to visualize spatial data. Watch the demo video about visualizing annual asylum seeker data.

Finnish stemming and lemmatization in python

I wrote to Solita's blog about text analytics with the headline "Finnish stemming and lemmatization in python". The post has code examples.

Blogging about professional topics - Experiences and tips

Blogging about professional topics is an excellent way to increase visibility for yourself and your company. Read the tips and experiences!

Experiences from funding application classification by text analytics

Experiences from funding application classification by text analytics

Combining machine learning and business - Practical example

I give an example about machine learning use case in a format that should be understandable also for less technical people.

Maximizing uptime in Hiab hackathon

Read about Solita team's solution in a hackathon organized by Hiab. The task was to take advantage of data to maximize machine uptime.