PySpark execution logic and code optimization
The article goes through the PySpark execution logic and provides guidelines to optimize the speed and performance.
Spark + Python tutorial for data developers
A tutorial for parallel computation with Spark and Python. The example has been ran on AWS cloud computing platform.
Introduction to AWS Glue for big data ETL
AWS Glue service works especially well for big data batch processing. Read the full post from data.solita.fi.
Excel Power Map - Spatial data visualization as a time series
Excel Power Map is designed to visualize spatial data. Watch the demo video about visualizing annual asylum seeker data.
Finnish stemming and lemmatization in python
I wrote to Solita's blog about text analytics with the headline "Finnish stemming and lemmatization in python". The post has code examples.
Blogging about professional topics - Experiences and tips
Blogging about professional topics is an excellent way to increase visibility for yourself and your company. Read the tips and experiences!
Experiences from funding application classification by text analytics
Experiences from funding application classification by text analytics
Combining machine learning and business - Practical example
I give an example about machine learning use case in a format that should be understandable also for less technical people.
Maximizing uptime in Hiab hackathon
Read about Solita team's solution in a hackathon organized by Hiab. The task was to take advantage of data to maximize machine uptime.