Prerequirements

The Spatial Data Science with High Performance Computing (HPC) website is designed for students, professionals, and especially personnel of higher-education institutions in Finland who are working with CSC Finland’s supercomputers and interested to use the computational resources in Parallel for Spatial Data Science processes with Python. Although the website has a focus on utilizing CSC Finland’s computing resources, the parallelization operations work similarly on other computing clusters as well. Thus, the lessons can be useful for anyone who has access to some sort of computing cluster to distribute their computations.

The website is composed of a short Introduction to CSC’s resources and followed by a Getting started section where you will find briefly how to set up a HPC environment for further processes (only applicable for CSC Finland’s clients). Then, short examples of how to use the HPC storage in case you want to use it straightforward. Finally, the website contains various Lessons with different Spatial Data Processes using HPC resources in parallel showing how supercomputers can facilitate your work using specific Python programing tools designed for parallel processing.

The resources on this website are part of Geoportti Research Infrastructure which is a shared service for researchers, teachers, and students using geospatial data and geocomputing tools. Geoportti RI helps the researchers in Finland to use, to refine, to preserve, and to share their geospatial resources. Learn more about Geoportti RI services.

Prerequirements#

The HPC Lessons requires previous knowledge in spatial algorithms (Spatial Analysis) and some programing skills. In a more specific specific way as next:

Medium/Advanced Python programming skills
Basic knowledge on Jupyter Lab
Basic spatial algorithms knowledge
Basic CSC’s set up knowledge especifically for Puhti supercomputer (Optional)

The instructions about how to access and set up Puhti supercomputer are included so in case you haven’t used it before you will be able to.

Course format#

The course start with Introduction for theoretical overview of HPC resources and it continues with a Getting started section that will show you how to access and create an online session in you browser for using the HPC computational resources for the Lessons. Then, it has a previous overview of how to manage Allas HPC storage in case you need it for your project. and then the Lessons starts where you can see an overview on the web.

Running Lessons#

To download all the materials on this website, you can use Git to clone the repository:

$ git clone https://github.com/AaltoGIS/GeoHPC.git

Alternatively, you can download the files as a ZipFile. The coding material is located under the folder GeoHPC/source/lessons.

GIS libraries#

To start running the Lessons, you need the preinstalled environment geoconda which contains all the necessary GIS libraries. You can activate the environment in your Juyter Lab Session. Find the instructions in the section Using GIS libraries on Puhti.

For those cases where your analysis requires specialized libraries that are not included in the geoconda environment, you can create your own customized environment. Find the instructions in the section Installing customized HPC environment.

Content#

Find a detailed structure of the website here.

Presentations

Webinars

GeoHPC Webinar 2024

Getting started

HPC storage

Lesson 3 - SYKE

Lesson 4 - FGI

Lesson 5 - Overture Maps