
The Spatial Data Science with High Performance Computing (HPC) website is designed for students, professionals, and especially personnel of higher-education institutions in Finland who are working with CSC Finland’s supercomputers and interested to use the computational resources in Parallel for Spatial Data Science processes with Python. Although the website has a focus on utilizing CSC Finland’s computing resources, the parallelization operations work similarly on other computing clusters as well. Thus, the lessons can be useful for anyone who has access to some sort of computing cluster to distribute their computations.
The website is composed of a short Introduction to CSC’s resources and followed by a Getting started section where you will find briefly how to set up a HPC environment for further processes (only applicable for CSC Finland’s clients). Then, short examples of how to use the HPC storage in case you want to use it straightforward. Finally, the website contains various Lessons with different Spatial Data Processes using HPC resources in parallel showing how supercomputers can facilitate your work using specific Python programing tools designed for parallel processing.
The resources on this website are part of Geoportti Research Infrastructure which is a shared service for researchers, teachers, and students using geospatial data and geocomputing tools. Geoportti RI helps the researchers in Finland to use, to refine, to preserve, and to share their geospatial resources. Learn more about Geoportti RI services.
Prerequirements#
The HPC Lessons requires previous knowledge in spatial algorithms (Spatial Analysis) and some programing skills. In a more specific specific way as next:
Medium/Advanced Python programming skills
Basic knowledge on Jupyter Lab
Basic spatial algorithms knowledge
Basic CSC’s set up knowledge especifically for Puhti supercomputer (Optional)
The instructions about how to access and set up Puhti supercomputer are included so in case you haven’t used it before you will be able to.
Course format#
The course start with Introduction for theoretical overview of HPC resources and it continues with a Getting started section that will show you how to access and create an online session in you browser for using the HPC computational resources for the Lessons. Then, it has a previous overview of how to manage Allas HPC storage in case you need it for your project. and then the Lessons starts where you can see an overview on the web.
Running Lessons#
To download all the materials on this website, you can use Git to clone the repository:
$ git clone https://github.com/AaltoGIS/GeoHPC.git
Alternatively, you can download the files as a ZipFile.
Once you have cloned and installed the environment specified in the section Installing customized HPC environment you will be able to run the Lessons located under the folder:
GeoHPC/source/lessons
Simply open every notebook and follow the instructions cell by cell.
Content#
Find a detailed structure of the website here.
Presentations
Webinars
Getting started
HPC storage
Lesson 1
Lesson 2
- Lesson 2. Shortest Path
- Shortest Path (Dijkstra’s) in OSM driving network between residential buildings and Rautatieasema
- Hands-on coding
Lesson 3 - SYKE
- Lesson 3. Land Cover Classification
- Land Cover classification using Random Forest with in-situ and EO data in Finland
- Import needed Python libraries
- Import Point-EO from local
- Download Lucas 2018 from Allas
- Lucas 2018 in Finland
- Lucas 2018 in Lapland
- Download input rasters from Allas
- Using GeoCubes for downloading data (Optional)
- Split multi-band rasters to single-band (Optional)
- Visualization of input raster - NDVI example
- Land Use Class frequency visualization
- Create a datacube
- Store bands for point-eo sampling script
- Sample raster using point-eo for sampling
- Training with Random Forest model
- Parallelization of Random Forest
- Visualization of the final result
Lesson 4 - FGI
- Lesson 4 - Merging Tiled Vectors
- Hands on coding (Local)
- Step 1: count
KOHDEOSO
attribute value occurrences - Step 2: merging features and creating joined layers
- Parallelization
- Moving to Puhti (HPC)
- Clone repository
- Set up HOME folder and data
- Set up settings file
- Writing a serial job file
- About python env
- 1 List files
RUN sbatch job_list_files.sh
- Writing a parallel job file
- 2 Create partial index
RUN sbatch job_create_partial_index.sh
- 3 Join partial index
RUN sbatch job_join_partial_index.sh
- 4 Create partial layer
RUN sbatch job_create_partial_layer.sh
- 5 Join partial layer
RUN sbatch job_join_partial_layer.sh
- Wrap-up