The Spatial Data Science with High Performance Computing (HPC) web is a course designed for students, professionals, and especially personnel of higher-education institutions in Finland who are working with CSC’s supercomputers and willing to use the computational resources in Parallel for Spatial Data Science processes with Python.
The course is composed by a short Introduction to CSC’s resources and followed by a Getting started section where you will find briefly how to set up a HPC environment for further processes. Then, short examples of how to use the HPC storage in case you want to use it straightforward. Finally, you will see separated Lessons with different Spatial Data Processes using HPC resources in Parallel showing how supercomputers can facilitate your work using the right Python programing tools.
Prerequirements#
The HPC Lessons requires previous knowledge in spatial algorithms (Spatial Analysis) and some programing skills. I a more specific specific way as next:
Medium/Advanced Python programming skills
Basic knowledge on Jupyter Lab
Basic spatial algorithms knowledge
Basic CSC’s set up knowledge especifically for Puhti supercomputer (Optional)
The intructions about how to access and set up Puhti supercomputer are included so in case you haven’t used it before you will be able to.
Course format#
The course start with Introduction for theoretical overview of HPC resources and it continues with a Getting started section that will show you how to acess and create an online session in you browser for using the HPC computational resources for the Lessons. Then, it has a previous overview of how to manage Allas HPC storage in case you need for your projects (nice to have), and then the Lessons starts where you can see an overview on the web.
Running Lessons#
Once you have cloned and installed the environment especified in the section Installing customized HPC environment you will be able to run the Lessons located under the folder:
GeoHPC/source/lessons
Simply open every notebook and follow the instructions cell by cell.
Content#
Find a detailed structure of the website here.
Course information
Getting started
HPC storage
Lesson 1
Lesson 2
- Lesson 2. Shortest Path
- Shortest Path (Dijkstra’s) in OSM driving network between residential buildings and Rautatieasema
- Hands-on coding
Lesson 3 - SYKE
- Lesson 3. Land Cover Classification
- Land Cover classification using Random Forest with in-situ and EO data in Finland
- Import needed Python libraries
- Import Point-EO from local
- Download Lucas 2018 from Allas
- Lucas 2018 in Finland
- Lucas 2018 in Lapland
- Download input rasters from Allas
- Using GeoCubes for downloading data (Optional)
- Split multi-band rasters to single-band (Optional)
- Visualization of input raster - NDVI example
- Land Use Class frequency visualization
- Create a datacube
- Store bands for point-eo sampling script
- Sample raster using point-eo for sampling
- Training with Random Forest model
- Parallelization of Random Forest
- Visualization of the final result
Lesson 4 - FGI
- Lesson 4 - Merging Tiled Vectors
- Hands on coding (Local)
- Step 1: count
KOHDEOSOattribute value occurrences - Step 2: merging features and creating joined layers
- Parallelization
- Moving to Puhti (HPC)
- Clone repository
- Set up HOME folder and data
- Set up settings file
- Writing a serial job file
- About python env
- 1 List files
RUN sbatch job_list_files.sh- Writing a parallel job file
- 2 Create partial index
RUN sbatch job_create_partial_index.sh- 3 Join partial index
RUN sbatch job_join_partial_index.sh- 4 Create partial layer
RUN sbatch job_create_partial_layer.sh- 5 Join partial layer
RUN sbatch job_join_partial_layer.sh
- Wrap-up