PyHPC – Self-Guided Parallel Programming Workshop with Python
This project is a one-week self-paced workshop designed to explore parallel and high-performance computing (HPC) concepts and tools using the Python programming language.
It's inspired by typical content from introductory graduate-level HPC courses, but adapted to be practical, flexible, and free from academic bureaucracy.
Each day covers a different technique to exploit parallelism at the CPU, GPU, or cluster level, with simple examples and performance comparisons.
Workshop Structure
The repository is organized into five folders, one for each day, each containing a README.md
that serves as a guide.
Day | Topic | Brief Description |
---|---|---|
1 | Multiprocessing | Using multiple processes for CPU-bound tasks |
2 | Multithreading and GIL | Concurrency in Python, synchronization, and GIL limitations |
3 | MPI with mpi4py |
Distributed computing on clusters or simulated locally |
4 | GPU with PyCUDA/Numba | Kernel programming and CPU vs GPU comparison |
5 | Parallel Libraries | Exploration of PyTorch, Dask, Numba, and comparative benchmarking |
How to Use It
- You can follow the daily order or skip directly to topics of interest.
- Each directory contains examples, exercises, and space for notes.
- Ideally, the learner (that would be me) should keep track of results, comparisons, and observations.
Suggested Requirements
- Python 3.9+
- Libraries:
multiprocessing
,threading
,mpi4py
,pycuda
,numba
,torch
,dask
,line_profiler
, etc. - An NVIDIA GPU environment for CUDA examples.
- A real or simulated cluster environment (e.g.,
mpiexec -n 4 python script.py
) for Day 3.
License
MIT – free to use, copy, and modify.
Parallel libraries provide tools to distribute computational tasks across multiple processor cores or machines, significantly reducing execution time for computationally intensive applications. They enable faster results by leveraging the power of parallel processing.
CUDA enables parallel computation on NVIDIA GPUs by executing code blocks called kernels, organized into grids. Kernels are the functions that perform the actual calculations, and grids define how these kernels are executed across the GPU's architecture.
MPI (Message Passing Interface) is a standard for parallel computing that enables processes to communicate and synchronize with each other. mpi4py is a Python binding to the MPI standard, allowing Python code to leverage parallel processing capabilities.
The GIL is a mutex (mutual exclusion lock) that allows only one native thread to execute Python bytecode at a time within a single process. This limits true parallelism in CPython.
High-Performance Computing (HPC) includes various strategies to accelerate computation by leveraging hardware parallelism. Choosing the right approach depends on the nature of the problem, the scale of data, and available infrastructure.