03 - MPI with `mpi4py`

July 2025

📦 Real-World Example: Volunteer Distributed Computing

Projects like Folding@Home, SETI@Home, and World Community Grid demonstrate how distributed systems can leverage millions of heterogeneous nodes to perform scientific computation. At its 2020 COVID-19 peak, Folding@Home briefly exceeded 2.5 exaFLOPS of aggregate performance — more than the top supercomputers combined — entirely through volunteer hardware.

These platforms divide large problems (like protein folding or signal analysis) into independent work units distributed across thousands of computers, similar to how mpi4py or Ray distribute tasks across processes. While volunteer computing faces challenges such as unreliable nodes and heterogeneous performance, it remains a striking illustration of parallelism at planetary scale.

Section 3.1: Introduction to MPI and `mpi4py`

Key Concept

MPI (Message Passing Interface) is a standard for parallel computing that enables processes to communicate and synchronize with each other. mpi4py is a Python binding to the MPI standard, allowing Python code to leverage parallel processing capabilities.

Topics

Distributed Memory: Understand that each process has its own memory space.
Communication: Learn about sending and receiving data between processes.
Collective Operations: Familiarize yourself with common operations like broadcast, scatter, and gather.
Process Management: Understand how processes are launched and managed within an MPI application.
In-session Exercise: Consider how you would divide a large dataset among multiple processes. What communication strategy would you use?
Common Pitfall: Incorrectly handling data distribution can lead to data races or incomplete results.
Best Practice: Use MPI.Comm_rank() to identify the process ID for tailored computations.

Section 3.2: Basic MPI Workflow

Key Concept

This section outlines the fundamental steps involved in writing a simple parallel program using MPI.

Topics

Initialization: Explain the need to initialize the MPI environment.
Communication Setup: Describe how to define communication patterns between processes.
Task Decomposition: Discuss dividing the overall problem into smaller tasks for parallel execution.
Finalization: Explain the importance of properly terminating the MPI environment.

Section 3.3: Data Structures in Parallel

Key Concept

Choosing appropriate data structures is crucial for efficient parallel programming.

Topics

Global vs. Local Data: Differentiate between data accessible to all processes and data specific to a single process.
Data Partitioning: Explore techniques for dividing data among processes (e.g., domain decomposition).
Synchronization Primitives: Introduce concepts like barriers and locks for coordinating parallel operations.
Data Alignment: Briefly mention the importance of data alignment for performance.

Section 3.4: Common MPI Patterns

Key Concept

Several common patterns arise in parallel programming, offering reusable solutions for frequently encountered problems.

Topics

Data Parallelism: Explain how to apply the same operation to different parts of the data.
Task Parallelism: Describe how to execute different tasks concurrently.
Pipeline Parallelism: Introduce the concept of breaking down a complex task into stages executed in a pipeline.
Reduce Operation: Highlight the use of MPI.Reduce for aggregating data across processes.

Section 3.2: `mpiexec` and distributed execution

Key Concept

mpiexec is the command-line tool used to launch parallel processes across multiple nodes in a cluster. It allows you to distribute your workload and leverage the combined computational power of your system.

Topics

Process Launch: mpiexec starts multiple instances of a program.
Resource Specification: You specify the number of processes and the host(s) to run them on.
Environment Variables: mpiexec sets environment variables crucial for inter-process communication (e.g., MPI_COMM_WORLD_SIZE, MPI_PROC_ID).
Basic Command Structure: mpiexec -n <processes> <program_arguments>
In-session Exercise: (5 min) What are the key environment variables set by mpiexec and what is their purpose?
Common Pitfalls: Forgetting to specify the number of processes (-n) will likely lead to errors.
Best Practices: Always verify the number of processes and host configuration before launching a job.

Section 3.3: MPI Communication Fundamentals

Key Concept

MPI communication enables processes to exchange data and coordinate their actions during parallel execution. It's the core mechanism for distributed computation.

Topics

Message Passing: Processes send and receive data via messages.
Communication Patterns: Common patterns include point-to-point (sending to a specific process) and collective operations (e.g., broadcast, gather).
Data Distribution: Data can be distributed across processes using techniques like domain decomposition.
MPI Functions: MPI provides a set of functions for communication (e.g., MPI_Send, MPI_Recv).
In-session Exercise: (5 min) Describe the difference between point-to-point and collective communication.
Common Pitfalls: Incorrectly specifying message sizes or buffer sizes can lead to communication errors.
Best Practices: Use appropriate data types for communication to minimize data transfer overhead.

Section 3.4: Common MPI Operations**

Key Concept

MPI provides a set of fundamental operations for exchanging data and coordinating processes, forming the building blocks of parallel algorithms.

Topics

Send/Receive: Fundamental for data exchange between processes.
Broadcast: Copies data from one process to all other processes.
Gather/Scatter: Collects data from multiple processes into one, or distributes data from one process to multiple processes.
Barrier: Synchronizes processes to ensure they reach a certain point in the computation before proceeding.
In-session Exercise: (5 min) Explain when you would use a Barrier operation.
Common Pitfalls: Incorrectly using Gather or Scatter can lead to data inconsistencies.
Best Practices: Understand the data layout and process IDs when using collective operations.

Section 3.3: Process communication: `send`, `recv`, `broadcast`, `scatter`, `gather`

Key Concept

This section covers fundamental mechanisms for inter-process communication (IPC), enabling processes to exchange data and coordinate their actions. These primitives provide building blocks for distributed computation and parallel processing.

Topics

send and recv: Basic communication; one process sends data to another, and the receiver retrieves it.
broadcast: Sends data from one process to multiple recipients.
scatter: Distributes data from one process to multiple recipients, potentially with each recipient receiving a portion.
gather: Collects data from multiple processes into one process.

Exercise

(5 min) Consider a scenario where you need to calculate the average of a large dataset distributed across multiple processes. Which communication primitives would be most suitable for gathering the partial results?

Pitfalls

Deadlock: Incorrectly ordering send and recv operations can lead to processes blocking indefinitely.
Data Races: Concurrent access to shared data without proper synchronization can result in unpredictable behavior.

Best Practices

Synchronization: Use appropriate synchronization mechanisms (e.g., mutexes, semaphores) to protect shared data.
Error Handling: Implement robust error handling to manage communication failures.

Section 3.3: Process communication: `send`, `recv`, `broadcast`, `scatter`, `gather`

Key Concept

This section introduces core inter-process communication (IPC) primitives, allowing processes to exchange data and coordinate. These are essential for parallel and distributed applications.

Topics

send and recv: Fundamental mechanism for point-to-point communication. One process sends data, another receives.
broadcast: Sends data from a single process to all other participating processes.
scatter: Distributes data from a single process to multiple processes, potentially splitting the data.
gather: Collects data from multiple processes into a single process.

Exercise

(5 min) Imagine you have a master process that needs to distribute a large array of data to several worker processes. Which communication primitive would be most appropriate?

Pitfalls

Race Conditions: Unprotected concurrent access to shared resources can lead to incorrect results.
Buffer Overflow: Sending more data than the receiver's buffer can cause crashes or data corruption.

Best Practices

Explicit Synchronization: Clearly define synchronization points using appropriate primitives.
Data Validation: Validate data before sending and receiving to prevent errors.

Section 3.4: Modern alternatives: `joblib` and `Ray` for distributed computing

Key Concept

joblib and Ray provide simpler and more scalable solutions for parallelizing Python code compared to traditional approaches like multiprocessing. Ray offers a more comprehensive framework for distributed applications.

Topics

joblib: Simple parallelization for CPU-bound tasks, particularly useful for large datasets.
Ray: A distributed execution framework for building scalable applications, including parallel computing, machine learning, and reinforcement learning.
Scalability: Both tools offer different levels of scalability, with Ray designed for larger, more complex deployments.
Ease of Use: Both aim to simplify parallelization, reducing boilerplate code compared to multiprocessing.
In-session Exercise: Consider how you might parallelize a data preprocessing step (e.g., feature scaling) using either joblib or Ray. What are the key considerations for choosing between them? (5-10 min)
Common Pitfalls: Over-parallelization can lead to overhead exceeding the benefits of parallelism.
Best Practices: Profile your code to identify bottlenecks before parallelizing.

Section 3.5: Comparison: MPI vs modern distributed frameworks

Key Concept

Modern distributed frameworks offer higher-level abstractions and often better performance than traditional MPI, but require a shift in thinking about application design.

Topics

Abstraction Level: MPI operates at the message level; modern frameworks offer object-oriented or data-centric abstractions.
Scalability: Modern frameworks often scale more efficiently to large clusters and heterogeneous environments.
Programming Model: MPI requires explicit communication management; modern frameworks often handle this implicitly.
Ease of Use: Modern frameworks generally have simpler APIs and better integration with common data formats.
In-session Exercise: Briefly brainstorm how you would refactor a simple MPI program to use a data-centric approach.
Common Pitfalls: Over-optimizing communication details in a modern framework can negate its benefits.
Best Practices: Prioritize data locality and minimize data movement across nodes.

Section 3.5: Comparison: MPI vs modern distributed frameworks

Key Concept

Modern distributed frameworks provide higher-level abstractions and often superior performance compared to traditional MPI, but necessitate a change in application design.

Topics

Abstraction Level: MPI focuses on explicit message passing; modern frameworks offer data-centric or object-oriented abstractions.
Scalability: Modern frameworks are designed for efficient scaling across large, heterogeneous clusters.
Programming Model: MPI requires manual communication management; modern frameworks often handle this implicitly.
Ease of Use: Modern frameworks typically feature simpler APIs and better data format integration.
In-session Exercise: Consider a simple MPI program. How might you restructure it to leverage a data-centric approach?
Common Pitfalls: Excessive focus on low-level communication details can diminish the advantages of the framework.
Best Practices: Maximize data locality to reduce inter-node data transfer.

Exercise: Distributed FFT computation

Objective: Implement a basic distributed FFT computation using MPI to divide a large array into smaller chunks and compute the FFT of each chunk.

Instructions: - You are given a Python script fft_mpi.py that uses MPI to distribute the computation of the Fast Fourier Transform (FFT) of a large array. The script divides the input array into chunks and assigns each chunk to a different MPI process. - Modify the fft_mpi.py script to print the index of the chunk being processed and the size of the chunk for each process. - Run the modified script with 4 MPI processes. - Observe the output and verify that the array is indeed divided into chunks and each process is working on a portion of the data.

Expected Learning Outcome: You will understand how to divide a computational task into smaller parts and distribute it across multiple processes using MPI, and how to track the progress of each process.

Exercise: Fallback: `joblib` parallel processing if MPI setup fails

Objective: To understand how to use joblib as a fallback for parallel processing when MPI is unavailable.

Instructions: - You are given a Python script calculate_sum.py that calculates the sum of a large list of numbers. The script uses MPI for parallelization. - First, attempt to run calculate_sum.py using MPI. - Then, modify the script to include a fallback mechanism using joblib parallel processing. This fallback should be activated if MPI fails to initialize. - Run the modified script and observe the output.

Expected Learning Outcome: You will understand how to implement a fallback strategy for parallel processing, allowing your code to run on systems without MPI.

Exercise: Monte Carlo simulation across processes

Objective: Simulate the probability of a random point falling within a circle using MPI.

Instructions: - You are given a Python script monte_carlo.py that performs a Monte Carlo simulation. This script currently runs the simulation on a single process. - Modify the script to distribute the simulation workload across multiple processes using MPI. - Run the modified script with mpiexec -n <number_of_processes> python monte_carlo.py. - Compare the results (number of points inside the circle) with the single-process simulation.

Expected Learning Outcome: You will understand how to distribute a computationally intensive task across multiple processes using MPI and how to compare the results to a single-process execution.

No Pages Found

03 - MPI with `mpi4py`

📦 Real-World Example: Volunteer Distributed Computing

Section 3.1: Introduction to MPI and mpi4py

Key Concept

Topics

Section 3.2: Basic MPI Workflow

Key Concept

Topics

Section 3.3: Data Structures in Parallel

Key Concept

Topics

Section 3.4: Common MPI Patterns

Key Concept

Topics

Section 3.2: mpiexec and distributed execution

Key Concept

Topics

Section 3.3: MPI Communication Fundamentals

Key Concept

Topics

Section 3.4: Common MPI Operations**

Key Concept

Topics

Section 3.3: Process communication: send, recv, broadcast, scatter, gather

Key Concept

Topics

Exercise

Pitfalls

Best Practices

Section 3.3: Process communication: send, recv, broadcast, scatter, gather

Key Concept

Topics

Exercise

Pitfalls

Best Practices

Section 3.4: Modern alternatives: joblib and Ray for distributed computing

Key Concept

Topics

Section 3.5: Comparison: MPI vs modern distributed frameworks

Key Concept

Topics

Section 3.5: Comparison: MPI vs modern distributed frameworks

Key Concept

Topics

Exercise: Distributed FFT computation

Exercise: Fallback: joblib parallel processing if MPI setup fails

Exercise: Monte Carlo simulation across processes

Section 3.1: Introduction to MPI and `mpi4py`

Section 3.2: `mpiexec` and distributed execution

Section 3.3: Process communication: `send`, `recv`, `broadcast`, `scatter`, `gather`

Section 3.3: Process communication: `send`, `recv`, `broadcast`, `scatter`, `gather`

Section 3.4: Modern alternatives: `joblib` and `Ray` for distributed computing

Exercise: Fallback: `joblib` parallel processing if MPI setup fails