03 - MPI with `mpi4py`

July 2025

Section 3.1: Introduction to MPI and mpi4py

Key Concept

MPI (Message Passing Interface) is a standard for parallel computing that enables processes to communicate and synchronize with each other. mpi4py is a Python binding to the MPI standard, allowing Python code to leverage parallel processing capabilities.

Topics

  • Distributed Memory: Understand that each process has its own memory space.
  • Communication: Learn about sending and receiving data between processes.
  • Collective Operations: Familiarize yourself with common operations like broadcast, scatter, and gather.
  • Process Management: Understand how processes are launched and managed within an MPI application.

  • In-session Exercise: Consider how you would divide a large dataset among multiple processes. What communication strategy would you use?

  • Common Pitfall: Incorrectly handling data distribution can lead to data races or incomplete results.
  • Best Practice: Use MPI.Comm_rank() to identify the process ID for tailored computations.

Section 3.2: Basic MPI Workflow

Key Concept

This section outlines the fundamental steps involved in writing a simple parallel program using MPI.

Topics

  • Initialization: Explain the need to initialize the MPI environment.
  • Communication Setup: Describe how to define communication patterns between processes.
  • Task Decomposition: Discuss dividing the overall problem into smaller tasks for parallel execution.
  • Finalization: Explain the importance of properly terminating the MPI environment.

Section 3.3: Data Structures in Parallel

Key Concept

Choosing appropriate data structures is crucial for efficient parallel programming.

Topics

  • Global vs. Local Data: Differentiate between data accessible to all processes and data specific to a single process.
  • Data Partitioning: Explore techniques for dividing data among processes (e.g., domain decomposition).
  • Synchronization Primitives: Introduce concepts like barriers and locks for coordinating parallel operations.
  • Data Alignment: Briefly mention the importance of data alignment for performance.

Section 3.4: Common MPI Patterns

Key Concept

Several common patterns arise in parallel programming, offering reusable solutions for frequently encountered problems.

Topics

  • Data Parallelism: Explain how to apply the same operation to different parts of the data.
  • Task Parallelism: Describe how to execute different tasks concurrently.
  • Pipeline Parallelism: Introduce the concept of breaking down a complex task into stages executed in a pipeline.
  • Reduce Operation: Highlight the use of MPI.Reduce for aggregating data across processes.

Section 3.2: mpiexec and distributed execution

Key Concept

mpiexec is the command-line tool used to launch parallel processes across multiple nodes in a cluster. It allows you to distribute your workload and leverage the combined computational power of your system.

Topics

  • Process Launch: mpiexec starts multiple instances of a program.
  • Resource Specification: You specify the number of processes and the host(s) to run them on.
  • Environment Variables: mpiexec sets environment variables crucial for inter-process communication (e.g., MPI_COMM_WORLD_SIZE, MPI_PROC_ID).
  • Basic Command Structure: mpiexec -n <processes> <program_arguments>

  • In-session Exercise: (5 min) What are the key environment variables set by mpiexec and what is their purpose?

  • Common Pitfalls: Forgetting to specify the number of processes (-n) will likely lead to errors.
  • Best Practices: Always verify the number of processes and host configuration before launching a job.

Section 3.3: MPI Communication Fundamentals

Key Concept

MPI communication enables processes to exchange data and coordinate their actions during parallel execution. It's the core mechanism for distributed computation.

Topics

  • Message Passing: Processes send and receive data via messages.
  • Communication Patterns: Common patterns include point-to-point (sending to a specific process) and collective operations (e.g., broadcast, gather).
  • Data Distribution: Data can be distributed across processes using techniques like domain decomposition.
  • MPI Functions: MPI provides a set of functions for communication (e.g., MPI_Send, MPI_Recv).

  • In-session Exercise: (5 min) Describe the difference between point-to-point and collective communication.

  • Common Pitfalls: Incorrectly specifying message sizes or buffer sizes can lead to communication errors.
  • Best Practices: Use appropriate data types for communication to minimize data transfer overhead.

Section 3.4: Common MPI Operations**

Key Concept

MPI provides a set of fundamental operations for exchanging data and coordinating processes, forming the building blocks of parallel algorithms.

Topics

  • Send/Receive: Fundamental for data exchange between processes.
  • Broadcast: Copies data from one process to all other processes.
  • Gather/Scatter: Collects data from multiple processes into one, or distributes data from one process to multiple processes.
  • Barrier: Synchronizes processes to ensure they reach a certain point in the computation before proceeding.

  • In-session Exercise: (5 min) Explain when you would use a Barrier operation.

  • Common Pitfalls: Incorrectly using Gather or Scatter can lead to data inconsistencies.
  • Best Practices: Understand the data layout and process IDs when using collective operations.

Section 3.3: Process communication: send, recv, broadcast, scatter, gather

Key Concept

This section covers fundamental mechanisms for inter-process communication (IPC), enabling processes to exchange data and coordinate their actions. These primitives provide building blocks for distributed computation and parallel processing.

Topics

  • send and recv: Basic communication; one process sends data to another, and the receiver retrieves it.
  • broadcast: Sends data from one process to multiple recipients.
  • scatter: Distributes data from one process to multiple recipients, potentially with each recipient receiving a portion.
  • gather: Collects data from multiple processes into one process.

Exercise

  • (5 min) Consider a scenario where you need to calculate the average of a large dataset distributed across multiple processes. Which communication primitives would be most suitable for gathering the partial results?

Pitfalls

  • Deadlock: Incorrectly ordering send and recv operations can lead to processes blocking indefinitely.
  • Data Races: Concurrent access to shared data without proper synchronization can result in unpredictable behavior.

Best Practices

  • Synchronization: Use appropriate synchronization mechanisms (e.g., mutexes, semaphores) to protect shared data.
  • Error Handling: Implement robust error handling to manage communication failures.

Section 3.3: Process communication: send, recv, broadcast, scatter, gather

Key Concept

This section introduces core inter-process communication (IPC) primitives, allowing processes to exchange data and coordinate. These are essential for parallel and distributed applications.

Topics

  • send and recv: Fundamental mechanism for point-to-point communication. One process sends data, another receives.
  • broadcast: Sends data from a single process to all other participating processes.
  • scatter: Distributes data from a single process to multiple processes, potentially splitting the data.
  • gather: Collects data from multiple processes into a single process.

Exercise

  • (5 min) Imagine you have a master process that needs to distribute a large array of data to several worker processes. Which communication primitive would be most appropriate?

Pitfalls

  • Race Conditions: Unprotected concurrent access to shared resources can lead to incorrect results.
  • Buffer Overflow: Sending more data than the receiver's buffer can cause crashes or data corruption.

Best Practices

  • Explicit Synchronization: Clearly define synchronization points using appropriate primitives.
  • Data Validation: Validate data before sending and receiving to prevent errors.

Section 3.4: Modern alternatives: joblib and Ray for distributed computing

Key Concept

joblib and Ray provide simpler and more scalable solutions for parallelizing Python code compared to traditional approaches like multiprocessing. Ray offers a more comprehensive framework for distributed applications.

Topics

  • joblib: Simple parallelization for CPU-bound tasks, particularly useful for large datasets.
  • Ray: A distributed execution framework for building scalable applications, including parallel computing, machine learning, and reinforcement learning.
  • Scalability: Both tools offer different levels of scalability, with Ray designed for larger, more complex deployments.
  • Ease of Use: Both aim to simplify parallelization, reducing boilerplate code compared to multiprocessing.

  • In-session Exercise: Consider how you might parallelize a data preprocessing step (e.g., feature scaling) using either joblib or Ray. What are the key considerations for choosing between them? (5-10 min)

  • Common Pitfalls: Over-parallelization can lead to overhead exceeding the benefits of parallelism.

  • Best Practices: Profile your code to identify bottlenecks before parallelizing.

Section 3.5: Comparison: MPI vs modern distributed frameworks

Key Concept

Modern distributed frameworks offer higher-level abstractions and often better performance than traditional MPI, but require a shift in thinking about application design.

Topics

  • Abstraction Level: MPI operates at the message level; modern frameworks offer object-oriented or data-centric abstractions.
  • Scalability: Modern frameworks often scale more efficiently to large clusters and heterogeneous environments.
  • Programming Model: MPI requires explicit communication management; modern frameworks often handle this implicitly.
  • Ease of Use: Modern frameworks generally have simpler APIs and better integration with common data formats.

  • In-session Exercise: Briefly brainstorm how you would refactor a simple MPI program to use a data-centric approach.

  • Common Pitfalls: Over-optimizing communication details in a modern framework can negate its benefits.
  • Best Practices: Prioritize data locality and minimize data movement across nodes.

Section 3.5: Comparison: MPI vs modern distributed frameworks

Key Concept

Modern distributed frameworks provide higher-level abstractions and often superior performance compared to traditional MPI, but necessitate a change in application design.

Topics

  • Abstraction Level: MPI focuses on explicit message passing; modern frameworks offer data-centric or object-oriented abstractions.
  • Scalability: Modern frameworks are designed for efficient scaling across large, heterogeneous clusters.
  • Programming Model: MPI requires manual communication management; modern frameworks often handle this implicitly.
  • Ease of Use: Modern frameworks typically feature simpler APIs and better data format integration.

  • In-session Exercise: Consider a simple MPI program. How might you restructure it to leverage a data-centric approach?

  • Common Pitfalls: Excessive focus on low-level communication details can diminish the advantages of the framework.
  • Best Practices: Maximize data locality to reduce inter-node data transfer.

Exercise: Distributed FFT computation

Objective: Implement a basic distributed FFT computation using MPI to divide a large array into smaller chunks and compute the FFT of each chunk.

Instructions: - You are given a Python script fft_mpi.py that uses MPI to distribute the computation of the Fast Fourier Transform (FFT) of a large array. The script divides the input array into chunks and assigns each chunk to a different MPI process. - Modify the fft_mpi.py script to print the index of the chunk being processed and the size of the chunk for each process. - Run the modified script with 4 MPI processes. - Observe the output and verify that the array is indeed divided into chunks and each process is working on a portion of the data.

Expected Learning Outcome: You will understand how to divide a computational task into smaller parts and distribute it across multiple processes using MPI, and how to track the progress of each process.


Exercise: Fallback: joblib parallel processing if MPI setup fails

Objective: To understand how to use joblib as a fallback for parallel processing when MPI is unavailable.

Instructions: - You are given a Python script calculate_sum.py that calculates the sum of a large list of numbers. The script uses MPI for parallelization. - First, attempt to run calculate_sum.py using MPI. - Then, modify the script to include a fallback mechanism using joblib parallel processing. This fallback should be activated if MPI fails to initialize. - Run the modified script and observe the output.

Expected Learning Outcome: You will understand how to implement a fallback strategy for parallel processing, allowing your code to run on systems without MPI.


Exercise: Monte Carlo simulation across processes

Objective: Simulate the probability of a random point falling within a circle using MPI.

Instructions: - You are given a Python script monte_carlo.py that performs a Monte Carlo simulation. This script currently runs the simulation on a single process. - Modify the script to distribute the simulation workload across multiple processes using MPI. - Run the modified script with mpiexec -n <number_of_processes> python monte_carlo.py. - Compare the results (number of points inside the circle) with the single-process simulation.

Expected Learning Outcome: You will understand how to distribute a computationally intensive task across multiple processes using MPI and how to compare the results to a single-process execution.


No Pages Found