02 - Multithreading and GIL

July 2025

Section 2.1: What is the GIL (Global Interpreter Lock) in CPython

Key Concept

The GIL is a mutex (mutual exclusion lock) that allows only one native thread to execute Python bytecode at a time within a single process. This limits true parallelism in CPython.

Topics

  • Single-threaded bytecode execution: Only one thread can run Python code at a time.
  • Memory management: The GIL simplifies memory management by ensuring thread safety.
  • C extension interaction: The GIL is released when C extensions are called, allowing for some parallelism in those sections.
  • Impact on CPU-bound tasks: The GIL significantly limits performance for CPU-bound operations.

  • In-session exercise: Consider a scenario where you have a CPU-intensive task that could be parallelized. Briefly sketch out how you might approach it using multiprocessing instead of threading.

  • Common Pitfall: Assuming threads automatically lead to performance gains for CPU-bound tasks.
  • Best Practice: For CPU-bound tasks, consider using multiprocessing instead of threading.

Section 2.2: Alternatives to Threading: Multiprocessing

Key Concept

Multiprocessing creates separate processes, each with its own memory space, bypassing the GIL and enabling true parallelism.

Topics

  • Process Isolation: Each process has its own memory space, preventing data sharing by default.
  • Inter-process Communication (IPC): Requires explicit mechanisms (e.g., queues, pipes) to share data between processes.
  • Overhead: Creating and managing processes has higher overhead than threads.
  • Suitable for CPU-bound tasks: Multiprocessing is well-suited for tasks that are limited by CPU performance.

  • In-session exercise: Describe a situation where you would choose multiprocessing over threading.

  • Common Pitfall: Forgetting to implement proper IPC mechanisms between processes.
  • Best Practice: Use appropriate IPC mechanisms (e.g., queues) to efficiently share data between processes.

Section 2.3: Asynchronous Programming with asyncio

Key Concept

asyncio provides a single-threaded, concurrent programming model that allows you to handle multiple tasks without blocking.

Topics

  • Coroutines: Functions defined with async and await keywords.
  • Event Loop: Manages and schedules the execution of coroutines.
  • I/O-bound tasks: asyncio excels at handling tasks that spend a lot of time waiting for I/O (e.g., network requests).
  • Concurrency, not parallelism: asyncio achieves concurrency within a single thread.

  • In-session exercise: Think of a scenario where you would use asyncio instead of threading or multiprocessing.

  • Common Pitfall: Blocking the event loop with synchronous operations.
  • Best Practice: Avoid blocking operations within coroutines; use asynchronous equivalents.

Section 2.2: Introduction to asyncio for I/O-bound tasks

Key Concept

asyncio enables concurrent execution of code by allowing functions to pause and resume while waiting for I/O operations to complete. This is particularly beneficial for tasks that spend a lot of time waiting for external resources like network requests or file reads.

Topics

  • Coroutines: Functions defined with async def that can be paused and resumed.
  • Event Loop: The central mechanism that manages and schedules coroutines.
  • await keyword: Pauses a coroutine's execution until an awaited operation completes.
  • Concurrency vs. Parallelism: asyncio provides concurrency, not true parallelism (unless combined with multiprocessing).

Exercise

  • Consider a scenario where you need to fetch data from multiple APIs. How could asyncio help improve the overall execution time compared to sequential calls?

Common Pitfalls

  • Blocking Operations: Avoid using blocking functions directly within coroutines, as they can stall the event loop.
  • Ignoring Exceptions: Properly handle exceptions within coroutines to prevent unexpected program termination.

Best Practices

  • Use asyncio.gather: Run multiple coroutines concurrently for improved efficiency.
  • Context Managers: Utilize async with for managing asynchronous resources (e.g., network connections).

Section 2.3: The threading module: using Thread, Lock, RLock, Event, Condition

Key Concept

The threading module allows you to run code concurrently, enabling parallelism and improved performance, especially for I/O-bound tasks. It provides tools for managing shared resources and synchronizing threads to avoid race conditions.

Topics

  • Thread: Creates and manages individual threads of execution.
  • Lock: Protects shared resources by ensuring only one thread accesses them at a time.
  • RLock: Similar to Lock, but allows multiple acquisitions by the same thread, useful for recursive functions.
  • Event: A signaling mechanism; threads can wait for an event to be set by another thread.
  • Condition: Allows threads to wait for a specific condition to become true, providing more sophisticated synchronization.

Exercise

  • (5 min) Consider a scenario where multiple threads need to increment a shared counter. What synchronization primitive would you use to prevent data corruption?

Pitfalls

  • Race Conditions: Failure to protect shared resources with appropriate synchronization can lead to unpredictable and incorrect results.
  • Deadlock: Threads can become blocked indefinitely if they are waiting for resources held by other threads.

Best Practices

  • Minimize Lock Scope: Reduce the amount of code protected by a lock to improve concurrency.
  • Use Context Managers (with lock:): Ensure locks are always released, even if exceptions occur.

Section 2.4: When threading is useful vs when to avoid it

Key Concept

Threading is beneficial for I/O-bound tasks where the program spends significant time waiting for external operations. However, it's often not the best choice for CPU-bound tasks due to the Global Interpreter Lock (GIL).

Topics

  • I/O-bound tasks: Good for tasks involving network requests, file operations, or database queries.
  • Concurrency: Enables multiple tasks to appear to run simultaneously.
  • Responsiveness: Prevents the program from freezing while waiting.
  • Not for CPU-bound tasks (generally): The GIL limits true parallelism for computationally intensive operations.

  • In-session Exercise: Consider a scenario where you need to download multiple files from the internet. How would threading be beneficial in this situation? (5 min)

  • Common Pitfalls: Race conditions and deadlocks can occur if shared resources are not properly protected.
  • Best Practices: Use thread-safe data structures and synchronization primitives (locks, semaphores) to manage shared resources.

Section 2.4: When threading is useful vs when to avoid it

Key Concept

Threading is most effective when your program spends a lot of time waiting for external events, allowing other parts of the program to execute concurrently. It's less effective for tasks that require heavy computation.

Topics

  • I/O-bound workloads: Ideal for tasks involving network operations, disk access, or user input.
  • Concurrency vs. Parallelism: Threading provides concurrency (simultaneous execution), but not necessarily parallelism (true simultaneous execution on multiple cores).
  • GIL limitations: The Global Interpreter Lock (GIL) restricts true parallelism for CPU-bound tasks in standard Python implementations.
  • Suitable for tasks with independent units of work: Each thread can operate on a distinct piece of data.

  • In-session Exercise: Imagine you need to process a large dataset by performing calculations on each row. Would threading be a good approach? Why or why not? (5 min)

  • Common Pitfalls: Data corruption due to shared mutable state if not carefully managed.
  • Best Practices: Employ appropriate locking mechanisms (e.g., threading.Lock) to protect shared data and avoid race conditions.

Section 2.5: Practical differences between threading, asyncio, and multiprocessing

Key Concept

These concurrency models offer different approaches to achieving parallelism, each with trade-offs in resource usage, complexity, and suitability for different workloads. threading is suitable for I/O-bound tasks, asyncio for concurrent I/O, and multiprocessing for CPU-bound tasks.

Topics

  • Threading: Shares memory space; limited by the Global Interpreter Lock (GIL) for CPU-bound tasks.
  • Asyncio: Single-threaded, event-driven; excels at managing many concurrent I/O operations efficiently.
  • Multiprocessing: Creates separate processes with independent memory spaces; bypasses the GIL, ideal for CPU-bound tasks.

  • In-session Exercise: Consider a scenario where you need to download multiple files from the internet. Which concurrency model would be most appropriate and why?

  • Common Pitfalls: Assuming multiprocessing always results in faster execution; overhead of inter-process communication.
  • Best Practices: Use threading for tasks involving waiting for external resources (network, disk), and multiprocessing for computationally intensive tasks.

Section 2.6: Advanced synchronization patterns

Key Concept

Beyond basic periodic synchronization, this section explores patterns designed for specific scenarios like event-driven or distributed systems, often involving more complex timing relationships.

Topics

  • Event-driven synchronization: Synchronizing based on the occurrence of specific events, rather than fixed intervals.
  • Phase synchronization: Maintaining a precise phase relationship between signals, crucial for applications like radar or optical communication.
  • Distributed synchronization: Coordinating timing across multiple independent systems or nodes.
  • Adaptive synchronization: Adjusting synchronization parameters dynamically based on system conditions.

  • In-session exercise: Consider a scenario where you need to synchronize data acquisition from multiple sensors. What factors would influence your choice of synchronization pattern? (5 min)

  • Common Pitfalls: Assuming fixed intervals are always optimal; neglecting latency and network delays in distributed systems.

  • Best Practices: Prioritize robust error handling and logging in distributed synchronization; use well-defined event identifiers.

Section 2.6: Advanced synchronization patterns

Key Concept

This section delves into synchronization techniques beyond simple periodic intervals, addressing scenarios with event-driven triggers, phase relationships, distributed systems, and dynamic adjustments.

Topics

  • Event-driven synchronization: Triggering synchronization based on specific events, offering flexibility for asynchronous systems.
  • Phase synchronization: Maintaining a precise phase relationship between signals, essential for applications like radar and optical communication.
  • Distributed synchronization: Coordinating timing across multiple independent systems, requiring careful consideration of network latency and consistency.
  • Adaptive synchronization: Dynamically adjusting synchronization parameters to optimize performance in varying conditions.

  • In-session exercise: Imagine synchronizing data streams from geographically dispersed sensors. What are the key challenges and potential synchronization patterns to consider? (5 min)

  • Common Pitfalls: Overlooking the impact of network latency in distributed systems; relying on naive assumptions about signal propagation speed.

  • Best Practices: Implement redundancy and fault tolerance in distributed synchronization; use standardized protocols and data formats.

Section 2.6: Advanced synchronization patterns

Key Concept

This section covers synchronization methods beyond basic periodic timing, focusing on event-driven triggers, phase relationships, distributed systems, and adaptive adjustments.

Topics

  • Event-driven synchronization: Synchronizing based on the occurrence of specific events, providing flexibility for asynchronous systems.
  • Phase synchronization: Maintaining a precise phase relationship between signals, critical for applications like radar and optical communication.
  • Distributed synchronization: Coordinating timing across multiple independent systems, requiring careful consideration of network latency and consistency.
  • Adaptive synchronization: Dynamically adjusting synchronization parameters based on system conditions for optimal performance.

  • In-session exercise: Discuss a scenario where you need to synchronize a system with a variable data rate input. What synchronization approach would be most suitable? (5 min)

  • Common Pitfalls: Ignoring the effects of jitter and variations in signal timing; assuming perfect network connectivity.

  • Best Practices: Employ robust error detection and correction mechanisms; utilize timestamping and sequence numbering.

Exercise: Concurrent data acquisition simulation

Objective: Simulate fetching data from multiple sources concurrently using Python's multithreading.

Instructions: - Create a Python script that simulates fetching data from a list of URLs. Each URL represents a data source. - Implement a function fetch_data(url) that simulates fetching data from a URL (e.g., by sleeping for a random amount of time). - Use the threading module to create multiple threads, each running the fetch_data function with a different URL. - Measure the total execution time of the multithreaded script and compare it to the execution time of a sequential script that performs the same data fetching.

Expected Learning Outcome: Understand how multithreading can improve the performance of I/O-bound tasks and gain practical experience with the threading module.


Exercise: Multiple threads reading files

Objective: To understand how multithreading can be used to read multiple files concurrently.

Instructions: - Create a Python script that reads the contents of three text files (e.g., file1.txt, file2.txt, file3.txt). If the files don't exist, create them with some sample content. - Use the threading module to create three threads, each responsible for reading one of the files. - Print the first 50 characters of the content read from each file in the main thread.

Expected Learning Outcome: You should be able to implement a basic multithreaded program to perform a simple task and observe the concurrent execution of threads.


Exercise: Web scraping with asyncio

Objective: Practice using asyncio to fetch data from multiple URLs concurrently.

Instructions: - You are given a script that scrapes the title of a few websites. The script uses synchronous requests, which can be slow. - Modify the script to use asyncio and the aiohttp library to fetch the titles of the same websites concurrently. - Run the modified script and compare the execution time with the original synchronous version.

Expected Learning Outcome: You should understand how asyncio can improve the performance of I/O-bound tasks like web scraping by allowing concurrent execution.


No Pages Found