Python provides a range of tools and libraries for performing programming, which are designed to cater to various needs and levels of complexity. It is crucial to have an understanding of these tools in order to develop applications that are both efficient and responsive. Now let’s delve into the details of the tools, for this purpose.
Multithreading
Introduction to Multithreading Multithreading allows concurrent execution of multiple threads within a single process. Threads share the same memory space, making it suitable for I/O bound tasks.
Threading Module The threading module facilitates working with threads in Python. Let’s create a simple example to demonstrate multithreading:
import threading
def print_numbers():
for i in range(1, 6):
print(f"Number {i}")
def print_letters():
for letter in 'abcde':
print(f"Letter {letter}")
if __name__ == "__main__":
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_letters)
thread1.start() # Start the thread to run the 'print_numbers' function concurrently
thread2.start() # Start another thread to run the 'print_letters' function concurrently
thread1.join() # Wait for thread1 to complete its task before moving on
thread2.join() # Wait for thread2 to complete its task before moving on
# Number 1
# Letter a
# Letter b
# Number 2
# Number 3
# Number 4
# Number 5
# Letter c
# Letter d
# Letter e
It’s important to understand that while you have two threads (thread1 and thread2) that are executing concurrently, the order in which they execute their tasks is not guaranteed. This lack of ordering is due to the nature of threading and how operating systems schedule threads for execution. Here’s why you see the output in a seemingly random order:
Thread Scheduling: The operating system’s thread scheduler determines when and in what order threads run. Threads can be preempted and paused at any time, and the scheduler decides which thread to execute next based on factors like thread priorities and time slices.
Non-Atomic Print Operation: The print operation itself is not atomic, meaning it’s not a single, uninterrupted action. When you print something, it involves multiple steps like acquiring the console output lock, formatting the output, and releasing the lock. Between these steps, other threads can run, which can result in interleaved output.
For example, consider this possible order of execution:
thread1 starts and prints "Number 1".
thread2 starts and prints "Letter a" and "Letter b".
thread1 continues and prints "Number 2", "Number 3", "Number 4", "Number 5".
thread2 continues and prints "Letter c", "Letter d", "Letter e".
Example: Downloading Files Concurrently with Multithreading
In this example, we’ll use multithreading to download files concurrently from different URLs:
import threading
import requests
def download_file(url, filename):
response = requests.get(url)
with open(filename, "wb") as file:
file.write(response.content)
print(f"Downloaded {filename}")
if __name__ == "__main__":
urls = [
"https://github.com/wkhtmltopdf/wkhtmltopdf/archive/refs/tags/0.12.6.zip",
"https://github.com/wkhtmltopdf/wkhtmltopdf/archive/refs/tags/0.12.2.1.zip",
"https://github.com/wkhtmltopdf/wkhtmltopdf/archive/refs/tags/0.12.3.1.zip",
]
threads = []
for i, url in enumerate(urls):
thread = threading.Thread(target=download_file, args=(url, f"file_{i}.txt"))
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
# Downloaded https://github.com/wkhtmltopdf/wkhtmltopdf/archive/refs/tags/0.12.3.1.zip
# Downloaded https://github.com/wkhtmltopdf/wkhtmltopdf/archive/refs/tags/0.12.2.1.zip
# Downloaded https://github.com/wkhtmltopdf/wkhtmltopdf/archive/refs/tags/0.12.6.zip
If you observed that downloading files without using threads (i.e., in a sequential or synchronous manner) took less time than downloading them with threads, there are a few potential reasons for this counterintuitive behavior:
Global Interpreter Lock (GIL): Python has a Global Interpreter Lock (GIL) that allows only one thread to execute Python bytecode at a time, even on multi-core processors. This means that in a multithreaded Python program, threads can be limited by the GIL, especially if the tasks involve CPU bound operations. In the case of downloading files, which is typically I/O bound (waiting for data to be transferred over the network), using threads may not provide a significant advantage, and it could even introduce some overhead.
Network Bound: If the download speed is limited by the network bandwidth, using multiple threads might not lead to a significant improvement because the bottleneck is the network speed, not the CPU. In such cases, the overhead of managing multiple threads can outweigh any potential gains.
Thread Overhead: Creating and managing threads in Python comes with some overhead. If the tasks are relatively simple, such as downloading files, the overhead of creating and managing threads can outweigh the benefits of concurrency.
Resource Contention: When using threads, there can be contention for system resources like CPU and memory. If the system becomes saturated with threads, context switching and resource contention may slow down the overall performance.
Thread Management: The example code provided for multithreading may not be optimized for maximum concurrency. In a real-world scenario, optimizing thread management, such as using thread pools or asyncio for I/O bound operations, can yield better results.
Multiprocessing
Introduction to Multiprocessing Multiprocessing allows parallel execution of multiple processes, each with its own memory space. It’s suitable for CPU-bound tasks.
Multiprocessing Module The multiprocessing module supports multiprocessing in Python. Let’s create an example demonstrating multiprocessing:
import multiprocessing
def worker(number):
result = number * number
print(f"Result: {result}")
if __name__ == "__main__":
processes = []
for i in range(1, 6):
process = multiprocessing.Process(target=worker, args=(i,))
processes.append(process)
process.start()
for process in processes:
process.join()
# Result: 4
# Result: 9
# Result: 1
# Result: 25
# Result: 16
Explanation: In this multiprocessing example, we’re creating multiple processes to perform a CPU-bound task, which is calculating the square of a number. However, the order in which the results are printed may not necessarily match the order of the input values.
This is because the individual processes run concurrently and independently of each other. They may complete their tasks in a different order, depending on factors like the CPU’s availability and scheduling. As a result, the printed results can appear in a random or unordered fashion.
Multiprocessing is ideal for parallelizing CPU bound tasks to leverage multiple CPU cores effectively. However, it doesn’t guarantee a specific order of execution or results, as the processes run in parallel and their completion times can vary.
Event Loop (Asyncio)
Introduction to Asynchronous I/O Asynchronous I/O enables non-blocking concurrency. The event loop manages tasks, making it suitable for I/O-bound operations.
Asyncio Module The asyncio module provides tools for asynchronous programming. Let’s create an example to illustrate asyncio:
import asyncio
async def print_numbers():
for i in range(1, 6):
print(f"Number {i}")
await asyncio.sleep(1)
async def print_letters():
for letter in 'abcde':
print(f"Letter {letter}")
await asyncio.sleep(1)
async def main():
task1 = asyncio.create_task(print_numbers())
task2 = asyncio.create_task(print_letters())
await task1
await task2
if __name__ == "__main__":
asyncio.run(main())
# Number 1
# Letter a
# Number 2
# Letter b
# Number 3
# Letter c
# Number 4
# Letter d
# Number 5
# Letter e
In the code, you are using asyncio to create two asynchronous tasks (print_numbers and print_letters) and running them concurrently. Each task includes an await asyncio.sleep(1) statement, which effectively suspends the execution of the task for 1 second before continuing.
The main coroutine is executed when you run the program.
Inside main, you create two tasks: task1 (for print_numbers) and task2 (for print_letters). These tasks are started concurrently.
task1 starts executing the print_numbers coroutine. It prints "Number 1" and then hits the await asyncio.sleep(1) line. While it sleeps, the event loop continues.
Simultaneously, task2 starts executing the print_letters coroutine. It prints "Letter a" and then awaits for 1 second.
After 1 second, task1 resumes execution, printing "Number 2" and then sleeping again.
task2 also resumes after 1 second, printing "Letter b" and then sleeping.
This pattern continues until both task1 and task2 have completed their respective loops. The await statements within each coroutine introduce pauses, allowing the other task to make progress while the first one sleeps.
As a result, you see an interleaved output where "Number" and "Letter" lines are mixed together because both tasks are running concurrently and asynchronously, each yielding to the event loop during the await asyncio.sleep(1) calls. This allows for a more responsive and non-blocking execution of tasks in an event-driven manner, which is one of the key benefits of asyncio.
Conclusion
In this article we have introduced the principles of programming in Python using three primary methods: multithreading, multiprocessing and asyncio. We have discussed how each approach caters to scenarios, such as improving tasks that involve input/output operations handling intensive tasks or efficiently managing network operations. As you continue your journey into the world of programming in Python these fundamental techniques will serve as tools for creating responsive and high performing applications.
In our articles of the "Mastering Asynchronous Programming, in Python" series we will delve deeper into strategies, best practices and real world applications of these asynchronous Python programming techniques. Whether you want to optimize web scraping processes, develop web services or improve data processing pipelines, our comprehensive series will provide you with the knowledge and skills needed to excel in the field of Python development. Stay tuned for content and practical examples that will help take your Python programming skills to new heights.