Introduction

In this article we will look at threading vs multiprocessing within Python, and when you should use one over the other.

TL;DR

  • What is the GIL (Global Interpreter Lock)? - Prevents more than 1 thread being run within a single CPython Interpreter/process.
  • Why do we need the GIL? - CPython interpreter memory management is not thread-safe.
  • threading - Use when the workload is IO bound.
  • multiprocessing - Use when high computation required.

Processes vs Threads

First of all, let's look at the differences between a thread and a process at a general level.

Multiprocessing Threading
A new process is started independently from the first process. A new thread is spawned within the existing process.
Starting a process is slower than starting a thread. Starting a thread is faster than starting a process.
Memory is not shared between processes. Memory is shared between all threads.
One GIL for each process. One GIL for all threads.

The GIL

The GIL (Global Interpreter Lock) ensures that there is no more than one thread in a state of execution at any one given moment with the CPython interpreter. This lock is necessary mainly because CPython's memory management is not thread-safe. This results in the CPython interpreter being unable to do two things at once. However, with multiprocessing, as the GIL is based on a per Python Interpreter basis, multiple Python processes can be created (resulting in multiple GILs, i.e 1 per process) in order to perform parallel processing.

Threading

We can perform threading via the threading module. An example is shown below. This example is based on calculating the square root of 4 million numbers. The function to perform this is assigned to a thread. For the context of the example, a thread is created for every core upon the system.[1]

When you run this you will see that only a single Python process is run, even though multiple threads (based on the number of cores you have) are executed. This is because of the GIL.

from threading import Thread
import os
import math

def calc():
   for i in range(0, 4000000):
       math.sqrt(i)

threads = list()

for i in range(os.cpu_count()):
   print('registering thread %d' % i)
   threads.append(Thread(target=calc))

for thread in threads:
   thread.start()

for thread in threads:
   thread.join()

Multiple Processing

In order to use multiple processing the multiprocessing module is used. As you can see this module is consumed is much the same as way the threading module`.[2]

When run, you will see that multiple Python processes are created, one per core. Each one running its own GIL, and the parallel execution of calc() being performed.

from multiprocessing import Process
import os
import math

def calc():
   for i in range(0, 4000000):
       math.sqrt(i)

processes = list()

for i in range(os.cpu_count()):
   print('registering process %d' % i)
   processes.append(Process(target=calc))

for process in processes:
   process.start()

for process in processes:
   process.join()

When to Use What?

So when should you use threading over multiprocessing?

To summarize you would typically want to use threading when your operations are IO bound. For example, let's say making 20 API requests. As we know the GIL would prevent 20 parallel threads from running. The GIL, as we know, will only allow a single thread. However, at the point the 1st thread is run the network IO will be requested by the thread. As this IO is performed outside of Python the GIL would release the lock, and allow the other thread to run. When the first thread’s IO returned the lock would then be reacquired. Therefore, threading has provided us with some additional benefit without requiring the overhead needed in creating multiple processes.

Of course, this is not good for executions that require greater computation, as the GIL/lock upon the thread would remain. In this case, multiprocessing is beneficial, allowing you to split your workload across multiple CPU cores.

References


  1. "Threading vs Multiprocessing in Python : programming - Reddit." 19 Aug. 2018, https://www.reddit.com/r/programming/comments/98koue/threading_vs_multiprocessing_in_python/. Accessed 25 Apr. 2019. ↩︎

  2. "Threading vs Multiprocessing in Python : programming - Reddit." 19 Aug. 2018, https://www.reddit.com/r/programming/comments/98koue/threading_vs_multiprocessing_in_python/. Accessed 25 Apr. 2019. ↩︎