How does Gunicorn work?

I’ve been deploying Python web applications for a while now, and Gunicorn has always been part of the stack. But I realized I didn’t really understand what it does under the hood. This post is my notes while answering some questions I had related to Gunicorn.

# A Bit of History

Way back in the 90s, there was no standard way for a traditional web server to run Python applications. Gregory Trubetskoy wrote mod_python, which allowed running the Python interpreter inside web servers of that day, like Netscape Enterprise Server (NSAPy), and later the Apache web server.

As mod_python’s development stalled and security vulnerabilities were discovered, there was recognition by the community that a consistent way to execute Python code for web applications was needed. So, the Python community came up with WSGI (Web Server Gateway Interface) as a standard interface for web servers to run Python applications.

This way, Python frameworks can just implement the WSGI application interface and not worry about processing network requests at scale. Handling requests and communicating those to the app framework’s process is the responsibility of a WSGI server (a web server implementing the WSGI Server interface).

Gunicorn is a WSGI server. It’s responsible for translating and communicating web requests to Python web application frameworks.

Now that we understand the need for Gunicorn and what purpose it serves, let’s take a peek at how it actually works.

# The Pre-fork Worker Model

Gunicorn uses a pre-fork worker model. This means that there is a central master process that manages a set of worker processes. At application startup, the master Gunicorn process spawns worker processes.

The master never knows anything about individual clients. All requests and responses are handled completely by worker processes.

# How Master & Worker Processes Handle Application Memory

Normally, the worker processes each individually import the WSGI application and load it into memory of each worker process. This means your memory consumption will be multiplied by how many workers you use.

You can, however, pre-load your application. By using the --preload flag, you’re making this import happen in the master process before workers are spawned. Then, the workers share this memory (more on this in a bit, as it’s a bit more nuanced than that).

# How the Master Process Spawns Children

The master process uses the fork() system call to create new child processes. Here are some important points on how fork() works:

The new process (child process) is an exact copy of the calling parent process at the time of the fork.
This copy includes the process’s code, the current state of execution, and its memory space (variables, data, and program instructions). The child process inherits a copy of the parent’s memory space.
This copying happens in a copy-on-write fashion.

What does copy-on-write mean? Initially, the memory is not physically copied. Instead, both the parent and child processes share the same physical memory pages. But the memory is marked in a way that if either process tries to modify any shared page, the OS makes a copy of that page, ensuring that modifications in the child do not affect the parent, and vice versa.

This way, fork() helps share memory across processes efficiently by using the operating system’s memory management capabilities. This sharing is particularly useful for read-only or mostly-read operations, such as serving predictions from a loaded ML model, where the memory savings can be substantial.

# Does Gunicorn Use Green Threads?

By default, no.

However, you can have green thread workers (known as greenlets in Python) if you configure your workers to be of the Async worker type.

In case of async workers, every time a new request comes, they are handled by greenlets spawned by the worker threads. The core idea is that when there is an IO-bound action in your code, the greenlet will yield and give a handle to another greenlet to use the CPU. So when many requests are coming to your app, it can handle them concurrently. The IO-bound calls are not going to block the CPU, and throughput will improve.

# Why Do You Need Nginx with Gunicorn?

You might be thinking, “If Gunicorn handles requests, why do I need Nginx in front of it?”

A few reasons:

Static files: Not all requests need to be dynamically generated. There’s no need to execute any Python code for static files like CSS, images, fonts, etc. With Nginx, you can just serve them directly from the file system. If a request needs to be dynamically generated, Nginx forwards it to Gunicorn.
Slow clients: Gunicorn developers made a specific decision to assume that Gunicorn will never read connections directly from the internet, so that they don’t have to worry about clients that are slow. From a ServerFault answer :

“If we can assume that Gunicorn will never read connections directly from the internet, then we don’t have to worry about clients that are slow.”

Simply put, Gunicorn itself is not prepared to be front-facing. It’s easy to DOS and overwhelm without a proper reverse proxy in front.

# What About Uvicorn?

A quick note: Gunicorn itself is not compatible with FastAPI because FastAPI uses the newer ASGI (Asynchronous Server Gateway Interface) standard. For FastAPI and other ASGI frameworks, you’d use Uvicorn instead.

# References

That’s it for this post. I hope it helped clarify what Gunicorn does and why it’s an important piece of the Python web stack. Thanks for reading!