← Bookmarks πŸ“„ Article

FFmpeg 101: Understanding the Demux/Decode Pipeline Through Code

A practical walkthrough of FFmpeg's core architecture that shows how ~50 lines of C code can demux and decode video streams by understanding five key structures and their relationships.

Β· software engineering
Read Original
Listen to Article
0:000:00
Summary used for search

β€’ FFmpeg's architecture separates concerns cleanly: libavformat handles containers/muxing, libavcodec handles encoding/decoding, connected through AVFormatContext β†’ AVStream β†’ AVCodec β†’ AVPacket β†’ AVFrame
β€’ The demux/decode pipeline is surprisingly simple once you understand stream index mapping - track which stream index maps to which decoder context to route packets correctly
β€’ Packet/frame relationship is asynchronous using EAGAIN pattern - you may need to send multiple packets before receiving a frame, or receive multiple frames from one packet
β€’ Complete working example with meson build system automatically downloads FFmpeg dependencies and demonstrates the full pipeline from file to decoded frames

This tutorial demystifies FFmpeg by focusing on its core architecture rather than drowning in documentation. The key insight is that FFmpeg's complexity becomes manageable once you understand how five structures relate: AVFormatContext (manages sync/metadata for the entire file), AVStream (represents individual audio/video streams), AVCodec (defines encoding/decoding logic), AVPacket (holds encoded data), and AVFrame (holds decoded raw data). These structures form a clear pipeline where the format context demuxes the file into streams, codecs are matched to each stream's parameters, and packets flow through decoders to produce frames.

The implementation pattern is straightforward: open the file with avformat_open_input, iterate through streams to find the one you want (tracking its index), find the appropriate decoder with avcodec_find_decoder, allocate a codec context, then loop through av_read_frame to extract packets. The critical detail is matching packet stream indices to the correct decoder - packets for stream 0 go to the video decoder, packets for stream 1 to the audio decoder. The decode step uses an asynchronous pattern where avcodec_send_packet pushes encoded data and avcodec_receive_frame pulls decoded frames, with EAGAIN indicating you need to send more packets before frames are ready.

The tutorial includes a complete working example with meson build configuration that auto-downloads FFmpeg if needed. Running it against a sample MP4 shows the actual packet flow - video packets arriving with presentation timestamps (pts), being sent to the decoder, and producing frames marked as I-frames (keyframes) or P-frames (predicted frames). This concrete output makes the abstract structures tangible and provides a foundation for understanding more complex FFmpeg operations like filtering, encoding, or streaming.