FFmpeg Architecture 101: Building a Media Player from Scratch
A practical walkthrough of FFmpeg's core data structures and demux/decode pipeline, with working code that shows how to extract and decode video streams programmatically.
Read Original Summary used for search
TLDR
โข FFmpeg separates tools (ffmpeg, ffplay, ffprobe) from libraries (libavformat for I/O, libavcodec for encoding/decoding) - understanding this split is key to programmatic usage
โข The demux/decode pipeline uses a specific structure hierarchy: AVFormatContext (container) โ AVStream (individual streams) โ AVCodec (decoder) โ AVPacket (encoded data) โ AVFrame (raw decoded data)
โข Decoding follows a send/receive pattern: avcodec_send_packet() pushes encoded packets to the decoder, then avcodec_receive_frame() pulls out decoded frames (may require multiple receives per packet)
โข Stream indexing is critical - you must track which stream index corresponds to your target video/audio stream when demuxing packets from the container
โข Includes complete working code with meson build setup that automatically downloads FFmpeg dependencies
In Detail
This tutorial breaks down FFmpeg's architecture by building a basic media player that demuxes and decodes video streams. The key insight is understanding FFmpeg's component separation: command-line tools (ffmpeg, ffplay, ffprobe) versus libraries (libavformat for I/O/muxing, libavcodec for encoding/decoding, libavfilter for processing). When building applications, you work directly with these libraries through specific data structures.
The demux/decode pipeline follows a clear hierarchy. AVFormatContext manages the overall container and provides sync/metadata. AVStream represents individual audio or video streams within that container. AVCodec defines the encoding/decoding logic. AVPacket holds encoded data extracted from streams. AVFrame contains the final decoded raw video frames or audio samples. The actual code flow: open the input file with avformat_open_input(), iterate through streams to find your target video stream, locate the appropriate codec with avcodec_find_decoder(), create an AVCodecContext, then loop through packets using av_read_frame(). For each packet matching your stream index, call avcodec_send_packet() followed by avcodec_receive_frame() in a loop (one packet may produce multiple frames).
The tutorial provides complete working code with a meson build system that auto-downloads FFmpeg if needed. Running it against a sample MP4 shows the actual output: stream metadata (time base, framerate, codec), packet arrival with PTS timestamps, and decoded frame information (type I/P/B, format, keyframe status). This concrete example demystifies FFmpeg's API surface and gives you a foundation for building real media processing applications.