โ† Bookmarks ๐Ÿ“„ Article

FFmpeg Architecture 101: Building a Media Player from Scratch

A practical walkthrough of FFmpeg's core data structures and demux/decode pipeline, with working code that shows how to extract and decode video streams programmatically.

ยท software engineering
Read Original
Listen to Article
0:000:00
Summary used for search

โ€ข FFmpeg separates tools (ffmpeg, ffplay, ffprobe) from libraries (libavformat for I/O, libavcodec for encoding/decoding) - understanding this split is key to programmatic usage
โ€ข The demux/decode pipeline uses a specific structure hierarchy: AVFormatContext (container) โ†’ AVStream (individual streams) โ†’ AVCodec (decoder) โ†’ AVPacket (encoded data) โ†’ AVFrame (raw decoded data)
โ€ข Decoding follows a send/receive pattern: avcodec_send_packet() pushes encoded packets to the decoder, then avcodec_receive_frame() pulls out decoded frames (may require multiple receives per packet)
โ€ข Stream indexing is critical - you must track which stream index corresponds to your target video/audio stream when demuxing packets from the container
โ€ข Includes complete working code with meson build setup that automatically downloads FFmpeg dependencies

This tutorial breaks down FFmpeg's architecture by building a basic media player that demuxes and decodes video streams. The key insight is understanding FFmpeg's component separation: command-line tools (ffmpeg, ffplay, ffprobe) versus libraries (libavformat for I/O/muxing, libavcodec for encoding/decoding, libavfilter for processing). When building applications, you work directly with these libraries through specific data structures.

The demux/decode pipeline follows a clear hierarchy. AVFormatContext manages the overall container and provides sync/metadata. AVStream represents individual audio or video streams within that container. AVCodec defines the encoding/decoding logic. AVPacket holds encoded data extracted from streams. AVFrame contains the final decoded raw video frames or audio samples. The actual code flow: open the input file with avformat_open_input(), iterate through streams to find your target video stream, locate the appropriate codec with avcodec_find_decoder(), create an AVCodecContext, then loop through packets using av_read_frame(). For each packet matching your stream index, call avcodec_send_packet() followed by avcodec_receive_frame() in a loop (one packet may produce multiple frames).

The tutorial provides complete working code with a meson build system that auto-downloads FFmpeg if needed. Running it against a sample MP4 shows the actual output: stream metadata (time base, framerate, codec), packet arrival with PTS timestamps, and decoded frame information (type I/P/B, format, keyframe status). This concrete example demystifies FFmpeg's API surface and gives you a foundation for building real media processing applications.