← Bookmarks 📄 Article

Premature Optimization is Fun Sometimes

A delightful deep-dive into shrinking a ping monitoring struct from 12KB to 4KB through progressively clever bit-packing tricks—completely unnecessary, but intellectually satisfying.

· software engineering
Read Original
Listen to Article
0:000:00
Summary used for search

• Walks through optimizing a 24-byte struct down to 8 bytes using tagged unions, bitfields, and creative use of ICMP identifier fields
• Demonstrates how struct padding can sabotage optimizations and how to work around it through careful field ordering
• Shows instruction-level optimization: reordering fields to eliminate shifts, flipping bit meanings to let compiler elide masks
• The punchline: "completely pointless exercise" since the app isn't memory-constrained, but optimization for its own sake is fun

The author walks through optimizing a connectivity monitoring system's data structure, starting with a naive 24-byte struct storing ping timestamps. The first optimization uses a tagged union—since you only need the sent timestamp until you receive the response, then you only care about elapsed time. This saves 8 bytes. Next, they reduce timestamp precision from nanoseconds to 100-microsecond increments, fitting 20 years of data into 43 bits instead of 64.

The interesting part comes when bitfields don't save space due to struct padding. The solution: eliminate the source_address field entirely by repurposing 4 bits of the ICMP identifier field as a rolling counter to track address changes. This gets the struct down to 8 bytes. But the optimization doesn't stop there—field ordering matters for instruction efficiency. Aligning seq_no on a 16-bit boundary means loading it is a single ldrh instruction instead of requiring a shift. The addendums show even more refinement: reordering the received bit so accessing it only needs a shift, then flipping its meaning to "not_received" so the compiler can elide mask operations when the bit is checked in conditionals.

The author admits upfront this is completely unnecessary—the application isn't memory-constrained. But the exercise demonstrates deep knowledge of struct layout, compiler behavior, and bit-level optimization techniques. It's optimization as intellectual craft, celebrating the joy of making things smaller and faster even when it doesn't matter.