Designing for Low-Latency Trading Systems
The Microsecond Battlefield
In high-frequency trading, a single microsecond can mean millions in lost opportunity. While most systems measure success in seconds or milliseconds, trading systems operate in microseconds—where even accessing main memory feels like an eternity. This isn’t about making things “fast”; it’s about understanding that at this scale, every instruction, every cache miss, and every system call becomes visible in your P&L.
The Hidden Killers of Latency
False Sharing: The Silent Performance Assassin
Here’s what nobody tells you: two threads writing to different variables can destroy each other’s performance if those variables share a cache line. Modern CPUs have 64-byte cache lines, and when Thread A modifies byte 0 while Thread B modifies byte 32, the entire cache line invalidates across cores. Jane Street discovered this cost them 40% throughput on their order router—two “independent” counters were inadvertently sharing a cache line.
The fix? Pad your hot data structures to cache line boundaries. Not just alignment—actual padding with dummy bytes.


