Learning DPDK: Branch Prediction



It is well-known that modern CPUs are built using the instructions pipelines that enable them to execute multiple instructions in parallel. But in case of conditional branches within the program code, not all the instructions are executed each time. As a solution, a speculative execution and branch prediction mechanisms are used to further speed up performance by guessing and executing one branch ahead of time. The problem is that in case of the wrong guess, the results of the execution have to be discarded and correct instructions have to be loaded into the instruction cache and executed on the spot.


An application developer should use macros likely and unlikely that are shortcuts for gcc __builtin_expect directive. The purpose of these macros is to give the compiler a hint which path will be taken more often and as a result, decreasing percentage of branch prediction misses.



Learning DPDK: make your data cache friendly with pahole tool



Taking into account orders of magnitude between speed access to different cache levels and RAM itself,  it is advised to carefully analyze C data structures that are used frequently on cache friendliness. The idea is to have the most often accessed data (“hot”) to stay in a higher level cache as long as possible. And the following technics are used.

  1. Group “hot” members together in the beginning and push “cold” to the end;
  2. Minimize structure size by avoiding padding;
  3. Align data to cache line size.

You can find a great description of why and how the data structures are laid out by compilers here.

Poke-a-hole (pahole) analyzes the object file and outputs detailed description of each and every structure layout created by a compiler.


Analyze the file.
pahole a.out
Analyze one structure.
pahole a.out -C structure
Get suggestion on improvements.
pahole --show_reorg_steps --reorganize -C structure a.out