It is convenient to store thread-specific data, for instance, statistics, inside an array of structures. The size of the array is equal to the number of threads.
The only thing that you need to be careful about is to avoid so-called false sharing. It is a performance penalty that you pay when RW-data shares the same cache line and is accessed from multiple threads.
Align a structure accessed by each thread to a cache line size (64 bytes) using macro __rte_cache_aligned that is actually a shortcut for __attribute__(__aligned__((64))).
typedef struct counter_s
Define an array of the structures with one element per thread.
Note that in case if structure size is smaller than cache line size, the padding is required. Otherwise, gcc compiler will complain with the following error.
error: alignment of array elements is greater than element size