Learning DPDK : Huge pages

hugepages

Intro

Modern CPUs support different page sizes, e.g. 4K, 2M, 1GB. All page sizes, except 4K, are named “huge pages” in Linux. The reason for this name convention is historical and stems from the fact that originally Linux supported 4K page size only.

Big page sizes are beneficial for performance as far as fewer translations between virtual and physical addresses happen and Translation Lookaside Buffer (TLB) cache is a scarce resource.

mmu-tlb

To check the size of TLB the following utility can be used.

cpuid | grep -i tlb
cache and TLB information (2):
0x63: data TLB: 1G pages, 4-way, 4 entries
0x03: data TLB: 4K pages, 4-way, 64 entries
0x76: instruction TLB: 2M/4M pages, fully, 8 entries
0xb6: instruction TLB: 4K, 8-way, 128 entries
0xc3: L2 TLB: 4K/2M pages, 6-way, 1536 entries

To check the number of allocated huge pages the following command can be used.

cat /proc/meminfo | grep Huge
AnonHugePages: 4409344 kB
HugePages_Total: 32
HugePages_Free: 32
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB

There are two types of huge pages available in the Linux.

  • Transparent (Anonymous) huge pages
  • Persistent huge pages

Transparent huge pages

Transparent huge pages is an abstraction layer that automates most aspects of creating, managing and using huge pages. As far as there existed some issues with performance and stability, DPDK does not rely on this mechanism but uses persistent huge pages.

Persistent huge pages

Persistent huge pages have to be configured manually. Persistent huge pages are never swapped by the Linux kernel.

The following management interfaces exist in Linux to allocate the persistent huge pages.

  • Shared memory using shmget()
  • HugeTLBFS is a RAM-based filesystem and mmap()read() or memfd_create() can be used to access its files
  • Anonymous mmap() by specifying the flags MAP_ANONYMOUS and MAP_HUGETLB flags
  • libhugetlbfs APIs
  • Automatic backing of memory regions

Persistent huge pages are used in DPDK by default, mount points are discovered automatically and pages are released once application exits. But in case a user needs to manually tune something, the following EAL command line parameters could be used.

  • --huge-dir Use specified hugetlbfs directory instead of autodetected ones.
  • --huge-unlink Unlink huge page files after creating them (implies no secondary process support).
  • --in-memory Recent DPDK versions added an option to not rely on hugetlbfs

There are multiple ways to set up persistent huge pages.

  • On the boot
  • In run-time

In boot time

Modify Linux boot time parameters inside /etc/default/grub. Huge pages will be spread equally between all NUMA sockets.
GRUB_CMDLINE_LINUX="default_hugepagesz=1G hugepagesz=1G hugepages=32"

Update the grub configuration file and reboot.

grub2-mkconfig -o /boot/grub2/grub.cfg
reboot

Create a folder for a permanent mount point of hugetlbfs 

mkdir /mnt/huge

Add the following line to the /etc/fstab file:
nodev /mnt/huge hugetlbfs defaults 0 0

In runtime

Update number of huge pages for each NUMA node. Default huge page size cannot be modified in the runtime.
echo 16 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
echo 16 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages

Create a mount point.

mkdir /mnt/huge

Mount hugetlbfs 
mount -t hugetlbfs nodev /mnt/huge

Memory allocation

While there are many ways to allocate persistent huge pages, DPDK is using the following.

  • mmap() call with hugetlbfs mount point
  • mmap() call with MAP_HUGETLB flag
  • memfd_create() call with MFD_HUGETLB flag

References