Modern CPUs support different page sizes, e.g. 4K, 2M, 1GB. All page sizes, except 4K, are named “huge pages” in Linux. The reason for this name convention is historical and stems from the fact that originally Linux supported 4K page size only.
Big page sizes are beneficial for performance as far as fewer translations between virtual and physical addresses happen and Translation Lookaside Buffer (TLB) cache is a scarce resource.
To check the size of TLB the following utility can be used.
cpuid | grep -i tlb
cache and TLB information (2):
0x63: data TLB: 1G pages, 4-way, 4 entries
0x03: data TLB: 4K pages, 4-way, 64 entries
0x76: instruction TLB: 2M/4M pages, fully, 8 entries
0xb6: instruction TLB: 4K, 8-way, 128 entries
0xc3: L2 TLB: 4K/2M pages, 6-way, 1536 entries
To check the number of allocated huge pages the following command can be used.
cat /proc/meminfo | grep Huge
AnonHugePages: 4409344 kB
Hugepagesize: 1048576 kB
There are two types of huge pages available in the Linux.
- Transparent (Anonymous) huge pages
- Persistent huge pages
Transparent huge pages
Transparent huge pages is an abstraction layer that automates most aspects of creating, managing and using huge pages. As far as there existed some issues with performance and stability, DPDK does not rely on this mechanism but uses persistent huge pages.
Persistent huge pages
Persistent huge pages have to be configured manually. Persistent huge pages are never swapped by the Linux kernel.
The following management interfaces exist in Linux to allocate the persistent huge pages.
- Shared memory using shmget()
- HugeTLBFS is a RAM-based filesystem and mmap(), read() or memfd_create() can be used to access its files
- Anonymous mmap() by specifying the flags MAP_ANONYMOUS and MAP_HUGETLB flags
- libhugetlbfs APIs
- Automatic backing of memory regions
Persistent huge pages are used in DPDK by default, mount points are discovered automatically and pages are released once application exits. But in case a user needs to manually tune something, the following EAL command line parameters could be used.
--huge-dirUse specified hugetlbfs directory instead of autodetected ones.
--huge-unlinkUnlink huge page files after creating them (implies no secondary process support).
--in-memoryRecent DPDK versions added an option to not rely on hugetlbfs
There are multiple ways to set up persistent huge pages.
- On the boot
- In run-time
In boot time
Modify Linux boot time parameters inside /etc/default/grub. Huge pages will be spread equally between all NUMA sockets.
GRUB_CMDLINE_LINUX="default_hugepagesz=1G hugepagesz=1G hugepages=32"
Update the grub configuration file and reboot.
grub2-mkconfig -o /boot/grub2/grub.cfg reboot
Create a folder for a permanent mount point of hugetlbfs
Add the following line to the /etc/fstab file:
nodev /mnt/huge hugetlbfs defaults 0 0
Update number of huge pages for each NUMA node. Default huge page size cannot be modified in the runtime.
echo 16 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
echo 16 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
Create a mount point.
mount -t hugetlbfs nodev /mnt/huge
While there are many ways to allocate persistent huge pages, DPDK is using the following.
- mmap() call with hugetlbfs mount point
- mmap() call with MAP_HUGETLB flag
- memfd_create() call with MFD_HUGETLB flag
- Huge pages by Mel Gorman
- The Linux Kernel : HugeTLB Pages
- Transparent huge pages in 2.6.38
- Transparent Hugepages: measuring the performance impact
- The final step for huge-page swapping
- hugetlbfs, Still Alive and Kicking
- DPDK : System Requirements
- DPDK : EAL parameters
- Exploring the New DPDK Memory Subsystem
- [dpdk-dev] [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint
- Add hugetlbfs support to memfd_create()