Learning DPDK : DPI with Hyperscan



To know which application generates monitored traffic it is not enough to know TCP/IP address and port but a look inside HTTP header is required.


HTTP header is analyzed against a collection of strings. Each string is associated with some protocol, like facebook, google chat, etc.


String search is a slow operation and to be made fast could leverage smart algorithms and HW optimization technics.


Regex library called Hyperscan. You can listen for the introduction of the library here. The speed of the library was evaluated here.


Install binary prerequisites

yum install ragel libstdc++-static

Download Hyperscan sources

wget https://github.com/intel/hyperscan/archive/v4.7.0.tar.gz
tar -xf v4.7.0.tar.gz

Download boost headers

wget https://dl.bintray.com/boostorg/release/1.67.0/source/boost_1_67_0.tar.gz
tar -xf boost_1_67_0.tar.gz
cp -r boost_1_67_0/boost hyperscan-4.7.0/include

Build and install Hyperscan shared library

Just follow the instruction from here.
cd hyperscan-4.7.0
mkdir build
cd build
cmake -DBUILD_SHARED_LIBS=true ..
make install

Link DPDK app against Hyperscan

Modify Makefile as follows.
CFLAGS += -I/usr/local/include/hs/
LDFLAGS += -lhs

Build a database from a list of strings

Use hs_compile_multi() with an array of strings that you need to grep. To escape a string use  \Q and \E symbols from PCRE syntax.


Use hs_scan() API
Check simplegrep example for more details.