Learning DPDK : DPI with Hyperscan

magnifying-glass

Why

To know which application generates monitored traffic it is not enough to know TCP/IP address and port but a look inside HTTP header is required.

How

HTTP header is analyzed against a collection of strings. Each string is associated with some protocol, like facebook, google chat, etc.

Complications

String search is a slow operation and to be made fast could leverage smart algorithms and HW optimization technics.

Solution

Regex library called Hyperscan. You can listen for the introduction of the library here. The speed of the library was evaluated here.

Integration

Install binary prerequisites

yum install ragel libstdc++-static

Download Hyperscan sources

wget https://github.com/intel/hyperscan/archive/v4.7.0.tar.gz
tar -xf v4.7.0.tar.gz

Download boost headers

wget https://dl.bintray.com/boostorg/release/1.67.0/source/boost_1_67_0.tar.gz
tar -xf boost_1_67_0.tar.gz
cp -r boost_1_67_0/boost hyperscan-4.7.0/include

Build and install Hyperscan shared library

Just follow the instruction from here.
cd hyperscan-4.7.0
mkdir build
cd build
cmake -DBUILD_SHARED_LIBS=true ..
make
make install

Link DPDK app against Hyperscan

Modify Makefile as follows.
CFLAGS += -I/usr/local/include/hs/
LDFLAGS += -lhs

Build a database from a list of strings

Use hs_compile_multi() with an array of strings that you need to grep. To escape a string use  \Q and \E symbols from PCRE syntax.

Search

Use hs_scan() API
Check simplegrep example for more details.

References