Network Traffic Classification using Deep Learning

Amit Bhagat
3 min readNov 21, 2017

--

Deep Learning is everywhere at the moment. It is used in various applications like image recognition, image captioning, language translation, medical, etc. It is solving real-world problems and in some cases doing better than humans [1].

As a network engineer, I was looking for ways to apply Deep Learning in the field of computer networking. I came across papers [2] and [3], and it seemed like a great application to test out my skills.

Dataset

In this experiment, I decided to capture 4 classes of network traffic using Wireshark - SFTP, SNMP, VOIP and HTTP.

Sample packet showing SNMP payload

If you notice, the payload is represented in Hex and the length of each payload is variable so pre-processing is required. I collected payload of all TCP/UDP packets for these applications and placed them in their individual TXT files.

Since (most) neural network models require fixed length input, I limited the length of each payload to 1000 bytes during pre-processing. If the length is less than that, I padded zeros to the end. If the length is more than 1000 bytes, I clipped the payload at 1000 bytes. Then each byte is converted to decimal and divided by 255 to normalize the value between 0 and 1.

Pandas DataFrame after pre-processing

The labels are generated from the file names data was stored in.

Architecture

It turns out, the appropriate architecture for this application is Convolution Neural Networks (CNN), especially one-dimensional CNN. If you need a refresher on CNN, please read [5].

CNN architecture

It is made up of 1D CNN layer with 512 filters of size 3. The filters convolve over the input data with stride of 2. The output of this layer is a 2D tensor of size 500x512. This is called the Activation map. After performing MaxPooling (with stride 2) and dropout with probability 0.25, this tensor is fed into next convolution layer which has 256 filter with shape 250x512. The output of second CNN layer is a tensor of shape 125x256. After another MaxPooling and Dropout layers, the output is fed into Flatten layer which converts the tensor into a vector of size 15872. This is then fed into a series of fully-connected layers. Finally, a Softmax classifier performs the classification task.

Here’s the summary of the model-

Model summary

Result

Output of last few epochs

Running the model on GPU-enabled AWS EC2 instance is unbelievably fast. Also, the model is working, but it seems to be overfitting on training data; it has a very high accuracy on training data (more than 99%) but accuracy on test data is about 80%. One of the reasons could be lack of data. I have about 10000 observations of which 30% is reserved for test data.

The code is implemented in Keras with Tensorflow backend and is available in [4].

References

[1] https://edgylabs.com/google-ai-cancer-diagnosis

[2] https://arxiv.org/pdf/1709.02656.pdf

[3] https://www.blackhat.com/docs/us-15/materials/us-15-Wang-The-Applications-Of-Deep-Learning-On-Traffic-Identification-wp.pdf

[4] https://github.com/amit2555/dlfn

[5] https://github.com/mingruimingrui/Convolution-neural-networks-made-easy-with-keras

--

--

No responses yet