Giter Site home page Giter Site logo

echowei / deeptraffic Goto Github PK

View Code? Open in Web Editor NEW
648.0 17.0 296.0 933.22 MB

Deep Learning models for network traffic classification

License: Mozilla Public License 2.0

PowerShell 15.07% Python 77.76% Shell 7.17%
deep-learning cnn-model lstm-model malware-analysis encrypted-traffic traffic-analysis traffic-classification

deeptraffic's Introduction

deeptraffic's People

Contributors

echowei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeptraffic's Issues

恶意流量分类中input_data工具包的下载

尊敬的作者:
您好!
我非常有幸阅读了您的论文,论文思路非常清晰,我想要尝试运行一下您的代码,但是遇到一个问题,就是在恶意流量分类中train.py里面,import input_data,这个工具包,我在网上找不到。所以,想向您请教一下,这个工具包要如何下载,有什么作用

input_data.py 文件

作者您好,感谢开源,代码包中好像缺少input_data.py这个module,请您确认一下。期待你的回复。

处理ISCX2012数据集的时候,是否需要担心病毒问题?

作者你好
我想咨询这样一个问题,因为我的windows defender报毒了,先问问这种情况是否会导致计算机感染?
我google了一下没有找到对应的解答,我尝试像数据集作者,以及相关论文作者进行询问。
我不是病毒方面或网络的专家,尽管以我的计算机知识来看,合理应该是不用担心的,但是以防万一,仍然询问一下。如果作者知道的话,麻烦请告诉我一下。谢谢。

about project

hi
tanx for project
can I help me ?what do I run it?

关于训练准确率极低的情况

20类训练测试的输出如下

step 0, train accuracy 0
step 2000, train accuracy 0.9
step 4000, train accuracy 0.96
step 6000, train accuracy 0.94
step 8000, train accuracy 0.92
step 10000, train accuracy 0.98

2021-01-05 01:03:39
DATA_DIR: /PUBLIC/sakura/self_secuity/echowei/reproduction2/USTC-TK2016-ubuntu/5_Mnist
0, aimchat, 0.07194244604316546, 0.01020408163265306
1, AIM_Chat, 0.0, 0.0
2, browsing, 0.5, 0.003875968992248062
3, browsing2-1, 0.020446096654275093, 0.01089108910891089
4, browsing2-2, 0.12269129287598944, 0.09470468431771895
5, browsing2, 0.03184713375796178, 0.07286995515695067
6, browsing_ara, 0.0, 0.0
7, browsing_ara2, 0.0, 0.0
8, browsing_ger, 0.07142857142857142, 0.001026694045174538
9, Email_IMAP_filetransfer, 0.0, 0.0
10, AUDIO_spotifygateway, 0.0, 0.0
11, AUDIO_tor_spotify, 0.0, 0.0
12, AUDIO_tor_spotify2, 0.0, 0.0
13, BROWSING_gate_SSL_Browsing, 0.0, 0.0
14, BROWSING_ssl_browsing_gateway, 0.0, 0.0
15, BROWSING_tor_browsing_ara, 0.0, 0.0
16, BROWSING_tor_browsing_ger, 0.0, 0.0
17, BROWSING_tor_browsing_mam, -1, 0.0
18, BROWSING_tor_browsing_mam2, 0.0, 0.0
19, CHAT_aimchatgateway, 0.0, 0.0
Total accuracy: 0.0184

环境是Ubuntu18.04,所以用的是Ubuntu的分支处理
我的数据处理流程如下:

1.	pwsh 1_Pcap2Session.ps1 -f
2.	pwsh 2_ProcessSession.ps1 -a -s
3.	python 3_Session2Png.py 
4.	python 4_Png2Mnist.py

请问是我哪一步做错了吗?
感谢您的开源,期待您的回复!

malware这篇的一个疑惑

请问malware这篇,预处理之后是每条流里每个packet的前784字节,都作为一条训练数据,还是一个流整体作为一条训练数据?如果是一个流或会话整体作为一条训练数据,打一个标签,那么是选取前三个packet还是几个?
在train_cnn.py中,好像是按照mnist的格式,即一个784字节的数据对应一个标签。所以有上述疑惑。

Few bugs in the code

I ran the whole process, and found (and fixed) some potential bugs.

1_Pcap2Session.ps1: foreach($f in gci 1_Pcap *.pcap)
This code only handle .pcap files (There are two types of files: .pcap and .pcapng).
Fix: So we should first convert .pcapng to .pcap using splitcap, then run this script.

2_ProcessSession.ps1 : $test = $files | get-random -count ([int]($count/10))
When $count is less than 10, it'll cause error, and $test is still the $test coming from the previous loop. This leads to some data wrongly classified.
Fix: ignore the .pcap file that has less than 10 packets.

CNN.py : y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, w_fc2) + b_fc2)
This would cause nan or 0 gradient if "tf.matmul(h_fc1_drop, w_fc2) + b_fc2" is all zero or nan, so when the training iteration is big enough, all weights could suddenly became 0.
Fix: use tf.nn.softmax_cross_entropy_with_logits instead. It handles the extreme case safely.

训练集和测试集使用预处理工具处理后的测试结果很低

使用您给的PCAP数据和预处理工具得到的训练集和测试集与您直接提供的得到的结果完全不同,不知道是哪里出了问题,怀疑预处理或数据有问题,要不就是哪里遗漏了,或是预处理的方式不一样,我得到的训练集和测试集与您的不同,一直很困扰我,请原谅~

关于您提供的数据预处理结果

作者您好,我读了您的文章后下载您的代码,和您提供的数据预处理结果,因为我想实验一下你提出的模型的分类准确度是多少,然后我在运行” encrypt_traffic_cnn_1d.py“后会提示类别不对,您提供的数据预处理结果中的6class.zip以及12class.zip中真的有6个类别以及12个类别吗?期待您的回复!祝您生活工作顺利。

关于HAST-IDS预处理环节

作者您好 我尝试着用你论文里的方法进行预处理 试了很长时间 都不行, 论文里写到,在CNN模型中是对数据流中的每个数据包单独处理 而不是对数据流整体进行处理,我也看了你之前的问答,好像说的也是对流文件进行处理。 我尝试着用pkt2flow分流后,用第一个文件里面的预处理代码进行处理,这样操作起来并不可靠,模型代码会报错。 所以想问下作者 这部分的预处理 你已经解决了吗。

关于MAC address and IP address的处理

你好, 在你的论文"Malware Traffic Classification Using Convolutional Neural Network for Representation Learning"中提到, 在处理数据的时候, 需要randomizes MAC address and IP address in in data link layer.

但是, 我在看你仓库代码的时候(这个文件夹, 2.encrypted_traffic_classification/2.PreprocessedTools), 没有看到相应的对IP和MAC的处理内容. 想问一下你是在哪里对这两个进行处理的呢.

期待你的回复, 祝好.

感谢开源

正准备以这个作为课题,正在研究您的博士论文,给我帮助很大,谢谢

Folder Structure for Training Data in Encrypted Traffic Classification Task

For the encrypted traffic classification task, the Dataset.txt and Png2Mnist.py files seem to imply that each class (label) should have it's own folder with associated pcap files inside (in other words the label information is determined by the folder structure). However the Pcap2Session and ProcessSession files seem to assume all pcap files are together in a single folder (for example gci just looks within the single folder).

Maybe I am missing something about these assumptions?

运行遇到几个问题

不知道是不是版本的问题
第一个错误
File "/DeepTraffic/2.encrypted_traffic_classification/4.TrainAndTest/1d_cnn_25+3/encrypt_traffic_cnn_1d.py", line 133, in
ValueError: Cannot feed value of shape (50, 10) for Tensor u'Placeholder_1:0', which has shape '(?, 2)'

第二个错误
File "DeepTraffic/2.encrypted_traffic_classification/4.TrainAndTest/1d_cnn_25+3/encrypt_traffic_cnn_1d.py", line 107, in
ValueError: Only call softmax_cross_entropy_with_logits with named arguments (labels=..., logits=..., ...)

第三个问题
作者为什么要把 y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, w_fc2) + b_fc2)
注释掉呢,下面还有很多y_conv的引用。
感觉回答

关于evaluate the model模块

作者您好,请教一下:我在win环境下跑的代码,#evaluate the model这部分10分类的能跑通,2和20分类的只训练了模型,没有跑出来acc等数据,比较疑惑原因,望解惑

关于预处理的混淆问题

你好,
感谢您的论文以及开源的代码。
问题已经解决,可能是我把session和flow混在一起了
感谢。

Pre-processing for HAST-IDS

Sir
Can you provide the preprocessing tool and code to split the raw pcap into flows just as you did for the malware and encrypted traffic classification.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.