yanminglai / malware-gan Goto Github PK

View Code? Open in Web Editor NEW

111.0 111.0 57.0 2.43 MB

Realization of paper: "Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN" 2017

Home Page: https://arxiv.org/abs/1702.05983

License: GNU General Public License v3.0

Python 100.00%

malware-gan's People

Stargazers

Watchers

Forkers

chneau axoaxel tttjjjwww s-sabareeswaran walt1998 sanuparaballi qing0991 yuedeji iemos lamdangm returnhere qian-han mingyu-academic wyattyun zoujianfei 24mlight kkkkobeeee zyyrrr abdullah-b lzylucy johannespinger liuzhichenger ywkw1717 rpplayground yanlychen sakuracy wmastersonv stenpiren ccc2876 hyeon-jeong underspace milkigit luozui active2626 s884812 saltfun 973771793 tomkallo yzx-fish alhaol adeeps1 sadaqatali1234 gan-apps-sets jimba86 muzihuole yakaili thisisnitish vishalsharma2000 zhangding222 dagrons bluesoju rafaelmra burkinabe techris45 xuguowong adenrajput tejasakumar

malware-gan's Issues

Generator smooth function

Hello there,
I have read your paper and debugged your code, however I am unable to find out where you implement the function G(m,z)= max(m,o) in formula 1 of 2.2 Paragraph Generator.

For example when you compute the g_loss is only based on the output of the combined model which takes as input the xmal_batch+ noise. This means that the gradient is backpropagating from the substitute (fixed weight) detector back into the generator but I don't see how it implements the maximum between the xmal_batch input (m) and the output of the generator (o).

Cheers.

你好，想请教一个问题。

请问不运行 1.py 文件会有什么影响？它是一定要运行的吗？

Can't recreate graph

Hi,
I am not able to recreate the tpr against epoch graph. For me the TPR is consistently sat at 1 for all epochs. I have tried using your different version implementations but sadly with no luck.

Unclear Output

Running the python file exp.py gives the below shown output. Could you please explain what it indicates.

Original_Train_TRR: 0.979890310786106, Adver_Train_TRR: 0.013711151736745886
Original_Test_TRR: 0.9854014598540146, Adver_Test_TRR: 0.0072992700729927005

next step for malware gan

Hello Weiwei,
I have read your paper and used your GAN approach with a real AV engine and it does a pretty good job at improving detection rates.

I have a few questions regarding how this approach could be expanded in a more realistic scenario:

The blackbox detector (like any of today’s ML based AV engines) typically uses a mix of continuous and binary features
For example in addition to API calls (binary features) it also consider total number of sections (numeric ordinal feature) or file entropy (numeric continues feature)
Only a subset of features can be manipulated by the attacker without breaking functionality -like in your paper- of the sample

With this in mind:

In the case of only binary features m whereby only a subset of them q can be added (ORed) whereas the other m-q can’t
How would you change the Generator to take this into account?
Would you just ignore the q and train it as before on the m subset assuming that they are somehow uncorrelated?
In the case of continuous features (like count of an attribute) where one could add arbitrary values (possibly unbounded) should one follow the approach used by adversarial images?
How would you then combine both the binary and numerical generators? Would you keep them independent and train them at each iteration independently from each other?

Thanks and regards.

implementation -help

Can u please help me to run your code in step by step manner

Where are generated examples located?

I'd like to check on VirusTotal to see if the generated examples can bypass traditional antiviruses as well

.idea Folder

In the .idea folder you committed, you have settings specific to your development environment. For example, in .idea/mal_gan.iml, it lists your development path.

<orderEntry type="jdk" jdkName="Python 3.6.2 (C:\Users\Kuan\Anaconda2\envs\py3\python.exe)" jdkType="Python SDK" />

You typically want to delete the .idea folder and add it to your .gitignore. If you would like a pull-request, I can do that for you.

Dataset Required

I need the full dataset i.e. 180K samples. Please tell me how to get that. I am trying to do some research around this. The dataset I have is not giving good results with this method. Substitute detector always predicts the generated examples as benign in my dataset.
Thanks.

Dataset

Please upload your data set

yanminglai / malware-gan Goto Github PK

malware-gan's People

Stargazers

Watchers

Forkers

malware-gan's Issues

Generator smooth function

你好，想请教一个问题。

Can't recreate graph

Unclear Output

next step for malware gan

implementation -help

Where are generated examples located?

.idea Folder

Dataset Required

Dataset

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent