Giter Site home page Giter Site logo

yanminglai / malware-gan Goto Github PK

View Code? Open in Web Editor NEW
111.0 111.0 57.0 2.43 MB

Realization of paper: "Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN" 2017

Home Page: https://arxiv.org/abs/1702.05983

License: GNU General Public License v3.0

Python 100.00%

malware-gan's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

malware-gan's Issues

Generator smooth function

Hello there,
I have read your paper and debugged your code, however I am unable to find out where you implement the function G(m,z)= max(m,o) in formula 1 of 2.2 Paragraph Generator.

For example when you compute the g_loss is only based on the output of the combined model which takes as input the xmal_batch+ noise. This means that the gradient is backpropagating from the substitute (fixed weight) detector back into the generator but I don't see how it implements the maximum between the xmal_batch input (m) and the output of the generator (o).

Cheers.

Can't recreate graph

Hi,
I am not able to recreate the tpr against epoch graph. For me the TPR is consistently sat at 1 for all epochs. I have tried using your different version implementations but sadly with no luck.

Unclear Output

Running the python file exp.py gives the below shown output. Could you please explain what it indicates.

Original_Train_TRR: 0.979890310786106, Adver_Train_TRR: 0.013711151736745886
Original_Test_TRR: 0.9854014598540146, Adver_Test_TRR: 0.0072992700729927005

next step for malware gan

Hello Weiwei,
I have read your paper and used your GAN approach with a real AV engine and it does a pretty good job at improving detection rates.

I have a few questions regarding how this approach could be expanded in a more realistic scenario:

  • The blackbox detector (like any of today’s ML based AV engines) typically uses a mix of continuous and binary features

  • For example in addition to API calls (binary features) it also consider total number of sections (numeric ordinal feature) or file entropy (numeric continues feature)

  • Only a subset of features can be manipulated by the attacker without breaking functionality -like in your paper- of the sample

With this in mind:

  • In the case of only binary features m whereby only a subset of them q can be added (ORed) whereas the other m-q can’t
  • How would you change the Generator to take this into account?
    Would you just ignore the q and train it as before on the m subset assuming that they are somehow uncorrelated?
  • In the case of continuous features (like count of an attribute) where one could add arbitrary values (possibly unbounded) should one follow the approach used by adversarial images?
  • How would you then combine both the binary and numerical generators? Would you keep them independent and train them at each iteration independently from each other?

Thanks and regards.

.idea Folder

In the .idea folder you committed, you have settings specific to your development environment. For example, in .idea/mal_gan.iml, it lists your development path.

<orderEntry type="jdk" jdkName="Python 3.6.2 (C:\Users\Kuan\Anaconda2\envs\py3\python.exe)" jdkType="Python SDK" />

You typically want to delete the .idea folder and add it to your .gitignore. If you would like a pull-request, I can do that for you.

Dataset Required

I need the full dataset i.e. 180K samples. Please tell me how to get that. I am trying to do some research around this. The dataset I have is not giving good results with this method. Substitute detector always predicts the generated examples as benign in my dataset.
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.