Giter Site home page Giter Site logo

deeppm's Introduction

설치 조건

필요 library

Tokenizer

  • 주어진 basic block을 있는 그대로 tokenize하기 위해 사용
  • 기존에 사용하던 tokenizer는 instruction에 canonicalized을 적용해서 암시적인 정보를 추가로 제공함, 하지만 필요한 정보를 추출하는 과정에서 정보 손실이 일어남.

Tokenize 과정

  • 각각 명령어의 시작과 끝에 <START, <END> 추가

    • 첫 번째 명령어에는 <START> 대신에 <BLOCK_START>
    • 마지막 명령어에는 <END> 대신 <BLOCK_END>
  • 특수기호들도 하나의 토큰으로([ ] +’, ‘-’, ‘,’)

    • [
    • ]
    • +
    • -
    • *
    • ,
    • :
  • 0x로 시작하는 constant:

    • constant = 0: <ZERO_{constant_byte_size}_BYTES>로 변경

    • constant ≠ 0: <NUM_{constant_byte_size}_BYTES>로 변경

    • 예시) 0x03 ⇒ <NUM_1_BYTES>

      0x00000012 ⇒ <NUM_4_BYTES>

      0x0000 ⇒ <ZERO_2_BYTES>

      0x00 ⇒ <ZERO_1_BYTES>

  • tokenizer 사용시 처음 보는 토큰은 <UNK> 토큰으로 변경

예시)

push   rbx
test   byte ptr [rdi+0x0e], 0x01

tokenizer 사용시

<BLOCK_START> push rbx <END>
<START> test byte ptr [ rdi + <NUM_1_BYTES> ] , <NUM_1_BYTES> <BLOCK_END>

deeppm's People

Contributors

gjujinkim96 avatar

Watchers

 avatar

deeppm's Issues

Dataset

Could you share some data with me, and is this an implementation for the paper DeepPM?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.