Giter Site home page Giter Site logo

Comments (4)

wyx502 avatar wyx502 commented on July 30, 2024

image
作者您好,我找到了您发布的上图,想确认下是15G的公开数据集吗?另外,还想问一下4.3G的encrypted_traffic_burst.txt文件是30G的数据生成的吗?谢谢。

from et-bert.

linwhitehat avatar linwhitehat commented on July 30, 2024

1,预训练的数据集中选取是没有什么加入约束的,因此可以使用尽可能丰富的协议流量进行替代。
2,encrypted_traffic_burst.txt是基于预训练数据生成的

from et-bert.

wyx502 avatar wyx502 commented on July 30, 2024

image
80df7a92e9cc6d0148e694c1cb792bd
抱歉再次打扰,我想问一下,您在主页的readme提到用vocab_process/main.py生成corpora,在data_process的readme中又提到pre-training stage用data_generation生成burst,哪个才是能够生成encryted_traffic_burst.txt的方法呢。因为我发现这两个都能生成txt文件。您能否再详细说明一下呢,谢谢。

from et-bert.

linwhitehat avatar linwhitehat commented on July 30, 2024

image 80df7a92e9cc6d0148e694c1cb792bd 抱歉再次打扰,我想问一下,您在主页的readme提到用vocab_process/main.py生成corpora,在data_process的readme中又提到pre-training stage用data_generation生成burst,哪个才是能够生成encryted_traffic_burst.txt的方法呢。因为我发现这两个都能生成txt文件。您能否再详细说明一下呢,谢谢。

你好,data_process中是生成用于预训练corpora所需的流量burst数据,然后由vocab_process生成相应的corpora,可以把这两部分理解为流量数据预处理和预训练数据生成的过程。

from et-bert.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.