Hardware configuration problem about et-bert HOT 6 CLOSED

linwhitehat commented on July 30, 2024

Hardware configuration problem

from et-bert.

Comments (6)

linwhitehat commented on July 30, 2024

We encountered some problems in the process of reproducing the model.

Can you tell us your hardware configuration, for example, how much memory？ how many GPUs？what type of GPUs？ because we found that we were running out of memory during the reproduction process.

Thank you for following our work.
The details of our experimental environment are as follows:
Available memory is 502 GB
Available GPU are Tesla V100S (32GB) x 4

from et-bert.

lincgcg commented on July 30, 2024

Thanks for your reply, we have configured a similar configuration to yours:
Available memory is 503 GB
Available GPU are Tesla V100S (32GB) x 8
But we have some questions in the process of implementing Pre-process:

Under your configuration, how long does it take to complete the second step in the pre-process? , We've spent at least 24 hours now, but the program still hasn't finished running. We would also like you to tell us how long the other steps take to run.
We found that in the process of running the program(the second step in the pre-process), it only used nearly 502G of memory in the first few hours, but after about ten hours, the usage of the memory is only 10G-30G. Did you encounter this situation？

from et-bert.

linwhitehat commented on July 30, 2024

Thanks for your reply, we have configured a similar configuration to yours: Available memory is 503 GB Available GPU are Tesla V100S (32GB) x 8 But we have some questions in the process of implementing Pre-process:

Under your configuration, how long does it take to complete the second step in the pre-process? , We've spent at least 24 hours now, but the program still hasn't finished running. We would also like you to tell us how long the other steps take to run.

We found that in the process of running the program(the second step in the pre-process), it only used nearly 502G of memory in the first few hours, but after about ten hours, the usage of the memory is only 10G-30G. Did you encounter this situation？

The second step is to generate the pre-training dataset. The time cost of this step depends on the size of the corpora, and in our experiments, it took around 1-2 hours. I have checked and updated the codes, and suggest you replace the files in uer/target/bert_target.py and uer/utils/data.py with the new one.
I have encountered a similar situation, and if this problem still exists after updating the code file, you can give feedback and we will share the generated pre-trained dataset.

from et-bert.

lincgcg commented on July 30, 2024

Thanks for your reply, we have configured a similar configuration to yours: Available memory is 503 GB Available GPU are Tesla V100S (32GB) x 8 But we have some questions in the process of implementing Pre-process:

Under your configuration, how long does it take to complete the second step in the pre-process? , We've spent at least 24 hours now, but the program still hasn't finished running. We would also like you to tell us how long the other steps take to run.

We found that in the process of running the program(the second step in the pre-process), it only used nearly 502G of memory in the first few hours, but after about ten hours, the usage of the memory is only 10G-30G. Did you encounter this situation？

The second step is to generate the pre-training dataset. The time cost of this step depends on the size of the corpora, and in our experiments, it took around 1-2 hours. I have checked and updated the codes, and suggest you replace the files in uer/target/bert_target.py and uer/utils/data.py with the new one.

I have encountered a similar situation, and if this problem still exists after updating the code file, you can give feedback and we will share the generated pre-trained dataset.

According to your suggestion, we successfully completed the second step in the pre-process, and got the dataset.pt file with a size of about 28G.
But unfortunately, we encountered some problems in the next task, and we tried many methods without success:

In the third step of the pre-process, we modified the file paths in the datasets/main.py file. We got the following results:

We think it may be that there is no packet/splitcap/ under the path ../ET-BERT-main/datasets/cstnet-tls1.3/, if possible, please check if the code needs this file.

in pre-training, We got the following results:

We have no solution to this problem. Did you encounter this problem in the process of implementation?

Looking forward to your reply!

from et-bert.

linwhitehat commented on July 30, 2024

Thanks for your reply, we have configured a similar configuration to yours: Available memory is 503 GB Available GPU are Tesla V100S (32GB) x 8 But we have some questions in the process of implementing Pre-process:

Under your configuration, how long does it take to complete the second step in the pre-process? , We've spent at least 24 hours now, but the program still hasn't finished running. We would also like you to tell us how long the other steps take to run.

We found that in the process of running the program(the second step in the pre-process), it only used nearly 502G of memory in the first few hours, but after about ten hours, the usage of the memory is only 10G-30G. Did you encounter this situation？

The second step is to generate the pre-training dataset. The time cost of this step depends on the size of the corpora, and in our experiments, it took around 1-2 hours. I have checked and updated the codes, and suggest you replace the files in uer/target/bert_target.py and uer/utils/data.py with the new one.

I have encountered a similar situation, and if this problem still exists after updating the code file, you can give feedback and we will share the generated pre-trained dataset.

According to your suggestion, we successfully completed the second step in the pre-process, and got the dataset.pt file with a size of about 28G. But unfortunately, we encountered some problems in the next task, and we tried many methods without success:

In the third step of the pre-process, we modified the file paths in the datasets/main.py file. We got the following results:

We think it may be that there is no packet/splitcap/ under the path ../ET-BERT-main/datasets/cstnet-tls1.3/, if possible, please check if the code needs this file.

in pre-training, We got the following results:

We have no solution to this problem. Did you encounter this problem in the process of implementation?
Looking forward to your reply!

Thanks for your feedback, we have updated the codes and readme to solve the problems.

from et-bert.

GuisengLiu commented on July 30, 2024

Can you tell us your other specific software configuration, for example, which vision of python? CUDA=10.2 or 11.1?
Because we may met some problems during the reproduction process. We only noticed pytorch=1.8

from et-bert.

Hardware configuration problem about et-bert HOT 6 CLOSED

Comments (6)

We encountered some problems in the process of reproducing the model.

Can you tell us your hardware configuration, for example, how much memory？ how many GPUs？what type of GPUs？ because we found that we were running out of memory during the reproduction process.

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent