error of preprocess about code2seq HOT 12 CLOSED

tech-srl commented on May 27, 2024

error of preprocess

from code2seq.

Comments (12)

lizhuo-1994 commented on May 27, 2024 1

it works! it seems that I lost *.jar file and I found it back ,thanks for helping!

from code2seq.

lizhuo-1994 commented on May 27, 2024

I modified a little of java large data set but I did not rewrite anything for code2seq, could you please help me about my issue? Thanks a lot!

from code2seq.

urialon commented on May 27, 2024

Hi @lizhuo-1994 ,
Thank you for your interest in code2seq!

What is your Java version? Please run "java --version"

from code2seq.

urialon commented on May 27, 2024

Additionally - can you try to run the extractor directly, without the python wrapper:

java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main

from code2seq.

lizhuo-1994 commented on May 27, 2024

$ java -version
openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1~16.04-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)

$ java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main

Error: Could not find or load main class JavaExtractor.App

here is my result, thanks for helping~

from code2seq.

urialon commented on May 27, 2024

Did you run this from the main code2seq directory? Does the jar file exist?
Can you please run:
ls -lt JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar
?

If the file exists, then please run:

jar tvf JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar | grep JavaExtractor

from code2seq.

lizhuo-1994 commented on May 27, 2024

but here is another problem:

Extracting paths from validation set...
Finished extracting paths from validation set
Extracting paths from test set...
Finished extracting paths from test set
Extracting paths from training set...
dir: data/train was not completed in time
Finished extracting paths from training set
Creating histograms from the training data
subtoken vocab size: 0
node vocab size: 0
target vocab size: 0
File: 1.test.raw.txt
Traceback (most recent call last):
File "preprocess.py", line 115, in
max_contexts=int(args.max_contexts), max_data_contexts=int(args.max_data_contexts))
File "preprocess.py", line 53, in process_file
print('Average total contexts: ' + str(float(sum_total) / total))
ZeroDivisionError: float division by zero

from code2seq.

lizhuo-1994 commented on May 27, 2024

maybe it is because of timeout , I will try it again, thanks ~

from code2seq.

urialon commented on May 27, 2024

Yes, there are timeouts, and we originally used a 64-cores machine to preprocess the datasets.
So using a smaller machine might trigger timeouts.
The exact time is defined here:
https://github.com/tech-srl/code2seq/blob/master/JavaExtractor/extract.py#L37

By default, 6 processes run in parallel (see: https://github.com/tech-srl/code2seq/blob/master/JavaExtractor/extract.py#L66 and each of them runs with 64 threads (see: https://github.com/tech-srl/code2seq/blob/master/preprocess.sh#L32)

To verify that preprocessing runs on a small dataset, you can try preprocessing the JavaExtractor itself. I.e., point the training+test+validation paths to JavaExtractor/JPredict/src/ and verify that it runs successfully within a few seconds or so.

from code2seq.

lizhuo-1994 commented on May 27, 2024

thanks for the explanation, I re-configured it and now it seems working well.

BTW, it is really disk-consuming and time-consuming, so I think it would be running about 2-3days for preprocessing

from code2seq.

urialon commented on May 27, 2024

Unfortunately, that's right.
The preprocessing pipeline was designed to process millions of examples and it is disk- and time- consuming.

I'm closing this issue for now, feel free to re-open if you have any additional question.

from code2seq.

walt676 commented on May 27, 2024

thanks for the explanation, I re-configured it and now it seems working well.

BTW, it is really disk-consuming and time-consuming, so I think it would be running about 2-3days for preprocessing

Hello, may I ask the specific configuration of your machine and the last parameter you used?

Thanks a lot!

from code2seq.

error of preprocess about code2seq HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent