Giter Site home page Giter Site logo

error of preprocess about code2seq HOT 12 CLOSED

tech-srl avatar tech-srl commented on May 27, 2024
error of preprocess

from code2seq.

Comments (12)

lizhuo-1994 avatar lizhuo-1994 commented on May 27, 2024 1

it works! it seems that I lost *.jar file and I found it back ,thanks for helping!

from code2seq.

lizhuo-1994 avatar lizhuo-1994 commented on May 27, 2024

I modified a little of java large data set but I did not rewrite anything for code2seq, could you please help me about my issue? Thanks a lot!

from code2seq.

urialon avatar urialon commented on May 27, 2024

Hi @lizhuo-1994 ,
Thank you for your interest in code2seq!

What is your Java version? Please run "java --version"

from code2seq.

urialon avatar urialon commented on May 27, 2024

Additionally - can you try to run the extractor directly, without the python wrapper:

java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main

from code2seq.

lizhuo-1994 avatar lizhuo-1994 commented on May 27, 2024

$ java -version
openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1~16.04-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)

$ java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main

Error: Could not find or load main class JavaExtractor.App

here is my result, thanks for helping~

from code2seq.

urialon avatar urialon commented on May 27, 2024

Did you run this from the main code2seq directory? Does the jar file exist?
Can you please run:
ls -lt JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar
?

If the file exists, then please run:

jar tvf JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar | grep JavaExtractor

from code2seq.

lizhuo-1994 avatar lizhuo-1994 commented on May 27, 2024

but here is another problem:

Extracting paths from validation set...
Finished extracting paths from validation set
Extracting paths from test set...
Finished extracting paths from test set
Extracting paths from training set...
dir: data/train was not completed in time
Finished extracting paths from training set
Creating histograms from the training data
subtoken vocab size: 0
node vocab size: 0
target vocab size: 0
File: 1.test.raw.txt
Traceback (most recent call last):
File "preprocess.py", line 115, in
max_contexts=int(args.max_contexts), max_data_contexts=int(args.max_data_contexts))
File "preprocess.py", line 53, in process_file
print('Average total contexts: ' + str(float(sum_total) / total))
ZeroDivisionError: float division by zero

from code2seq.

lizhuo-1994 avatar lizhuo-1994 commented on May 27, 2024

maybe it is because of timeout , I will try it again, thanks ~

from code2seq.

urialon avatar urialon commented on May 27, 2024

Yes, there are timeouts, and we originally used a 64-cores machine to preprocess the datasets.
So using a smaller machine might trigger timeouts.
The exact time is defined here:
https://github.com/tech-srl/code2seq/blob/master/JavaExtractor/extract.py#L37

By default, 6 processes run in parallel (see: https://github.com/tech-srl/code2seq/blob/master/JavaExtractor/extract.py#L66 and each of them runs with 64 threads (see: https://github.com/tech-srl/code2seq/blob/master/preprocess.sh#L32)

To verify that preprocessing runs on a small dataset, you can try preprocessing the JavaExtractor itself. I.e., point the training+test+validation paths to JavaExtractor/JPredict/src/ and verify that it runs successfully within a few seconds or so.

from code2seq.

lizhuo-1994 avatar lizhuo-1994 commented on May 27, 2024

thanks for the explanation, I re-configured it and now it seems working well.

BTW, it is really disk-consuming and time-consuming, so I think it would be running about 2-3days for preprocessing

from code2seq.

urialon avatar urialon commented on May 27, 2024

Unfortunately, that's right.
The preprocessing pipeline was designed to process millions of examples and it is disk- and time- consuming.

I'm closing this issue for now, feel free to re-open if you have any additional question.

from code2seq.

walt676 avatar walt676 commented on May 27, 2024

thanks for the explanation, I re-configured it and now it seems working well.

BTW, it is really disk-consuming and time-consuming, so I think it would be running about 2-3days for preprocessing

Hello, may I ask the specific configuration of your machine and the last parameter you used?

Thanks a lot!

from code2seq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.