Hi, thanks for your code, I used your to create a new data set. I first want to

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Errors when creating new dataset. about code2seq HOT 6 CLOSED

tech-srl commented on May 27, 2024

Errors when creating new dataset.

from code2seq.

Comments (6)

JiyangZhang commented on May 27, 2024 1

Sweet! I found that I dumped max_contexts rather than max_data_contexts into dictionary. Now it is successfully reading the data.

Thank you very much.

from code2seq.

JiyangZhang commented on May 27, 2024

Just curious about what results in the error

from code2seq.

urialon commented on May 27, 2024

Hi @JiyangZhang ,
Thank you for your interest in code2seq!

I am aware that sometimes the reader tries to read a batch, and 10 examples are not even enough for a single batch.

But I think that in your case, the error is not even related to the number of examples, but to the number of paths ("contexts") in each example. The reader expects 201 fields, which are 1 target sequence and 200 contexts. Did you preprocess your data with other numbers than the defaults for MAX_CONTEXTS and MAX_DATA_CONTEXTS?

from code2seq.

JiyangZhang commented on May 27, 2024

Hi @urialon ,

Thanks for the reply and that makes sense. But I used the default value of MAX_CONTEXTS, which is 200 and MAX_DATA_CONTEXTS, which is 1000. I checked my 'data.train.c2s', every context is padded to length of 1000. I am not sure what is the reason.

I still think the reason is the dict, because it works whenever I use your dict. The way I create dict is just dumping three histograms and the numbers (same with preprocess.py).

Thank you very much!

from code2seq.

urialon commented on May 27, 2024

Let's compare the dict files.

Please run the following code separately on the two dictionary files, changing DICT_FILE_PATH every time:

with open(DICT_FILE_PATH, 'rb') as file:
                subtoken_to_count = pickle.load(file)
                node_to_count = pickle.load(file)
                target_to_count = pickle.load(file)
                max_contexts = pickle.load(file)
                self.num_training_examples = pickle.load(file)
                print('Dictionaries loaded.')

what is max_contexts in each case?

from code2seq.

urialon commented on May 27, 2024

Also note that self.config.DATA_NUM_CONTEXTS needs to be 0 such that the value of max_contexts from the dict file will be loaded:
https://github.com/tech-srl/code2seq/blob/master/model.py#L42

from code2seq.

Recommend Projects

Errors when creating new dataset. about code2seq HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent