Hello Victor. I would like to thank u first for your contribution. <p dir="aut

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Training data (aggregate_paraphrase_corpus_0) about paraphraser HOT 6 OPEN

vsuthichai commented on June 16, 2024 9

Training data (aggregate_paraphrase_corpus_0)

from paraphraser.

Comments (6)

jasonray716 commented on June 16, 2024 7

I need this training data as well.
Could you share me the download link or how to create this format of dataset?

from paraphraser.

LBartolini commented on June 16, 2024 4

Hi, I know has already passed some time since you were asking these files.
I'm not @vsuthichai but I think I understand how to generate training data.
First thing you need to download the data from internet (just search para-nmt-50m-demo).
Next you need to run the file "preprocess_data.py" passing as parameter the file you downladed called "para-nmt-50m-small.txt".
This will create a bunch of files called "para-nmt-50m-small.txt + ".
Now the last thing you need to do is create the sentence embeddings (I need to find out how to do) and correct all the import strings where all these files are used in the code.

Finally You should be able to train your model. Make sure that the dataset you use is formatted like so "Source sentence" + "\t + "final sentence".
I need now to translate all the dataset to italian and try to train in italian...
Wish me luck

from paraphraser.

SeekPoint commented on June 16, 2024

how to get the trainging dataset

from paraphraser.

tim5go commented on June 16, 2024

@vsuthichai
I would like to have the training data as well, is it possible to share with me privately?

from paraphraser.

kay312 commented on June 16, 2024

Hi, I know has already passed some time since you were asking these files.
I'm not @vsuthichai but I think I understand how to generate training data.
First thing you need to download the data from internet (just search para-nmt-50m-demo).
Next you need to run the file "preprocess_data.py" passing as parameter the file you downladed called "para-nmt-50m-small.txt".
This will create a bunch of files called "para-nmt-50m-small.txt + ".
Now the last thing you need to do is create the sentence embeddings (I need to find out how to do) and correct all the import strings where all these files are used in the code.

Finally You should be able to train your model. Make sure that the dataset you use is formatted like so "Source sentence" + "\t + "final sentence".
I need now to translate all the dataset to italian and try to train in italian...
Wish me luck

thanks for your comment, have you secceeded? I'm doing the similar thing, translate these to chinese.

from paraphraser.

LBartolini commented on June 16, 2024

Hi, I know has already passed some time since you were asking these files.
I'm not @vsuthichai but I think I understand how to generate training data.
First thing you need to download the data from internet (just search para-nmt-50m-demo).
Next you need to run the file "preprocess_data.py" passing as parameter the file you downladed called "para-nmt-50m-small.txt".
This will create a bunch of files called "para-nmt-50m-small.txt + ".
Now the last thing you need to do is create the sentence embeddings (I need to find out how to do) and correct all the import strings where all these files are used in the code.
Finally You should be able to train your model. Make sure that the dataset you use is formatted like so "Source sentence" + "\t + "final sentence".
I need now to translate all the dataset to italian and try to train in italian...
Wish me luck

thanks for your comment, have you secceeded? I'm doing the similar thing, translate these to chinese.

hi, I didn't really succeeded. I tried to use the training data and translate into Italian. The thing is that the translation weren't good and the training dataset wasn't big enough (maybe because I only used the para-nmt whereas the author of the repository used a bunch of them). I tried to train anyway but I didn't have good results.

from paraphraser.

Training data (aggregate_paraphrase_corpus_0) about paraphraser HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent