Comments (6)
I need this training data as well.
Could you share me the download link or how to create this format of dataset?
from paraphraser.
Hi, I know has already passed some time since you were asking these files.
I'm not @vsuthichai but I think I understand how to generate training data.
First thing you need to download the data from internet (just search para-nmt-50m-demo).
Next you need to run the file "preprocess_data.py" passing as parameter the file you downladed called "para-nmt-50m-small.txt".
This will create a bunch of files called "para-nmt-50m-small.txt + ".
Now the last thing you need to do is create the sentence embeddings (I need to find out how to do) and correct all the import strings where all these files are used in the code.
Finally You should be able to train your model. Make sure that the dataset you use is formatted like so "Source sentence" + "\t + "final sentence".
I need now to translate all the dataset to italian and try to train in italian...
Wish me luck
from paraphraser.
how to get the trainging dataset
from paraphraser.
@vsuthichai
I would like to have the training data as well, is it possible to share with me privately?
from paraphraser.
Hi, I know has already passed some time since you were asking these files.
I'm not @vsuthichai but I think I understand how to generate training data.
First thing you need to download the data from internet (just search para-nmt-50m-demo).
Next you need to run the file "preprocess_data.py" passing as parameter the file you downladed called "para-nmt-50m-small.txt".
This will create a bunch of files called "para-nmt-50m-small.txt + ".
Now the last thing you need to do is create the sentence embeddings (I need to find out how to do) and correct all the import strings where all these files are used in the code.Finally You should be able to train your model. Make sure that the dataset you use is formatted like so "Source sentence" + "\t + "final sentence".
I need now to translate all the dataset to italian and try to train in italian...
Wish me luck
thanks for your comment, have you secceeded? I'm doing the similar thing, translate these to chinese.
from paraphraser.
Hi, I know has already passed some time since you were asking these files.
I'm not @vsuthichai but I think I understand how to generate training data.
First thing you need to download the data from internet (just search para-nmt-50m-demo).
Next you need to run the file "preprocess_data.py" passing as parameter the file you downladed called "para-nmt-50m-small.txt".
This will create a bunch of files called "para-nmt-50m-small.txt + ".
Now the last thing you need to do is create the sentence embeddings (I need to find out how to do) and correct all the import strings where all these files are used in the code.
Finally You should be able to train your model. Make sure that the dataset you use is formatted like so "Source sentence" + "\t + "final sentence".
I need now to translate all the dataset to italian and try to train in italian...
Wish me luckthanks for your comment, have you secceeded? I'm doing the similar thing, translate these to chinese.
hi, I didn't really succeeded. I tried to use the training data and translate into Italian. The thing is that the translation weren't good and the training dataset wasn't big enough (maybe because I only used the para-nmt whereas the author of the repository used a bunch of them). I tried to train anyway but I didn't have good results.
from paraphraser.
Related Issues (20)
- OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory. HOT 1
- No matching distribution found for en-core-web-sm==2.0.0 (from -r requirements.txt (line 13)) HOT 2
- web site does not respond HOT 1
- Unrecognized Arguments: Checkpoint HOT 2
- can you convert this to tensorflowjs ?
- python3 paraphraser/setup.py install -->not found (or not a regular file)
- Paraphraser hangs when invoked repeatedly in a loop HOT 3
- List of training datasets HOT 1
- Can not download data model. HOT 1
- Trained model
- The demo page doesn't return anything HOT 3
- sample website dont work HOT 1
- Is there an example?
- Training data volume
- some examples:
- how to run the paraphraser? meet with some problem HOT 6
- how to training the model form scratch? HOT 4
- Could you point me to glove.6B.300d.pickle? HOT 2
- Question: in inference.py, what does UUNNKK means? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paraphraser.