I am trying to work on my own data in a txt file the source and target sentences are s

Custom Text Dataset about pytorch-seq2seq HOT 6 CLOSED

moodhiaj commented on May 24, 2024

Custom Text Dataset

from pytorch-seq2seq.

Comments (6)

moodhiaj commented on May 24, 2024 2

If someone is looking for the answer, here what I did and worked for me:
`tokenize = lambda x:x.split(' ')
SRC = Field(tokenize = tokenize)
TRG = Field(tokenize = tokenize,)
fields = {'Source': ('src',SRC), 'Target': ('trg',TRG)}
train_data, valid_data, test_data = torchtext.legacy.data.TabularDataset.splits(
path = '',
train = 'My_train_Set.csv',
test = 'My_test_set.csv',
validation = 'My_Validation_Set.csv',
format = 'csv',
fields = fields)
SRC.build_vocab(train_data, min_freq=2)
TRG.build_vocab(train_data, min_freq=2)
BATCH_SIZE = 128

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, valid_iterator, test_iterator = BucketIterator.splits(
(train_data, valid_data, test_data),
batch_size = BATCH_SIZE,
sort_within_batch = True,
sort_key = lambda x : len(x.src),
device = device)`

from pytorch-seq2seq.

wusuhuang commented on May 24, 2024

i also want to ask this question!

from pytorch-seq2seq.

wusuhuang commented on May 24, 2024

thank you very much. I also want to know how to store source and target sentence in CSV file. They are paired sentences. 

…

---Original--- From: "Moodhi ***@***.***> Date: Wed, Mar 30, 2022 03:57 AM To: ***@***.***>; Cc: ***@***.******@***.***>; Subject: Re: [bentrevett/pytorch-seq2seq] Custom Text Dataset (Issue #183) If someone is looking for the answer, here what I did and worked for me: `tokenize = lambda x:x.split(' ') SRC = Field(tokenize = tokenize) TRG = Field(tokenize = tokenize,) fields = {'Source': ('src',SRC), 'Target': ('trg',TRG)} train_data, valid_data, test_data = torchtext.legacy.data.TabularDataset.splits( path = '', train = 'My_train_Set.csv', test = 'My_test_set.csv', validation = 'My_Validation_Set.csv', format = 'csv', fields = fields) SRC.build_vocab(train_data, min_freq=2) TRG.build_vocab(train_data, min_freq=2) BATCH_SIZE = 128 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') train_iterator, valid_iterator, test_iterator = BucketIterator.splits( (train_data, valid_data, test_data), batch_size = BATCH_SIZE, sort_within_batch = True, sort_key = lambda x : len(x.src), device = device)` — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

from pytorch-seq2seq.

moodhiaj commented on May 24, 2024

I don't know how your data structured but mine was originally in Excel files so I didn't have any problems converting them to CSV.

from pytorch-seq2seq.

wusuhuang commented on May 24, 2024

can you tell me how to make your own data of the csv format?

from pytorch-seq2seq.

tuzeao commented on May 24, 2024

Thanks for this great solution.
Using model with custom dataset is always a big bored and irritable problem

from pytorch-seq2seq.

Custom Text Dataset about pytorch-seq2seq HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent