Giter Site home page Giter Site logo

[nmt-2.0] about gcm HOT 8 OPEN

natalymr avatar natalymr commented on June 24, 2024
[nmt-2.0]

from gcm.

Comments (8)

natalymr avatar natalymr commented on June 24, 2024
  • вставить eval
  • вставить bleu на eval
  • почистить код
  • запустить на машине
  • отправить, если не будет обучаться, на проверку Саше
  • 100 эпох
  • 200 эпох
  • другой lr
  • запустить baselines
  • прочитать, как интерпретировать графики на wandb
  • прочитать статью про bleu score
  • собрать датасет с 200 токенами
  • добавить dropout/batch_normalization

from gcm.

natalymr avatar natalymr commented on June 24, 2024

h = 2, 1000 эпох, 10 примеров, train/test одинаковый
image
h = 2, 1000 эпох, 10 примеров, train/test разный
image
h = 256, 1000 эпох, 10 примеров, train/test одинаковый
image
h = 256, 10 эпох, полный датасет, train/test разный
image
h = 256, 50 эпох, полный датасет, train/test разный
image
h = 256, 30 эпох, полный датасет, другой lr, train/test разные
image
h = 256, 30 эпох, 2000 датасет, lr = 0.001, train/test разные
image
h = 256, 30 эпох, 2000 датасет, lr = 0.0001, train/test разные
image
h = 256, 30 эпох, 100 датасет, lr = 0.0001, train/test разные
image

from gcm.

natalymr avatar natalymr commented on June 24, 2024

запуски на маке

500 эпох, датасет 100 train != 100 test, lr = 0.05, step_size=150, gamma=0.1, bs=10*10, test every 10, ИСПРАВЛЕННЫЙ grad acc - ничем вроде не отличается от предыдущего, но что-то скоры совсем другие; ДО BIDERECTIONAL, добавила clip_grad

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/1s09khnv

500 эпох, датасет 100 train != 100 test, lr = 0.05, step_size=150, gamma=0.1, bs=10*10, test every 10, ИСПРАВЛЕННЫЙ grad acc - ничем вроде не отличается от предыдущего, но что-то скоры совсем другие; ДО BIDERECTIONAL

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/35uz9xt0

500 эпох, датасет 100 train != 100 test, lr = 0.1, step_size=100, gamma=0.1), bs=10*10, test every 10, ИСПРАВЛЕННЫЙ grad acc

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/cnqca8ml

500 эпох, датасет 100 train != 100 test, lr = 0.1, step_size=100, gamma=0.1), bs=10*10, test every 10

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/1o31kd14

добавила grad acc (bs = 1, step делаем через каждый 5 шагов)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/uaoy1i2r

просто предыдущий запуск

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2iarf6cm

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr начали с 0.001, после 500 эпох уменьшили в 0.2 раза, добавила pack_padded

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ypzmdhj4

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr начали с 0.001, после 500 эпох уменьшили в 0.2 раза, убрала dropout, clip_grad

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2bhi9b3z

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr начали с 0.001, после 500 эпох уменьшили в 0.2 раза, clip_grad(0.25)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/3nkuv908

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr начали с 0.001, после 500 эпох уменьшили в 0.2 раза, добавила dropout на lstm в decoder, clip_grad(0.25)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/3v4h52mv?workspace=user-natalymr

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr начали с 0.001, после 500 эпох увеличили в 10 раз

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ba9k4hxu?workspace=user-natalymr

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr увеличивается через каждые 500 эпох в 10 раз

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/1612jgvi

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr уменьшается через каждые 500 эпох в 10 раз

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2kdz2ntc

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr уменьшается через каждые 100 эпох в 10 раз

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2ar6y7nu

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/cp5vlq38

h = 2, init = xavier_uniform_, 10 dataset, train=test, добавила dropout везде с коэф 0.5

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/3msbqrk2

h = 2, init = xavier_normal_, 10 dataset, train=test

вот тут видно, что

h = 2, init = normal, 10 dataset, train=test

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/3nxm5dk6?workspace=user-natalymr

h = 2, init не делала, 10 dataset, train=test

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2bjp9eqj?workspace=user-natalymr

запуски на машине

1000 эпох, 100 dataset, train != test, lr = 0.0001

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/xc58t4gg?workspace=user-natalymr

200 эпох, init = xavier_normal_, 100 dataset, train != test, lr = 0.01

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/zx11267c

2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr = 0.01

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/b12hg64j

200 эпох, init = xavier_normal_, 100 dataset, train != test, lr уменьшающийся через каждые 50 эпох

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/4rx2i84m

200 эпох, init = xavier_normal_, 100 dataset, train != test, lr уменьшающийся через каждые 500 эпох

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ttyv6fs2

1000 эпох, init = xavier_normal_, 100 dataset, train != test, lr уменьшающийся через каждые 500 эпох, batch size = 100, а не 10; hid_size 400, а не 300

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/mpr4zj4q

2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr уменьшающийся через каждые 500 эпох, batch size = 100, а не 10; hid_size 400, а не 300

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ymhus29z

2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr НЕ УМЕНЬШАЕМ (0.001), batch size = 100, а не 10; hid_size 400, а не 300

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ej4s0x8t

2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr УВЕЛИЧИВАЕМ через каждые 500 эпох в 5 раз (0.001), batch size = 100, а не 10; hid_size 400, а не 300

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/wy8r2kfu

2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr уменьшающийся через каждые 500 эпох в 2 раза (а не в 10 раз), начиная с 0.001, batch size = 100, а не 10; hid_size 400, а не 300

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/5gnbvy1d

2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr не уменьшаем (0.001), batch size = 100, а не 10; hid_size 400, а не 300, добавила DROPOUT (encoder: embed, lstm, decoder: embed)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/dk5czbor

200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr не уменьшаем (0.001), batch size = 100, а не 10; hid_size 400, а не 300, добавила DROPOUT (encoder: embed, lstm, decoder: embed)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/hysoott0

200 эпох, init = xavier_normal_, 2000 dataset, train != test, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT (encoder: embed, lstm, decoder: embed) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/513ukeok

200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.1, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/zyvjmits

200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.1, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1), sort dataset = True

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/r1ik0206?workspace=user-natalymr

200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.1, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1), sort dataset = True, SKIP PADDING

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/anwy0t7z

200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.1, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed), lr НЕ УМЕНЬШАЕМ, sort dataset = True, скип паддинг

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/nobbnkk3

59 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.01, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed), lr не уменьшаем, sort dataset = FALSE, shuffle=TRUE, скип паддинг

начала генерировать <sos> и <eos> 😡

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/bdtttitj

200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.1, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed), lr не уменьшаем, sort dataset = TRUE, shuffle=TRUE, скип паддинг

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/nlz2wr98

200 эпох, init = xavier_normal_, 2000 dataset, train != test, hid_size 256, а не 400, добавила dropout 0.2 (encoder: embed, lstm, decoder: embed), lr не уменьшаем, sort dataset = true, shuffle=true, скип паддинг; GRAD ACC (bs=100*2) & lr = 0.1, step_size=100, gamma=0.1 & test_every 2

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/45tyzit5

30 эпох, init = xavier_normal_, 2000 dataset, train != test, hid_size 256, а не 400, добавила dropout 0.2 (encoder: embed, lstm, decoder: embed), lr не уменьшаем, sort dataset = true, shuffle=true, скип паддинг; GRAD ACC (bs=100*5(!!!)) & lr = 0.1, step_size=100, gamma=0.1 & test_every 5(!) - дубина - неправильно реализовала GRAD ACC

для почти всего генерирует одно и то же + повторяет одно и то же слово в одном сообщении

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/q5okz6q1

100 эпох, init = xavier_normal_, 2000 dataset, train != test, hid_size 256, а не 400, добавила dropout 0.2 (encoder: embed, lstm, decoder: embed), lr не уменьшаем, sort dataset = true, shuffle=true, скип паддинг; GRAD ACC (bs=50*5(!!!)) & lr = 0.1, step_size=50, gamma=0.1 & test_every 1 - ДУБИНА, неправильно реализовала GRAD ACC

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/s7hwma0d

вроде как лучше оказался 2*100

100 эпох, весь датасет, Total number of params = 16167398, src vocab = 26774, tgt vocab = 13795, bs=100*2 & lr = 0.1, step_size=50, gamma=0.1 & test_every 1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/99lrdx51

100 эпох, весь датасет, Total number of params = 16133862, src vocab = 26643, tgt vocab = 13795, bs=100*5 & lr = 0.1, step_size=50, gamma=1 & test_every 1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/fhva9pab

100 эпох, весь датасет, Total number of params = 26839398, src vocab = 26643, tgt vocab = 13795, hid_size 400, а не 256, bs=50*5(!) & lr = 0.1, step_size=50, gamma=1 & test_every 1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/4nc1j7lj

100 эпох, весь датасет, Total number of params =19172998, src vocab = 26643, tgt vocab = 13795, hid_size 300, а не 400, bs=50*10(!) & lr = 0.1, step_size=50, gamma=1 & test_every 1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/d9gjkoer

100 эпох, весь датасет, Total number of params =19172998, src vocab = 26643, tgt vocab = 13795, hid_size 300, а не 400, bs=50*10(!) & lr = 0.1, step_size=50, gamma=1 & test_every 1, добавила clip_grad(1)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/625igudj

ДУБИНА - неправильно реализовала bleu - новый Bleu, добавляем по 1000 каждую эпоху после test_bleu > 1.5

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/z2vjlp74

Новые Bleu, сразу на всем датасете - ДУБИНА, не на всем, неправильно реализовала постепенное добавление датасета, поэтому тут все испортила

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/3amsiq35?workspace=user-natalymr

Новые Bleu, сразу на всем датасете

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2hyvjbwi

from gcm.

natalymr avatar natalymr commented on June 24, 2024

если градиенты начинают расходиться (перепрыгнули лок минимум)

lr уменьшаем, dropout добавить


если градиенты сходятся, а loss/acc не тот (засели в лок минимуме)

lr увеличиваем, dropout ослабить


Статья про Batch Size
There might be critical consequences when using different batch sizes that should be taken into consideration when choosing one. Let’s cover two of the main potential
consequences of using small or large batch sizes:

  • Generalization: Large batch sizes may cause bad generalization (or even get stuck in
    a local minimum). Generalization means that the neural network will perform quite well on samples outside of the training set. So, bad generalization — which is pretty much overfitting — means that the neural network will perform poorly on samples outside of the training set.
  • Convergence speed: Small batch sizes may lead to slow convergence of the learning algorithm. The variable updates applied in every step, that were calculated using a
    batch of samples, will determine the starting point for the next batch of samples.
    Training samples are randomly drawn from the training set every step and therefore the
    resulting gradients are noisy estimates based on partial data. The fewer samples we use in a single batch, the noisier and less accurate the gradient estimates will be. That is, the smaller the batch, the bigger impact a single sample has on the applied variable updates. In other words, smaller batch sizes may make the learning process noisier and
    fluctuating, essentially extending the time it takes the algorithm to converge.

With all that in mind, we have to choose a batch size that will be neither too small nor too large but somewhere in between. The main idea here is that we should play around with
different batch sizes until we find one that would be optimal for the specific neural
network and dataset we are using.


Solution (survey)

One way to overcome the GPU memory limitations and run large batch sizes is to split the
batch of samples into smaller mini-batches, where each mini-batch requires an amount of
GPU memory that can be satisfied. These mini-batches can run independently, and their
gradients should be averaged or summed before calculating the model variable updates.
There are two main ways to implement this:

  • Data-parallelism — use multiple GPUs to train all mini-batches in parallel, each on a
    single GPU. The gradients from all mini-batches are accumulated and the result is used to
    update the model variables at the end of every step.
  • Gradient accumulation — run the mini-batches sequentially while accumulating the gradients. The accumulated results are used to update the model variables at the end of the last mini-batch.

So what is gradient accumulation, technically?

Gradient accumulation means running a configured number of steps without updating the model variables while accumulating the gradients of those steps and then using the accumulated gradients to compute the variable updates.
Yes, it’s really that simple.
Running some steps without updating any of the model variables is the way we —
logically — split the batch of samples into a few mini-batches. The batch of samples that is used in every step is effectively a mini-batch, and all the samples of those steps combined are effectively the global batch.
By not updating the variables at all those steps, we cause all the mini-batches to use the same model variables for calculating the gradients. This is mandatory to ensure the same gradients and updates are calculated as if we were using the global batch size.
Accumulating the gradients in all of these steps results in the same sum of gradients as if we were using the global batch size.

Iterating through an example

So, let’s say we are accumulating gradients over 5 steps. We want to accumulate the gradients of the first 4 steps, without updating any variable. At the fifth step, we want to use the accumulated gradients of the previous 4 steps combined with the gradients of the fifth step to compute and assign the variable updates. Let’s see it in action:

  1. Starting at the first step, all the samples of the first mini-batch propagate through the forward and backward passes, resulting in computed gradients for each trainable model variable. We don’t want to actually update the variables, so there is no need in computing the updates at this point. What we need, though, is a place to store the gradients of the first step, in order for them to be accessible in the following steps, and we will use another variable for each trainable model variable, to hold the accumulated gradients. So, after computing the gradients of the first step, we will store them in the variables we created for the accumulated gradients.
  2. Now the second step starts, and again, all the samples of the second mini-batch
    propagate through all the layers of the model, computing the gradients of the second step. Just like the step before, we don’t want to update the variables yet, so there is no need in computing the variable updates. What’s different than the first step though, is that instead of just storing the gradients of the second step in our variables, we are going to add them to the values stored in the variables, which currently hold the gradients of the first step.
  3. Steps 3 and 4 are pretty much the same as the second step, as we are not yet updating the variables, and we are accumulating the gradients by adding them to our variables.
  4. Steps 3 and 4 are pretty much the same as the second step, as we are not yet updating the variables, and we are accumulating the gradients by adding them to our variables.
  5. Then, in step 5, we do want to update the variables, as we intended to accumulate the gradients over 5 steps. After computing the gradients of the fifth step, we will add them to the accumulated gradients, resulting in the sum of all the gradients of those 5 steps.

We’ll then take this sum and insert it as a parameter to the optimizer, resulting in the updates computed using all the gradients of those 5 steps, computed over all the samples in the global batch.


Solution (implementation)

https://discuss.pytorch.org/t/how-to-implement-accumulated-gradient/3822
image
image
И ЕЩЕ ТРИ ВАРИАНТА РЕАЛИЗАЦИИ
https://discuss.pytorch.org/t/why-do-we-need-to-set-the-gradients-manually-to-zero-in-pytorch/4903/20?u=alband
image
Из этого же обсуждения
image

from gcm.

natalymr avatar natalymr commented on June 24, 2024

200 tokens

1000 dataset (500val/test) bs = 250, lr=0.1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/0yjtnd2s

ALL dataset, Total number of params = 26230872, src vocab = 41607, tgt vocab = 18069, bs = 250, lr=0.1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/1p2c36bd
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/w3ao2qls

ALL dataset, Total number of params = 10497688, src vocab = 41607, tgt vocab = 18069, bs = 500, hid_size = 128, а не 300, lr=0.1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ljc12vrp

Обучаем на 1000 коммитах, проверяем на всем test/val, hid_size=300, bs=25*10, lr=0.1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ha8nper1

Обучаем на 1000 коммитах С КОНЦА, проверяем на всем test/val, hid_size=300, bs=25*10, lr step_size=10, gamma=0.8, lr=0.1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ehtguhaa

Обучаем на 5000 коммитах, проверяем на всем test/val, hid_size=300, bs=25*10, lr step_size=10, gamma=0.8, lr=0.1,

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/0ux74572

Проверочный запуск: вначале обучаемся на 1000 примерх, когда test_bleu > 0.5, начинаем каждую эпоху добавлять по 500 примеров в трейн, проверяемся на всем test/val, hid_size=300, bs=25*10, lr step_size=10, gamma=0.8, lr=0.1,

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ojqkbrjw

Вначале обучаемся на 1000 примерх, когда test_bleu > 1.5, начинаем каждую эпоху добавлять по 500 примеров в трейн, проверяемся на всем test/val, hid_size=300, bs=25*10, lr step_size=10, gamma=0.7, lr=0.1,

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/q6jh7kqd

Поменяла способ подсчета bleu

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/1k6wlhkq

Добавляем по 1000 коммитов

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/31y4gm86?workspace=user-natalymr

Добавляем по 500 коммитов

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2ekge9aq

from gcm.

natalymr avatar natalymr commented on June 24, 2024

NMT

100 tokens

на всем датасете
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/6y01tlo1

200 tokens

добавляем по 500
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/20rquvgm
после того, как добавили новые данные в трейн, ждем 4 эпохи, потом опять добавляем
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/iio91nz5
уменьшила lr (0.01)
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/i7l6s0dg

from gcm.

natalymr avatar natalymr commented on June 24, 2024

New Dataset (match with code2seq)

NMT


100 tokens

по старому - добавлять немного коммитов, потом 4 эпохи обучаемся

https://app.wandb.ai/natalymr/nmt-1.0-test/runs/8p3nb59w?workspace=user-natalymr

сразу все данные

https://app.wandb.ai/natalymr/nmt-1.0-test/runs/d97basok?workspace=user-natalymr

сразу все данные, в 2 раза больше эпох

https://app.wandb.ai/natalymr/nmt-1.0-test/runs/9kdczc93?workspace=user-natalymr

изменила lr

прервался запуск
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/vylsmbo9?workspace=user-natalymr
полный запуск
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/1yinbj6a?workspace=user-natalymr


200 tokens

lr = 0.01, после 150 эпох уменьшаем в 10 раз, 250 эпох
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/ucukw8kl

from gcm.

natalymr avatar natalymr commented on June 24, 2024

New Dataset (match with code2seq)

NMT-2

100 tokens

Весь датасет, lr=0.01
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/og66z2nl
400 эпох:
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/6c7f56mo
250 эпох, не уменьшаем LR
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/9k5usrqq

from gcm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.