Comments (8)
- вставить eval
- вставить bleu на eval
- почистить код
- запустить на машине
- отправить, если не будет обучаться, на проверку Саше
- 100 эпох
- 200 эпох
- другой
lr
- запустить baselines
- прочитать, как интерпретировать графики на wandb
- прочитать статью про bleu score
- собрать датасет с 200 токенами
- добавить dropout/batch_normalization
from gcm.
h = 2, 1000 эпох, 10 примеров, train/test одинаковый
h = 2, 1000 эпох, 10 примеров, train/test разный
h = 256, 1000 эпох, 10 примеров, train/test одинаковый
h = 256, 10 эпох, полный датасет, train/test разный
h = 256, 50 эпох, полный датасет, train/test разный
h = 256, 30 эпох, полный датасет, другой lr
, train/test разные
h = 256, 30 эпох, 2000 датасет, lr = 0.001
, train/test разные
h = 256, 30 эпох, 2000 датасет, lr = 0.0001
, train/test разные
h = 256, 30 эпох, 100 датасет, lr = 0.0001
, train/test разные
from gcm.
запуски на маке
500 эпох, датасет 100 train != 100 test, lr = 0.05, step_size=150, gamma=0.1, bs=10*10, test every 10, ИСПРАВЛЕННЫЙ grad acc - ничем вроде не отличается от предыдущего, но что-то скоры совсем другие; ДО BIDERECTIONAL, добавила clip_grad
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/1s09khnv
500 эпох, датасет 100 train != 100 test, lr = 0.05, step_size=150, gamma=0.1, bs=10*10, test every 10, ИСПРАВЛЕННЫЙ grad acc - ничем вроде не отличается от предыдущего, но что-то скоры совсем другие; ДО BIDERECTIONAL
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/35uz9xt0
500 эпох, датасет 100 train != 100 test, lr = 0.1, step_size=100, gamma=0.1), bs=10*10, test every 10, ИСПРАВЛЕННЫЙ grad acc
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/cnqca8ml
500 эпох, датасет 100 train != 100 test, lr = 0.1, step_size=100, gamma=0.1), bs=10*10, test every 10
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/1o31kd14
добавила grad acc (bs = 1, step делаем через каждый 5 шагов)
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/uaoy1i2r
просто предыдущий запуск
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2iarf6cm
h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr начали с 0.001, после 500 эпох уменьшили в 0.2 раза, добавила pack_padded
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ypzmdhj4
h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr начали с 0.001, после 500 эпох уменьшили в 0.2 раза, убрала dropout, clip_grad
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2bhi9b3z
h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr начали с 0.001, после 500 эпох уменьшили в 0.2 раза, clip_grad(0.25)
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/3nkuv908
h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr начали с 0.001, после 500 эпох уменьшили в 0.2 раза, добавила dropout на lstm в decoder, clip_grad(0.25)
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/3v4h52mv?workspace=user-natalymr
h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr начали с 0.001, после 500 эпох увеличили в 10 раз
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ba9k4hxu?workspace=user-natalymr
h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr увеличивается через каждые 500 эпох в 10 раз
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/1612jgvi
h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr уменьшается через каждые 500 эпох в 10 раз
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2kdz2ntc
h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr уменьшается через каждые 100 эпох в 10 раз
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2ar6y7nu
h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/cp5vlq38
h = 2, init = xavier_uniform_, 10 dataset, train=test, добавила dropout везде с коэф 0.5
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/3msbqrk2
h = 2, init = xavier_normal_, 10 dataset, train=test
вот тут видно, что
- веса становятся ненулевыми
- градиенты тоже отличны от 0
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/8a8hs6pg?workspace=user-natalymr
h = 2, init = normal, 10 dataset, train=test
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/3nxm5dk6?workspace=user-natalymr
h = 2, init не делала, 10 dataset, train=test
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2bjp9eqj?workspace=user-natalymr
запуски на машине
1000 эпох, 100 dataset, train != test, lr = 0.0001
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/xc58t4gg?workspace=user-natalymr
200 эпох, init = xavier_normal_, 100 dataset, train != test, lr = 0.01
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/zx11267c
2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr = 0.01
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/b12hg64j
200 эпох, init = xavier_normal_, 100 dataset, train != test, lr уменьшающийся через каждые 50 эпох
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/4rx2i84m
200 эпох, init = xavier_normal_, 100 dataset, train != test, lr уменьшающийся через каждые 500 эпох
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ttyv6fs2
1000 эпох, init = xavier_normal_, 100 dataset, train != test, lr уменьшающийся через каждые 500 эпох, batch size = 100, а не 10; hid_size 400, а не 300
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/mpr4zj4q
2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr уменьшающийся через каждые 500 эпох, batch size = 100, а не 10; hid_size 400, а не 300
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ymhus29z
2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr НЕ УМЕНЬШАЕМ (0.001), batch size = 100, а не 10; hid_size 400, а не 300
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ej4s0x8t
2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr УВЕЛИЧИВАЕМ через каждые 500 эпох в 5 раз (0.001), batch size = 100, а не 10; hid_size 400, а не 300
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/wy8r2kfu
2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr уменьшающийся через каждые 500 эпох в 2 раза (а не в 10 раз), начиная с 0.001, batch size = 100, а не 10; hid_size 400, а не 300
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/5gnbvy1d
2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr не уменьшаем (0.001), batch size = 100, а не 10; hid_size 400, а не 300, добавила DROPOUT (encoder: embed, lstm, decoder: embed)
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/dk5czbor
200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr не уменьшаем (0.001), batch size = 100, а не 10; hid_size 400, а не 300, добавила DROPOUT (encoder: embed, lstm, decoder: embed)
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/hysoott0
200 эпох, init = xavier_normal_, 2000 dataset, train != test, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT (encoder: embed, lstm, decoder: embed) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1)
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/513ukeok
200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.1, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1)
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/zyvjmits
200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.1, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1), sort dataset = True
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/r1ik0206?workspace=user-natalymr
200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.1, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1), sort dataset = True, SKIP PADDING
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/anwy0t7z
200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.1, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed), lr НЕ УМЕНЬШАЕМ, sort dataset = True, скип паддинг
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/nobbnkk3
59 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.01, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed), lr не уменьшаем, sort dataset = FALSE, shuffle=TRUE, скип паддинг
начала генерировать
<sos>
и<eos>
😡
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/bdtttitj
200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.1, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed), lr не уменьшаем, sort dataset = TRUE, shuffle=TRUE, скип паддинг
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/nlz2wr98
200 эпох, init = xavier_normal_, 2000 dataset, train != test, hid_size 256, а не 400, добавила dropout 0.2 (encoder: embed, lstm, decoder: embed), lr не уменьшаем, sort dataset = true, shuffle=true, скип паддинг; GRAD ACC (bs=100*2) & lr = 0.1, step_size=100, gamma=0.1 & test_every 2
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/45tyzit5
30 эпох, init = xavier_normal_, 2000 dataset, train != test, hid_size 256, а не 400, добавила dropout 0.2 (encoder: embed, lstm, decoder: embed), lr не уменьшаем, sort dataset = true, shuffle=true, скип паддинг; GRAD ACC (bs=100*5(!!!)) & lr = 0.1, step_size=100, gamma=0.1 & test_every 5(!) - дубина - неправильно реализовала GRAD ACC
для почти всего генерирует одно и то же + повторяет одно и то же слово в одном сообщении
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/q5okz6q1
100 эпох, init = xavier_normal_, 2000 dataset, train != test, hid_size 256, а не 400, добавила dropout 0.2 (encoder: embed, lstm, decoder: embed), lr не уменьшаем, sort dataset = true, shuffle=true, скип паддинг; GRAD ACC (bs=50*5(!!!)) & lr = 0.1, step_size=50, gamma=0.1 & test_every 1 - ДУБИНА, неправильно реализовала GRAD ACC
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/s7hwma0d
вроде как лучше оказался 2*100
100 эпох, весь датасет, Total number of params = 16167398, src vocab = 26774, tgt vocab = 13795, bs=100*2 & lr = 0.1, step_size=50, gamma=0.1 & test_every 1
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/99lrdx51
100 эпох, весь датасет, Total number of params = 16133862, src vocab = 26643, tgt vocab = 13795, bs=100*5 & lr = 0.1, step_size=50, gamma=1 & test_every 1
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/fhva9pab
100 эпох, весь датасет, Total number of params = 26839398, src vocab = 26643, tgt vocab = 13795, hid_size 400, а не 256, bs=50*5(!) & lr = 0.1, step_size=50, gamma=1 & test_every 1
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/4nc1j7lj
100 эпох, весь датасет, Total number of params =19172998, src vocab = 26643, tgt vocab = 13795, hid_size 300, а не 400, bs=50*10(!) & lr = 0.1, step_size=50, gamma=1 & test_every 1
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/d9gjkoer
100 эпох, весь датасет, Total number of params =19172998, src vocab = 26643, tgt vocab = 13795, hid_size 300, а не 400, bs=50*10(!) & lr = 0.1, step_size=50, gamma=1 & test_every 1, добавила clip_grad(1)
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/625igudj
ДУБИНА - неправильно реализовала bleu - новый Bleu, добавляем по 1000 каждую эпоху после test_bleu > 1.5
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/z2vjlp74
Новые Bleu, сразу на всем датасете - ДУБИНА, не на всем, неправильно реализовала постепенное добавление датасета, поэтому тут все испортила
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/3amsiq35?workspace=user-natalymr
Новые Bleu, сразу на всем датасете
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2hyvjbwi
from gcm.
если градиенты начинают расходиться (перепрыгнули лок минимум)
lr уменьшаем, dropout добавить
если градиенты сходятся, а loss/acc не тот (засели в лок минимуме)
lr увеличиваем, dropout ослабить
Статья про Batch Size
There might be critical consequences when using different batch sizes that should be taken into consideration when choosing one. Let’s cover two of the main potential
consequences of using small or large batch sizes:
- Generalization: Large batch sizes may cause bad generalization (or even get stuck in
a local minimum). Generalization means that the neural network will perform quite well on samples outside of the training set. So, bad generalization — which is pretty much overfitting — means that the neural network will perform poorly on samples outside of the training set. - Convergence speed: Small batch sizes may lead to slow convergence of the learning algorithm. The variable updates applied in every step, that were calculated using a
batch of samples, will determine the starting point for the next batch of samples.
Training samples are randomly drawn from the training set every step and therefore the
resulting gradients are noisy estimates based on partial data. The fewer samples we use in a single batch, the noisier and less accurate the gradient estimates will be. That is, the smaller the batch, the bigger impact a single sample has on the applied variable updates. In other words, smaller batch sizes may make the learning process noisier and
fluctuating, essentially extending the time it takes the algorithm to converge.
With all that in mind, we have to choose a batch size that will be neither too small nor too large but somewhere in between. The main idea here is that we should play around with
different batch sizes until we find one that would be optimal for the specific neural
network and dataset we are using.
Solution (survey)
One way to overcome the GPU memory limitations and run large batch sizes is to split the
batch of samples into smaller mini-batches, where each mini-batch requires an amount of
GPU memory that can be satisfied. These mini-batches can run independently, and their
gradients should be averaged or summed before calculating the model variable updates.
There are two main ways to implement this:
- Data-parallelism — use multiple GPUs to train all mini-batches in parallel, each on a
single GPU. The gradients from all mini-batches are accumulated and the result is used to
update the model variables at the end of every step. - Gradient accumulation — run the mini-batches sequentially while accumulating the gradients. The accumulated results are used to update the model variables at the end of the last mini-batch.
So what is gradient accumulation, technically?
Gradient accumulation means running a configured number of steps without updating the model variables while accumulating the gradients of those steps and then using the accumulated gradients to compute the variable updates.
Yes, it’s really that simple.
Running some steps without updating any of the model variables is the way we —
logically — split the batch of samples into a few mini-batches. The batch of samples that is used in every step is effectively a mini-batch, and all the samples of those steps combined are effectively the global batch.
By not updating the variables at all those steps, we cause all the mini-batches to use the same model variables for calculating the gradients. This is mandatory to ensure the same gradients and updates are calculated as if we were using the global batch size.
Accumulating the gradients in all of these steps results in the same sum of gradients as if we were using the global batch size.
Iterating through an example
So, let’s say we are accumulating gradients over 5 steps. We want to accumulate the gradients of the first 4 steps, without updating any variable. At the fifth step, we want to use the accumulated gradients of the previous 4 steps combined with the gradients of the fifth step to compute and assign the variable updates. Let’s see it in action:
- Starting at the first step, all the samples of the first mini-batch propagate through the forward and backward passes, resulting in computed gradients for each trainable model variable. We don’t want to actually update the variables, so there is no need in computing the updates at this point. What we need, though, is a place to store the gradients of the first step, in order for them to be accessible in the following steps, and we will use another variable for each trainable model variable, to hold the accumulated gradients. So, after computing the gradients of the first step, we will store them in the variables we created for the accumulated gradients.
- Now the second step starts, and again, all the samples of the second mini-batch
propagate through all the layers of the model, computing the gradients of the second step. Just like the step before, we don’t want to update the variables yet, so there is no need in computing the variable updates. What’s different than the first step though, is that instead of just storing the gradients of the second step in our variables, we are going to add them to the values stored in the variables, which currently hold the gradients of the first step. - Steps 3 and 4 are pretty much the same as the second step, as we are not yet updating the variables, and we are accumulating the gradients by adding them to our variables.
- Steps 3 and 4 are pretty much the same as the second step, as we are not yet updating the variables, and we are accumulating the gradients by adding them to our variables.
- Then, in step 5, we do want to update the variables, as we intended to accumulate the gradients over 5 steps. After computing the gradients of the fifth step, we will add them to the accumulated gradients, resulting in the sum of all the gradients of those 5 steps.
We’ll then take this sum and insert it as a parameter to the optimizer, resulting in the updates computed using all the gradients of those 5 steps, computed over all the samples in the global batch.
Solution (implementation)
https://discuss.pytorch.org/t/how-to-implement-accumulated-gradient/3822
И ЕЩЕ ТРИ ВАРИАНТА РЕАЛИЗАЦИИ
https://discuss.pytorch.org/t/why-do-we-need-to-set-the-gradients-manually-to-zero-in-pytorch/4903/20?u=alband
Из этого же обсуждения
from gcm.
200 tokens
1000 dataset (500val/test) bs = 250, lr=0.1
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/0yjtnd2s
ALL dataset, Total number of params = 26230872, src vocab = 41607, tgt vocab = 18069, bs = 250, lr=0.1
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/1p2c36bd
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/w3ao2qls
ALL dataset, Total number of params = 10497688, src vocab = 41607, tgt vocab = 18069, bs = 500, hid_size = 128, а не 300, lr=0.1
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ljc12vrp
Обучаем на 1000 коммитах, проверяем на всем test/val, hid_size=300, bs=25*10, lr=0.1
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ha8nper1
Обучаем на 1000 коммитах С КОНЦА, проверяем на всем test/val, hid_size=300, bs=25*10, lr step_size=10, gamma=0.8, lr=0.1
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ehtguhaa
Обучаем на 5000 коммитах, проверяем на всем test/val, hid_size=300, bs=25*10, lr step_size=10, gamma=0.8, lr=0.1,
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/0ux74572
Проверочный запуск: вначале обучаемся на 1000 примерх, когда test_bleu > 0.5, начинаем каждую эпоху добавлять по 500 примеров в трейн, проверяемся на всем test/val, hid_size=300, bs=25*10, lr step_size=10, gamma=0.8, lr=0.1,
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ojqkbrjw
Вначале обучаемся на 1000 примерх, когда test_bleu > 1.5, начинаем каждую эпоху добавлять по 500 примеров в трейн, проверяемся на всем test/val, hid_size=300, bs=25*10, lr step_size=10, gamma=0.7, lr=0.1,
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/q6jh7kqd
Поменяла способ подсчета bleu
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/1k6wlhkq
Добавляем по 1000 коммитов
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/31y4gm86?workspace=user-natalymr
Добавляем по 500 коммитов
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2ekge9aq
from gcm.
NMT
100 tokens
на всем датасете
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/6y01tlo1
200 tokens
добавляем по 500
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/20rquvgm
после того, как добавили новые данные в трейн, ждем 4 эпохи, потом опять добавляем
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/iio91nz5
уменьшила lr (0.01)
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/i7l6s0dg
from gcm.
New Dataset (match with code2seq)
NMT
100 tokens
по старому - добавлять немного коммитов, потом 4 эпохи обучаемся
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/8p3nb59w?workspace=user-natalymr
сразу все данные
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/d97basok?workspace=user-natalymr
сразу все данные, в 2 раза больше эпох
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/9kdczc93?workspace=user-natalymr
изменила lr
прервался запуск
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/vylsmbo9?workspace=user-natalymr
полный запуск
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/1yinbj6a?workspace=user-natalymr
200 tokens
lr = 0.01, после 150 эпох уменьшаем в 10 раз, 250 эпох
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/ucukw8kl
from gcm.
New Dataset (match with code2seq)
NMT-2
100 tokens
Весь датасет, lr=0.01
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/og66z2nl
400 эпох:
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/6c7f56mo
250 эпох, не уменьшаем LR
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/9k5usrqq
from gcm.
Related Issues (20)
- [code2seq] add rnn in training pipeline HOT 1
- [merge messages] HOT 4
- [code2seq] new dataset HOT 5
- [code2seq] commits with 1, 2, 3 etc changed functions HOT 2
- [code2seq] repeat "perfect storm" HOT 3
- [scores] оценить качество данных - ранжирующая функция HOT 6
- [NMT] do not forget about this article HOT 6
- [scores] хватит смотреть в один bleu score, надо что-нибудь визуализировать, чтобы лучше понимать, что происходит HOT 2
- [dataset] гипотезы HOT 2
- [seim] выступление HOT 2
- [diploma] Текст диплома HOT 2
- [code2seq] 2 inputs HOT 2
- [dataset] анализ сообщений HOT 3
- [baseline] naive HOT 1
- [articles] metrics table HOT 3
- [baseline] naive message generation based on gumtree diff HOT 3
- [baseline] naive bayes HOT 4
- [baseline] code2seq HOT 1
- [code2seq] train on method diff until reasonable BLEU @natalymr
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gcm.