machelreid / subformer Goto Github PK

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Home Page: https://arxiv.org/abs/2101.00234

License: MIT License

Python 97.43% C++ 0.60% Cuda 1.38% Shell 0.03% Lua 0.16% Cython 0.40%

machine-learning nlp

subformer's People

Contributors

Stargazers

Watchers

Forkers

adbmd wxdwlai wongz97

subformer's Issues

Shared weights update during backpropagation

Dear Subformer authors,

I knew that you use def share_layers(self, layers) function to limit all layers have the same weights. During the inference phase, all layers use the weights in the base layer to performer the prediction. However, how does it perform the gradients update? Do all layers share the same gradients?

From my understanding, all layers will compute their unique gradients. The question is that will the last layer's gradients or the first layer gradients be used?

How to reproduce the result of abstractive summarization?

Dear Subformer authors,
Hi! Thanks for sharing your codes! I want to reproduce the results of abstractive summarization, but I'm confused about how to set the training parameters. I use the same scripts of Training but the result is bad. Could you kindly provide the scripts for summarization task? Thank you very much!

Core codes for the sandwich weight sharing

Dear Subformer authors,

Thanks for sharing your codes on the interesting subformer work! I am eager to reproduce your experiments on sandwich weight sharing. But I am a little confused about finding your core codes from the fairseq templete. Is it mainly in the fairseq/modules/subformer_layer.py? Could you kindly introduce your core codes on weight sharing? Thanks very much!

Bests,
Qian

Could you release the summary of CNN/Daily Mail decoded by the model?

ModuleNotFoundError: No module named 'fairseq.data.multilingual_denoising_dataset'

Dear subformer authors,

After I successfully installed PyYAML-5.4.1 antlr4-python3-runtime-4.8 cffi-1.14.5 cython-0.29.23 fairseq hydra-core-1.0.6 importlib-resources-5.1.2 numpy-1.20.2 omegaconf-2.0.6 portalocker-2.0.0 pycparser-2.20 regex-2021.4.4 sacrebleu-1.5.1 torch-1.8.1 tqdm-4.60.0 typing-extensions-3.7.4.3, I tried to run the training script your provided for machine translation. However, I came cross a ModuleNotFoundError for fairseq.data.multilingual_denoising_dataset. Do you know how to solve this issue? Thanks for your help!

Bests,
Qian

machelreid / subformer Goto Github PK

subformer's People

Contributors

Stargazers

Watchers

Forkers

subformer's Issues

Shared weights update during backpropagation

How to reproduce the result of abstractive summarization?

Core codes for the sandwich weight sharing

Could you release the summary of CNN/Daily Mail decoded by the model?

ModuleNotFoundError: No module named 'fairseq.data.multilingual_denoising_dataset'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent