Giter Site home page Giter Site logo

machelreid / subformer Goto Github PK

View Code? Open in Web Editor NEW
14.0 14.0 3.0 3.24 MB

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Home Page: https://arxiv.org/abs/2101.00234

License: MIT License

Python 97.43% C++ 0.60% Cuda 1.38% Shell 0.03% Lua 0.16% Cython 0.40%
machine-learning nlp

subformer's People

Contributors

machelreid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

subformer's Issues

Shared weights update during backpropagation

Dear Subformer authors,

I knew that you use def share_layers(self, layers) function to limit all layers have the same weights. During the inference phase, all layers use the weights in the base layer to performer the prediction. However, how does it perform the gradients update? Do all layers share the same gradients?

From my understanding, all layers will compute their unique gradients. The question is that will the last layer's gradients or the first layer gradients be used?

How to reproduce the result of abstractive summarization?

Dear Subformer authors,
Hi! Thanks for sharing your codes! I want to reproduce the results of abstractive summarization, but I'm confused about how to set the training parameters. I use the same scripts of Training but the result is bad. Could you kindly provide the scripts for summarization task? Thank you very much!

Core codes for the sandwich weight sharing

Dear Subformer authors,

Thanks for sharing your codes on the interesting subformer work! I am eager to reproduce your experiments on sandwich weight sharing. But I am a little confused about finding your core codes from the fairseq templete. Is it mainly in the fairseq/modules/subformer_layer.py? Could you kindly introduce your core codes on weight sharing? Thanks very much!

Bests,
Qian

ModuleNotFoundError: No module named 'fairseq.data.multilingual_denoising_dataset'

Dear subformer authors,

After I successfully installed PyYAML-5.4.1 antlr4-python3-runtime-4.8 cffi-1.14.5 cython-0.29.23 fairseq hydra-core-1.0.6 importlib-resources-5.1.2 numpy-1.20.2 omegaconf-2.0.6 portalocker-2.0.0 pycparser-2.20 regex-2021.4.4 sacrebleu-1.5.1 torch-1.8.1 tqdm-4.60.0 typing-extensions-3.7.4.3, I tried to run the training script your provided for machine translation. However, I came cross a ModuleNotFoundError for fairseq.data.multilingual_denoising_dataset. Do you know how to solve this issue? Thanks for your help!

Bests,
Qian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.