openai / sparse_attention Goto Github PK
View Code? Open in Web Editor NEWExamples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"
Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"
The paper is well written and makes great results in various datasets.
However, the contribution of novelty is unclear.
Q1: How is the Sparse Transformer (strided) different from local attention?
Q2: How is the Sparse Transformer (fixed) different from block self-attention? ( ICLR 2018 https://openreview.net/forum?id=H1cWzoxA-)?
HI,
I am trying to visualize the attention schemes using this code. Basically trying to reproduce Fig:3 from the paper. I could reproduce the "fixed" attention scheme as shown below:
The problem is I could not reproduce the "strided" scheme (Fig 3.b from paper). All I get is the following no matter what parameters I try:
If I change some code then I can get the correct "strided" version as shown in the paper. The following is after some code changes:
Did anyone face the same issue?
See title, as we all know, the DynamicConv has claimed that it achieved the state-of-the-art performance in many tasks (e.g., WMT14 ende). But I find that DynamicConv was never mentioned in your paper.
Would your team wanna conduct comparison experiments? Just like the issue659 in repository pytorch/fairseq
It seems that the code for images is not provided, and in #7, it was mentioned that the strided attention is difficult to reproduce. I am wondering whether anyone has successfully reproduce the results for image generation
When I tried to run the code the following error occurred:
Traceback (most recent call last):
File "attention.py", line 4, in
from blocksparse import BlocksparseTransformer
File "/home/user/anaconda3/lib/python3.7/site-packages/blocksparse/init.py", line 3, in
from blocksparse.utils import (
File "/home/user/anaconda3/lib/python3.7/site-packages/blocksparse/utils.py", line 16, in
_op_module = tf.load_op_library(os.path.join(data_files_path, 'blocksparse_ops.so'))
File "/home/en/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: libcudart.so.10.0: cannot open shared object file: No such file or directory
Can you provide any insight into expected throughput, relative to a "base" transformer implementation?
I.e., if you consider two model with same hidden size, # layers, etc., will sparse_attention version run significantly slower (if yes, presumably because of recompute)?
Apologies if this was covered in the paper--I skimmed and didn't see it addressed.
Am considering getting this up and running--extremely interesting--but would like a sense on whether there is a major throughput hit before doing so.
Thank you--very neat to see successful evolution from https://openai.com/blog/block-sparse-gpu-kernels/.
Is it possible to release a PyTorch implementation of the method?
For Ubantu 18.04, cuda10.0, what is the better version of python and TensorFlow?
See title. The GPT-2 repo was MIT Licensed which was very helpful!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.