aliutkus / spe Goto Github PK
View Code? Open in Web Editor NEWRelative Positional Encoding for Transformers with Linear Complexity
Relative Positional Encoding for Transformers with Linear Complexity
I would like to ask for the pre trained models of the pop piano experiment used in the paper, could you please provide them?
Thanks
For the jax implementation, on line 210 of spe.py, should the axis summed over be -1 instead of -2? When using -2, the size of the last output dimension is num_realizations, rather than the query/key dimension:
return (spe[:, :keys.shape[1]] * keys[..., None]).sum(axis=-1)
Once the paper is published, we should put the packages on PyPI.
Hello,
I implemented the algorithm in the vision transformer architecture the following way:
#inside __init__()
self.spe = SineSPE(num_heads=head_cnt,in_features=in_dim,num_sines=5,num_realizations=64)
self.filter = SPEFilter(gated=False,code_shape=self.spe.code_shape)
#inside forward()
q,k=self.filter(q,k,self.spe(q.shape[:2]))
qk,kp = performer(...)
out=lin_attention(...)
The model I am using has 4 layers 6 heads and embedding dimension 384, patch_size=4.
Training 100 epochs with CIFAR100 converges to 42.3% and without SPE 45.3%. Although this can be expected, with SPE the training time is around 6x longer, is that normal?
Performers + ViT takes 39 minutes
Perfomers + ViT + SPE takes around 4 hours
For both I am using 2 Titan XP GPUs.
This is very problematic to me because I was considering scaling up those experiments with imagenet.
I would also like to know how can I implement the indexing T=N^2 for images (where did you do it in the lra benchmark?), according to section 2 of the paper.
Many thanks!
Hey I am a little bit confused about the scale.
Inside SineSPE() you deal with the scale (both d^0.25 and num_realizations^0.25)
On the other hand when you show the application in pytorch, after applying the filter you divide by sqrt(num_realizations) again, why is that?
https://github.com/aliutkus/spe/blob/main/src/pytorch/examples/test_spe.ipynb
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.