Giter Site home page Giter Site logo

multidim-positional-encoding's Introduction

1D, 2D, and 3D Sinusoidal Postional Encoding (Pytorch and Tensorflow)

Code Coverage PyPI version Code style: black License: MIT

A 2D Example

This is a practical, easy to download implemenation of 1D, 2D, and 3D sinusodial positional encodings for PyTorch and Tensorflow.

It is able to encode on tensors of the form (batchsize, x, ch), (batchsize, x, y, ch), and (batchsize, x, y, z, ch), where the positional encodings will be calculated along the ch dimension. The Attention is All You Need allowed for positional encoding in only one dimension, however, this works to extend this to 2 and 3 dimensions.

This also works on tensors of the form (batchsize, ch, x), etc. See the usage for more information.

NOTE: The import syntax has changed as of version 6.0.1. See the section for details.

To install, simply run:

pip install positional-encodings[pytorch,tensorflow]

You can also install the pytorch and tf encodings individually with the following commands.

  • For a PyTorch only installation, run pip install positional-encodings[pytorch]
  • For a TensorFlow only installation, run pip install positional-encodings[tensorflow]

Usage (PyTorch):

The repo comes with the three main positional encoding models, PositionalEncoding{1,2,3}D. In addition, there are a Summer class that adds the input tensor to the positional encodings.

import torch
from positional_encodings.torch_encodings import PositionalEncoding1D, PositionalEncoding2D, PositionalEncoding3D, Summer

# Returns the position encoding only
p_enc_1d_model = PositionalEncoding1D(10)

# Return the inputs with the position encoding added
p_enc_1d_model_sum = Summer(PositionalEncoding1D(10))

x = torch.rand(1,6,10)
penc_no_sum = p_enc_1d_model(x) # penc_no_sum.shape == (1, 6, 10)
penc_sum = p_enc_1d_model_sum(x)
print(penc_no_sum + x == penc_sum) # True
p_enc_2d = PositionalEncoding2D(8)
y = torch.zeros((1,6,2,8))
print(p_enc_2d(y).shape) # (1, 6, 2, 8)

p_enc_3d = PositionalEncoding3D(11)
z = torch.zeros((1,5,6,4,11))
print(p_enc_3d(z).shape) # (1, 5, 6, 4, 11)

And for tensors of the form (batchsize, ch, x) or their 2D and 3D counterparts, include the word Permute before the number in the class; e.g. for a 1D input of size (batchsize, ch, x), do PositionalEncodingPermute1D instead of PositionalEncoding1D.

import torch
from positional_encodings.torch_encodings import PositionalEncodingPermute3D

p_enc_3d = PositionalEncodingPermute3D(11)
z = torch.zeros((1,11,5,6,4))
print(p_enc_3d(z).shape) # (1, 11, 5, 6, 4)

Tensorflow Keras

This also supports Tensorflow. Simply prepend all class names with TF.

import tensorflow as tf
from positional_encodings.tf_encodings import TFPositionalEncoding2D, TFSummer

# Returns the position encoding only
p_enc_2d = TFPositionalEncoding2D(170)
y = tf.zeros((1,8,6,2))
print(p_enc_2d(y).shape) # (1, 8, 6, 2)

# Return the inputs with the position encoding added
add_p_enc_2d = TFSummer(TFPositionalEncoding2D(170))
y = tf.ones((1,8,6,2))
print(add_p_enc_2d(y) - p_enc_2d(y)) # tf.ones((1,8,6,2))

Changes as of version 6.0.1

Before 6.0.1, users had to install both the tensorflow and the torch packages, both of which are quite large. Now, one can install the packages individually, but now the code has to be changed:

If using PyTorch:

from positional_encodings import * -> from positional_encodings.torch_encodings import *

If using TensorFlow:

from positional_encodings import * -> from positional_encodings.tf_encodings import *

Formulas

The formula for inserting the positional encoding are as follows:

1D:

PE(x,2i) = sin(x/10000^(2i/D))
PE(x,2i+1) = cos(x/10000^(2i/D))

Where:
x is a point in 2d space
i is an integer in [0, D/2), where D is the size of the ch dimension

2D:

PE(x,y,2i) = sin(x/10000^(4i/D))
PE(x,y,2i+1) = cos(x/10000^(4i/D))
PE(x,y,2j+D/2) = sin(y/10000^(4j/D))
PE(x,y,2j+1+D/2) = cos(y/10000^(4j/D))

Where:
(x,y) is a point in 2d space
i,j is an integer in [0, D/4), where D is the size of the ch dimension

3D:

PE(x,y,z,2i) = sin(x/10000^(6i/D))
PE(x,y,z,2i+1) = cos(x/10000^(6i/D))
PE(x,y,z,2j+D/3) = sin(y/10000^(6j/D))
PE(x,y,z,2j+1+D/3) = cos(y/10000^(6j/D))
PE(x,y,z,2k+2D/3) = sin(z/10000^(6k/D))
PE(x,y,z,2k+1+2D/3) = cos(z/10000^(6k/D))

Where:
(x,y,z) is a point in 3d space
i,j,k is an integer in [0, D/6), where D is the size of the ch dimension

The 3D formula is just a natural extension of the 2D positional encoding used in this paper.

Don't worry if the input is not divisible by 2 (1D), 4 (2D), or 6 (3D); all the necessary padding will be taken care of.

Thank you

Thank you for this repo for inspriration of this method.

Citations

1D:

@inproceedings{vaswani2017attention,
  title={Attention is all you need},
  author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
  booktitle={Advances in neural information processing systems},
  pages={5998--6008},
  year={2017}
}

2D:

@misc{wang2019translating,
    title={Translating Math Formula Images to LaTeX Sequences Using Deep Neural Networks with Sequence-level Training},
    author={Zelun Wang and Jyh-Charn Liu},
    year={2019},
    eprint={1908.11415},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

3D: Coming soon!

multidim-positional-encoding's People

Contributors

dependabot[bot] avatar irivii avatar petar-iv avatar psobot avatar tatp22 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

multidim-positional-encoding's Issues

Positional Encoding for discrete 3D points

Firstly, thanks for your code sharing.

I am curious about that can I directly use the PE3D for discrete 3D points? In your implementation, you use torch.arange() to generate x, y, z. I wonder whether PE3D still works if I use discrete point coordinates (e.g. (0.1, 0.8, 0.3)) to replace them.

Use PositionalEncoding1D without batches?

Is there a reason the positional encoding functions require the input data to be in a 3d batch form? Trying to use this with a model that doesn't do batch processing and was wondering if it could be done, or if i am better off implementing my own positional encoding class?

Consider the default option for the positional encodings :thinking:

Currently, we have it such that the positional encodings just return the encodings themselves. However, this is most often used when adding them to an additional tensor. This PR would consist of two things:

  • Making the "pos" and "sum" option available for the pytorch models
  • Considering which ones should be the defaults

tensor & positional encoding on different devices in Summer when using 3D positional encodings with batch size = 1

Hi,

I'm using 3D positional encodings for pytorch in a (shifted window) transformer model. My model was trained with a batch size of 8. In testing, the positional encodings work fine with a batch size > 1. With batch_size = 1, however, the positional encoding is on device 'cpu' while my patch embedding is on device 'cuda:0', so the positional encoding can't be added to the patch embedding in the Summer class.

I naively fixed the issue by replacing: return tensor + penc with return tensor + penc.to(tensor.device) in torch_encodings.py (line 213). Is the idea of just forcing the positional encoding to be on the same device as the embedding valid, and should be added in the code maybe?

Best regards,
Tjade

how to transfer from new version to old version

Hi,

I trained a model with the new version, the usage for PositionalEncoding1D is

from positional_encodings.torch_encodings import PositionalEncoding1D

But for some reasons, I need to load the trained model with old version (5.0.0), the usage changed to

from positional_encodings import PositionalEncoding1D

So when I load my model file

torch.load('model.pth')

it will occur an error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 607, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/opt/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 882, in _load
    result = unpickler.load()
  File "/opt/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 875, in find_class
    return super().find_class(mod_name, name)
ModuleNotFoundError: No module named 'positional_encodings.torch_encodings'

Can there be any method to fix except for update the version?

the parameter: channels !!!

Hi all,
I am trying to use function: PositionalEncodingPermute2D
I have a input: torch.Size([1, 169, 1, 1024])
So, the "channels" will be 169
But the error: RuntimeError: The expanded size of the tensor (85) must match the existing size (86) at non-singleton dimension 2. Target sizes: [1, 1024, 85]. Tensor sizes: [86]
I also find that if channels%4 == {1,2} ---> the error happens.
So can you explain what I am missing here ?
Thank you so much !!!

Using TFPositionalEncoding2D in model

We are experimenting with the integration of the TFPositionalEncoding2D into a custom model and would like some advice on the topic.

We saw that for the 1D version (TFPositionalEncoding1D) several people have embedded their input vector into an Embedding layer and feed that to the TFPositionalEncoding1D. Eg.

# declaration
self.position_embedding = Embedding(input_dim=vocab_size, output_dim=output_dim)
self.position_encoding = TFPositionalEncoding1D(output_dim)

# usage
position_embedding = self.position_embedding(inputs)
embedded_indices = self.position_encoding(position_embedding)
return position_embedding + embedded_indices

If this would be the correct approach to use the 1D encoding, how could this be translated to feed the 2D version? It would require an Embedding that produces (batch_size, x, y, channels) wich is not possible in TF. Any advice would be appreciated.

torch.cat operation of sin() and cos does() not gives same result as formula

According to the formula :

PE(x,2i) = sin(x/10000^(2i/D))
PE(x,2i+1) = cos(x/10000^(2i/D))

the final positional encoding vector should have sine values in even dimensions and cosine values in odd dimension.
However, in your code you do a simple concatenation, which results in sine values for the first half dimensions and cosine values for the rest.

For example let's consider a batch size of 1 and a sequence of length 1. If I use inv_freq as a vector of zeros I should at the end get an encoding vector with 0 for even dimensions ( sin(0)=0 ) and 1 for odd dimensions ( cos(0)=1 ). However when we use the code provided in the class PositionalEncoding1D we end up with a vector that has 0 in the first half and 1 in the other half; which is due to the simple concatenation. The code below reproduce this:

shape = (1, 1, 10)
channels = 10
channels = int(np.ceil(channels / 2) * 2)
inv_freq = torch.zeros(int(channels/2))
batch_size, x, orig_ch = shape
pos_x = torch.arange(x)
sin_inp_x = torch.einsum("i,j->ij", pos_x, inv_freq)
emb_x = torch.cat((sin_inp_x.sin(), sin_inp_x.cos()), dim=-1)
emb = torch.zeros((x, channels))
emb[:, :channels] = emb_x

result :
tensor([[0., 0., 0., 0., 0., 1., 1., 1., 1., 1.]])

The following change correct this error :

emb_x = torch.stack((sin_inp_x.sin(), sin_inp_x.cos()), dim=-1)
emb_x = torch.flatten(emb_x, -2, -1)

result :
tensor([[0., 1., 0., 1., 0., 1., 0., 1., 0., 1.]])

It creates a stacked tensor with sine and cosine pairs (with same dimension) and flatten it so that we have sine values in 2i and cosine values in 2i + 1.

Create a test suite :test_tube:

Currently, this repo is lacking some tests. Lets add some, such that we have some kind of continuous integration system so that our tests are correct.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.