tatp22 / multidim-positional-encoding Goto Github PK

An implementation of 1D, 2D, and 3D positional encoding in Pytorch and TensorFlow

License: MIT License

Python 100.00%

multidim-positional-encoding's Introduction

1D, 2D, and 3D Sinusoidal Postional Encoding (Pytorch and Tensorflow)

This is a practical, easy to download implemenation of 1D, 2D, and 3D sinusodial positional encodings for PyTorch and Tensorflow.

It is able to encode on tensors of the form (batchsize, x, ch), (batchsize, x, y, ch), and (batchsize, x, y, z, ch), where the positional encodings will be calculated along the ch dimension. The Attention is All You Need allowed for positional encoding in only one dimension, however, this works to extend this to 2 and 3 dimensions.

This also works on tensors of the form (batchsize, ch, x), etc. See the usage for more information.

NOTE: The import syntax has changed as of version 6.0.1. See the section for details.

To install, simply run:

pip install positional-encodings[pytorch,tensorflow]

You can also install the pytorch and tf encodings individually with the following commands.

For a PyTorch only installation, run pip install positional-encodings[pytorch]
For a TensorFlow only installation, run pip install positional-encodings[tensorflow]

Usage (PyTorch):

The repo comes with the three main positional encoding models, PositionalEncoding{1,2,3}D. In addition, there are a Summer class that adds the input tensor to the positional encodings.

import torch
from positional_encodings.torch_encodings import PositionalEncoding1D, PositionalEncoding2D, PositionalEncoding3D, Summer

# Returns the position encoding only
p_enc_1d_model = PositionalEncoding1D(10)

# Return the inputs with the position encoding added
p_enc_1d_model_sum = Summer(PositionalEncoding1D(10))

x = torch.rand(1,6,10)
penc_no_sum = p_enc_1d_model(x) # penc_no_sum.shape == (1, 6, 10)
penc_sum = p_enc_1d_model_sum(x)
print(penc_no_sum + x == penc_sum) # True

p_enc_2d = PositionalEncoding2D(8)
y = torch.zeros((1,6,2,8))
print(p_enc_2d(y).shape) # (1, 6, 2, 8)

p_enc_3d = PositionalEncoding3D(11)
z = torch.zeros((1,5,6,4,11))
print(p_enc_3d(z).shape) # (1, 5, 6, 4, 11)

And for tensors of the form (batchsize, ch, x) or their 2D and 3D counterparts, include the word Permute before the number in the class; e.g. for a 1D input of size (batchsize, ch, x), do PositionalEncodingPermute1D instead of PositionalEncoding1D.

import torch
from positional_encodings.torch_encodings import PositionalEncodingPermute3D

p_enc_3d = PositionalEncodingPermute3D(11)
z = torch.zeros((1,11,5,6,4))
print(p_enc_3d(z).shape) # (1, 11, 5, 6, 4)

Tensorflow Keras

This also supports Tensorflow. Simply prepend all class names with TF.

import tensorflow as tf
from positional_encodings.tf_encodings import TFPositionalEncoding2D, TFSummer

# Returns the position encoding only
p_enc_2d = TFPositionalEncoding2D(170)
y = tf.zeros((1,8,6,2))
print(p_enc_2d(y).shape) # (1, 8, 6, 2)

# Return the inputs with the position encoding added
add_p_enc_2d = TFSummer(TFPositionalEncoding2D(170))
y = tf.ones((1,8,6,2))
print(add_p_enc_2d(y) - p_enc_2d(y)) # tf.ones((1,8,6,2))

Changes as of version `6.0.1`

Before 6.0.1, users had to install both the tensorflow and the torch packages, both of which are quite large. Now, one can install the packages individually, but now the code has to be changed:

If using PyTorch:

from positional_encodings import * -> from positional_encodings.torch_encodings import *

If using TensorFlow:

from positional_encodings import * -> from positional_encodings.tf_encodings import *

Formulas

The formula for inserting the positional encoding are as follows:

1D:

PE(x,2i) = sin(x/10000^(2i/D))
PE(x,2i+1) = cos(x/10000^(2i/D))

Where:
x is a point in 2d space
i is an integer in [0, D/2), where D is the size of the ch dimension

2D:

PE(x,y,2i) = sin(x/10000^(4i/D))
PE(x,y,2i+1) = cos(x/10000^(4i/D))
PE(x,y,2j+D/2) = sin(y/10000^(4j/D))
PE(x,y,2j+1+D/2) = cos(y/10000^(4j/D))

Where:
(x,y) is a point in 2d space
i,j is an integer in [0, D/4), where D is the size of the ch dimension

3D:

PE(x,y,z,2i) = sin(x/10000^(6i/D))
PE(x,y,z,2i+1) = cos(x/10000^(6i/D))
PE(x,y,z,2j+D/3) = sin(y/10000^(6j/D))
PE(x,y,z,2j+1+D/3) = cos(y/10000^(6j/D))
PE(x,y,z,2k+2D/3) = sin(z/10000^(6k/D))
PE(x,y,z,2k+1+2D/3) = cos(z/10000^(6k/D))

Where:
(x,y,z) is a point in 3d space
i,j,k is an integer in [0, D/6), where D is the size of the ch dimension

The 3D formula is just a natural extension of the 2D positional encoding used in this paper.

Don't worry if the input is not divisible by 2 (1D), 4 (2D), or 6 (3D); all the necessary padding will be taken care of.

Thank you

Thank you for this repo for inspriration of this method.

Citations

1D:

@inproceedings{vaswani2017attention,
  title={Attention is all you need},
  author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
  booktitle={Advances in neural information processing systems},
  pages={5998--6008},
  year={2017}
}

2D:

@misc{wang2019translating,
    title={Translating Math Formula Images to LaTeX Sequences Using Deep Neural Networks with Sequence-level Training},
    author={Zelun Wang and Jyh-Charn Liu},
    year={2019},
    eprint={1908.11415},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

3D: Coming soon!

multidim-positional-encoding's People

Contributors

Stargazers

Watchers

multidim-positional-encoding's Issues

Positional Encoding for discrete 3D points

Firstly, thanks for your code sharing.

I am curious about that can I directly use the PE3D for discrete 3D points? In your implementation, you use torch.arange() to generate x, y, z. I wonder whether PE3D still works if I use discrete point coordinates (e.g. (0.1, 0.8, 0.3)) to replace them.

investigate the fix enconding name

Use PositionalEncoding1D without batches?

Is there a reason the positional encoding functions require the input data to be in a 3d batch form? Trying to use this with a model that doesn't do batch processing and was wondering if it could be done, or if i am better off implementing my own positional encoding class?

Consider the default option for the positional encodings :thinking:

Currently, we have it such that the positional encodings just return the encodings themselves. However, this is most often used when adding them to an additional tensor. This PR would consist of two things:

Making the "pos" and "sum" option available for the pytorch models
Considering which ones should be the defaults

tensor & positional encoding on different devices in Summer when using 3D positional encodings with batch size = 1

Hi,

I'm using 3D positional encodings for pytorch in a (shifted window) transformer model. My model was trained with a batch size of 8. In testing, the positional encodings work fine with a batch size > 1. With batch_size = 1, however, the positional encoding is on device 'cpu' while my patch embedding is on device 'cuda:0', so the positional encoding can't be added to the patch embedding in the Summer class.

I naively fixed the issue by replacing: return tensor + penc with return tensor + penc.to(tensor.device) in torch_encodings.py (line 213). Is the idea of just forcing the positional encoding to be on the same device as the embedding valid, and should be added in the code maybe?

Best regards,
Tjade

how to transfer from new version to old version

Hi,

I trained a model with the new version, the usage for PositionalEncoding1D is

from positional_encodings.torch_encodings import PositionalEncoding1D

But for some reasons, I need to load the trained model with old version (5.0.0), the usage changed to

from positional_encodings import PositionalEncoding1D

So when I load my model file

torch.load('model.pth')

it will occur an error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 607, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/opt/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 882, in _load
    result = unpickler.load()
  File "/opt/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 875, in find_class
    return super().find_class(mod_name, name)
ModuleNotFoundError: No module named 'positional_encodings.torch_encodings'

Can there be any method to fix except for update the version?

make fix encoding for permute classes

the parameter: channels !!!

Hi all,
I am trying to use function: PositionalEncodingPermute2D
I have a input: torch.Size([1, 169, 1, 1024])
So, the "channels" will be 169
But the error: RuntimeError: The expanded size of the tensor (85) must match the existing size (86) at non-singleton dimension 2. Target sizes: [1, 1024, 85]. Tensor sizes: [86]
I also find that if channels%4 == {1,2} ---> the error happens.
So can you explain what I am missing here ?
Thank you so much !!!

Installing this package seems to force installation of tensorflow.

When installing this via pip install positional-encodings the installer automatically downloads and attempts to install tensorflow. This is undesirable for some users who are using pytorch instead.

Using TFPositionalEncoding2D in model

We are experimenting with the integration of the TFPositionalEncoding2D into a custom model and would like some advice on the topic.

We saw that for the 1D version (TFPositionalEncoding1D) several people have embedded their input vector into an Embedding layer and feed that to the TFPositionalEncoding1D. Eg.

# declaration
self.position_embedding = Embedding(input_dim=vocab_size, output_dim=output_dim)
self.position_encoding = TFPositionalEncoding1D(output_dim)

# usage
position_embedding = self.position_embedding(inputs)
embedded_indices = self.position_encoding(position_embedding)
return position_embedding + embedded_indices

If this would be the correct approach to use the 1D encoding, how could this be translated to feed the 2D version? It would require an Embedding that produces (batch_size, x, y, channels) wich is not possible in TF. Any advice would be appreciated.

fix readme for the fix encoding if need be

torch.cat operation of sin() and cos does() not gives same result as formula

According to the formula :

PE(x,2i) = sin(x/10000^(2i/D))
PE(x,2i+1) = cos(x/10000^(2i/D))

the final positional encoding vector should have sine values in even dimensions and cosine values in odd dimension.
However, in your code you do a simple concatenation, which results in sine values for the first half dimensions and cosine values for the rest.

For example let's consider a batch size of 1 and a sequence of length 1. If I use inv_freq as a vector of zeros I should at the end get an encoding vector with 0 for even dimensions ( sin(0)=0 ) and 1 for odd dimensions ( cos(0)=1 ). However when we use the code provided in the class PositionalEncoding1D we end up with a vector that has 0 in the first half and 1 in the other half; which is due to the simple concatenation. The code below reproduce this:

shape = (1, 1, 10)
channels = 10
channels = int(np.ceil(channels / 2) * 2)
inv_freq = torch.zeros(int(channels/2))
batch_size, x, orig_ch = shape
pos_x = torch.arange(x)
sin_inp_x = torch.einsum("i,j->ij", pos_x, inv_freq)
emb_x = torch.cat((sin_inp_x.sin(), sin_inp_x.cos()), dim=-1)
emb = torch.zeros((x, channels))
emb[:, :channels] = emb_x

result :
tensor([[0., 0., 0., 0., 0., 1., 1., 1., 1., 1.]])

The following change correct this error :

emb_x = torch.stack((sin_inp_x.sin(), sin_inp_x.cos()), dim=-1)
emb_x = torch.flatten(emb_x, -2, -1)

result :
tensor([[0., 1., 0., 1., 0., 1., 0., 1., 0., 1.]])

It creates a stacked tensor with sine and cosine pairs (with same dimension) and flatten it so that we have sine values in 2i and cosine values in 2i + 1.