System Info transformers ve

cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

MultiScaleDeformableAttentionFunction different results on different devices about transformers HOT 5 OPEN

DonggeunYu commented on July 4, 2024

MultiScaleDeformableAttentionFunction different results on different devices

from transformers.

Comments (5)

amyeroberts commented on July 4, 2024 2

cc @qubvel If you have time to dig into this

from transformers.

qubvel commented on July 4, 2024 1

Hi @DonggeunYu thanks for reporting the issue!
Unfortunately, I was not able to reproduce it with my envs. I tried:

latest torch (2.3.0+cu121) + latest transformers (4.41.2)
specified torch (2.1.0+cu121) + latest transformers (4.41.2)
latest torch (2.3.0+cu121) + specified transformers (4.39.0)
specified torch (2.1.0+cu121) + specified transformers (4.39.0)

My setup is 4 GPUs Tesla T4, I tried to launch on each of them, results were always identical

tensor([[[0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         ...,
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500]]],
       device='cuda:0')

Env:

- `transformers` version: 4.39.0
- Platform: Linux-6.5.0-1020-aws-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.3
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

Do you have any ideas why that might happen in your env?

from transformers.

DonggeunYu commented on July 4, 2024

Hi @DonggeunYu thanks for reporting the issue! Unfortunately, I was not able to reproduce it with my envs. I tried:

latest torch (2.3.0+cu121) + latest transformers (4.41.2)

specified torch (2.1.0+cu121) + latest transformers (4.41.2)

latest torch (2.3.0+cu121) + specified transformers (4.39.0)

specified torch (2.1.0+cu121) + specified transformers (4.39.0)

My setup is 4 GPUs Tesla T4, I tried to launch on each of them, results were always identical
tensor([[[0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         ...,
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500]]],
       device='cuda:0')
Env:
- `transformers` version: 4.39.0
- Platform: Linux-6.5.0-1020-aws-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.3
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
Do you have any ideas why that might happen in your env?

I don't know the cause.
I will also test it in various environments.

from transformers.

DonggeunYu commented on July 4, 2024

Hi @DonggeunYu thanks for reporting the issue! Unfortunately, I was not able to reproduce it with my envs. I tried:

latest torch (2.3.0+cu121) + latest transformers (4.41.2)

specified torch (2.1.0+cu121) + latest transformers (4.41.2)

latest torch (2.3.0+cu121) + specified transformers (4.39.0)

specified torch (2.1.0+cu121) + specified transformers (4.39.0)

My setup is 4 GPUs Tesla T4, I tried to launch on each of them, results were always identical
tensor([[[0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         ...,
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500]]],
       device='cuda:0')
Env:
- `transformers` version: 4.39.0
- Platform: Linux-6.5.0-1020-aws-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.3
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
Do you have any ideas why that might happen in your env?

@qubvel
If the container image you used is public, can you share it?

from transformers.

qubvel commented on July 4, 2024

@DonggeunYu I was using an Amazon EC2 instance g4dn.12xlarge with Ubuntu 22.04

from transformers.

MultiScaleDeformableAttentionFunction different results on different devices about transformers HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent