Giter Site home page Giter Site logo

Comments (5)

amyeroberts avatar amyeroberts commented on July 4, 2024 2

cc @qubvel If you have time to dig into this

from transformers.

qubvel avatar qubvel commented on July 4, 2024 1

Hi @DonggeunYu thanks for reporting the issue!
Unfortunately, I was not able to reproduce it with my envs. I tried:

  • latest torch (2.3.0+cu121) + latest transformers (4.41.2)
  • specified torch (2.1.0+cu121) + latest transformers (4.41.2)
  • latest torch (2.3.0+cu121) + specified transformers (4.39.0)
  • specified torch (2.1.0+cu121) + specified transformers (4.39.0)

My setup is 4 GPUs Tesla T4, I tried to launch on each of them, results were always identical

tensor([[[0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         ...,
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500]]],
       device='cuda:0')

Env:

- `transformers` version: 4.39.0
- Platform: Linux-6.5.0-1020-aws-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.3
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

Do you have any ideas why that might happen in your env?

from transformers.

DonggeunYu avatar DonggeunYu commented on July 4, 2024

Hi @DonggeunYu thanks for reporting the issue! Unfortunately, I was not able to reproduce it with my envs. I tried:

  • latest torch (2.3.0+cu121) + latest transformers (4.41.2)
  • specified torch (2.1.0+cu121) + latest transformers (4.41.2)
  • latest torch (2.3.0+cu121) + specified transformers (4.39.0)
  • specified torch (2.1.0+cu121) + specified transformers (4.39.0)

My setup is 4 GPUs Tesla T4, I tried to launch on each of them, results were always identical

tensor([[[0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         ...,
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500]]],
       device='cuda:0')

Env:

- `transformers` version: 4.39.0
- Platform: Linux-6.5.0-1020-aws-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.3
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

Do you have any ideas why that might happen in your env?

I don't know the cause.
I will also test it in various environments.

from transformers.

DonggeunYu avatar DonggeunYu commented on July 4, 2024

Hi @DonggeunYu thanks for reporting the issue! Unfortunately, I was not able to reproduce it with my envs. I tried:

  • latest torch (2.3.0+cu121) + latest transformers (4.41.2)
  • specified torch (2.1.0+cu121) + latest transformers (4.41.2)
  • latest torch (2.3.0+cu121) + specified transformers (4.39.0)
  • specified torch (2.1.0+cu121) + specified transformers (4.39.0)

My setup is 4 GPUs Tesla T4, I tried to launch on each of them, results were always identical

tensor([[[0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         ...,
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500],
         [0.2500, 0.2500, 0.2500,  ..., 0.2500, 0.2500, 0.2500]]],
       device='cuda:0')

Env:

- `transformers` version: 4.39.0
- Platform: Linux-6.5.0-1020-aws-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.3
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

Do you have any ideas why that might happen in your env?

@qubvel
If the container image you used is public, can you share it?

from transformers.

qubvel avatar qubvel commented on July 4, 2024

@DonggeunYu I was using an Amazon EC2 instance g4dn.12xlarge with Ubuntu 22.04

from transformers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.