Giter Site home page Giter Site logo

Comments (8)

xrbailey avatar xrbailey commented on August 9, 2024 1

Hi, I'm seeing the exact same behavior and error, but relatively randomly and seemingly related to the prompt content. I've used several of the Torch models 2b and 7b (not quantized) with cuda 11.8 device on Ubuntu 20.04. I'm able to replicate it by adding some digits into the default prompt like: prompt="The meaning of life 189 is". Hopefully someone else can verify this?

from gemma_pytorch.

xrbailey avatar xrbailey commented on August 9, 2024 1

Same observation I made. Just putting an integer into the prompt seems to be the most likely to make it happen. Perhaps a bug in the tokenizer or other state dependency in the way the model is represented in memory; as @ShadovvSinger suggests?

from gemma_pytorch.

pengchongjin avatar pengchongjin commented on August 9, 2024

Hi, which model variant did you use? On which platform? Can you please share the command to reproduce this?

from gemma_pytorch.

michaelmoynihan avatar michaelmoynihan commented on August 9, 2024

Thanks @vupjing and @xrbailey for opening this issue and giving some details! I was able to reproduce your issue for 7b (I couldn't reproduce for 2b). I used a machine with 8 A100 chips with cuda 11.8 on Debian 10. My results can be replicated with the following steps

# 1. Clone the repo
git repo clone https://github.com/google/gemma_pytorch.git
cd gemma_pytorch

# 2. Install kagglehub
pip install kagglehub

# 3. Copy kaggle.json into ~/.kaggle/kaggle.json

# 4. Download the checkpoint
python3 -c 'import kagglehub; kagglehub.kagglehub.model_download("google/gemma/pyTorch/7b-it")'

# 5. Set environment variables
VARIANT=7b
CKPT_PATH=${HOME}/.cache/kagglehub/models/google/gemma/pyTorch/7b-it/2/gemma-7b-it.ckpt
DOCKER_URI=gemma:${USER}
PROMPT="The meaning of life 189 is"

# 6. Build the container
docker build -f docker/Dockerfile ./ -t ${DOCKER_URI}

# 7. Run the container
docker run -t --rm \
    --gpus all \
    -v ${CKPT_PATH}:/tmp/ckpt \
    ${DOCKER_URI} \
    python scripts/run.py \
    --device=cuda \
    --ckpt=/tmp/ckpt \
    --variant="${VARIANT}" \
    --prompt="${PROMPT}"

I will investigate further to find the root cause. Thanks again for reporting!

from gemma_pytorch.

vupjing avatar vupjing commented on August 9, 2024

@pengchongjin the hardware is a 3080ti laptop (16G VRAM), windows, (torch / 2.1.1+cu118), ( torchvision / 0.16.1+cu118)
using the below command in 7b/quant.

python run.py --ckpt e:\ai\models\gemma\gemma-7b-it-quant.ckpt --variant 7b --device cuda --quant Yes --output_len 512 --prompt "the self-attention is important for transformer because"

@michaelmoynihan even a single "attention" will trigger this issue , the below is the command:

python run.py --ckpt e:\ai\models\gemma\gemma-7b-it-quant.ckpt --variant 7b --device cuda --quant Yes --output_len 102 --prompt "attention"
btw: it runs longer than before.
===DEBUG=== : hidden_states = tensor([[[-2.9688, 1.7910, 1.4717, ..., 0.7642, -3.4121, 1.3447]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[ 4.0430, 1.8535, 3.7500, ..., 0.5488, -0.8179, 1.3154]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[ 3.0566, -0.6260, 4.4648, ..., 0.4519, -0.7573, 0.5947]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[ 0.7437, 1.0332, 3.4062, ..., -0.0818, -3.3262, 1.1758]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-1.1656e+01, 2.8095e-03, -1.0156e+00, ..., 2.7051e-01,
6.9629e-01, 2.0137e+00]]], device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[ 1.5420, -0.2026, -0.0492, ..., 2.1406, -3.2305, 3.0176]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-8.5312, -1.7539, -2.5098, ..., 0.6948, -2.7070, 1.1201]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-5.8203, 1.6299, -1.3154, ..., 0.5977, -0.5708, 1.4873]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-3.3730, 2.0801, -0.2607, ..., 1.4248, -0.9785, 0.3665]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-6.0781, 2.2480, 2.2324, ..., 1.4805, -3.2656, -1.2080]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-10.0625, 0.2861, 0.3865, ..., 1.9512, -1.8047, 0.7646]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-8.7969, 5.1992, -2.8613, ..., 0.9131, -0.2659, -0.9487]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-13.9453, -0.3953, 3.3281, ..., 2.0703, -0.7788, -4.4219]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-5.3047, -3.4746, 2.3848, ..., 0.7417, 2.1348, -7.4297]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-7.5820, 2.7754, -1.3721, ..., 1.1064, 3.9473, -2.9414]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-6.1758, 4.1367, -2.4004, ..., 2.0391, 9.1953, -5.2969]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-4.4688, -0.9126, 0.8286, ..., 2.5215, 8.0469, -2.2344]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-13.1641, 1.5215, -1.8213, ..., 0.4519, 2.0312, 2.4336]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-1.1000e+01, 4.4102e+00, -4.2695e+00, ..., 2.1744e-03,
2.9121e+00, -1.8726e-01]]], device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-10.6484, 0.5142, 0.1533, ..., 2.8203, 1.0557, -0.6807]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-10.1953, 4.5781, -2.6484, ..., 1.5723, 5.1250, -2.4746]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-13.0469, -0.3462, -3.1387, ..., 2.7539, 4.2305, 0.6416]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-10.9922, 4.5391, -2.8750, ..., 2.2422, 2.9844, -3.6445]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-8.1172, 3.5684, -2.0059, ..., 3.5410, 6.3711, -1.7500]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-3.4609, -1.6553, 1.4521, ..., 2.9609, 5.6836, -2.3965]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-14.7031, 0.3909, -1.1123, ..., 1.1816, 1.1211, 1.8271]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-3.9023, 3.9316, 0.0205, ..., 0.4788, -0.2610, -0.8560]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[-6.7812, 3.9023, 1.5674, ..., 0.0728, -2.9688, -2.2754]]],
device='cuda:0', dtype=torch.float16)
===DEBUG=== : hidden_states = tensor([[[nan, nan, nan, ..., nan, nan, nan]]], device='cuda:0',
dtype=torch.float16)
Traceback (most recent call last):

from gemma_pytorch.

yongzhuo avatar yongzhuo commented on August 9, 2024

Due to dtype of bfloat16?

from gemma_pytorch.

ShadovvSinger avatar ShadovvSinger commented on August 9, 2024

I've encountered a peculiar issue with the Gemma model that I'd like to share. I'm consistently getting a RuntimeError: probability tensor contains either inf, nan or an element < 0 under specific circumstances when using different prompts. Here are the details:

I use 3 differernt prompt:

prompt1:'Gemma is a large language model, please introduce it.'
the model generates a response successfully almost every time.

prompt2: 'Gemma is an instruction tuned model with seven billion parameter, please introduce it.'
the success rate is variable; sometimes it works, sometimes it doesn't.

prompt3 'Gemma is an instruction tuned model with 7 billion parameter, please introduce it.'
it has never successfully generated a response.

I do not reload the model between runs; I only change the prompts and execute
model.generate(prompt, device)

Interestingly, if I first run the model with prompt 1 and then with prompt 2, the second prompt almost always succeeds.

This behavior seems to suggest some form of state-dependency or sensitivity to the numerical representation in the prompts. I thought this might be of interest to the community, especially if others are experiencing similar issues.

from gemma_pytorch.

keltin13 avatar keltin13 commented on August 9, 2024

Try loading the model with dtype=torch.bfloat16 instead of float16.

from gemma_pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.