CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCr

<span aria-label="This issue is a duplicate of #11" class="issue-keyword tooltipped to

运行时报错：输入两个句子导致索引越界 about guwenbert HOT 5 CLOSED

ethan-yt commented on August 15, 2024

运行时报错：输入两个句子导致索引越界

from guwenbert.

Comments (5)

Ethan-yt commented on August 15, 2024

This error can actually be due to different reasons. It is recommended to debug CUDA errors by running the code on the CPU, if possible. If that’s not possible, try to execute the script via:

CUDA_LAUNCH_BLOCKING=1 python [YOUR_PROGRAM]

from guwenbert.

Lirsakura commented on August 15, 2024

This error can actually be due to different reasons. It is recommended to debug CUDA errors by running the code on the CPU, if possible. If that’s not possible, try to execute the script via:

CUDA_LAUNCH_BLOCKING=1 python [YOUR_PROGRAM]

禁用了gpu之后的报错是这这样的：

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-20-ce6b8456fe47> in <module>
     32         time.sleep(0.2)
     33 
---> 34         train_loss, train_acc = train_model(model, train_loader)
     35         val_loss, val_acc = test_model(model, val_loader)
     36 

<ipython-input-19-7219d0502762> in train_model(model, train_loader)
     30 
     31         with autocast(): #使用半精度训练
---> 32             output = model(input_ids, attention_mask, token_type_ids).logits
     33 
     34             loss = criterion(output, y) / CFG['accum_iter']

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/opt/conda/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, input_ids, token_type_ids, attention_mask, labels, position_ids, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
   1243         )
   1244 
-> 1245         outputs = self.roberta(
   1246             flat_input_ids,
   1247             position_ids=flat_position_ids,

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/opt/conda/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    803         head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
    804 
--> 805         embedding_output = self.embeddings(
    806             input_ids=input_ids,
    807             position_ids=position_ids,

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/opt/conda/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
    116         if inputs_embeds is None:
    117             inputs_embeds = self.word_embeddings(input_ids)
--> 118         token_type_embeddings = self.token_type_embeddings(token_type_ids)
    119 
    120         embeddings = inputs_embeds + token_type_embeddings

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/sparse.py in forward(self, input)
    122 
    123     def forward(self, input: Tensor) -> Tensor:
--> 124         return F.embedding(
    125             input, self.weight, self.padding_idx, self.max_norm,
    126             self.norm_type, self.scale_grad_by_freq, self.sparse)

/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1850         # remove once script supports set_grad_enabled
   1851         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1852     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   1853 
   1854 

IndexError: index out of range in self

我已经让token_type_ids = torch.tensor(token_type_ids)全部为0了，但依然不清楚哪里有错，读取数据的代码如下

def collate_fn(data)
    input_ids, attention_mask, token_type_ids = [], [], []
    for x in data:
        text = tokenizer(x[1], text_pair=x[0], padding='max_length', truncation=True, max_length=CFG['max_len'], return_tensors='pt')
        input_ids.append(text['input_ids'].tolist())
        attention_mask.append(text['attention_mask'].tolist())
        token_type_ids.append(text['token_type_ids'].tolist())
    input_ids = torch.tensor(input_ids)
    attention_mask = torch.tensor(attention_mask)
    token_type_ids = torch.tensor(token_type_ids)
    token_type_ids = torch.zeros_like(token_type_ids)
    label = torch.tensor([x[-1] for x in data])
#     print(token_type_ids)
    return input_ids, attention_mask, token_type_ids, label

请问这样需要怎么修改？

from guwenbert.

Lirsakura commented on August 15, 2024

This error can actually be due to different reasons. It is recommended to debug CUDA errors by running the code on the CPU, if possible. If that’s not possible, try to execute the script via:

CUDA_LAUNCH_BLOCKING=1 python [YOUR_PROGRAM]

运行了这一句话之后的报错是
RuntimeError: CUDA error: device-side assert triggered

from guwenbert.

Ethan-yt commented on August 15, 2024

我在colab测试了一下，是没问题的。你可以再看看你的代码，如果还不行，在colab创建一个可以复现问题的最小样例，分享出来，以便更好的定位和解决问题。

https://colab.research.google.com/drive/1rvMqxEz3ayA6b583dGEXNL0Gu1ZgeL6G?usp=sharing

from guwenbert.

Ethan-yt commented on August 15, 2024

Duplicate of #11

from guwenbert.

运行时报错：输入两个句子导致索引越界 about guwenbert HOT 5 CLOSED

Comments (5)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent