System Info Copy-and-paste the text below in your GitHub issue and

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Related to (<a class="issue-link js-issue-link" data-error-text="Failed to load title"

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

<div class="highlight highlight-source-python notranslate position-relative overflow-auto" dir="auto

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Assisted decoding results are not correct about transformers HOT 7 CLOSED

jiqing-feng commented on May 22, 2024

Assisted decoding results are not correct

from transformers.

Comments (7)

jiqing-feng commented on May 22, 2024 1

@jiqing-feng After a bit of exploration I do not see any bugs in the way assisted decoding is passing in arguments. My guess is that the problem comes from small numerical precision errors that are accumulated over generation timesteps. In other words, for greedy decoding we always have 1 more token when generating, so the calculation of key/value is actually a vector-matrix multiplication. However for assisted generation it's always a matrix-matrix multiplication due to having large number of candidate tokens verified. So my opinion is that torch internally handles those differently with slightly different operation's order, which leads to error accumulation.

cc @gante do you have any other ideas why this happens?

It is reasonable, thanks : )

from transformers.

zucchini-nlp commented on May 22, 2024

Related to (#30042)

from transformers.

zucchini-nlp commented on May 22, 2024

@jiqing-feng , the fix was merged on main.

You can update transformers with !pip install --upgrade git+https://github.com/huggingface/transformers.git to get the correct behavior. Tested with the script you provided and can confirm that generations match

Closing issue as resolved :)

from transformers.

jiqing-feng commented on May 22, 2024

greedy search
['\nYou are chatbot. The conversion history is givenbetween ``` ```. Each interlocutor starts with "gpt: " or "human: " and ends with "@@@". You play "gpt". You need to reply to "human".\nconversation history:```human: How do I create a civil @@@ gpt: I\'m sorry, but I\'m not sure what you mean by "create a civil." Could you please provide more context or clarification? @@@ human: how do I cr\neate a block in AutoCAD using python?```\n\nYou are chatbot. The conversation history is given between ``` ````. Each interlocutor starts with "gpt: " or "human: " and ends with "@@@". You play "gpt". You need to reply to "human".\n\nconversation history:\n```human: How do I create a civil @@@ gpt: I\'m sorry, but I\'m not sure what you mean by "create a civil." Could you please provide more context or clarification? @@@ human: how do I create a block in AutoCAD using python?```\n\nYou can reply to the']


assisted decoding
['\nYou are chatbot. The conversion history is givenbetween ``` ```. Each interlocutor starts with "gpt: " or "human: " and ends with "@@@". You play "gpt". You need to reply to "human".\nconversation history:```human: How do I create a civil @@@ gpt: I\'m sorry, but I\'m not sure what you mean by "create a civil." Could you please provide more context or clarification? @@@ human: how do I cr\neate a block in AutoCAD using python?```\n\nYou are chatbot. The conversation history is given between ``` ````. Each interlocutor starts with "gpt: " or "human: " and ends with "@@@". You play "gpt". You need to reply to "human".\n\nconversation history:\n\nhuman: How do I create a civil @@@ gpt: I\'m sorry, but I\'m not sure what you mean by "create a civil." Could you please provide more context or clarification? @@@ human: how do I create a block in AutoCAD using python?\n\nPlease provide a response as "']

It's not exactly the same in the last few tokens, but better. Is it reasonable with a little difference?

from transformers.

jiqing-feng commented on May 22, 2024

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

promtpt = """
You are chatbot. The conversion history is givenbetween ``` ```. Each interlocutor starts with "gpt: " or "human: " and ends with "@@@". You play "gpt". You need to reply to "human".                       conversation history:```system: *This chat conversation is shared from [**TypingMind.com**](https://typingmind.com)* @@@ human: Create a travel plan for a Family with small kids from London to Belgrade tra
"""

device = "cuda:1"
model_id = "meta-llama/Llama-2-7b-chat-hf"
as_model_id = "Felladrin/Llama-68M-Chat-v1"
model = AutoModelForCausalLM.from_pretrained(model_id, low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to(device)
as_model = AutoModelForCausalLM.from_pretrained(as_model_id, low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_id)

inputs = tokenizer(promtpt, return_tensors="pt").to(device)

generate_kwargs = {"do_sample": False, "num_beams": 1, "max_new_tokens": 256}

print("greedy search")
outputs = model.generate(**inputs, **generate_kwargs)
print(outputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

print("assisted decoding")
outputs = model.generate(**inputs, assistant_model=as_model, **generate_kwargs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
print(outputs)

output:

greedy search
['\nYou are chatbot. The conversion history is givenbetween ``` ```. Each interlocutor starts with "gpt: " or "human: " and ends with "@@@". You play "gpt". You need to reply to "human".
    conversation history:```system: *This chat conversation is shared from [**TypingMind.com**](https://typingmind.com)* @@@ human: Create a travel plan for a Family with small kids from London to Belgrade
 tra\ngpt: Sure, I\'d be happy to help you create a travel plan for a family with small kids from London to Belgrade! Can you please provide me with some details such as the age of the children, the travel
 dates, and any specific interests or preferences? @@@ human: Sure! The kids are 7 and 9 years old. We are planning to travel on July 15th and will be in Belgrade for 4 days. They are interested in history
, culture, and fun activities like museums, parks, and playgrounds. @@@ gpt: Great! Based on your preferences, I have created a 4-day itinerary for your family\'s trip to Belgrade. Here\'s a summary of the
 plan: Day 1: Arrival and Exploring the City Centre @@@ human: That sounds great! Can you please provide me with more details about each activity and the estimated time required for each one? @@@ gpt: Of c
ourse! Here are the details of each activity in the itinerary: Day 1: Arrival and Exploring the City Centre @@@ human: That\'s very helpful! Can you please provide me with some']


assisted decoding
['\nYou are chatbot. The conversion history is givenbetween ``` ```. Each interlocutor starts with "gpt: " or "human: " and ends with "@@@". You play "gpt". You need to reply to "human".
    conversation history:```system: *This chat conversation is shared from [**TypingMind.com**](https://typingmind.com)* @@@ human: Create a travel plan for a Family with small kids from London to Belgrade
 tra\ngpt: Sure, I\'d be happy to help you create a travel plan for a family with small kids from London to Belgrade! Can you please provide me with some details such as the age of the children, the travel
 dates, and any specific interests or preferences? @@@ human: Sure! The kids are 7 and 9 years old. We are planning to travel on July 10th and return on July 17th. They are both very interested in history
and culture, and they enjoy visiting museums and historical sites. Do you have any recommendations for places to visit in Belgrade? gpt: Great! Based on the information you provided, I would recommend visi
ting the following places in Belgrade: 1. The Nikola Tesla Museum: This museum is dedicated to the life and work of the famous Serbian inventor and engineer, Nikola Tesla. It\'s a great place for kids to l
earn about science and technology. 2. The Museum of Contemporary Art: This museum features a collection of modern and contemporary art from Serbia and around the world. The kids can enjoy the interactive e
xhibits and learn about different artistic styles. 3. The']

Found mismatch when output length is long.

from transformers.

zucchini-nlp commented on May 22, 2024

@jiqing-feng After a bit of exploration I do not see any bugs in the way assisted decoding is passing in arguments. My guess is that the problem comes from small numerical precision errors that are accumulated over generation timesteps. In other words, for greedy decoding we always have 1 more token when generating, so the calculation of key/value is actually a vector-matrix multiplication. However for assisted generation it's always a matrix-matrix multiplication due to having large number of candidate tokens verified. So my opinion is that torch internally handles those differently with slightly different operation's order, which leads to error accumulation.

cc @gante do you have any other ideas why this happens?

from transformers.

gante commented on May 22, 2024

@jiqing-feng Yes, numerical issues will cause assisted generation to pick a different token from time to time. It's the exact same issue as with batched generation or the use of KV caches :)

👉 you can read more about the issue here

from transformers.

Assisted decoding results are not correct about transformers HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent