Giter Site home page Giter Site logo

Comments (7)

jiqing-feng avatar jiqing-feng commented on May 22, 2024 1

@jiqing-feng After a bit of exploration I do not see any bugs in the way assisted decoding is passing in arguments. My guess is that the problem comes from small numerical precision errors that are accumulated over generation timesteps. In other words, for greedy decoding we always have 1 more token when generating, so the calculation of key/value is actually a vector-matrix multiplication. However for assisted generation it's always a matrix-matrix multiplication due to having large number of candidate tokens verified. So my opinion is that torch internally handles those differently with slightly different operation's order, which leads to error accumulation.

cc @gante do you have any other ideas why this happens?

It is reasonable, thanks : )

from transformers.

zucchini-nlp avatar zucchini-nlp commented on May 22, 2024

Related to (#30042)

from transformers.

zucchini-nlp avatar zucchini-nlp commented on May 22, 2024

@jiqing-feng , the fix was merged on main.

You can update transformers with !pip install --upgrade git+https://github.com/huggingface/transformers.git to get the correct behavior. Tested with the script you provided and can confirm that generations match

Closing issue as resolved :)

from transformers.

jiqing-feng avatar jiqing-feng commented on May 22, 2024
greedy search
['\nYou are chatbot. The conversion history is givenbetween ``` ```. Each interlocutor starts with "gpt: " or "human: " and ends with "@@@". You play "gpt". You need to reply to "human".\nconversation history:```human: How do I create a civil @@@ gpt: I\'m sorry, but I\'m not sure what you mean by "create a civil." Could you please provide more context or clarification? @@@ human: how do I cr\neate a block in AutoCAD using python?```\n\nYou are chatbot. The conversation history is given between ``` ````. Each interlocutor starts with "gpt: " or "human: " and ends with "@@@". You play "gpt". You need to reply to "human".\n\nconversation history:\n```human: How do I create a civil @@@ gpt: I\'m sorry, but I\'m not sure what you mean by "create a civil." Could you please provide more context or clarification? @@@ human: how do I create a block in AutoCAD using python?```\n\nYou can reply to the']


assisted decoding
['\nYou are chatbot. The conversion history is givenbetween ``` ```. Each interlocutor starts with "gpt: " or "human: " and ends with "@@@". You play "gpt". You need to reply to "human".\nconversation history:```human: How do I create a civil @@@ gpt: I\'m sorry, but I\'m not sure what you mean by "create a civil." Could you please provide more context or clarification? @@@ human: how do I cr\neate a block in AutoCAD using python?```\n\nYou are chatbot. The conversation history is given between ``` ````. Each interlocutor starts with "gpt: " or "human: " and ends with "@@@". You play "gpt". You need to reply to "human".\n\nconversation history:\n\nhuman: How do I create a civil @@@ gpt: I\'m sorry, but I\'m not sure what you mean by "create a civil." Could you please provide more context or clarification? @@@ human: how do I create a block in AutoCAD using python?\n\nPlease provide a response as "']

It's not exactly the same in the last few tokens, but better. Is it reasonable with a little difference?

from transformers.

jiqing-feng avatar jiqing-feng commented on May 22, 2024
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

promtpt = """
You are chatbot. The conversion history is givenbetween ``` ```. Each interlocutor starts with "gpt: " or "human: " and ends with "@@@". You play "gpt". You need to reply to "human".                       conversation history:```system: *This chat conversation is shared from [**TypingMind.com**](https://typingmind.com)* @@@ human: Create a travel plan for a Family with small kids from London to Belgrade tra
"""

device = "cuda:1"
model_id = "meta-llama/Llama-2-7b-chat-hf"
as_model_id = "Felladrin/Llama-68M-Chat-v1"
model = AutoModelForCausalLM.from_pretrained(model_id, low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to(device)
as_model = AutoModelForCausalLM.from_pretrained(as_model_id, low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_id)

inputs = tokenizer(promtpt, return_tensors="pt").to(device)

generate_kwargs = {"do_sample": False, "num_beams": 1, "max_new_tokens": 256}

print("greedy search")
outputs = model.generate(**inputs, **generate_kwargs)
print(outputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

print("assisted decoding")
outputs = model.generate(**inputs, assistant_model=as_model, **generate_kwargs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
print(outputs)

output:

greedy search
['\nYou are chatbot. The conversion history is givenbetween ``` ```. Each interlocutor starts with "gpt: " or "human: " and ends with "@@@". You play "gpt". You need to reply to "human".
    conversation history:```system: *This chat conversation is shared from [**TypingMind.com**](https://typingmind.com)* @@@ human: Create a travel plan for a Family with small kids from London to Belgrade
 tra\ngpt: Sure, I\'d be happy to help you create a travel plan for a family with small kids from London to Belgrade! Can you please provide me with some details such as the age of the children, the travel
 dates, and any specific interests or preferences? @@@ human: Sure! The kids are 7 and 9 years old. We are planning to travel on July 15th and will be in Belgrade for 4 days. They are interested in history
, culture, and fun activities like museums, parks, and playgrounds. @@@ gpt: Great! Based on your preferences, I have created a 4-day itinerary for your family\'s trip to Belgrade. Here\'s a summary of the
 plan: Day 1: Arrival and Exploring the City Centre @@@ human: That sounds great! Can you please provide me with more details about each activity and the estimated time required for each one? @@@ gpt: Of c
ourse! Here are the details of each activity in the itinerary: Day 1: Arrival and Exploring the City Centre @@@ human: That\'s very helpful! Can you please provide me with some']


assisted decoding
['\nYou are chatbot. The conversion history is givenbetween ``` ```. Each interlocutor starts with "gpt: " or "human: " and ends with "@@@". You play "gpt". You need to reply to "human".
    conversation history:```system: *This chat conversation is shared from [**TypingMind.com**](https://typingmind.com)* @@@ human: Create a travel plan for a Family with small kids from London to Belgrade
 tra\ngpt: Sure, I\'d be happy to help you create a travel plan for a family with small kids from London to Belgrade! Can you please provide me with some details such as the age of the children, the travel
 dates, and any specific interests or preferences? @@@ human: Sure! The kids are 7 and 9 years old. We are planning to travel on July 10th and return on July 17th. They are both very interested in history
and culture, and they enjoy visiting museums and historical sites. Do you have any recommendations for places to visit in Belgrade? gpt: Great! Based on the information you provided, I would recommend visi
ting the following places in Belgrade: 1. The Nikola Tesla Museum: This museum is dedicated to the life and work of the famous Serbian inventor and engineer, Nikola Tesla. It\'s a great place for kids to l
earn about science and technology. 2. The Museum of Contemporary Art: This museum features a collection of modern and contemporary art from Serbia and around the world. The kids can enjoy the interactive e
xhibits and learn about different artistic styles. 3. The']

Found mismatch when output length is long.

from transformers.

zucchini-nlp avatar zucchini-nlp commented on May 22, 2024

@jiqing-feng After a bit of exploration I do not see any bugs in the way assisted decoding is passing in arguments. My guess is that the problem comes from small numerical precision errors that are accumulated over generation timesteps. In other words, for greedy decoding we always have 1 more token when generating, so the calculation of key/value is actually a vector-matrix multiplication. However for assisted generation it's always a matrix-matrix multiplication due to having large number of candidate tokens verified. So my opinion is that torch internally handles those differently with slightly different operation's order, which leads to error accumulation.

cc @gante do you have any other ideas why this happens?

from transformers.

gante avatar gante commented on May 22, 2024

@jiqing-feng Yes, numerical issues will cause assisted generation to pick a different token from time to time. It's the exact same issue as with batched generation or the use of KV caches :)

👉 you can read more about the issue here

from transformers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.