Comments (5)
@aFernandezEspinosa No idea 🤷♀️
from transformers.
Hi @aFernandezEspinosa, thanks for raising an issue!
As these two code examples are almost exactly the same, it's very difficult for us to be able to infer or help debug here. If you run the second code example i.e. without the print statement, with the T5Config
import and without any additional code does it run without issue?
from transformers.
Hi @aFernandezEspinosa, thanks for raising an issue!
As these two code examples are almost exactly the same, it's very difficult for us to be able to infer or help debug here. If you run the second code example i.e. without the print statement, with the
T5Config
import and without any additional code does it run without issue?
Hi @amyeroberts
import faiss
import pickle
import openai
import numpy as np
import os
import sys
from transformers import T5ForConditionalGeneration, AutoTokenizer
# Set the environment variable to avoid OpenMP conflicts
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
# Load the generation model and tokenizer
generation_model_name = "t5-small"
generation_tokenizer = AutoTokenizer.from_pretrained(generation_model_name)
generation_model = T5ForConditionalGeneration.from_pretrained(generation_model_name)
print("Something is happening")
# Get the parent directory
parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), os.pardir))
# Add the parent directory to sys.path
sys.path.append(parent_dir)
import config
try:
with open(".openaikey", "r") as f:
openai_api_key = f.read()
except:
raise ValueError("Could not read OpenAI API key. Make sure there there is a file named .openaikey containing the key")
if openai_api_key is None:
raise ValueError("OpenAI API key is missing")
openai.api_key = openai_api_key
# Load the FAISS index and documents list from disk
index = faiss.read_index("faiss_index.bin")
with open("documents.pkl", "rb") as f:
documents = pickle.load(f)
# Function to get embeddings from OpenAI
def get_openai_embeddings(texts):
response = openai.embeddings.create(
input=texts,
model="text-embedding-ada-002"
)
response_dict = response.to_dict()
embeddings = [item['embedding'] for item in response_dict['data']]
return np.array(embeddings)
# Function to retrieve the most relevant document
def retrieve_most_relevant(query, index, documents):
query_embedding = get_openai_embeddings([query])[0]
D, I = index.search(np.array([query_embedding]), k=1) # Retrieve the most relevant document
relevant_doc = documents[I[0][0]]
return relevant_doc
goal = "Navigate to search"
relevant_doc = retrieve_most_relevant(goal, index, documents)
#print(relevant_doc)
# Function to generate Appium commands based on the relevant document
def generate_appium_commands(goal, relevant_doc):
print(goal)
input_text = f"goal: {goal} context: {relevant_doc}"
inputs = generation_tokenizer(input_text, return_tensors='pt', max_length=512, truncation=True)
outputs = generation_model.generate(**inputs, max_new_tokens=50)
commands = generation_tokenizer.decode(outputs[0], skip_special_tokens=True)
return commands.split('\n')
commands = generate_appium_commands(goal, relevant_doc)
print(commands)
This is my script, I have attempted moving the initialization of the tokenizer and the T5ForConditionalGeneration in different places, I'm also using the print print("Something is happening")
to troubleshoot the exact location of the error, and it's always breaking on the call
generation_model = T5ForConditionalGeneration.from_pretrained(generation_model_name)
Hopefully this helps to provide more clarity
from transformers.
Hi @aFernandezEspinosa, thanks for sharing your script, I was able to replicate the seg fault.
I was able to isolate the issue to the faiss
import. If I remove this, the following lines will run without issue:
import pickle
import openai
import numpy as np
import os
import sys
from transformers import T5ForConditionalGeneration, AutoTokenizer
# Set the environment variable to avoid OpenMP conflicts
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
# Load the generation model and tokenizer
generation_model_name = "t5-small"
generation_tokenizer = AutoTokenizer.from_pretrained(generation_model_name)
generation_model = T5ForConditionalGeneration.from_pretrained(generation_model_name)
print("Something is happening")
from transformers.
Hi @aFernandezEspinosa, thanks for sharing your script, I was able to replicate the seg fault.
I was able to isolate the issue to the
faiss
import. If I remove this, the following lines will run without issue:import pickle import openai import numpy as np import os import sys from transformers import T5ForConditionalGeneration, AutoTokenizer # Set the environment variable to avoid OpenMP conflicts os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE" # Load the generation model and tokenizer generation_model_name = "t5-small" generation_tokenizer = AutoTokenizer.from_pretrained(generation_model_name) generation_model = T5ForConditionalGeneration.from_pretrained(generation_model_name) print("Something is happening")
oh interesting, I'll give it a try, do you know why this might be happening? Moving the faiss import after the transformers import also fixes the issue, that's interesting
from transformers.
Related Issues (20)
- [Error] with Trainer: TypeError: Unsupported types (<class 'NoneType'>) passed to `_gpu_broadcast_one`.
- Extra dataset features not passing to the custom collator HOT 3
- max_length calculation for padding the generation outputs in the Seq2SeqTrainer prediction_step function HOT 2
- cannot import name 'Conversation' from 'transformers' HOT 1
- Unrecognized configuration class ChameleonConfig HOT 4
- Using Trainer + a pretrained tokenizer + 4D attention mask is extremely slow
- gemma2 + flash atten Error: RuntimeError: linalg.vector_norm: Expected a floating point or complex tensor as input. Got Long HOT 3
- Licence HOT 2
- ValueError: The checkpoint you are trying to load has model type `chameleon` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date. HOT 1
- Gemma template won't end with eos_token HOT 6
- LlavaNextVideo always assumes left padding when batch size is 1 HOT 1
- _prepare_4d_causal_attention_mask mask inversion should work boolean masks HOT 1
- Output from model.Generate & model.forward not same when output attention/hidden_state is True
- Metadata HOT 1
- RuntimeError: Failed to import transformers.pipelines because of the following error (look up to see its traceback): module 'tensorflow' has no attribute 'data' HOT 1
- Exception raised when running `T5-like span-masked language modeling` example in `examples/flax/language-modeling/` HOT 2
- TF Lite model created from TFWhisperForConditionalGeneration.from_pretrained craches HOT 2
- Table question answering pipeline failing to save HOT 3
- No module named 'transformers.modeling_flash_attention_utils' HOT 1
- Is apply_chat_template support function call usage? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.