eladlev / autoprompt Goto Github PK
View Code? Open in Web Editor NEWA framework for prompt tuning using Intent-based Prompt Calibration
License: Apache License 2.0
A framework for prompt tuning using Intent-based Prompt Calibration
License: Apache License 2.0
Hi, could you provide examples how should look like config_ranking.yml
for generation tasks?
With default config_ranking.yml
(as of now in main branch) it fails with:
Traceback (most recent call last):
File "path_to_repo/AutoPrompt/run_generation_pipeline.py", line 64, in <module>
best_prompt = ranker_pipeline.run_pipeline(opt.num_ranker_steps)
File "path_to_repo/AutoPrompt/optimization_pipeline.py", line 272, in run_pipeline
stop_criteria = self.step()
File "path_to_repo/AutoPrompt/optimization_pipeline.py", line 252, in step
self.eval.add_history(self.cur_prompt, self.task_description)
File "path_to_repo/AutoPrompt/eval/evaluator.py", line 112, in add_history
conf_matrix = confusion_matrix(self.dataset['annotation'],
File "path_to_env/miniconda3/envs/AutoPrompt/lib/python3.10/site-packages/sklearn/metrics/_classification.py", line 317, in confusion_matrix
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "path_to_env/miniconda3/envs/AutoPrompt/lib/python3.10/site-packages/sklearn/metrics/_classification.py", line 95, in _check_targets
raise ValueError(
ValueError: Classification metrics can't handle a mix of unknown and multiclass targets
If I add eval
section to config_ranking.yml
with function_name=ranking:
eval:
function_name: 'ranking'
error_threshold: 4
then it fails with:
Traceback (most recent call last):
File "path_to_repo/AutoPrompt/run_generation_pipeline.py", line 53, in <module>
ranker_pipeline = OptimizationPipeline(ranker_config_params, output_path=os.path.join(opt.output_dump, 'ranker'))
File "path_to_repo/AutoPrompt/optimization_pipeline.py", line 58, in __init__
self.eval = Eval(config.eval, self.meta_chain.error_analysis, self.dataset.label_schema)
File "path_to_repo/AutoPrompt/eval/evaluator.py", line 19, in __init__
self.score_func = self.get_eval_function(config)
File "path_to_repo/AutoPrompt/eval/evaluator.py", line 39, in get_eval_function
return utils.set_ranking_function(config.function_params)
AttributeError: 'EasyDict' object has no attribute 'function_params'. Did you mean: 'function_name'?
I refer to the documentation and use run_generation_pipeline.py to generate an optimized prompt. However, the resulting optimization results are far from the initial prompt, and many details are overlooked. I took a screenshot of the original prompt (which is about parsing the COBOL language and writing the analysis report) and the optimized prompt (the details of parsing are ignored a lot) and the related code
output files
output.log
config_yaml.txt
Please tell me why this is happening and how I can improve it. Thank you
Currently, not all output_schemas support customized output parsers.
For example in the classification output schemes, only json schemas are available, meaning this prompt only works well for the models which support json schema output.
To be compatible with other LLMs, a customized output parser is required.
config/config_default.yml
llm:
type: 'HuggingFacePipeline'
name: 'Qwen-14B-Chat'
max_new_tokens: 4096
cammand:
> python run_pipeline.py
--prompt "Does this movie review contain a spoiler? answer Yes or No"
--task_description "Assistant is an expert classifier that will classify a movie review, and let the user know if it contains a spoiler for the reviewed movie or not."
--num_steps 30
result:
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████| 15/15 [00:07<00:00, 1.96it/s]Using pad_token, but it is not set yet.
The model 'QWenLMHeadModel' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MptForCausalLM', 'MusicgenForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
Is that not support Qwen? The "text-generation" means??
Output of the sample_batches doesn't have the sample keys
samples_list = [
element for sublist in samples_batches for element in sublist["samples"]
]
Although OpenAI is currently the most cutting-edge llm, there is no doubt that using gemini can lower the threshold for use
Argilla server version 1.25.0
I have updated the code repository to the latest version. I see that the client version is also 1.25.0, but the service started well and reported the following error:
┌───────────────────── Traceback (most recent call last) ─────────────────────┐
│ C:\code\owncode\AutoPrompt\run_pipeline.py:44 in <module> │
│ │
│ 41 pipeline = OptimizationPipeline(config_params, task_description, initi │
│ 42 if (opt.load_path != ''): │
│ 43 │ pipeline.load_state(opt.load_path) │
│ > 44 best_prompt = pipeline.run_pipeline(opt.num_steps) │
│ 45 print('\033[92m' + 'Calibrated prompt score:', str(best_prompt['score' │
│ 46 print('\033[92m' + 'Calibrated prompt:', best_prompt['prompt'] + '\033 │
│ 47 │
│ │
│ C:\code\owncode\AutoPrompt\optimization_pipeline.py:272 in run_pipeline │
│ │
│ 269 │ │ # Run the optimization pipeline for num_steps │
│ 270 │ │ num_steps_remaining = num_steps - self.batch_id │
│ 271 │ │ for i in range(num_steps_remaining): │
│ > 272 │ │ │ stop_criteria = self.step() │
│ 273 │ │ │ if stop_criteria: │
│ 274 │ │ │ │ break │
│ 275 │ │ final_result = self.extract_best_prompt() │
│ │
│ C:\code\owncode\AutoPrompt\optimization_pipeline.py:240 in step │
│ │
│ 237 │ │ │ │ step=self.batch_id) │
│ 238 │ │ │
│ 239 │ │ logging.info('Running annotator') │
│ > 240 │ │ records = self.annotator.apply(self.dataset, self.batch_id) │
│ 241 │ │ self.dataset.update(records) │
│ 242 │ │ │
│ 243 │ │ self.predictor.cur_instruct = self.cur_prompt │
│ │
│ C:\code\owncode\AutoPrompt\estimator\estimator_argilla.py:106 in apply │
│ │
│ 103 │ │ webbrowser.open(url_link) │
│ 104 │ │ while True: │
│ 105 │ │ │ query = "(status:Validated OR status:Discarded) AND metad │
│ > 106 │ │ │ search_results = current_api.search.search_records( │
│ 107 │ │ │ │ name=dataset.name, │
│ 108 │ │ │ │ task=rg_dataset.task, │
│ 109 │ │ │ │ size=0, │
│ │
│ C:\code\owncode\AutoPrompt\venv\Lib\site-packages\argilla\client\apis\searc │
│ h.py:76 in search_records │
│ │
│ 73 │ │ │ url += f"?limit={size}" │
│ 74 │ │ │
│ 75 │ │ query = self._parse_query(query=query) │
│ > 76 │ │ response = self.http_client.post( │
│ 77 │ │ │ path=url, │
│ 78 │ │ │ json={"query": query} if query else None, │
│ 79 │ │ ) │
│ │
│ C:\code\owncode\AutoPrompt\venv\Lib\site-packages\argilla\client\sdk\client │
│ .py:124 in inner │
│ │
│ 121 │ │ @functools.wraps(func) │
│ 122 │ │ def inner(self, *args, **kwargs): │
│ 123 │ │ │ try: │
│ > 124 │ │ │ │ result = func(self, *args, **kwargs) │
│ 125 │ │ │ │ return result │
│ 126 │ │ │ except httpx.ConnectError as err: │
│ 127 │ │ │ │ err_str = f"Your Api endpoint at {self.base_url} is n │
│ │
│ C:\code\owncode\AutoPrompt\venv\Lib\site-packages\argilla\client\sdk\client │
│ .py:191 in post │
│ │
│ 188 │ │ │ *args, │
│ 189 │ │ │ **kwargs, │
│ 190 │ │ ) │
│ > 191 │ │ return build_raw_response(response).parsed │
│ 192 │ │
│ 193 │ @with_httpx_error_handler │
│ 194 │ def put( │
│ │
│ C:\code\owncode\AutoPrompt\venv\Lib\site-packages\argilla\client\sdk\_helpe │
│ rs.py:25 in build_raw_response │
│ │
│ 22 │
│ 23 │
│ 24 def build_raw_response(response: httpx.Response) -> Response[Union[Dic │
│ > 25 │ return build_typed_response(response) │
│ 26 │
│ 27 │
│ 28 ResponseType = TypeVar("ResponseType") │
│ │
│ C:\code\owncode\AutoPrompt\venv\Lib\site-packages\argilla\client\sdk\_helpe │
│ rs.py:34 in build_typed_response │
│ │
│ 31 def build_typed_response( │
│ 32 │ response: httpx.Response, response_type_class: Optional[Type[Respo │
│ 33 ) -> Response[Union[ResponseType, ErrorMessage, HTTPValidationError]]: │
│ > 34 │ parsed_response = check_response(response, expected_response=respo │
│ 35 │ if response_type_class: │
│ 36 │ │ parsed_response = response_type_class(**parsed_response) │
│ 37 │ return Response( │
│ │
│ C:\code\owncode\AutoPrompt\venv\Lib\site-packages\argilla\client\sdk\_helpe │
│ rs.py:63 in check_response │
│ │
│ 60 │ │ │ message=message, │
│ 61 │ │ │ response=response.content, │
│ 62 │ │ ) │
│ > 63 │ handle_response_error(response, **kwargs) │
│ 64 │
│ │
│ C:\code\owncode\AutoPrompt\venv\Lib\site-packages\argilla\client\sdk\common │
│ s\errors_handler.py:63 in handle_response_error │
│ │
│ 60 │ │ error_type = GenericApiError │
│ 61 │ else: │
│ 62 │ │ raise HttpResponseError(response=response) │
│ > 63 │ raise error_type(**error_args) │
│ 64 │
└─────────────────────────────────────────────────────────────────────────────┘
NotFoundApiError: Argilla server returned an error with http status: 404. Error
details: {'response': 'Not Found'}
Can this project use the source LLM? Such as Xcomposer or LLama? Have you test these LLMs in paper?
Hi, with the latest changes I got a new error when run run_generation_pipeline.py
Traceback (most recent call last):
File "path_to_repo/AutoPrompt/run_generation_pipeline_alena.py", line 64, in <module>
best_prompt = ranker_pipeline.run_pipeline(opt.num_ranker_steps)
File "path_to_repo/AutoPrompt/optimization_pipeline.py", line 272, in run_pipeline
stop_criteria = self.step()
File "path_to_repo/AutoPrompt/optimization_pipeline.py", line 252, in step
self.eval.add_history(self.cur_prompt, self.task_description)
File "path_to_repo/AutoPrompt/eval/evaluator.py", line 115, in add_history
conf_matrix = confusion_matrix(self.dataset['annotation'],
File "path_to_env/AutoPrompt/lib/python3.10/site-packages/sklearn/utils/_param_validation.py", line 214, in wrapper
return func(*args, **kwargs)
File "path_to_env/AutoPrompt/lib/python3.10/site-packages/sklearn/metrics/_classification.py", line 340, in confusion_matrix
raise ValueError("At least one label specified must be in y_true")
ValueError: At least one label specified must be in y_true
config_ranking.yml
and config_generation.yml
are not modified.
config_default.yml
is
use_wandb: False
dataset:
name: 'dataset'
records_path: null
initial_dataset: ''
label_schema: ["Yes", "No"]
max_samples: 5
semantic_sampling: False # Change to True in case you don't have M1. Currently there is an issue with faiss and M1
# annotator:
# method : 'argilla'
# config:
# api_url: ''
# api_key: 'admin.apikey'
# workspace: 'admin'
# time_interval: 5
annotator:
method: 'llm'
config:
llm:
type: 'OpenAI'
name: 'gpt-3.5-turbo-0613'
instruction: 'Assess whether the text contains a harmful topic.
Answer Yes if it does and No otherwise.'
num_workers: 2
prompt: 'prompts/predictor_completion/prediction.prompt'
mini_batch_size: 1
mode: 'annotation'
predictor:
method : 'llm'
config:
llm:
type: 'OpenAI'
name: 'gpt-3.5-turbo-0613'
# async_params:
# retry_interval: 10
# max_retries: 2
model_kwargs: {"seed": 220}
num_workers: 2
prompt: 'prompts/predictor_completion/prediction.prompt'
mini_batch_size: 1 #change to >1 if you want to include multiple samples in the one prompt
mode: 'prediction'
meta_prompts:
folder: 'prompts/meta_prompts_classification'
num_err_prompt: 1 # Number of error examples per sample in the prompt generation
num_err_samples: 2 # Number of error examples per sample in the sample generation
history_length: 4 # Number of sample in the meta-prompt history
num_generated_samples: 10 # Number of generated samples at each iteration
num_initialize_samples: 10 # Number of generated samples at iteration 0, in zero-shot case
samples_generation_batch: 10 # Number of samples generated in one call to the LLM
num_workers: 5 #Number of parallel workers
warmup: 4 # Number of warmup steps
eval:
function_name: 'accuracy'
num_large_errors: 4
num_boundary_predictions : 0
error_threshold: 0.5
llm:
type: 'OpenAI'
name: 'gpt-3.5-turbo-0613'
temperature: 0.8
stop_criteria:
max_usage: 2 #In $ in case of OpenAI models, otherwise number of tokens
patience: 3 # Number of patience steps
min_delta: 0.05 # Delta for the improvement definition
I run command
python run_generation_pipeline.py \
--prompt "Write a good and comprehensive movie review about a specific movie." \
--task_description "Assistant is a large language model that is tasked with writing movie reviews."
Hi! I tried to run the pipeline using Azure Open AI, using LLM as annotator, but got this error.
Processing samples: 100%|##########| 1/1 [00:24<00:00, 24.04s/it]
Traceback (most recent call last):
File "prompt_model\AutoPrompt\run_pipeline.py", line 44, in <module>
best_prompt = pipeline.run_pipeline(opt.num_steps)
File "prompt_model\AutoPrompt\optimization_pipeline.py", line 274, in run_pipel
stop_criteria = self.step()
File "prompt_model\AutoPrompt\optimization_pipeline.py", line 233, in step
self.generate_initial_samples()
File "prompt_model\AutoPrompt\optimization_pipeline.py", line 194, in generate_tial_samples
samples_list = [element for sublist in samples_batches for element in sublist['samples']]
File "prompt_model\AutoPrompt\optimization_pipeline.py", line 194, in <listcomp
samples_list = [element for sublist in samples_batches for element in sublist['samples']]
KeyError: 'samples'
I have a problem when i run run_pipeline.py
C:\ProgramData\Anaconda3\envs\AutoPrompt\python.exe E:\AutoPrompt\run_pipeline.py
C:\ProgramData\Anaconda3\envs\AutoPrompt\lib\site-packages\transformers\utils\generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
Describe the task: 我要编写一个web小游戏程序
Initial prompt: 0
C:\ProgramData\Anaconda3\envs\AutoPrompt\lib\site-packages\langchain_core_api\deprecation.py:117: LangChainDeprecationWarning: The class langchain_community.chat_models.openai.ChatOpenAI
was deprecated in langchain-community 0.0.10 and will be removed in 0.2.0. An updated version of the class exists in the langchain-openai package and should be used instead. To use it run pip install -U langchain-openai
and import as from langchain_openai import ChatOpenAI
.
warn_deprecated(
Starting step 0
Dataset is empty generating initial samples
Processing samples: 100%|██████████| 1/1 [00:04<00:00, 4.61s/it]
Processing samples: 0it [00:00, ?it/s]
┌───────────────────── Traceback (most recent call last) ─────────────────────┐
│ E:\AutoPrompt\run_pipeline.py:44 in │
│ │
│ 41 pipeline = OptimizationPipeline(config_params, task_description, initi │
│ 42 if (opt.load_path != ''): │
│ 43 │ pipeline.load_state(opt.load_path) │
│ > 44 best_prompt = pipeline.run_pipeline(opt.num_steps) │
│ 45 print('\033[92m' + 'Calibrated prompt score:', str(best_prompt['score' │
│ 46 print('\033[92m' + 'Calibrated prompt:', best_prompt['prompt'] + '\033 │
│ 47 │
│ │
│ E:\AutoPrompt\optimization_pipeline.py:272 in run_pipeline │
│ │
│ 269 │ │ # Run the optimization pipeline for num_steps │
│ 270 │ │ num_steps_remaining = num_steps - self.batch_id │
│ 271 │ │ for i in range(num_steps_remaining): │
│ > 272 │ │ │ stop_criteria = self.step() │
│ 273 │ │ │ if stop_criteria: │
│ 274 │ │ │ │ break │
│ 275 │ │ final_result = self.extract_best_prompt() │
│ │
│ E:\AutoPrompt\optimization_pipeline.py:252 in step │
│ │
│ 249 │ │ self.eval.eval_score() │
│ 250 │ │ logging.info('Calculating Score') │
│ 251 │ │ large_errors = self.eval.extract_errors() │
│ > 252 │ │ self.eval.add_history(self.cur_prompt, self.task_description) │
│ 253 │ │ if self.config.use_wandb: │
│ 254 │ │ │ large_errors = large_errors.sample(n=min(6, len(large_err │
│ 255 │ │ │ correct_samples = self.eval.extract_correct() │
│ │
│ E:\AutoPrompt\eval\evaluator.py:126 in add_history │
│ │
│ 123 │ │ analysis = self.analyzer.invoke(prompt_input) │
│ 124 │ │ │
│ 125 │ │ self.history.append({'prompt': prompt, 'score': self.mean_sco │
│ > 126 │ │ │ │ │ │ │ 'errors': self.errors, 'confusion_matrix │
│ 127 │ │
│ 128 │ def extract_errors(self) -> pd.DataFrame: │
│ 129 │ │ """ │
└─────────────────────────────────────────────────────────────────────────────┘
TypeError: 'NoneType' object is not subscriptable
Process finished with exit code 1
I don't konw how to solve this,can you help me?
While attempting to execute the code, I encountered the following error message: "Process finished with exit code 137 (interrupted by signal 9: SIGKILL)". Prior to this error, the following log was observed:
"Starting step 0
Dataset is empty; generating initial samples
Processing samples: 0%| | 0/1 [00:00<?, ?it/s]
Setting pad_token_id to eos_token_id:50256 for open-end generation.
Processing samples: 100%|██████████| 1/1 [00:13<00:00, 13.33s/it]
Special tokens have been added to the vocabulary; ensure the associated word embeddings are fine-tuned or trained."
The failure occurred in line 53 of the estimator_llm file:
self.chain = ChainWrapper(self.opt.llm, self.opt.prompt, chain_metadata['json_schema'],
chain_metadata['parser_func'])
This code is being executed on my Ubuntu 20.04 system using HuggingFacePipeline, with attempts made using various Large Language Models. Upon researching the error message online, it appears to be related to a memory issue. Could you please provide guidance on how to address this problem?
Thank you.
ValidationError: 1 validation error for LLMChain
llm
none is not an allowed value (type=type_error.none.not_allowed)
Is there any plan to support local offline models?
I have a prompt which is used to generate sql query from the input text given by a user.
I am trying to optimize prompt using run_generation_pipeline.py, but I am getting completely different Calibrated prompt.
Below are the inputs provided:
--task_description:
Assistant is a large language model that is tasked to generate SQL query based on details and examples provided in prompt.
--prompt:
We have 2 tables:
Employee: Employee table have information regarding all the employees in a company.
Below are the attributes of Employee table
empid: empid column contains employee id. empid is a primary key of Employee table.
name: Name column contains name of the employee
salary: salary column contains salary of the employee
department_id: department_id column contains employee's department id. It is a foreign key from Department table.
Department: Department table have information regarding all the department of a company.
Below are the attributes of Department table
department_id: department_id contains the id of the department. department_id is primary key of Department table.
department_name: department_name contains name of the department.
***Below are few examples***:
##Example 1
user query: what is empid of employees in department A?
output: Select Employee.empid
From Employee
Join Department
on Employee.department_id = Department.department_id
Where Department.department_id = 'A';
##Example 2
user query: what is salary of employee with empid=1?
output: Select salary
From Employee
Where empid = 1;
** End of Examples **
Your task is to generate SQL query from natural language input provided by user.
Your task is to understand natural language input and provide SQL query to fetch information asked in natural language input from above tables.
annotator instruction in config_default.yml:
instruction:
'We have two tables Employee and Department.
Employee table have empid, name, salary, department_id as columns
Department table have department_id, department_name as columns
You will be given a query in natural language and its interpreted sql query to fetch data from above table.
Asses interpreted SQL query with respect to natural language input and table provided. Answer 1 if SQL query is relevant
and correct otherwise 0.'
output given by AutoPrompt:
Calibrated prompt score: 1.0
Calibrated prompt: Your task is to generate accurate and context-specific SQL queries based on natural language input provided by the user. Please include specific examples of natural language input and the corresponding expected SQL queries. Additionally, describe the database schema and table structure to provide more context for query generation. Aim for a higher score by improving the model's understanding and accuracy in generating SQL queries.
Output given is not relevant to the task. Am I providing the wrong inputs or missing some inputs that needs to be provided?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.