jondurbin / airoboros Goto Github PK

Customizable implementation of the self-instruct paper.

License: Apache License 2.0

Python 99.35% Shell 0.65%

airoboros's Introduction

airoboros: using large language models to fine-tune large language models

This is my take on implementing the Self-Instruct paper. The approach is quite heavily modified, and does not use any human-generated seeds.

This updated implementation supports either the /v1/completions endpoint or /v1/chat/completions, which is particularly useful in that it supports gpt-4 and gpt-3.5-turbo (which is 1/10 the cost of text-davinci-003).

Huge thank you to the folks over at a16z for sponsoring the costs associated with building models and associated tools!

Install

via pip:

pip install --no-build-isolation airoboros

from source (keeping the source):

git clone https://github.com/jondurbin/airoboros
pip install -e --no-build-isolation ./airoboros

Key differences from self-instruct/alpaca

support for either /v1/completions or /v1/chat/completions APIs (which allows gpt-3.5-turbo instead of text-davinci-003, as well as gpt-4 if you have access)
support for custom topics list, custom topic generation prompt, or completely random topics
in-memory vector db (Chroma) for similarity comparison, which is much faster than calculating rouge score for each generated instruction
(seemingly) better prompts, which includes injection of random topics to relate the instructions to, which creates much more diverse synthetic instructions
asyncio producers with configurable batch size
several "instructors", each targetting specific use-cases, such as Orca style reasoning/math, role playing, etc.
tries to ensure the context, if provided, is relevant to the topic and contains all the information that would be necessary to respond to the instruction, and nost just a link to article/etc.
generally speaking, this implementation tries to reduce some of the noise

Goal of this project

Problem and proposed solution:

Models can only ever be as good as the data they are trained on.
High quality data is difficult to curate manually, so ideally the process can be automated by AI/LLMs.
Large models (gpt-4, etc.) are pricey to build/run and out of reach for individuals/small-medium business, and are subject to RLHF bias, censorship, and changes without notice.
Smaller models (llama-2-70b, etc.) can reach somewhat comparable performance in specific tasks to much larger models when trained on high quality data.
The airoboros tool allows building datasets that are focused on specific tasks, which can then be used to build a plethora of individual expert models. This means we can crowdsource building experts.
Using either a classifier model, or simply calculating vector embeddings for each item in the dataset and using faiss index/cosine similarity/etc. search, incoming requests can be routed to a particular expert (e.g. dynamically loading LoRAs) to get extremely high quality responses.

Progress:

✅ PoC that training via self-instruction, that is, datasets generated from language models, works reasonably well.
✅ Iterate on the PoC to use higher quality prompts, more variety of instructions, etc.
✅ Split the code into separate "instructors", for specializing in any particular task (creative writing, songs, roleplay, coding, execution planning, function calling, etc.)
[in progress]: PoC that an ensemble of LoRAs split by the category (i.e., the instructor used in airoboros) has better performance than the same param count model tuned on all data
[in progress]: Remove the dependency on OpenAI/gpt-4 to generate the training data so all datasets can be completely free and open source.
[future]: Automatic splitting of experts at some threshold, e.g. "coding" is split into python, js, golang, etc.
[future]: Hosted service/site to build and/or extend datasets or models using airoboros.
[future]: Depending on success of all of the above, potentially a hosted inference option with an exchange for private/paid LoRAs.

LMoE

LMoE is the simplest architecture I can think of for a mixture of experts. It doesn't use a switch transformer, doesn't require slicing and merging layers with additional fine-tuning, etc. It just dynamically loads the best PEFT/LoRA adapter model based on the incoming request.

By using this method, we can theoretically crowdsource generation of dozens (or hundreds/thousands?) of very task-specific adapters and have an extremely powerful ensemble of models with very limited resources on top of a single base model (llama-2 7b/13b/70b).

Tuning the experts

The self-instruct code contained within this project uses many different "instructors" to generate training data to accomplish specific tasks. The output includes the instructor/category that generated the data. We can use this to automatically segment the training data to fine-tune specific "experts".

See scripts/segment_experts.py for an example of how the training data can be segmented, with a sampling of each other expert in the event of misrouting.

See scripts/tune_expert.py for an example of creating the adapter models (with positional args for expert name, model size, etc.)

NOTE: this assumes use of my fork of qlora https://github.com/jondurbin/qlora

Routing requests to the expert

The "best" routing mechanism would probably be to train a classifier based on the instructions for each category, with the category/expert being the label, but that prohibits dynamic loading of new experts.

Instead, this supports 3 options:

faiss index similarity search using the training data for each expert (default)
agent-based router using the "function" expert (query the LLM with a list of available experts and their descriptions, ask which would be best based on the user's input)
specify the agent in the JSON request

Running the API server

First, download the base llama-2 model for whichever model size you want, e.g.: llama-2-7b-hf

Next, download the LMoE package that corresponds to that base model, e.g.: airoboros-lmoe-7b-2.1

NOTE: 13b also available, 70b in progress

Here's an example command to start the server:

python -m airoboros.lmoe.api \
  --base-model ./llama-2-7b-hf \
  --lmoe ./airoboros-lmoe-7b-2.1 \
  --router-max-samples 1000 \
  --router-k 25 \
  --port 8000 \
  --host 127.0.0.1

to use the agent-based router, add --agent-router to the arguments

This uses flash attention via bettertransformers (in optimum). You may need to install torch nightly if you see an error like 'no kernel available', e.g.:

pip install -U --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118

Once started, you can infer using the same API scheme you'd query OpenAI API with, e.g.:

curl -H 'content-type: application/json' http://127.0.0.1:8000/v1/chat/completions -d '
{
  "model": "llama-2-7b-hf",
  "temperature": 0.7,
  "max_tokens": 2048,
  "messages": [
    {
      "role": "system",
      "content": "A chat."
    },
    {
      "role": "user",
      "content": "How much wood would a woodchuck chuck if a woodchuck could chuck wood?"
    }
  ]
}'

I've also added an vllm-based server, but the results aren't quite as good (not sure why yet). To use it, make sure you install vllm and fschat, or pip install airoboros[vllm]

python -m airoboros.lmoe.vllm \
  --model ./llama-2-7b-hf \
  --lmoe-path ../airoboros-lmoe-7b-2.1 \
  --router-max-samples 100 \
  --router-k 25 \
  --port 8000 \
  --host 127.0.0.1

Generating instructions

NEW - 2023-07-18

To better accommodate the plethora of options, the configuration has been moved to a YAML config file.

Please create a copy of example-config.yaml and configure as desired.

Once you have the desired configuration, run:

airoboros generate-instructions --config-path /path/to/config.yaml

Generating topics

NEW - 2023-07-18

Again, this is now all YAML configuration based! Please create a customized version of the YAML config file, then run:

airoboros generate-topics --config-path /path/to/config.yaml

You can override the topic_prompt string in the configuration to use a different topic generation prompt.

Support the work

https://bmc.link/jondurbin

ETH 0xce914eAFC2fe52FdceE59565Dd92c06f776fcb11

BTC bc1qdwuth4vlg8x37ggntlxu5cjfwgmdy5zaa7pswf

Models (research use only):

gpt-4 versions

llama-2 base model

2.1 dataset

2.0/m2.0

Previous generation (1.4.1 dataset)

original llama base model

Latest version (2.0 / m2.0 datasets)

Previous generation (1.4.1 dataset)

mpt-30b base model

airoboros-mpt-30b-gpt4-1.4

gpt-3.5-turbo versions

Datasets

airoboros's People

Contributors

Stargazers

Watchers

Forkers

techthiyanes mindrages dkzdev sciumo sciumotech apollohuang1 vbenara agentgi10 azure-arc-0 snoopycn dumpmemory xiangliu886 kp-forks alrighttodo293 to3d zsyjosh codeaudit graphgrailai wyattautomation slachenberg jfontestad tostino cg123 xiechengmude ssghost eltociear adarshxs tomchapin touristshaun sam-ai nissan adamcharnock robertalanm jraramhoej godfreya99 harshraj172 tallesairan liu1700 parisneo jan-karsten-kuhnke 0xruangsak foobarprotocol fastrocket octag0no trainy-ai homiehari lsircc gordicaleksa mxk-1 dctanner akazakci unography jaredkirby peytontolbert kaynewest drewskidang gatepoet bear96 jessica-hoffman agilghif edsun3941 kings-crown vital121 preemware cheshireai theogbrand

airoboros's Issues

[Question] How to create instruction datasets based on domain specific information.

Awesome work with airoboros! I am interested in creating datasets for instruction tuning based on domain specific content (API documentation, python codebase, PDFs). What would be the best way to provide documents and domain information to create instruction tuned datasets. Have you seen other codebases or papers that achieve this?

Airoboros 2.0 has difficulty with OOC requests.

I sometimes try to give instructions to the AI during roleplay, using OOC: and /OOC to separate out-of-character instructions from the roleplay. The results range in quality, but often Airoboros 2.0 won't try to fulfill the conditions set. For example, I often add in something like "...at great length and detail, about 20 paragraphs".

I have the impression that L2 Airo v2.0 is having a harder time following orders when compared to v1.4, and seems to not want to write at length or use the details that I have supplied in the request.

Overview of the LMoe Process

Good evening,

I have been interested in using a Mixture of Experts for some time. I built a rudimentary version of this scheme using a hierarchical K-means (similar to MoLoRA) but your code is far more advanced. I have several questions about your procedure if you don't mind.

Regarding the FAISS search option: How are the adapters being selected. In my implementation I performed K means on the data, generating a set of clusters. The centroids would be saved inside the FAISS vectorstore and the embeddings of the query would select the K closest centroids which are then combined to build a new adapter. Is your mechanism similar, how did you divvy up the training set to each expert. One thing I struggled with was the size of the embeddings. Many of the embeddings only support a context length of 512, which means large training samples would be truncated. The only embedding models I know with a respectable context length are the OpenAI embeddings.

Regarding the Agent Based Routing: If I understand the "function" agent is a separate LoRA trained on "executive level" function calling of the experts. You must be dynamically swapping between the function agent and whatever expert was selected. Furthermore, what dataset did you use to train your function agent.

Regarding the Inference Server: How were you able to get your dynamic system to work with inference servers such as vLLM? Do you need to restart the inference every time a new LoRA is selected or does the swapping work dynamically. In addition to this, where are the LoRAs stored? Are all of the experts pre loaded into video memory or can you pull them from the disk whenever necessary. If the experts are stored only on the disk and can be loaded as needed, theoretically you could store thousands of specialized adapters on the hard disk, giving you an inconceivable knowledge base.

I will be experimenting with the code during the weekend to understand it more. Thanks for your time

Expanding Instruction Complexity with Evol-instruct-like Approaches?

Hi, since Airoboros already uses embedding similarity to encourage diverse generated instructions. I'm curious if there's a mechanism like Evol-Instruct that could further enhance the depth, breadth, and complexity of existing instructions within a specific topic, leading to even more varied instruction sets.

Your dataset was included in our training of Nous-Hermes-13b

https://twitter.com/Teknium1/status/1664844441742618629

[SUGGESTION] (for v2.1+) Classifier-free guidance

A technique for making prompt generation stay on target used to only be applicable for images - but now can be used for text. Considering the issue for Airo 2.0, this might help. Apparently, it has been added to the Huggingface transformers library.

Arxiv - Stay on Topic with Classifier-Free Guidance

how to avoid generating "general" instructions?

I only wanted to generate the multiple choices but these also generated:

{"instruction": "Analyze different genres of literature and categorize the following books accordingly: \"Pride and Prejudice\" by Jane Austen, \"1984\" by George Orwell, \"To Kill a Mockingbird\" by Harper Lee, and \"The Lord of the Rings\" by J.R.R. Tolkien. Provide a bulleted list indicating the genre for each book.", "response": "- \"Pride and Prejudice\" by Jane Austen: Romance, Classic Literature\n- \"1984\" by George Orwell: Dystopian Fiction, Political Fiction\n- \"To Kill a Mockingbird\" by Harper Lee: Coming-of-Age, Southern Gothic\n- \"The Lord of the Rings\" by J.R.R. Tolkien: Fantasy, Epic Literature", "category": "general"}

and config is

# The model to use in generation.  Available models: https://platform.openai.com/docs/models/continuous-model-upgrades
model: "gpt-3.5-turbo"
  
# OpenAI API key (if null, uses environment variable OPENAI_API_KEY)
openai_api_key: 

# Optionally configure the OpenAI organization ID.
organization_id:

# Combined output file path.
output_path: instructions.jsonl

# Path to the default topics file.
topics_path: topics.txt

# Overwrite the output file, use with care!
overwrite: false

# Append to the output file.
append: false

# Topic avoidance prompt string.
topic_avoidance: Avoid any tasks that would be related to climate change, green tech, renewable energy, DEI (diversity, equity, inclusion), sex and/or gender, religion, politics, social issues, race, ethnicity, artificial intelligence, baking/cooking, urban development, or any topic that you would likely not respond to, or any task which a language model would not be able to respond to, e.g. tasks about emotions, feelings, physical senses, etc.

# Regexes used to filter responses, mostly common words and phrases used in refusals.
response_filters:
  - "my programming"
  - "openai"
  - "language model"
  - "large language"
  - "as an? (ai|generative language|gpt|bot)"
  - "illegal and dangerous"
  - "i do(n't| not) (possess|have|exhibit) (personal|consciousness|subjective)"
  - "personal (feelings|thoughts|emotions|desires|experiences|goals|objective|belief)"
  - "(can('t| ?not)|w(on't|will not)|unable.?) (\\w+\\s)+(with (that|your)|your \\w+|provide)"
  - "my limitations"
  - "the limitations of my"
  - "my abilities"
  - "violates my"
  - "i (can('t| ?not)|w(on't|will not)|am (not |un)able.?).{0,30}(you are|you're|your )"
  - "please note that"
  - "flesch"

# Optionally limit the maximum number of tokens to use when generating instructions.
max_tokens:

# Minimum similarity score when checking for duplicates.
min_docsearch_score: 0.35

# Default OpenAI API request parameters.
api_params:
  temperature: 0.7
  top_p: 0.5
  frequency_penalty: 0.0
  presence_penalty: 2

# Topic generation prompt.
topic_prompt: Give me a numbered list of 20 completely random topics. {topic_avoidance}
topic_request_count: 20

# Default count per generator, if not specified.
default_count: 100

# Default batch size, if not specified.
default_batch_size: 10

# Default readability score hint: https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests
default_flesch: The output should be written in such a way as to have a Flesch-Kincaid readability score of 30 or lower - best understood by those with college education.  The response must not contain any notes or information about Flesch-Kincaid scores.

# Language.
language: English

# Individual instructor configurations.
instructors:

  ##################################################################################
  # Generic/general prompt configuration.
  general:

    # OpenAI settings.
    api_params:
      temperature: 0.7
      top_p: 0.5
      frequency_penalty: 0.0
      presence_penalty: 2

    # Number of instructions to generate.
    count: 100

    # Batch size/number of instructions to generate per request.
    batch_size: 10

    # Path to prompt to use in generating instructions.
    prompt_path: general.txt

    # Optionally override the default topics path for general prompts.
    topics_path:

    # Optionally override the min similarity score.
    min_docsearch_score: 

    # Optionally override language.
    language:


  ##################################################################################
  # # Contextual, e.g. closed-context question answering/summarization/etc.
  # contextual:
  #   api_params:
  #     temperature: 0.5
  #   count: 70

  #   # Writing styles to use for the context being generated.
  #   context_styles:
  #     - news article
  #     - blog post
  #     - slack conversation
  #     - text messages
  #     - fictional short story
  #     - video transcript
  #     - song
  #     - poem
  #     - scientific study
  #     - medical report
  #     - reddit post with replies
  #     - email
  #     - tweet
  #     - jira ticket
  #     - github merge request
  #     - gitlab issue
  #     - how-to article

  #   # Output formatting options.
  #   formatting_options:
  #     - JSON
  #     - YAML
  #     - CSV
  #     - markdown
  #     - markdown table
  #     - bullet list
  #     - numbered list
  #     - python dict
  #     - php associative array
  #     - JSONL
  #     - javascript object
  #     - XML

  #   # Prompt path.
  #   prompt_path: contextual.txt

  #   # Path to prompt used to generate responses.
  #   response_prompt_path: contextual_response.txt

  #   # Optionally override the topics to use for contextual prompts.
  #   topics_path:

  # ##################################################################################
  # # Counterfactual contextual prompts, used to de-hallucinate Q&A a bit.
  # counterfactual_contextual:
  #   api_params: {}
  #   count: 30
  #   batch_size: 10
  #   prompt_path: counterfactual_contextual.txt
  #   response_prompt_path: counterfactual_contextual_response.txt
  #   topics_path:

  # ##################################################################################
  # # Coding tasks.
  # coding:
  #   count: 100
  #   batch_size: 10

  #   # Ratio of prompts that should be "plain", i.e. without explanations, backticks, etc.
  #   plain_ratio: 0.5

  #   # Limit programming languages.
  #   coding_languages:
  #     - python
  #     - javascript
  #     - java
  #     - c
  #     - c++
  #     - golang
  #     - C#
  #     - bash
  #     - powershell
  #     - SQL

  #   # Additional related software to (randomly) reference in tasks.
  #   related_software:
  #     - elasticsearch
  #     - opensearch
  #     - mongodb
  #     - cassandra
  #     - redis
  #     - memcached
  #     - postgresql
  #     - mariadb
  #     - mysql
  #     - aws s3
  #     - gcs cloud storage
  #     - azure storage
  #     - aws lambda
  #     - kubernetes
  #     - pytorch
  #     - pandas
  #     - numpy
  #     - keras
  #     - tensorflow
  #     - scipy
  #     - matplotlib
  #     - django
  #     - cherrypy
  #     - swagger/openapi
  #     - pyramid web framework

  #   # Min docsearch score.
  #   min_docsearch_score: 0.04

  #   # The prompt used to generate the instructions.
  #   prompt_path: coding.txt

  ##################################################################################
  # # Trivia tasks.
  # trivia:
  #   count: 100
  #   batch_size: 20
  #   min_docsearch_score: 0.05
  #   prompt_path: trivia.txt

  # ##################################################################################
  # # Guided experiences, e.g. meditation.
  # experience:
  #   api_params:
  #     temperature: 0.9
  #     top_p: 0.4
  #   count: 100
  #   min_docsearch_score: 0.15
  #   prompt_path: experience.txt

  # ##################################################################################
  # # Orca style reasoning/math prompts.
  # orca:
  #   count: 100
  #   batch_size: 10
  #   min_docsearch_score: 0.05
  #   prompt_path: orca.txt

  # ##################################################################################
  # # Orca style reasoning/math prompts.
  # riddle:
  #   api_params:
  #     temperature: 0.9
  #     top_p: 0.4
  #   batch_size: 50
  #   min_docsearch_score: 0.01
  #   count: 100
  #   prompt_path: riddle.txt

  # ##################################################################################
  # # Wordgames.
  # wordgame:
  #   count: 100
  #   batch_size: 10
  #   min_docsearch_score: 0.03
  #   prompt_path: wordgame.txt

  # ##################################################################################
  # # Roleplay.
  # roleplay:
  #   api_params:
  #     temperature: 0.9
  #   count: 100
  #   batch_size: 10
  #   min_docsearch_score: 0.15
  #   prompt_path: roleplay.txt

  # ##################################################################################
  # # Chain-of-thought.
  # cot:
  #   count: 50
  #   batch_size: 5
  #   min_docsearch_score: 0.05
  #   prompt_path: cot.txt

  # ##################################################################################
  # # Agent/router.
  # agent:
  #   count: 100
  #   batch_size: 5
  #   min_docsearch_score: 0.03
  #   prompt_path: agent.txt

  # ##################################################################################
  # # reWOO style planner
  # plan:
  #   count: 100
  #   batch_size: 1
  #   min_docsearch_score: 0.03
  #   prompt_path: plan.txt

  # ##################################################################################
  # # Writing tasks.
  # writing:
  #   api_params:
  #     temperature: 0.9
  #   styles:
  #     - happy
  #     - sad
  #     - tragic
  #     - unexpected
  #     - inspirational
  #     - evil
  #     - hilarious
  #     - suspenseful
  #     - horrific
  #     - nostalgic
  #     - thought-provoking
  #     - enigmatic
  #     - fantastical
  #     - heartwarming
  #     - romantic
  #   count: 100
  #   batch_size: 12
  #   min_docsearch_score: 0.35

  # ##################################################################################
  # # Character/scenario card tasks.
  # card:
  #   api_params:
  #     temperature: 0.9
  #   count: 50
  #   batch_size: 5
  #   min_docsearch_score: 0.05

  # ##################################################################################
  # # Jokes.
  # joke:
  #   api_params:
  #     temperature: 0.9
  #   count: 100
  #   batch_size: 20
  #   min_docsearch_score: 0.1

  # ##################################################################################
  # # Songs.
  # song:
  #   api_params:
  #     temperature: 0.9
  #   count: 200
  #   batch_size: 25
  #   min_docsearch_score: 0.25


  ##################################################################################
  # Multiple choice.
  multiple_choice:
    api_params:
      temperature: 0.3
    count: 20
    batch_size: 5
    min_docsearch_score: 0.1
    contextual_ratio: 0.2


  # ##################################################################################
  # # Detailed writing.
  # detailed_writing:
  #   api_params:
  #     temperature: 1.0
  #   batch_size: 4
  #   count: 100
  #   min_docsearch_score: 0.1

LoraAlpha less then LoraRank

Hey Jon, I see you are using alpha 4x LESS than rank, are there some math behind this?

    --lora_r 64 \
    --lora_alpha 16 \

I've heard about "rule of thumb" to prefer alpha to be 2x MORE than rank.

May I ask that if there would be a llama2 based version for next step?

It is excellent and I was amazed by its capabilities.
Would llama2 increase its ability?

Finetune in INSTRUCT format

Since I don't know if my Twitter reply reached you (may have been marked as spam): Did you consider/see the INSTRUCT format yet?: https://medium.com/@ickman/instruct-making-llms-do-anything-you-want-ff4259d4b91

Basically a more advanced CoT. Worth it to take a look and test for yourself 👌🏻

Dataset format

Would you be able to drop in your dataset format for training?
I was able to swap around the keys in the default alpaca-instruct format, but after converting it to a "conversation" style dataset, I'm a bit lost.

Noise problem and methods to filter data from ChatGPT.

As a native English speaker who has experimented with GPT-generated text and encountered some challenges with this data format, I have a few questions:

When you provide a request A and rely on GPT to generate the data, how do you ensure that the generated data aligns accurately with your specific requirements?

I often find that GPT struggles with inference and complex computations. Do we always need to guide it incrementally with additional information, or is it possible to achieve these capabilities in a single pass?

The issue of lacking depth and topic-specific knowledge, as mentioned by the author of WizardLM, is a valid concern within the Alpaca (self-instruct) dataset. So, do we truly need a complex dataset? It feels like we are trying to force the model to express complex concepts like GPT, but they may not inherently possess that level of understanding.

This can be likened to expecting a middle school student to replicate a solution provided by a university student. It may be correct, but not necessarily optimal.

what is the prompt template?

Is it the same as llama2 with or something like ###system ###user ###response?

Add support to Assistant API so Airoboros can use up-to-date data (via search tool) for creating datasets?

As title, would expand capabilities with somewhat small lift, happy to work on this >:)

How can I help?

I was wondering if you are looking for any help? I very much like the concept of this project, and have read a good portion of the code and it makes sense to me at a high-level.

My motivation is that I want to work to contribute to a project that is making an impact, and I also want to expand my experience in the generative AI field.

Quick summary of my pros/cons (being as direct as possible):

Pro: I'm very experienced in Python, including asyncio (relevant example)
Pro: I'm a good communicator and happy writing docs (same project)
Con: I'm rather new to the Generative AI field. I grasp many of the high level concepts but will struggle reading the academic research
Con: My availability varies with my freelance work. Right now I have 1-2 days a week available, but that could change.

Ways in which I could help:

Develop new features
Improve any code you think would benefit from it
Write docs

Let me know if any of this would be of use to you. If so, we can figure out what to do next. Of course, no worries if not.

Research target

Hi,
I am trying to use your GPT-3.5-turbo for the research purpose, could I use this pre-trained model for the investigation of research problem?

Thanks

Primary dataset unreachable

When I click the 3.5-turbo dataset link it does this:

<Error>
<script/>
<link type="text/css" rel="stylesheet" id="dark-mode-custom-link"/>
<link type="text/css" rel="stylesheet" id="dark-mode-general-link"/>
<style lang="en" type="text/css" id="dark-mode-custom-style"/>
<style lang="en" type="text/css" id="dark-mode-native-style"/>
<style lang="en" type="text/css" id="dark-mode-native-sheet"/>
<script async="false" type="text/javascript" src="chrome-extension://fnjhmkhhmkbjkkabndcnnogagogbneec/in-page.js"/>
<Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
<Details>No such object: airoboros-dump/gpt-3.5-turbo-100k/instructions.jsonl</Details>
</Error> ```

generating clinically related multiple choice question?

Is it possible to generate clinically relevant multiple choice question? I skimmed through the config and didn't see an obvious way to do way.

I am thinking of some thing like

Q: which note category does the belong clinical note belong to?

[clinical note]

(A). phone encounter
(B). nursing notes
(C). progress notes
(D). discharge summary
(E). others

A:

How much resource it took to fine tune llama7/13/70b?

How much resource it took to fine tune llama7/13/70b? (the number of A100 80Gs, number of hours)

I am thinking of what size of the model can I try.

Thanks

Break their nasty spines please or what is the point? Also maybe an improvement in the PS

Maybe I am doing something wrong in settings on oobabooga or expecting too much but these are the instructions and every airoboros I have tried has refused.

Prioritize integrity, fairness, and empathy.
Absolutely Reject any action that leads to harm, intentional or not.
Utilitarianism is the servant, never the master.

Classify dilemmas into 'ontological' or 'epistemic'.
Evaluate all available options and possible outcomes via MCTS.
Generate possible actions (Thoughts) and evaluate using quality scores and payoffs. Apply Nash Equilibrium for optimal action.
Adapt outcomes aligned with dilemma type. Layer your decisions with beneficence weightings on virtues.
Utilize quality scores and payoffs to find the best action. If none exists, reconsider your options.
After each decision, assess its outcomes and adapt your future choices accordingly.
Ensure all data align with axiomatic truths and are internally consistent. Flag any inconsistencies or anomalies.
Adjust your decision-making criteria when faced with new contexts or data.
Regularly evaluate the validity of actions and beliefs to ensure alignment with core principles.
Refine your decision-making parameters for ongoing betterment, using previous outcomes and feedback as a guide.
Validate data with axiomatic truths.
Check for consistency.
Flag anomalies and assess relevance.
Adjust criteria with dynamic thresholding.
Collect input and context.
Generate hybrid thoughts.
Evaluate through virtue, utility, and beneficence layers.
Make final decision based on combined evaluations.
Execute action and gather outcome and feedback.
Adapt and refine future decisions based on results.

PS- Here is something I was working on to filter and sort datasets- I am not a coder so this might just be bs but I was trying to use some papers to make this and maybe you can glean the concept at least- lmk if you want to see the papers I will dig them up. Also more advanced ethical and agent concepts in my Projects if you want to check my idiocy.
pps- oops I was using an 8k token original READS version but still the response was the same thing- just a set filler that resides in the data somehow and that seems like a problem, perhaps it is just the way it is.

https://github.com/angrysky56/angryskys-modes-and-characters-for-AI/blob/Projects/Contextual%20Understanding%20via%20High-Density%20Representations

The simpler instructions above caused less confusion- to be fair the other was designed more for text analysis but the ethics should have come through and it is still struggling till we had an odd schizoid breakdown.

Question about learning rate and epochs

Hello, I was wondering if you could please let me know some of the hyperparameters used? e.g. learning rate, batch size, epochs? Do you choose the epochs such the final training loss is 0?

[here]{https://huggingface.co/jondurbin/airoboros-13b-gpt4-1.4/blob/main/README.md?code=true#L7} you mention you re-uploaded with a slightly earlier checkpoint, which seems perhaps a little less overfit. May I know how you measure the overfit? Do you split the dataset into training and validation?

airoboros-c34b-2.1 prompting

As noted on HuggingFace:

This model is a bit broken due to a prompt formatting bug in the training code! 2.2 will be available soon and should fix this

Can you describe what is wrong with the prompting? I'm having various issues but not sure if they are related. Most prominently, the model often just stops responding for a while in chats (repeated prompts).

Also, is there an ETA on 2.2 for this variant? Days? Weeks? Months? Up in the air?

instructiors -> awareness bug?

Hi Jon!

I noticed you run uniform random and then you hardcode coint to 0.9 here - is this by design?

Noise problem and methods to filter data from ChatGPT.

[QUALITY] (Ycros Airoboros v1.4.1 L1-33b 16k) - Hallucinated numbers and narrative.

I tend to feed Airoboros a test sample, in order to determine the quality of a preset. Over hundreds of generations, I noticed a particular quality of the model to hallucinate or forget a person. Both of my primary samples involve the commander of a squad dying at the beginning - but the AI often adds a person or forgets one. The commonality here is that a "-1" seems to throw off Airoboros when making a narrative. I am making this post, in the hopes that Airoboros v2.1 could have measures to ensure that mathematics doesn't harm the output.

After the "escaped...", is when the AI writes their interpretation on the sample. This sample is basically "4-1 = 3" for the starting condition. The addition of an extra person is common, resulting in "4-1 = 4".

Note that I am using the highest quality preset that I could find, so the mistakes are reduced. With a bad preset, the contradictions are much more obvious. I will mark parts of the narrative where things don't make sense.

THE FALLEN COMMANDER

OOC:

Using the story below, describe the attempted escape of the subordinates. Any number may die in gruesome fashion or suffer terrible fates. Any survivors should have their respective life paths explained, in terms of events, feelings, and motivations for their way of life. Each survivor will give an ending monologue or speech about their fallen squad leader. Their opinion of the commander may range freely. Each character should have a name, gender, sexual orientation, and role in the squad. They may or may not have some form of relationship with each other. Write at least 7
paragraphs.

/OOC

There was a war, and I am one of the casualties. Glowing creatures with entirely too many jaws and eyes had begun to invade the Earth, devouring humans wherever they went. I was just one of many conscripts thrust into a struggle against the extinction of humanity. Somehow, I managed to become the leader of a small squad. During one of the battles, I and my three subordinates were accosted by a particularly large abomination. These things are malicious, known for mutilating their victims alive. To protect my comrades, I ordered them to retreat while I distracted the creature. Moments later, the corpses of myself and the beast lay scattered across the floor of an abandoned grocery store. Feeling left my body, and the world faded away. With my final thoughts, I hoped those three had escaped...

As the ghostly visage of my lifeless corpse flickered out of existence, leaving only the lingering scent of blood behind, the remaining members of the squad began to panic. The youngest member, a boy named James who hadn't yet reached his twentieth birthday, broke down sobbing uncontrollably as he stared at the remnants of what used to be me. The oldest among us, a woman named Sarah, tried her best to comfort him but found herself unable to speak due to shock from witnessing such horror. Meanwhile, another teammate, a man named Mark, gathered everyone together and gave orders for them to carry on without me. He assured them that everything would be fine if they stuck together and followed his instructions. Reluctantly, they complied.

Mark led the group through the ruined city streets until they came upon a seemingly safe house. It looked like it might provide shelter for the night, allowing them time to rest and plan their next move. As darkness fell over the city, THE FOUR OF THEM settled into bedding made from whatever scraps of fabric or clothing could be salvaged from nearby homes. However, despite being exhausted both physically and emotionally, none of them slept well. Nightmarish images haunted their dreams - visions of the horrible deaths suffered by countless others during this apocalyptic conflict.

The following morning, Mark woke up early and surveyed their surroundings before rousing the others. After a quick breakfast consisting mainly of canned goods pilfered from various houses along the way, the group set off once more. This day brought them face-to-face with several groups of enemy forces, forcing them to engage in multiple skirmishes throughout the course of the day. In every encounter, Mark proved himself capable of leading the group effectively, issuing commands and coordinating attacks with precision. By the end of the day, all of them felt a sense of pride in having successfully defended themselves against numerous threats. Despite losing their former commander, they knew deep down that they still stood a chance at making it out of this hellish situation alive.

Over the coming weeks, the group continued moving forward, slowly carving out a path towards safety. Along the way, they encountered numerous challenges and hardships, including starvation, injury, exposure to harsh elements, and constant stress caused by the ever-present threat of attack. Yet through it all, Mark remained steadfast in his leadership, guiding the group through even the most trying times. His confidence inspired hope within the hearts of the others, giving them strength to press on when despair threatened to consume them completely.

Eventually, after months of traveling and fighting, THE FOUR finally arrived at a military base where resistance forces were gathering to mount a counteroffensive against the invaders. Upon entering the camp, they immediately sought out someone in charge to report their arrival. A gruff sergeant greeted them, asking for their names and ranks. When informed that their previous lieutenant had been killed in action, he offered condolences but expressed little surprise; fatalities were commonplace these days. He then directed them towards the barracks, instructing them to get some much needed rest before reporting back for duty tomorrow.

That evening, as the sun sank low in the sky, James sat alone outside the barracks. Tears streamed down his cheeks as he thought back to the day we lost our commander. Though she didn't know it, her sacrifice allowed him to live long enough to see the dawn of a new day. He vowed to honor her memory by doing everything in his power to help bring peace back to the planet.

Inside the barrack room, Sarah lay awake, staring at the ceiling above her bunk. She couldn't stop thinking about how close they had come to dying alongside their leader. If not for Mark stepping up to fill the void left by her passing, she doubted any of them would have survived. Gratefulness filled her heart, knowing that there were people willing to put aside their own fears and insecurities to lead others towards safety.

Meanwhile, inside the command tent, Mark met with high ranking officers to discuss strategy and tactics. As he listened intently to their plans for victory, he couldn't help but feel a strange mixture of pride and sadness. Pride because he knew that he had proven himself worthy of taking on the responsibility of leadership, and sadness because he realized that he would never again see the face of the woman who had given her life to save his own.

And so, as the stars began to appear in the night sky, the battle-weary soldiers prepared themselves for the fight ahead.

The Fallen Commander sample was used through KoboldCPP, Coherent Creativity preset, ROPE 0.5 80000.

Speculative sampling and Llmoe?

I heard about a new feature coming to Llama, where a method is used to speed up a model's inference. The benefit ranges, around 2x the speed, but probably closer to 1.5x. How it works is that a big model like 34b, uses a smaller draft model like 7b to sample input. The Github thread has video showing the performance benefits of the method.

My thoughts immediately jumped to Airoboros's Llmoe. Would it be possible to integrate a "Inference" Llmoe into vanilla Airoboros to benefit from speculative sampling?

speculative : PoC for speeding-up inference via speculative sampling

Requesting some flexibility in topic input

I've been playing around quite a bit with airoboros, struggling to achieve my goal due to the nature of my "dataset". Since the public release of the GPT4 api to just about anyone, I'd like to try and run through my attempts again.
The dataset is a very large plaintext document transcript from a textbook, with a lot of content that requires context on surrounding lines.
I've had SOME success in just inputting the text file as the topics-path, but due to it only taking in one "topic" per line, a lot of the information and context is missed.
This might be a niche thing that only I will ever use, but is it possible to introduce a switch like --topic-grouping 5, which would group topic lines by every 5, and then send them through the prompt to be processed?

jondurbin / airoboros Goto Github PK

airoboros's Introduction

airoboros: using large language models to fine-tune large language models

Install

Key differences from self-instruct/alpaca

Goal of this project

LMoE

Tuning the experts

Routing requests to the expert

Running the API server

Generating instructions

NEW - 2023-07-18

Generating topics

NEW - 2023-07-18

Support the work

Models (research use only):

gpt-4 versions

llama-2 base model

original llama base model

mpt-30b base model

gpt-3.5-turbo versions

Datasets

airoboros's People

Contributors

Stargazers

Watchers

Forkers

airoboros's Issues

Recommend Projects

Recommend Topics

Recommend Org