Giter Site home page Giter Site logo

ftutils's Introduction

ftutils

Minimalist toolkit for managing conversational datasets & fine-tuning LLMs.

Installation

pip install git+https://github.com/louislva/ftutils.git

Why use ftutils?

Creating language datasets is time consuming! Organizing them is hard, especially if you try to build bespoke tooling for it.

So why not just use VSCode? With VSCode, you get to borrow:

  • File system: the file system is a really easy way to organize things into hierarchies
  • Text editor: edit files, have multiple tabs open, etc.
  • Version control: keep track of your dataset over time & don't lose your changes; make branches if you want to test different ways of labeling
  • Github Copilot: speed up your writing - it even works well for natural language files!

Ftutils creates utilities for converting datasets to and from an easy-to-manipulate .txt format.

Documentation

Reading and writing Conversations

The core of ftutils is the Conversation class. It's basically a list of messages, but you can save it to a file, or send it to the openai API.

Here's how to make one with code:

from ftutils.conversation import Conversation, Message, Dataset # the core of the library
from ftutils.utils import str_to_filename, random_hex # utils for naming files
from ftutils.openai import estimate_tokens, openai_start_finetune,openai_create_dataset # helpers related to fine-tuning with the openai api

conv = Conversation.from_json([
    {"role": "system", "content": "You are ChatGPT, a helpful assistant."},
    {"role": "user", "content": "Hello, how are you?"},
    {"role": "assistant", "content": "I'm doing well, thank you! How can I assist you today?"}
])

If you then want to save it, you can:

conv.to_file("cool-convo.txt")

Then, you can read the file and see: cool-convo.txt



system: You are ChatGPT, a helpful assistant.

user: Hello, how are you?

assistant: I'm doing well, thank you! How can I assist you today?

Let's say we're building a dataset for fine-tuning ChatGPT. We might want to make some changes to our conversation: cool-convo.txt



system: You are ChatGPT, a helpful assistant. You speak in uppercase.

user: Hello, how are you?

assistant: I'M DOING WELL, THANK YOU! HOW CAN I ASSIST YOU TODAY?

And when we load it, in another Python session, we can see:

conv = Conversation.from_file("cool-convo.txt")
print(conv.to_json())
# [
#     {'role': 'system', 'content': 'You are ChatGPT, a helpful assistant. You speak in uppercase.'},
#     {'role': 'user', 'content': 'Hello, how are you?'},
#     {'role': 'assistant', 'content': "I'M DOING WELL, THANK YOU! HOW CAN I ASSIST YOU TODAY?"}
# ]

print(conv.to_text())
#
#
# system: You are ChatGPT, a helpful assistant. You speak in uppercase.
#
# user: Hello, how are you?
#
# assistant: I'M DOING WELL, THANK YOU! HOW CAN I ASSIST YOU TODAY?

Building a dataset for fine-tuning

Okay, let's say we've made a folder (datasets/case-sensitive-assistant) full of .txt files in the ftutils format. We now want to bundle them into a .jsonl dataset, which we can then send to OpenAI via the web interface.

We can do this with the Dataset class:

dataset = Dataset.from_dir("datasets/case-sensitive-assistant")

And then we can save it to a .jsonl file:

dataset.to_file("case-sensitive-assistant.jsonl")

Now you can just drag and drop this file into the web interface!

Using the OpenAI API with Conversation

The Conversation class allows you to use .txt format, or the json/object/OpenAI format. The OpenAI format is equivalent to what you use in the Python API.

This means you can generate a new message with OpenAI and add it to the conversation, for example:

from openai import OpenAI

client = OpenAI()
completion = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=conv.to_json()
)
conv.messages.append(completion.choices[0].message)
print(conv.to_text())
#
#
# system: You are ChatGPT, a helpful assistant. You speak in uppercase.
#
# user: Hello, how are you?
#
# assistant: I'M DOING WELL, THANK YOU! HOW CAN I ASSIST YOU TODAY?
# 
# assistant: WHAT CAN I DO FOR YOU?

Notice the extra message from the assistant in the end, generated by GPT-3.5.

We could save it again, if we like it:

conv.to_file("cool-convo-with-extra-message.txt")

ftutils's People

Contributors

louislva avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.