Giter Site home page Giter Site logo

falcon-llm's Introduction

Falcon-LLM

Helper scripts and examples for exploring the Falcon LLM models

Overview of the model and use-cases: https://www.youtube.com/watch?v=-IV1NTGy6Mg

Files:

  • api_server.py - Run locally or in cloud. Should fully set up a proper web server if you intend to host on a public IP, this is using the basic flask demo web server.
  • api_client.py - Make requests to the server. Makes R&D a lot easier if you can load and access the model separately, even if everything is on the same machine, so you're not re-loading the model every single time you make a change to your script. You can also use a notebook, but, depending on the complexity of your project, this might not be good enough.
  • Falcon-40B-demo.ipynb - a short notebook example of loading Falcon 40B with options for various datatypes (4, 8, and 16bit).
  • setup.sh - a quick shell script for setup of requirements that I used for Lambda H100 machines. (chmod +x setup.sh & ./setup.sh

falcon-llm's People

Contributors

sentdex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

falcon-llm's Issues

Best Practice for Handling Variable-Length Sequences in Training an LLM Model on a Chatbot Dataset

I am currently engaged in training Falcon (LLM) on a chatbot dataset, and I would appreciate some guidance on handling variable-length sequences within the dataset. The dataset consists of multiple examples of chat messages exchanged between user 1 and user 2, totaling around 500 such instances. Each example varies in the number of messages it contains, leading to differing sequence lengths.
Here are two representative data points from the dataset:

Datapoint 1 = """user 1 : How are you ?\n user 2 : I am good. \n user 1 : What do you like ? \n user 2 : Apples"""

Datapoint 2 = """user 1 : How are you ?\nuser 2 : I am good.\n user 1 : What do you like in fruits?\n user 2 : Oranges \nuser 1 : Great me too\n user 2 : But sometimes I like mangoes \nuser 1 : seems intresting \n user 2 : Yeah"""

To facilitate the training process, I tokenized the dataset, setting a maximum_length of input_ids to 4 tokens, and handled overflowed tokens by padding them accordingly.

Now, my question is: in cases where a chat message contains fewer than 4 tokens, what is considered a best practice? Should I pad these shorter sequences to match the maximum length, or would it be more suitable to keep them as they are?

I would appreciate any insights or suggestions on the most appropriate approach for handling variable-length sequences in this context.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.