Giter Site home page Giter Site logo

slms's Introduction

Small Language Models

This repo contains the code to train a simple language model on the TinyStories dataset.

The purpose of the repo is for me to experiment with creating small language models (SLMs) for very specific tasks; in this case writing short stories. The idea comes from a combination of the LIMA and TinyStories research papers. They describe how to improve performance of language models based on small, but high quality, curated datasets.

Hosting

This model is not officially hosted. You can however, find download and use the model through HuggingFace AutoModels. Here is the link.

Training

All code needed to train and run the model is provided. Including code to train a custom tokenizer.

Using the hyperameters specified in hyperparameters.json I was able to train a 1.456M parameter model that achieves a loss of 0.493 on the validation dataset. If you are up to it please feel free to attempt to beat this number (I do not suspect it will be very hard).

Training was done a very shitty Nvdia 1650 Ti GPU with 4GB of vRAM. Training took about 40 minutes to complete.

Sample

Here is a short sample generated by the trained model:

Once upon a time, in a big tree, there lived a little bird. Fred was very rough, and he always enjoyed looking for new nest. One day, Brown saw a little bird flying by. He thought it would be a friend and it was a h passion, too far. It was so long that it was a big with long beak. The little bird thought of sound could see it square apples, so it started to feel smoke on a branch. He hopped and flew, feeling so safe that the wind was getting ready to go home. Suddenly, she felt embarrassed. Just then, a big bird realized there was s happening next to its nest. She decided to care of the nest and not take it away from the bird down and seemed to hurt him. Some nest was also nest, growing tall by the nest and the p ripped its wings around them. The little bird was defied. They had to catch the butterfly. The bird learned a r own miner and the lesson: it's important to ask for help ring. Browned at ue, t we are because you can make it feel better." The little bird on the ground shook its head and tried to catch the leak . Now, I have an eow!

No it's not completely coherent, but we are getting close!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.