nai-chonky's Introduction

NAI-chonky (in development)

chunking algorithm for NAI modules
conversation that made me start this project: https://discord.com/channels/836774308772446268/837402685824565278/1066498372132950087
the basic idea is, create module training data in such a way that the module learns that each context starts with metadata
- simulating having metadata in Memory, in a NovelAI story.
- the hard part is to create chunks of exactly 256-token-multiples, to have precise control over how each chunk looks.

progress updates

from jan 27th, when thinking about the app:
jan 31st:
- uploaded main.py, which is the beginning of a refactor.
- "why on earth would you upload the beginning of a refactor?"
  - "idk."
- i had a solution with the simplifying assumptions of, "every story is bigger than the total context size, and you only put the part of the story inside the context that fits, and discard the rest".
  - then tried to expand on that solution by removing making the context size bigger than the biggest story (so that the program has to account for multiple stories in a context), but the code was too messy and very frustrating to work with, so i started a refactor.
- state of classes right now:
current problem:
- start of chunk drifts. so each metadata starts with a bracket, which is the start of each chunk. but this is not happening exactly when i tokenize and check the final training data file.

Recommend Projects