the basic idea is, create module training data in such a way that the module learns that each context starts with metadata
simulating having metadata in Memory, in a NovelAI story.
the hard part is to create chunks of exactly 256-token-multiples, to have precise control over how each chunk looks.
progress updates
from jan 27th, when thinking about the app:
jan 31st:
uploaded main.py, which is the beginning of a refactor.
"why on earth would you upload the beginning of a refactor?"
"idk."
i had a solution with the simplifying assumptions of, "every story is bigger than the total context size, and you only put the part of the story inside the context that fits, and discard the rest".
then tried to expand on that solution by removing making the context size bigger than the biggest story (so that the program has to account for multiple stories in a context), but the code was too messy and very frustrating to work with, so i started a refactor.
state of classes right now:
current problem:
start of chunk drifts. so each metadata starts with a bracket, which is the start of each chunk. but this is not happening exactly when i tokenize and check the final training data file.