chenjieen / redpajama-data Goto Github PK
View Code? Open in Web Editor NEWThis project forked from togethercomputer/redpajama-data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
License: Apache License 2.0