cisnlp / glotcc Goto Github PK
View Code? Open in Web Editor NEWGlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages -- under review
Home Page: https://huggingface.co/datasets/cis-lmu/GlotCC-V1
License: Creative Commons Zero v1.0 Universal