- dbpedia:
We use a subset of DBPedia validation dataset, that is curated by randomly sampling 200 samples. This is same as the sample file used in the embedding blog by OpenAI.
- kaggle-chatbot-3k
This is the dataset taken from Kaggle. The original csv file has been converted
to a text file with questions and answers prepended with q:
and q:
respectively.
- tiny-shakespeare
This is a dataset used by Andrej Karpathy for many of his articles and video series. The file is taken from the following website.
Copyright 2023 Weavers @ Eternal Loom. All rights reserved.