We have created a Bangla language model dataset called BanglaCLM using a 26.24 GB Bangla corpus scraped from several public websites. Trained model: https://huggingface.co/shahidul034/Bangla_text_generation
https://huggingface.co/shahidul034/BanglaGPT_512
https://huggingface.co/shahidul034/BanglaGPT
https://huggingface.co/shahidul034/Bangla_text_generation
Chunk 1: https://huggingface.co/datasets/shahidul034/text_generation_model_data
Chunk 2: https://huggingface.co/datasets/shahidul034/text_generation_model_data2
Chunk 3: https://huggingface.co/datasets/shahidul034/text_generation_model_data3
Chunk 4: https://huggingface.co/datasets/shahidul034/text_generation_model_data4
Chunk 5: https://huggingface.co/datasets/shahidul034/text_generation_model_data5
Chunk 6: https://huggingface.co/datasets/shahidul034/text_generation_model_data6
Chunk 7: https://huggingface.co/datasets/shahidul034/text_generation_model_data7
Chunk 8: https://huggingface.co/datasets/shahidul034/text_generation_model_data8
Chunk 9: https://huggingface.co/datasets/shahidul034/text_generation_model_data9
Chunk 10: https://huggingface.co/datasets/shahidul034/text_generation_model_data10
Chunk 11: https://huggingface.co/datasets/shahidul034/text_generation_model_data11
Chunk 12: https://huggingface.co/datasets/shahidul034/text_generation_model_data12
Chunk 13: https://huggingface.co/datasets/shahidul034/text_generation_model_data13
Chunk 14: https://huggingface.co/datasets/shahidul034/text_generation_model_data14
Chunk 15: https://huggingface.co/datasets/shahidul034/text_generation_model_data15
chunk 1: https://huggingface.co/datasets/shahidul034/text_summarization_dataset1
chunk 2: https://huggingface.co/datasets/shahidul034/text_summarization_dataset2
chunk 3: https://huggingface.co/datasets/shahidul034/text_summarization_dataset3
chunk 4: https://huggingface.co/datasets/shahidul034/text_summarization_dataset4
chunk 5: https://huggingface.co/datasets/shahidul034/text_summarization_dataset5
chunk 6: https://huggingface.co/datasets/shahidul034/text_summarization_dataset6
chunk 7: https://huggingface.co/datasets/shahidul034/text_summarization_dataset7
chunk 8: https://huggingface.co/datasets/shahidul034/text_summarization_dataset8
chunk 9: https://huggingface.co/datasets/shahidul034/text_summarization_dataset9
If you use any resources included in this repository for your work, please kindly cite the following paper:
M. S. Salim, H. Murad, D. Das and F. Ahmed,
"BanglaGPT: A Generative Pretrained Transformer-Based Model for Bangla Language,"
2023 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD), Dhaka, Bangladesh, 2023, pp. 56-59, doi: 10.1109/ICICT4SD59951.2023.10303383.
keywords: {Transformers;Tokenization;Encoding;Information and communication technology;Task analysis;Sustainable development;Bangla NLP;BanglaGPT;Bangla Text Generation Model},