Giter Site home page Giter Site logo

banglagpt's Introduction

BanglaGPT: A Generative Pretrained Transformer-Based Model for Bangla Language

We have created a Bangla language model dataset called BanglaCLM using a 26.24 GB Bangla corpus scraped from several public websites. Trained model: https://huggingface.co/shahidul034/Bangla_text_generation

https://huggingface.co/shahidul034/BanglaGPT_512

https://huggingface.co/shahidul034/BanglaGPT

https://huggingface.co/shahidul034/Bangla_text_generation

Raw dataset link for text generation(Bangla)

Chunk 1: https://huggingface.co/datasets/shahidul034/text_generation_model_data

Chunk 2: https://huggingface.co/datasets/shahidul034/text_generation_model_data2

Chunk 3: https://huggingface.co/datasets/shahidul034/text_generation_model_data3

Chunk 4: https://huggingface.co/datasets/shahidul034/text_generation_model_data4

Chunk 5: https://huggingface.co/datasets/shahidul034/text_generation_model_data5

Chunk 6: https://huggingface.co/datasets/shahidul034/text_generation_model_data6

Chunk 7: https://huggingface.co/datasets/shahidul034/text_generation_model_data7

Chunk 8: https://huggingface.co/datasets/shahidul034/text_generation_model_data8

Chunk 9: https://huggingface.co/datasets/shahidul034/text_generation_model_data9

Chunk 10: https://huggingface.co/datasets/shahidul034/text_generation_model_data10

Chunk 11: https://huggingface.co/datasets/shahidul034/text_generation_model_data11

Chunk 12: https://huggingface.co/datasets/shahidul034/text_generation_model_data12

Chunk 13: https://huggingface.co/datasets/shahidul034/text_generation_model_data13

Chunk 14: https://huggingface.co/datasets/shahidul034/text_generation_model_data14

Chunk 15: https://huggingface.co/datasets/shahidul034/text_generation_model_data15

Summarization dataset(Bangla)

chunk 1: https://huggingface.co/datasets/shahidul034/text_summarization_dataset1

chunk 2: https://huggingface.co/datasets/shahidul034/text_summarization_dataset2

chunk 3: https://huggingface.co/datasets/shahidul034/text_summarization_dataset3

chunk 4: https://huggingface.co/datasets/shahidul034/text_summarization_dataset4

chunk 5: https://huggingface.co/datasets/shahidul034/text_summarization_dataset5

chunk 6: https://huggingface.co/datasets/shahidul034/text_summarization_dataset6

chunk 7: https://huggingface.co/datasets/shahidul034/text_summarization_dataset7

chunk 8: https://huggingface.co/datasets/shahidul034/text_summarization_dataset8

chunk 9: https://huggingface.co/datasets/shahidul034/text_summarization_dataset9

Citation

If you use any resources included in this repository for your work, please kindly cite the following paper:

M. S. Salim, H. Murad, D. Das and F. Ahmed,
"BanglaGPT: A Generative Pretrained Transformer-Based Model for Bangla Language,"
2023 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD), Dhaka, Bangladesh, 2023, pp. 56-59, doi: 10.1109/ICICT4SD59951.2023.10303383. 
keywords: {Transformers;Tokenization;Encoding;Information and communication technology;Task analysis;Sustainable development;Bangla NLP;BanglaGPT;Bangla Text Generation Model},

banglagpt's People

Contributors

shahidul034 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.