Shahrukh Khan's Projects
Build and train state-of-the-art natural language processing models using BERT
The git-scm.com website. Note that this repository is only for the website; issues with git itself should go to https://git-scm.com/community.
handwritten text recognition on IAM handwriting dataset
:mag: End-to-end Python framework for building natural language search interfaces to data. Leverages Transformers and the State-of-the-Art of NLP. Supports DPR, Elasticsearch, Hugging Face’s Hub, and much more!
Html5 and css3 simple static page..
A hydra based Natural Language Processing (NLP) pipeline boilerplate with loggers embbeded, allowing for streamlining deep learning models and accomodating experimentation while also being able to write modular scalable code. The best feature is that the code is completely parametrized via config file, which minimizes code changes when data changes etc.
BERT + Image Captioning
Project repository for Interactive systems.
E crime file management web application in servlets
A PyTorch based comprehensive toolkit for weight-sharing in text classification setting.
This repo contains the Jupyter notebooks for all the Kaggle competition datasets that I have used for practicing different predictive modeling and ML techniques.
LingFeat - A Comprehensive Linguistic Features Extraction ToolKit for Readability Assessment
Scrape public jobs postings from LinkedIn in native python without selenium or any headless browser.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Exploring mixup strategies for text classification
This repository contains the code for deploying Pytorch models using the stream-lit framework as frontend and FastAPI as backend. It also uses docker containers to make the reproducibility and the deployment process seamless and easy.
This repo contains data science and machine learning moocs courses assignments I took on platforms like Coursera, Udemy, and Udacity.
A python library for extracting text from PDFs without losing the formatting of the PDF content.
A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.
Code repo for training and inferencing Multitask QA transformer for Extractive QA and Boolean QA.
Saarland University NNTI WS2021 NLP Final Project
Our code for training Word2Vec word embeddings for Hindi HASOC dataset then we use BiLSTM with self-attention in joint dual input learning setting where we train a single neural network on Hindi and Bengali dataset simultaneously using their respective embeddings and an LSTM in transfer learning setting.
Obsei is intended to be an automation tool for text analysis need.
E commerce site in php and Mysql
Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines