Unraveling mysteries hidden within datasets, a relentless data detective, transforming chaos into knowledge.

Introduction

👋 Hi, I’m @TatjanaChernenko
👀 I’m interested in Data Science, ML/DL, NLP and .
📫 How to reach me: [email protected]
📁 New Public Repository: This new public GitHub profile contains both old (starting from approx. 2015) and new my projects, uploaded now after years of working in a private capacity due to privacy policies of my employers.
📁 Project Uploads: All projects uploaded here are from my personal endeavors and university research. Due to privacy policies at SAP SE, where I am employed, I am unable to share work-related projects publicly. These repositories exclusively feature my private projects and are newly uploaded to this fresh GitHub profile. Thank you for your understanding.

My Projects

Research Repositories

CHERTOY: Word Sense Induction for Web Search Result Clustering
- GitHub: CHERTOY System
Data-to-text Generation
- GitHub: Data-to-text Generation
Text Summarization with LexRank
- GitHub: Text Summarization with LexRank

Predictive Maintenance (RUL, failure prediction, maintaince)

LSTM for predictive maintenance of aircraft machines

GitHub: LSTM for predictive maintenance of aircraft machines: failure and RUL (remaining usefull life) prediction

Anomaly Detection for Time Series with IBM API (SVR), K-Means clustering, statsmodels decomposition and Fourier analysis

GitHub: IBM API for anomaly detection, univariate data

Game AI

Reinforcement Learning Agent for Bomberman

GitHub: RL Agent for Bomberman

Speech Recognition

Speech-to-text with Transfer Learning

GitHub: Speech-to-text with Transfer Learning

Data Augmentation

Data Augmentation Techniques for Classification

GitHub: Data Augmentation Techniques

[My Playground (smaller projects / samples)](#playground):

EDA (Explorative Data Analysis)
Basic NLP Examples
Text Categorisation Task with ML
Dialogue Systems
Recommendation Systems
Sentiment Analysis
Voice technologies (speech-to-text, speech-to-speech, text-to-speech)
Various ML tasks
Apps with ChatGPT and OpenAI
Databases, SQL, noSQL, webscrapping, email notifications
NMT

Inspiration

Different

My Projects

Research Repositories

NLP / ML

2017/2018 CHERTOY: Word Sense Induction for better web search result clustering - An approach to improve word sense induction systems (WSI) for web search result clustering. Exploring the boundaries of vector space models for the WSI Task. CHERTOY system. Authors: Tatjana Chernenko, Utaemon Toyota.

Whitepaper - link

Key words: word sense induction, web search results clustering, ML, NLP, word2vec, sent2vec, NLP, data science, data processing.

2018 Data-to-text: Natural Language Generation from structured inputs - This project investigates the generation of descriptions of images focusing on spatial relationships between the objects and sufficient attributes for the objects. Leveraging an encoder-decoder architecture with LSTM cells (the Dong et al. (2017) is taken as basis), the system transforms normalized vector representations of attributes into fixed-length vectors. These vectors serve as initial states for a decoder generating target sentences from sequences in description sentences.

Whitepaper - link

Key words: natural language generation, encoder-decoder, ML, NLP, data science, feed-forward neural network, LSTMs.

2018 Text Summarization research: Optimizing LexRank system with ECNU features - enhancing the LexRank-based text summarization system by incorporating semantic similarity measures from the ECNU system. The LexRank-based text summarization system employs a stochastic graph-based method to compute the relative importance of textual units for extractive multi-document text summarization. This implementation initially utilizes cosine similarity between sentences as a key metric. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. The objective is to explore the impact of replacing cosine similarity with a combination of features from the ECNU system, known for its semantic similarity measure. This modification aims to improve the summarization effectiveness of the LexRank approach.

Whitepaper - link

Key words: natural language processing, text summarizaton, ML, NLP, data science, LexRank, ECNU, semantic similarity metrics, multi-document text summarization, cosine similarity, connectivity matrix, optimization.

2019, Reinforcement Learning agent for Bomberman game Training a RL agent for the multi-player game Bomberman using reinforcement learning, deep Q-learning with a dueling network architecture and separate decision and target networks, prioritized experience replay.

Whitepaper - link

Key words: reinforcement learning, q-learning.

2018, Speech-to-text: Transfer Learning for Automatic Speech Translation (playground) - Playground for the Automated Speech Translation (AST) with transfer learning vs. AST trained from scratch; hyperparameters tuning and evaluation.

Report - link

Key words: transfer learning, automated speech translation

2018, Data Augmentation techniques for binary- and multi-label classification - Exploring Data Augmentation techniques (Thesaurus and Backtranslation, a winning Kaggle technique) to expand existing datasets, evaluating on binary- and multi-label classification task (spam/not spam and news articles classification). Important when training data is limited, especially in Machine Learning (ML) or Deep Learning (DL) applications. The primary concept involves altering text while retaining its meaning to enhance the dataset's diversity.

Key words: data augmentation, data science, ML, DL, binary and multi-class classification

LSTM for predictive maintenance of aircraft machines

LSTM for predictive maintenance of aircraft machines: failure and RUL (remaining usefull life) prediction - Predictive Maintenance: Use LSTM to predict failure (binary classification) and RUL (remaining useful life or time to failure with regression) of aircraft engines.

Anomaly Detection for Time Series with IBM API (SVR), K-Means clustering, statsmodels decomposition and Fourier analysis

IBM API for anomaly detection, univariate data - Jupyter Notebook

Text Categorisation Task with ML (Reuters)

Categorization task with ML Algorithms for Reuters text categorization benchmark dataset - LinearSVC (Linear Support Vector Classifier), Decision Tree, Random Forest, Logistic Regression,k-Nearest Neighbors (k-NN),Naive Bayes, AdaBoost, LDA (Linear Discriminant Analysis),RBM (Restricted Boltzmann Machine),MLP (Multilayer Perceptron).
Collection of chatbots, dialogue systems

(coming soon)

Playground

EDA (Explorative Data Analysis)

Explorative Data Analysis of Aibnb rental prices in New York, 2019 - Jupyter Notebook

(further projects coming soon)

Basic NLP Examples

NLP examples - Jupyter Notebook with data preprocessing, top words, word cloud, frequencies, AgglomerativeClustering, PCA, Sentiment analysis, Topic Detection
REGEX examples - simple summary of regex examples in Jupyter Notebook.

Databases, SQL, noSQL, webscrapping, email notifications

LinkedIn webscrapping, saving data to local MongoDB and csv, filtering and updating the user via email - automatically extracting job postings from LinkedIn according to the predefined settings, storing them in a local MongoDB database and csv file, searching for relevant positions based on the keywords, and notifying the user via email (Gmail API) about relevant job opportunities.

Various ML tasks

https://github.com/TatjanaChernenko/ml_playground
Regression Task: Predicting Airbnb rental prices in New York - Regression task to predict rental prices in New York, playground. Models used: Linear Regression, Decision Trees, NNs.

(coming soon)

Apps with ChatGPT and OpenAI

OpenAI basic app - updating the basic OpenAI simple app to generate pet names to correspond to the OpenAI changes in code (January, 2024)
[fork: GPT Chatbot - customizable]https://github.com/TatjanaChernenko/customizable-gpt-chatbot) - A dynamic, scalable AI chatbot built with Django REST framework, supporting custom training from PDFs, documents, websites, and YouTube videos. Leveraging OpenAI's GPT-3.5, Pinecone, FAISS, and Celery for seamless integration and performance.

(coming soon)

Dialogue Systems

Question answering with DistilBERT - Question answering with DistilBERT, HuggingFace
Document Question Answering with LayoutLM - This is a fine-tuned version of the multi-modal LayoutLM model for the task of question answering on documents. It has been fine-tuned using both the SQuAD2.0 and DocVQA datasets.

Recommendation Systems

Own projects:

(to be uploaded soon) Forks:

Recommendation System with TensorFlow, approx.2020 - TensorFlow Recommenders is a library for building recommender system models using TensorFlow. Fork from smellslikeml
TF-Recommenders with Kubernetes - Example of kubernetes deployment for tf-recommenders model

Sentiment Analysis

(to be uploaded soon)

Forks:

Tweet Analysis - Analyzing ChatGPT-related tweets to observe technology interest trends over time

Voice technologies (speech-to-text, speech-to-speech, text-to-speech)

Own projects: (to be uploaded soon)

Forks:

Whisper OpenAI - Robust Speech Recognition via Large-Scale Weak Supervision
WhisperX Timestamps (& Diarization) - Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Whisper real-time - real-time speech-to-text conversion with Whisper
SpeechGPT - detects microphone input and coverts it to text using Google's Speech Recognition API. It then opens ChatGPT and inputs the recognized text using selenium. It can be used with a wake word, and it can also use text to speech to repeat ChatGPT's answer to the query.
Speaker Diarization Whisper - Whisper with with Speaker Diarization based on OpenAI Whisper
Speech-to-Text-WaveNet: End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow (forked from buriburisuri)
Speech-to-text via Whisper and GPT-4 - transcribe dictations to text using whisper, and then fixing the resulting transcriptions into usable text using gpt-4 (forked from MNoichl)
TensorFlow Speech Recognition - audio processing and speech classification with Tensorflow - convolution neural networks (forked from harshel)
Watson_STT_CustomModel - a custom speech model using IBM Watson Speech to Text; an old one (approx. 2018)
Simple Speech Recognition with Python - very simple setup using SpeechRecognition Python module
CTTS - Controllable Text-to-speech system, based on Microsoft's FastSpeech2
Google Sheets to Speech - Excel-to-speech, forked from Renoncio: A Python script for generating audio from a list of sentences in Google Sheets.
StreamlitTTS - Streamlit app allows you to convert text to audio files using the Microsoft Edge's online text-to-speech service.
Dolla Llama: Real-Time Co-Pilot for Closing the Deal - forked from smellslikeml; power a real-time speech-to-text agent with retrieval augmented generation based on webscraped customer use-cases, implements speech-to-text (STT) and retrieval-augmented generation (RAG) to assist live sales calls.
Text-to-Speech on AWS - forked from codets1989; using AWS Lambda and Polly converting text to speech and creating a automated pipeline
Whisper speech-to-text Telegram bot - forked from loyal-pelmen; Speech-to-Text Telegram bot
DeepSpeech on devices - embedded (offline, on-device) speech-to-text engine which can run in real time ranging from a Raspberry Pi 4 to high power GPU servers
Bash Whisper - using a Digital Voice Recorder (DVR) - Bash function to ease the transcription of audio files with OpenAI's whisper.
Awesome Whisper - model variants and playgrounds
TikTok Analyzer - Video Scraping and Content Analysis Tool. Search & download Tiktok videos by username and/or video tag, and analyze video contents. Transcribe video speech to text and perform NLP analysis tasks (e.g., keyword and topic discovery; emotion/sentiment analysis). Isolate audio signal and perform signal processing analysis tasks (e.g., pitch, prosody and sentiment analysis). Isolate visual stream and perform image tasks (e.g., object detection; face detection).
SpeechBrain - an open-source PyTorch toolkit that accelerates Conversational AI development; spans speech recognition, speaker recognition, speech enhancement, speech separation, language modeling, dialogue, and beyond. Over 200 competitive training recipes on more than 40 datasets supporting 20 speech and text processing tasks. Supports both training from scratch and fine-tuning pretrained models such as Whisper, Wav2Vec2, WavLM, Hubert, GPT2, Llama2, and beyond. The models on HuggingFace can be easily plugged in and fine-tuned.
Speech Synthesis Markup - SSML - XML-based markup language that you can use to fine-tune your text to speech output attributes (tutorial from Microsoft).

(further projects coming soon)

NMT

(coming soon)

Computer Vision

Own projects: (to be uploaded soon)

Forks/Inspiration:

Laundry Sorting with a robotic arm - Sorting laundry autonomously with a robotic arm and Computer Vision
-
-

Other

AutoXlicker - A lightweight and customizable autoclicker tool for automating repetitive clicking tasks.
[Hand Gesture Computer Interface]((https://github.com/TatjanaChernenko/Hand-Gesture-Computer-Interface) - Software that allows you to interact with your computer through hand gestures.

My Projects

Category	Project Title	GitHub
Research Repositories	CHERTOY: Word Sense Induction for better web search result clustering	CHERTOY System
Research Repositories	Data-to-text: Natural Language Generation from structured inputs	Data-to-text Generation
Research Repositories	Text Summarization research: Optimizing LexRank system with ECNU features	Text Summarization with LexRank
Research Repositories	Reinforcement Learning agent for Bomberman game	RL Agent for Bomberman
Research Repositories	Speech-to-text: Transfer Learning for Automatic Speech Translation (playground)	Speech-to-text with Transfer Learning
Research Repositories	Data Augmentation techniques for binary- and multi-label classification	Data Augmentation Techniques
Predictive Maintenance	LSTM for predictive maintenance of aircraft machines: failure and RUL (remaining useful life) prediction	Predictive Maintenance with LSTM
Anomaly Detection	Anomaly Detection for Time Series with IBM API (SVR), K-Means clustering, statsmodels decomposition and Fourier analysis	IBM API for anomaly detection, univariate data
Text Categorisation	Text Categorisation Task with ML (Reuters)	Categorization task with ML Algorithms for Reuters text categorization benchmark dataset
Playground	Explorative Data Analysis of Airbnb rental prices in New York, 2019	EDA of Airbnb Prices in New York
Playground	Basic NLP Examples	NLP examples
Databases, SQL, noSQL, webscrapping, email notifications	LinkedIn webscrapping, saving data to local MongoDB and csv, filtering and updating the user via email	LinkedIn Webscrapping and Email Notifications
Various ML tasks	Regression Task: Predicting Airbnb rental prices in New York	Regression Task with Airbnb Data
Dialogue Systems	Question answering with DistilBERT	DistilBERT Question Answering
Dialogue Systems	Document Question Answering with LayoutLM	LayoutLM Document QA
Recommendation Systems	Recommendation System with TensorFlow	TensorFlow Recommenders
Sentiment Analysis	Sentiment Analysis	(to be uploaded soon)
Voice Technologies	Speech-to-Text-WaveNet	Speech-to-Text-WaveNet
Voice Technologies	Speech-to-text via Whisper and GPT-4	Speech-to-text with Whisper to GPT
Voice Technologies	TensorFlow Speech Recognition	TensorFlow Speech Recognition
Voice Technologies	Watson_STT_CustomModel	Watson STT Custom Model
Voice Technologies	Simple Speech Recognition with Python	Simple Speech Recognition
Voice Technologies	CTTS	CTTS
Voice Technologies	Google Sheets to Speech	Google Sheets to Speech
Voice Technologies	StreamlitTTS	StreamlitTTS
Voice Technologies	Dolla Llama: Real-Time Co-Pilot for Closing the Deal	Dolla Llama
Voice Technologies	Text-to-Speech on AWS	Text-to-Speech on AWS
Voice Technologies	Whisper speech-to-text Telegram bot	Whisper Speech-to-Text Telegram Bot
NMT	NMT (Neural Machine Translation)	(coming soon)

Inspiration

Different

Prediction, Time Series, Anomaly Detection

AI for Time Series - papers, tutorials, surveys (!)
IBM Hub API Tutorial - Anomaly Detection - use IBM API for anomaly detection
IBM API for Anomaly Detection - playing around with the Anomaly Detection service to be made available on IBM API Hub
AWS Forecast - end-to-end guide - Prediction with AWS
Amazon Monitron Guidance for Predictive Maintenance - predictive maintenance management in industrial environments using Amazon Monitron and other AWS services.
Azure Predictive Maintenance Template - Regression: predict the Remaining Useful Life (RUL), or Time to Failure (TTF); binary classification: predict if an asset will fail within certain time frame (e.g. days). Multi-class classification: Predict if an asset will fail in different time windows: E.g., fails in window [1, w0] days; fails in the window [w0+1,w1] days; not fail within w1 days.
AI for Time Series - Tutorials
PredictionIO - Apache; a machine learning server for developers and ML engineers.
Conforal Prediction Tutorials - A professionally curated list of awesome Conformal Prediction videos, tutorials, books, papers, PhD and MSc theses, articles and open-source libraries.
Time Series Prediction - LSTM Neural Network for Time Series Prediction
Stock Prediction Models - gathers machine learning and deep learning models for Stock forecasting including trading bots and simulations
Lime: Explaining the predictions of any machine learning classifier
Time Series Prediction - TensorFlow Tutorial for Time Series Prediction
Awesome-time-series - A comprehensive survey on the time series domains
Predictive Maintenance Datasets - Datasets for predictive maintenance

Data Science Resources

Data Science Resources - learning - The open-source curriculum for learning to be a Data Scientist (quite basic, but nice links to books, etc.)
Data Science Resources - An Data Science repository to learn and apply for real world problems.
Data Science Cheatsheets - List of Data Science Cheatsheets to rule the world
Python Data Science Handbook - full text in Jupyter Notebooks
Data science Python notebooks - Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Datasets from Huggingface - collection of Huggingface datasets
Huggingface - web API for visualizing and exploring of datasets - Lightweight web API for visualizing and exploring all types of datasets - computer vision, speech, text, and tabular - stored on the Hugging Face Hub
Huggingface - analyse datasets - EDA from Huggingface (Developing tools to automatically analyze datasets)
Jupyter Notebooks for Big Data - with Spark and Hadoop - A guide on how to use Jupyter Notebook with big data frameworks like Apache Spark and Hadoop, including recommended libraries and tools.

NLP Resources

NLP state-of-the-art - Tracking Progress in Natural Language Processing
NMT Tutorial - Neural Machine Translation (NMT) tutorial. Data preprocessing, model training, evaluation, and deployment.
NMT - An educational tool to train, inspect, evaluate and translate using neural engines
FasterNMT - NMT incl. data preprocessing, model training, evaluation, and deployment with great performance.
DeepLearningForNLPInPytorch - an IPython Notebook tutorial on deep learning for natural language processing, including structure prediction.
alennlp - An open-source NLP research library, built on PyTorch.
Natural Language Processing Tutorial for Deep Learning Researchers
Oxford Deep NLP 2017 course
awasome-nlp - A curated list of resources dedicated to Natural Language Processing (NLP)
German-NLP Datasets
Scrapy - a fast high-level web crawling & scraping framework for Python.
Ressources for redacting personally identifiable information - resources for programmatically redacting personally identifiable information
Simple ML baselines - Jupyter Notebooks - simple ML baselines
Huggingface - Transformer - Huggingface - Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX; various tasks
Name Entity Recognition with Electra - Name Entity Recognition with Electra, Huggingface
Text Generation with GPT-2 - Text Generation with GPT-2, Huggingface
Natural Language Inference with RoBERTa - Natural Language Inference with RoBERTa, Huggingface
Summarization with BART - Text Summarization with BART, Huggingface
Data processing pipelines - data processing pipelines from Huggingface
Tokenizers from Huggingface - Fast State-of-the-Art Tokenizers optimized for Research and Production
text-generation-inference from Huggingface - Large Language Model Text Generation Inference. Text Generation Inference (TGI) is a toolkit for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation for the most popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and more.
Open-source AI cookbook - Huggingface - Open-source AI cookbook(Fine_tuning_Code_LLM_on_single_GPU.ipynb, etc.)
Styleformer - A neural language style transfer framework to transfer text smoothly between styles.
conv-emotion - Implementation of different architectures for emotion recognition in conversations.

Evaluation Tasks

Evaluate from Huggingface - Evaluate is a library that makes evaluating and comparing models and reporting their performance easier and more standardized. Implementations of dozens of popular metrics: the existing metrics cover a variety of tasks spanning from NLP to Computer Vision
NMT Evaluation framework - A useful framework to evaluate and compare different Machine Translation engines between each other on variety datasets.
FastChat - LLM chatbots evaluation platform - FastChat is an open platform for training, serving, and evaluating large language model based chatbots.
ParlAI - a framework for training and evaluating AI models on a variety of openly available dialogue datasets.
AutoGluon - if you prefer more control over the forecasting model exploration, training, and evaluation processes.
tune from Huggingface - A benchmark for comparing Transformer-based models.

Image / Video Technologies

Activity detection - Real-Time Spatio-Temporally Localized Activity Detection by Tracking Body Keypoints
Dance transfer - acquire pose estimates from a participant, train a pix2pix model, transfer source dance video, and generate a dance gif; Motion transfer booth for a 1 hour everybody dance now video generation using EdgeTPU and Tensorflow 2.0
Video embeddings and similarity - Training CNN model to generate image embeddings
Deep Fakes Detection - (2019) Repository to detect deepfakes, an opensource project as part of AI Geeks effort.
Diffusers from Huggingface - Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch

Voice Technologies

Speech Cognitive Service - A Jupyter Notebook that details how to use Azure's Speech Cognitive Service to Translate speech
Audio-Speech Tutorial, 2022 - an introduction on the topic of audio and speech processing - from basics to applications (approx. 2022)
espnet - End-to-End Speech Processing Toolkit
TTS - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Speech-to-text benchmark - speech-to-text benchmarking framework
Speech-to-text - with Whisper and Python, March 2023
Multilingual Text-to-Speech - Tomáš Nekvinda and Ondřej Dušek, One Model, Many Languages: Meta-Learning for Multilingual Text-to-Speech, 2020, Proc. Interspeech 2020
Unified Speech Tokenizer for Speech Language Models - SpeechTokenizer; SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models, Xin Zhang and Dong Zhang and Shimin Li and Yaqian Zhou and Xipeng Qiu, 2023
FunASR - a Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models; hopes to build a bridge between academic research and industrial applications on speech recognition. By supporting the training & finetuning of the industrial-grade speech recognition model, researchers and developers can conduct research and production of speech recognition models more conveniently, and promote the development of speech recognition ecology
Whisper model - OpenAI Whisper
Wenet - Production First and Production Ready End-to-End Speech Recognition Toolkit
Distilled variant of Whisper - Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Fine-tune Whisper -Fine-Tune Whisper For Multilingual ASR with Transformers

Different ML Resources

Applied ML - (not really up-to-date, but good) Papers & tech blogs by companies sharing their work on data science & machine learning in production.
500 AI projects - 500 AI Machine learning, Deep learning, Computer vision, NLP Projects with code
Parameter-Efficient Fine-Tuning from Huggingface - PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Huggingface notebooks for various(!) tasks - Notebooks using the Hugging Face libraries
Huggingface educational resources - Educational materials
Huggingface: notifications - Knock Knock: Get notified when your training ends with only two additional lines of code
Huggingface: No-code raining and deployments of state-of-the-art machine learning models - AutoTrain is a no-code tool for training state-of-the-art models for Natural Language Processing (NLP) tasks, for Computer Vision (CV) tasks, and for Speech tasks and even for Tabular tasks.
Huggingface: Transformer Tutorials - transformers-tutorials (by @nielsrogge) - Tutorials for applying multiple models on real-world datasets.

Industrial research

OpenAI

OpenAI - simple app - My note: a model used and several functions are already deprecated; my version above has things updated.
Retrieval-Augmented Generation in Azure using Azure AI search - A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
A collection of custom OpenAI WebApps
Real time speech2text - Build real time speech2text web apps using OpenAI's Whisper
OpenAI cookbook
OpenAI WhatsApp Chatbot
GPT-engineer - Specify what you want it to build, the AI asks for clarification, and then builds it.
Prompt-engineering Guide
PDF search app with OpenAI - an AI-app that allows you to upload a PDF and ask questions about it. It uses OpenAI's LLMs to generate a response.
OpenAI Code Automation - Fully coded Apps by GPT-4 and ChatGPT. Power of AI coding automation and new way of developing.
Semantic Search - Tutorial and template for a semantic search app powered by the Atlas Embedding Database, Langchain, OpenAI and FastAPI

Microsoft

OptiGuide - Large Language Models for Supply Chain Optimization
Generative AI lessons - 12 Lessons, Get Started Building with Generative AI
LLMOps Workshop - Learn how to build solutions with Large Language Models.
Data Science Lessons
AI Lessons
unilm - Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities. An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Old Photo Restoration via Deep Latent Space Translation - Bringing Old Photo Back to Life (CVPR 2020 oral)
NNI - An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Meta Research

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Seamless: Speech-to-speech translation (S2ST), Speech-to-text translation (S2TT), Text-to-speech translation (T2ST), Text-to-text translation (T2TT), Automatic speech recognition (ASR)
Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks.
Faiss is a library for efficient similarity search and clustering of dense vectors.
PyTorch-BigGraph (PBG) is a distributed system for learning graph embeddings for large graphs, particularly big web interaction graphs with up to billions of entities and trillions of edges.
Llama 2 Fine-tuning - examples to quickly get started with fine-tuning for domain adaptation and how to run inference for the fine-tuned models. For ease of use, the examples use Hugging Face converted versions of the models.
Pearl - A Production-ready Reinforcement Learning AI Agent Library
TorchRecipes - Recipes are a standard, well supported set of blueprints for machine learning engineers to rapidly train models using the latest research techniques without significant engineering overhead.
fastText is a library for efficient learning of word representations and sentence classification.
ParlAI - a framework for training and evaluating AI models on a variety of openly available dialogue datasets.

AWS samples

Image Generator with Stable Diffusion on Amazon Bedrock using Streamlit - A quick demostration to deploy a Stable Diffusion Web application with containers running on Amazon ECS. The model is provided by Amazon Bedrock in this example
Transactional Data Lake using Apache Iceberg with AWS Glue Streaming and MSK Connect (Debezium) - Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK Serverless and MSK Connect (Debezium)
MLOps using Amazon SageMaker and GitHub Actions - MLOps example using Amazon SageMaker Pipeline and GitHub Actions
Near-Real Time Usage Anomaly Detection using OpenSearch - Detect AWS usage anomalies in near-real time using OpenSearch Anomaly Detection and CloudTrail for improved cost management and security
Amazon DocumentDB (with MongoDB compatibility) samples - Code samples that demonstrate how to use Amazon DocumentDB
Marketing Content Generator - CDK Deployment for a sample marketing portal using generative AI for content generation and distribution; Marketing Content Generation and Distribution powered by Generative AI
Amazon SageMaker and AWS Trainium Examples - Text classification using Transformers, Pretrain BERT using Wiki Data, Pretrain/Fine tune Llama using Wiki Data.
AWS SageMaker Local Mode - Amazon SageMaker Local Mode Examples
End-to-end AIoT w/ SageMaker and Greengrass 2.0 on NVIDIA Jetson Nano - Hands-on lab from ML model training to model compilation to edge device model deployment on the AWS Cloud. It covers the detailed method of compiling SageMaker Neo for the target device, including cloud instance and edge device, and how to write and deploy Greengrass-v2 components from scratch.
InsuranceLake ETL with CDK Pipeline - This solution helps you deploy ETL processes and data storage resources to create an Insurance Lake using Amazon S3 buckets for storage, AWS Glue for data transformation, and AWS CDK Pipelines. It is originally based on the AWS blog Deploy data lake ETL jobs using CDK Pipelines, and complements the InsuranceLake Infrastructure project.
Amazon Forecast - for a low-code/no-code fully managed time series AI/ML forecasting service.
AutoGluon - if you prefer more control over the forecasting model exploration, training, and evaluation processes.
Retrieval Augmented Generation with Streaming LLM - leverage LLMs for RAG(Retrieval Augmented Generation).
Build generative AI agents with Amazon Bedrock, Amazon DynamoDB, Amazon Kendra, Amazon Lex, and LangChain

NVIDIA

Deep Learning Examples - State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
NeMo: a toolkit for conversational AI

tatjanachernenko Goto Github PK

Introduction

Table of Contents

My Projects

Research Repositories

Predictive Maintenance (RUL, failure prediction, maintaince)

Game AI

Speech Recognition

Data Augmentation

Inspiration

Industrial research

My Projects

Research Repositories

NLP / ML

LSTM for predictive maintenance of aircraft machines

Anomaly Detection for Time Series with IBM API (SVR), K-Means clustering, statsmodels decomposition and Fourier analysis

Text Categorisation Task with ML (Reuters)

Playground

EDA (Explorative Data Analysis)

Basic NLP Examples

Databases, SQL, noSQL, webscrapping, email notifications

Various ML tasks

Apps with ChatGPT and OpenAI

Dialogue Systems

Recommendation Systems

Sentiment Analysis

Voice technologies (speech-to-text, speech-to-speech, text-to-speech)

NMT

Computer Vision

Other

My Projects

Inspiration

Different

Prediction, Time Series, Anomaly Detection

Data Science Resources

NLP Resources

Evaluation Tasks

Image / Video Technologies

Voice Technologies

Different ML Resources

Industrial research

Tatjana (Tetyana) Chernenko's Projects

Recommend Projects

Recommend Topics

Recommend Org