Yueh-Po Peng's Projects
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences.
Contrastive Language-Audio Pretraining
Lab 2 project - Computer Networking Laboratory course at NTU CSIE.
a state-of-the-art-level open visual language model | ๅคๆจกๆ้ข่ฎญ็ปๆจกๅ
Official implementation of compound word transformer (AAAI'21)
Computational Methods for Data Science
Training General-Purpose Audio Tagging Networks with Noisy Labels and Iterative Self-Verification
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Digital Image Processing @NTU
Simple demonstration of utilising OpenCV.js for face recognition
final-project-b06902136 created by GitHub Classroom
Tensor library for machine learning
GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
FinTech Homework 2
[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
LAVIS - A One-stop Library for Language-Vision Intelligence
Collection of LeetCode questions to ace the coding interview! - Created using [LeetHub](https://github.com/QasimWani/LeetHub)
Python bindings for llama.cpp
Port of Facebook's LLaMA model in C/C++
๏ฃฟ "Wheel click" with three-finger click/tap for Trackpad and Magic Mouse.
Midi event transformer for music generation
Code base for MinD-Vis
MU-LLaMA: Music Understanding Large Language Model