yiakwy-xpu-ml-framework-team's Projects
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
AMD's Machine Intelligence Library
Dockerfiles for the various software layers defined in the ROCm software platform
AMD ROCm™ Software - GitHub Home
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
A library for efficient similarity search and clustering of dense vectors.
Fast Segment Anything
A distributed deep learning framework.
Gaussian Belief Propagation for Bundle adjustment and pose graph estimation.
Poplar implementation of "Bundle Adjustment on a Graph Processor" (CVPR 2020)
Model parallel transformers in JAX and Haiku
GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing those programs on GroqChip™ processors.
Useful tutorials and recipes for developers doing low-level work with the Graphcore IPU
Best practice for HPC with IPU backend for scientific/AI(Deep Learning Framework) algorithm and software development
DOOM (1993) on IPU 👿
IPU programming in Julia
Experimental JAX for Graphcore IPUs
NVIDIA NCCL Tests for Distributed Training
libavif - Library for encoding and decoding .avif files
Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
Python bindings for llama.cpp
Port of Facebook's LLaMA model in C/C++
Fork of LLVM Project containing a Colossus IPU backend implementation
A framework for few-shot evaluation of autoregressive language models.
Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
Ongoing research training transformer models at scale