Marko Kabić's Projects
My accepted submissions to ACM ICPC training and related sites
Auto parallelization for large-scale neural networks
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Prototype for the Arbor multi-compartment neural network simulator
A PyTorch implementation of the Transformer model in "Attention is All You Need".
Communication Avoiding Numerical Dense Matrix Computations
Communication-Avoiding Parallel Strassen
Colossal-AI: A Unified Deep Learning System for Big Model Era
Distributed Communication-Optimal LU-factorization Algorithm
Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
Distributed Communication-Optimal Shuffle and Transpose Algorithm
Quantum chemistry and solid state physics software package
Cyclops Tensor Framework: parallel arithmetic on multidimensional arrays
cuDF - GPU DataFrame Library
Cylon is a fast, scalable, distributed memory, parallel runtime with a Pandas like DataFrame.
Interface for Distributed Linear Algebra
Transformer related optimization, including BERT, GPT
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Fast and memory-efficient exact attention
Flax is a neural network library for JAX that is designed for flexibility.
Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020
Google Research
A library transforming between two arbitrary grid-like matrix data layouts over MPI ranks.
An extension to Googletest with MPI
Function interposition for Linux and Mac OS