shayebuhui01 Goto Github PK
Type: User
Type: User
A list of awesome compiler projects and papers for tensor computation and deep learning.
8-bit CUDA functions for PyTorch
The calflops is designed to calculate FLOPs、MACs and Parameters in all various neural networks, such as Linear、 CNN、 RNN、 GCN、Transformer(Bert、LlaMA etc Large Language Model)
Making large AI models cheaper, faster and more accessible
A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture
An unofficial cuda assembler, for all generations of SASS, hopefully :)
博客地址:https://zhuanlan.zhihu.com/p/518857175
博客地址:https://zhuanlan.zhihu.com/p/555339335
A simple high performance CUDA GEMM implementation.
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction. 博客地址:https://zhuanlan.zhihu.com/p/639297098
CUDA Templates for Linear Algebra Subroutines
deep learning for image processing including classification and object-detection etc.
Deep Learning System core principles introduction.
Fast and memory-efficient exact attention
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
Flops counter for convolutional networks in pytorch framework
Dissecting NVIDIA GPU Architecture
how to learn PyTorch and OneFlow
how to optimize some algorithm in cuda.
row-major matmul optimization
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
A Easy-to-understand TensorOp Matmul Tutorial. 博客地址:https://zhuanlan.zhihu.com/p/631227862
Ongoing research training transformer language models at scale, including: BERT & GPT-2
C++ Simulator for Neural Processing Unit
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.