A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
TransformerEngine is a library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Although it is new to PyPI (May 2023), it has existed as an open source project on GitHub since 28th September 2022.
How large is each release?
1 wheel each for aarch64 and x86, roughly 350MB per wheel for a total of 700 MB.