zhuhaozhe / pytorch Goto Github PK
View Code? Open in Web Editor NEWThis project forked from pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home Page: https://pytorch.org
License: Other
This project forked from pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home Page: https://pytorch.org
License: Other
Option1: set_float32_precision
+ (highest
, high
, medium
)
tf32
or bf16
. Lack of direct association between dtypes
and highest, high, medium
.Option2: allow_
+data_type
, e.g., allow_tf32
. And easier to extend with newer dtypes.
Option1: Only 1 flag set_fp32_precision
under torch.*
for all operators (conv
,matmul
,lstm
) and all backends (CUDA
, CUDNN
, MKLDNN
). Do not allow backend-specific flags.
Option2: Layering structure. All these flags should have an Optional[bool (or enum value from highest, high, medium)]
type such that whenever a level is not specified, it uses the parent value
None
.Option3: 3 flags set_fp32_conv/rnn/matmul_precision
under torch.*
for all backends. Allow backends to have backend-specific flags and setting backend-specific flags will also change the backend irrelevant flags.
low precision
for mkldnn matmul
are also disabled.This RFC proposes to use BFloat16
for GEMM/CONV/RNN
internal computations on CPU
device with user controlled frontend API. Currently, we have torch.set_float32_matmul_precision
which allow float32 matrix multiplications in lower precision.
CONV/RNN
to also have an internal computation data type for float32 and integrate mkldnn BF16
as an internal computation data type with GEMM/CONV/RNN
on CPU device, we proposed below high-level code changes.torch.set_float32_conv_precision
, torch.get_float32_conv_precision
torch.set_float32_rnn_precision
, torch.get_float32_rnn_precision
These frontend API should work under the same behavior with torch.set_float32_matmul_precision
and torch.get_float32_matmul_precision
. Users can set the precision to highest
, high
, and medium
. When the precision is high
, CUDA/CUDNN backend will be allowed to use TF32
as the internal computation data type. When the precision is medium
, the MKLDNN backend will be allowed to use BF16
as the internal computation data type.
matmul
. Currently, we only dispatch at::matmul
to mkldnn_matmul
when input tensors are BFloat16. We propose to further dispatch at::matmul
to mkldnn_matmul
when:
float32_matmul_precision
is medium
andThen We will use BF16
as the internal computation data type, PR is already created.
For Conv
. We will check float32_conv_precision
in mkldnn_conv
and will use BF16
as the internal computation data type.
For RNN
. We will check float32_rnn_precision
in mkldnn_rnn_layer
and will use BF16
as the internal computation data type.
addmm/mm
to mkldnn_linear
when float32_matmul_precision
is medium
A new instruction set of BF16 TMUL on Intel XEON server product can improve user application performance. With these frontend API, users can control internal computation data types for GEMM/CONV/RNN
even when the model's data type is FLoat32
. This will
Autocast
features since only GEMM/CONV/RNN
can have BF16
internal computation data types while for Autocast
, more ops might be computed at the BF16
level.Provide float32_conv_precision
and float32_rnn_precision
and enable bfloat16 datatype for internal computations with MKLDNN
backend when precision is set to medium
Front end API:
get/set_float32_conv/rnn_precision
like float32_matmul_precision
.
allow_bf32
in the mkldnn
backend like allow_tf32
in cudnn
backend.
Inductor linear packable rules:
mkldnn_linear
when presion
is medium
.
mkldnn_linear
.
mkldnn_linear
will introduce more fusion opportunities.This RFC proposes the addition of a user-controlled frontend API to configure the internal precision of float32
operations in convolutional (CONV
) and recurrent neural networks (RNN
) within PyTorch. Currently, PyTorch offers torch.set_float32_matmul_precision
to configure the internal precision of float32
matrix multiplication. This RFC suggests extending this functionality to include convolution and recurrent neural network operations, providing torch.set_float32_conv_precision
and torch.set_float32_rnn_precision
. The proposed APIs will mimic the behavior of torch.set_float32_matmul_precision
.
Frontend changes involve introducing new APIs:
torch.set_float32_conv_precision
, torch.get_float32_conv_precision
torch.set_float32_rnn_precision
, torch.get_float32_rnn_precision
These APIs will function similarly to torch.set_float32_matmul_precision
and torch.get_float32_matmul_precision
. Users can set the precision to highest
, high
, or medium
, each with corresponding backend behavior:
highest
: Use the highest available precision, avoiding lower precision.high
: Allow backends to use TensorFloat32 (TF32) or treat each float32
number as the sum of two bfloat16
numbers.medium
: Allow backends to use BFloat 16 (BF16).Global flags float32_conv/rnn_precision
will be introduced at this location in the PyTorch repository. This flag can be accessed and modified by the frontend APIs torch.get/set_float32_conv/rnn_precision
. Backend-related operators will read this flag to control the internal computation data types. For example:
float32_conv_precision
in the CuDNN Conv kernel. We should also check float32_rnn_precision
in the CuDNN RNN kernel. If not set to highest
, the internal computation data type will be TF32.float32_conv_precision
in OneDNN Conv kernel and check float32_rnn_precision
in OneDNN RNN kernel. If set to medium
, the internal data type will be BF16.The existing CUDNN backend-specific flag torch.backends.cudnn.allow_tf32
will interact with the proposed backend-irrelevant flag torch.set_float32_conv/rnn_precision
. These flags will override each other( we follow similar behavior between torch.backends.cuda.matmul.allow_tf32
and float32_matmul_precision
):
torch.backends.cudnn.allow_tf32
will set float32_rnn/conv_precision
to high
(TF32 enabled) and highest
(TF32 disabled).torch.backends.cudnn.alow_tf32=True
print("float32_conv_precision", torch.get_float32_conv_precision)
print("float32_rnn_precision", torch.get_float32_rnn_precision)
# output:
# float32_conv_precision, high
# float32_rnn_precision, high
torch.backends.cudnn.alow_tf32=False
print("float32_conv_precision", torch.get_float32_conv_precision)
print("float32_rnn_precision", torch.get_float32_rnn_precision)
# output:
# float32_conv_precision, highest
# float32_rnn_precision, highest
float32_rnn/conv_precision
to high
or medium
will enable torch.backends.cudnn.allow_tf32
, while setting one of it to highest
will disable it.torch.backends.cudnn.alow_tf32=True
torch.set_float32_conv_precision("highest")
print("torch.backends.cudnn.alow_tf32", torch.backends.cudnn.alow_tf32)
# output:
# torch.backends.cudnn.alow_tf32, False
torch.set_float32_rnn_precision("highest")
print("torch.backends.cudnn.alow_tf32", torch.backends.cudnn.alow_tf32)
# output:
# torch.backends.cudnn.alow_tf32, False
torch.set_float32_conv_precision("high")
torch.set_float32_rnn_precision("high")
print("torch.backends.cudnn.alow_tf32", torch.backends.cudnn.alow_tf32)
# output:
# torch.backends.cudnn.alow_tf32, True
We discussed how the existing CuDNN flag, torch.backends.cudnn.allow_tf32
, interacts with torch.set_float32_conv/rnn_precision
. However, we believe it is cleaner to use separate flags in CuDNN. We suggest deprecating torch.backends.cudnn.allow_tf32
in favor of torch.backends.cudnn.conv.allow_tf32
and torch.backends.cudnn.rnn.allow_tf32
. Then, the CuDNN backend-specific flags and backend-irrelevant flags can have a one-to-one correspondence, such as torch.backends.cuda.matmul.allow_tf32
and torch.float32_matmul_precision
torch.backends.cudnn.conv.allow_tf32 <-> torch.float32_conv_precision
torch.backends.cudnn.rnn.allow_tf32 <-> torch.float32_rnn_precision
# below flags are already existing now
torch.backends.cuda.matmul.allow_tf32 <-> torch.float32_matmul_precision
Lower-precision computation from different backends can significantly improve performance for deep learning workloads with minimal impact on precision. For example, TF32
from CUDA/CUDNN
or implicit reduced precision arithmetic feature from oneDNN
. By providing a user-controlled frontend API, users can easily configure the internal computation data type of convolutional and recurrent neural networks without knowing the detail of different backends. This allows them to leverage the performance benefits of lower precision while ensuring acceptable precision loss. Compared to Autocast
, the proposed flags offer:
Introduce float32_conv/rnn_precision
and enable users to control the internal data type for convolutional and recurrent neural networks by configuring the value of float32_conv/rnn_precision
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.