zhuhaozhe / pytorch Goto Github PK

This project forked from pytorch/pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

License: Other

Python 49.64% Shell 0.30% Batchfile 0.02% CMake 0.77% Makefile 0.01% Java 0.12% C++ 41.33% C 2.12% Cuda 3.81% Dockerfile 0.03% Metal 0.04% Objective-C++ 1.37% Objective-C 0.03% PureBasic 0.10% LLVM 0.01% Yacc 0.01% CSS 0.01% HTML 0.01% PowerShell 0.01% Assembly 0.29%

pytorch's People

Contributors

Forkers

enterpriseih

pytorch's Issues

Options for FP32 precsion

Flag Names

Option1: set_float32_precision + (highest, high, medium)

Pros: Clear implication of relative precision.
Cons: Potential confusion for users familiar with other data types like tf32 or bf16. Lack of direct association between dtypes and highest, high, medium.

Option2: allow_+data_type, e.g., allow_tf32. And easier to extend with newer dtypes.

Pros: More descriptive naming. Easier extension with newer data types.
Cons: Relative precision impact not immediately clear. May require additional documentation.

How/Where place flags

Option1: Only 1 flag set_fp32_precision under torch.* for all operators (conv,matmul,lstm) and all backends (CUDA, CUDNN, MKLDNN). Do not allow backend-specific flags.

Pros: Minimize configuration API surface. Simplified debugging scope.
Cons: Lack of flexibility. Inability to control precision per backend/operator.

Option2: Layering structure. All these flags should have an Optional[bool (or enum value from highest, high, medium)] type such that whenever a level is not specified, it uses the parent value

Pros: Advanced configuration per backend/operator. And can be degenerated to option 1 if all backends/operator specific flags are None.
Cons: Increased debugging complexity. Potential for multiple flags controlling the same thing. We may need to notify users when some low precision computation are enabled by some flags.

Option3: 3 flags set_fp32_conv/rnn/matmul_precision under torch.* for all backends. Allow backends to have backend-specific flags and setting backend-specific flags will also change the backend irrelevant flags.

Pros: Support per-operator configuration. Users can configure without detailed backend knowledge.
Cons: Larger searching scope. Interaction between backend-irrelevant and backend-specific flags. e.g., if user disable tf32 for cuda backend, the low precision for mkldnn matmul are also disabled.

USE the bfloat16 datatype (8 mantissa bits) for internal computations with GEMM/CONV/RNN

🚀 The Feature

This RFC proposes to use BFloat16 for GEMM/CONV/RNN internal computations on CPU device with user controlled frontend API. Currently, we have torch.set_float32_matmul_precision which allow float32 matrix multiplications in lower precision.

highest -> Do not use lower precision
high -> Use TF32 as the internal computation data type.
medium -> Designed to use BF16 as the internal computation data type.
To allow CONV/RNN to also have an internal computation data type for float32 and integrate mkldnn BF16 as an internal computation data type with GEMM/CONV/RNN on CPU device, we proposed below high-level code changes.

Frontend changes:

We propose to provide frontend API
- torch.set_float32_conv_precision, torch.get_float32_conv_precision
- torch.set_float32_rnn_precision, torch.get_float32_rnn_precision

These frontend API should work under the same behavior with torch.set_float32_matmul_precision and torch.get_float32_matmul_precision. Users can set the precision to highest, high, and medium. When the precision is high, CUDA/CUDNN backend will be allowed to use TF32 as the internal computation data type. When the precision is medium, the MKLDNN backend will be allowed to use BF16 as the internal computation data type.

Backend changes:

For matmul. Currently, we only dispatch at::matmul to mkldnn_matmul when input tensors are BFloat16. We propose to further dispatch at::matmul to mkldnn_matmul when:
- (1)float32_matmul_precision is medium and
- (2) Input tensors are float32

Then We will use BF16 as the internal computation data type, PR is already created.

For Conv. We will check float32_conv_precision in mkldnn_conv and will use BF16 as the internal computation data type.
For RNN. We will check float32_rnn_precision in mkldnn_rnn_layer and will use BF16 as the internal computation data type.

Inductor changes:

We will pack addmm/mm to mkldnn_linear when float32_matmul_precision is medium

Motivation

A new instruction set of BF16 TMUL on Intel XEON server product can improve user application performance. With these frontend API, users can control internal computation data types for GEMM/CONV/RNN even when the model's data type is FLoat32. This will

Have higher precision compared with Autocast features since only GEMM/CONV/RNN can have BF16 internal computation data types while for Autocast, more ops might be computed at the BF16 level.
Users can enable BF16 without finding a place to enable autocast in model scripts.

Pitch

Provide float32_conv_precision and float32_rnn_precision and enable bfloat16 datatype for internal computations with MKLDNN backend when precision is set to medium

Additional context

Design option

Front end API:

option 1: provide backend irrelevant API get/set_float32_conv/rnn_precision like float32_matmul_precision.
- Pros:
  - The user-facing API is unified. Users can use lower-precision computation data types without knowing the backend details.
- Cons:
  - Less of a fine-grained controller for different backend.
option 2: provide allow_bf32 in the mkldnn backend like allow_tf32 in cudnn backend.
- Pros:
  - Find-grained controller: The user will be able to run BF16 as internal computation datatypes on CPU and run FP32 datatypes on the GPU if the model is distributed on multiple kinds of devices.
- Cons:
  - The Users need to learn about different backend details and more code changes in their app.

Design option

Inductor linear packable rules:

option 1: Only pack it to mkldnn_linear when presion is medium.
- Pros:
  - No performance changes for pure FP32 case. No regression risks.
- Cons:
  - Less of fusion opportunities.
option 2: Always pack it to mkldnn_linear.
- Pros:
  - mkldnn_linear will introduce more fusion opportunities.
- Cons:
  - May have regression risks for pure FP32 case.

Set Float32 Precision for CONV/RNN

RFC: Extend set fp32 precision API to support Convolution and RNN

Overview

This RFC proposes the addition of a user-controlled frontend API to configure the internal precision of float32 operations in convolutional (CONV) and recurrent neural networks (RNN) within PyTorch. Currently, PyTorch offers torch.set_float32_matmul_precision to configure the internal precision of float32 matrix multiplication. This RFC suggests extending this functionality to include convolution and recurrent neural network operations, providing torch.set_float32_conv_precision and torch.set_float32_rnn_precision. The proposed APIs will mimic the behavior of torch.set_float32_matmul_precision.

Frontend Changes

Frontend changes involve introducing new APIs:

torch.set_float32_conv_precision, torch.get_float32_conv_precision
torch.set_float32_rnn_precision, torch.get_float32_rnn_precision

These APIs will function similarly to torch.set_float32_matmul_precision and torch.get_float32_matmul_precision. Users can set the precision to highest, high, or medium, each with corresponding backend behavior:

highest: Use the highest available precision, avoiding lower precision.
high: Allow backends to use TensorFloat32 (TF32) or treat each float32 number as the sum of two bfloat16 numbers.
medium: Allow backends to use BFloat 16 (BF16).

Backend Changes

Global flags float32_conv/rnn_precision will be introduced at this location in the PyTorch repository. This flag can be accessed and modified by the frontend APIs torch.get/set_float32_conv/rnn_precision. Backend-related operators will read this flag to control the internal computation data types. For example:

For CuDNN backend, we should check float32_conv_precision in the CuDNN Conv kernel. We should also check float32_rnn_precision in the CuDNN RNN kernel. If not set to highest, the internal computation data type will be TF32.
For OneDNN backend, we should check float32_conv_precision in OneDNN Conv kernel and check float32_rnn_precision in OneDNN RNN kernel. If set to medium, the internal data type will be BF16.

Flag Overrides

The existing CUDNN backend-specific flag torch.backends.cudnn.allow_tf32 will interact with the proposed backend-irrelevant flag torch.set_float32_conv/rnn_precision. These flags will override each other( we follow similar behavior between torch.backends.cuda.matmul.allow_tf32 and float32_matmul_precision):

Turning on/off TF32 with torch.backends.cudnn.allow_tf32 will set float32_rnn/conv_precision to high (TF32 enabled) and highest (TF32 disabled).

torch.backends.cudnn.alow_tf32=True
print("float32_conv_precision", torch.get_float32_conv_precision)
print("float32_rnn_precision", torch.get_float32_rnn_precision)
# output:
# float32_conv_precision, high
# float32_rnn_precision, high
torch.backends.cudnn.alow_tf32=False
print("float32_conv_precision", torch.get_float32_conv_precision)
print("float32_rnn_precision", torch.get_float32_rnn_precision)
# output:
# float32_conv_precision, highest
# float32_rnn_precision, highest

Setting both float32_rnn/conv_precision to high or medium will enable torch.backends.cudnn.allow_tf32, while setting one of it to highest will disable it.

torch.backends.cudnn.alow_tf32=True
torch.set_float32_conv_precision("highest")
print("torch.backends.cudnn.alow_tf32", torch.backends.cudnn.alow_tf32)
# output:
# torch.backends.cudnn.alow_tf32, False
torch.set_float32_rnn_precision("highest")
print("torch.backends.cudnn.alow_tf32", torch.backends.cudnn.alow_tf32)
# output:
# torch.backends.cudnn.alow_tf32, False
torch.set_float32_conv_precision("high")
torch.set_float32_rnn_precision("high")
print("torch.backends.cudnn.alow_tf32", torch.backends.cudnn.alow_tf32)
# output:
# torch.backends.cudnn.alow_tf32, True

Additional CuDNN Flag

We discussed how the existing CuDNN flag, torch.backends.cudnn.allow_tf32, interacts with torch.set_float32_conv/rnn_precision. However, we believe it is cleaner to use separate flags in CuDNN. We suggest deprecating torch.backends.cudnn.allow_tf32 in favor of torch.backends.cudnn.conv.allow_tf32 and torch.backends.cudnn.rnn.allow_tf32. Then, the CuDNN backend-specific flags and backend-irrelevant flags can have a one-to-one correspondence, such as torch.backends.cuda.matmul.allow_tf32 and torch.float32_matmul_precision

torch.backends.cudnn.conv.allow_tf32 <-> torch.float32_conv_precision
torch.backends.cudnn.rnn.allow_tf32 <-> torch.float32_rnn_precision
# below flags are already existing now
torch.backends.cuda.matmul.allow_tf32 <-> torch.float32_matmul_precision

Motivation

Lower-precision computation from different backends can significantly improve performance for deep learning workloads with minimal impact on precision. For example, TF32 from CUDA/CUDNN or implicit reduced precision arithmetic feature from oneDNN. By providing a user-controlled frontend API, users can easily configure the internal computation data type of convolutional and recurrent neural networks without knowing the detail of different backends. This allows them to leverage the performance benefits of lower precision while ensuring acceptable precision loss. Compared to Autocast, the proposed flags offer:

Higher precision control as they only affect convolutional and recurrent neural network internal data types, unlike Autocast, which impacts more operators.
Ease of use, as users do not need to modify their model scripts to enable autocasting.

Pitch

Introduce float32_conv/rnn_precision and enable users to control the internal data type for convolutional and recurrent neural networks by configuring the value of float32_conv/rnn_precision.

zhuhaozhe / pytorch Goto Github PK

pytorch's People

Contributors

Forkers

pytorch's Issues

Flag Names

How/Where place flags

🚀 The Feature

Frontend changes:

Backend changes:

Inductor changes:

Motivation

Pitch

Additional context

Design option

Design option

RFC: Extend set fp32 precision API to support Convolution and RNN

Overview

Frontend Changes

Backend Changes

Flag Overrides

Additional CuDNN Flag

Motivation

Pitch

Design options1: top down propagation

Design options1: no top down propagation

Recommend Projects

Recommend Topics

Recommend Org