Comments (9)
I've made the previous experiment with batch_first=False
, when I set batch_first=True
indeed the output is zero.
The reason is in the pytorch internals: it doesn't actually use MultiheadAttention.forward method (instead an optimized cuda kernel is directly called), so ptflops can't trace it. Therefore this bug with batch_first=True and param.requires_grad = False is not fixable.
from flops-counter.pytorch.
@ssk1997 please provide a code snippet to reproduce
from flops-counter.pytorch.
Test case with requires_grad=False:
import torch
import torch.nn as nn
from torch.nn import TransformerEncoder, TransformerEncoderLayer
from ptflops import get_model_complexity_info
def prepare_input(resolution):
input1 = torch.randn(1, 64, 512)
return dict(src=input1)
layer = TransformerEncoderLayer(d_model=512 , nhead=4, dim_feedforward=512 , batch_first=True)
model = TransformerEncoder(layer, 1)
for param in model.parameters():
param.requires_grad = False
flop1, params = get_model_complexity_info(model, input_res=(1, 1, 1),
input_constructor=prepare_input,
as_strings=True, print_per_layer_stat=True)
print(flop1,params)
output:
TransformerEncoder(
0, 0.000% Params, 0.0 Mac, 0.000% MACs,
(layers): ModuleList(
0, 0.000% Params, 0.0 Mac, 0.000% MACs,
(0): TransformerEncoderLayer(
0, 0.000% Params, 0.0 Mac, 0.000% MACs,
(self_attn): MultiheadAttention(
0, 0.000% Params, 0.0 Mac, 0.000% MACs,
(out_proj): NonDynamicallyQuantizableLinear(0, 0.000% Params, 0.0 Mac, 0.000% MACs, in_features=512, out_features=512, bias=True)
)
(linear1): Linear(0, 0.000% Params, 0.0 Mac, 0.000% MACs, in_features=512, out_features=512, bias=True)
(dropout): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.1, inplace=False)
(linear2): Linear(0, 0.000% Params, 0.0 Mac, 0.000% MACs, in_features=512, out_features=512, bias=True)
(norm1): LayerNorm(0, 0.000% Params, 0.0 Mac, 0.000% MACs, (512,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm(0, 0.000% Params, 0.0 Mac, 0.000% MACs, (512,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.1, inplace=False)
(dropout2): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.1, inplace=False)
)
)
)
0.0 Mac 0
Test case with requires_grad=True:
import torch
import torch.nn as nn
from torch.nn import TransformerEncoder, TransformerEncoderLayer
from ptflops import get_model_complexity_info
def prepare_input(resolution):
input1 = torch.randn(1, 64, 512)
return dict(src=input1)
layer = TransformerEncoderLayer(d_model=512 , nhead=4, dim_feedforward=512 , batch_first=True)
model = TransformerEncoder(layer, 1)
flop1, params = get_model_complexity_info(model, input_res=(1, 1, 1),
input_constructor=prepare_input,
as_strings=True, print_per_layer_stat=True)
print(flop1,params)
Output:
TransformerEncoder(
1.58 M, 99.870% Params, 105.04 MMac, 100.000% MACs,
(layers): ModuleList(
1.58 M, 99.870% Params, 105.04 MMac, 100.000% MACs,
(0): TransformerEncoderLayer(
1.58 M, 99.870% Params, 105.04 MMac, 100.000% MACs,
(self_attn): MultiheadAttention(
1.05 M, 66.580% Params, 71.48 MMac, 68.054% MACs,
(out_proj): NonDynamicallyQuantizableLinear(0, 0.000% Params, 0.0 Mac, 0.000% MACs, in_features=512, out_features=512, bias=True)
)
(linear1): Linear(262.66 k, 16.645% Params, 16.78 MMac, 15.973% MACs, in_features=512, out_features=512, bias=True)
(dropout): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.1, inplace=False)
(linear2): Linear(262.66 k, 16.645% Params, 16.78 MMac, 15.973% MACs, in_features=512, out_features=512, bias=True)
(norm1): LayerNorm(0, 0.000% Params, 0.0 Mac, 0.000% MACs, (512,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm(0, 0.000% Params, 0.0 Mac, 0.000% MACs, (512,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.1, inplace=False)
(dropout2): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.1, inplace=False)
)
)
)
105.04 MMac 1.58 M
Thanks for your response. @sovrasov
from flops-counter.pytorch.
My output if I launch this snipped with param.requires_grad = False
will be 100.89 MMac 0
. ptflops returns 0 params in that case because it counts only ones with a gradients (it's a natural definition of learnable parameters). Which version of ptflops do you use?
from flops-counter.pytorch.
For the reference you can grep torch._transformer_encoder_layer_fwd
in the pytorch's source code.
from flops-counter.pytorch.
Thanks a lot. It works fine with batch_first=False
.
from flops-counter.pytorch.
This bug may occur in other cases as well because of pytorch inference optimization: https://pytorch.org/tutorials/beginner/bettertransformer_tutorial.html https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/
from flops-counter.pytorch.
Encountered the same bug. Is there any hook to fix this?
from flops-counter.pytorch.
@quancs as I already wrote, this is a wontfix problem
from flops-counter.pytorch.
Related Issues (20)
- Is the input size of function "get_model_complexity_info()" must be fixed to 3 demensions? HOT 2
- How do I calculate the FLOPs of a model with some frozen layers during training? HOT 2
- Does this code also calculates MACs for back propagation? HOT 1
- how to calculate the flops if one module have 'einsum' option? HOT 2
- flops are counted multiple times if a module is shared by other modules HOT 4
- Support LayerNorm? HOT 1
- support for torch.compile? HOT 1
- The Conv1d with the same architecture yields different Macs in different models HOT 4
- integer overflow, when calculate the MACs of the ViT on Windows HOT 1
- There was a bug with computing FLOPs in OpenPCdet HOT 1
- Do this work with the 'deformable convolution' as well? HOT 5
- Is there some bug in the 'input_constructor' function? HOT 2
- Can't work with `F.interpolate` HOT 2
- FLOPs for a linear layer with 3D input HOT 2
- How to work with two input or more HOT 6
- Support ViT from timm huggingface HOT 1
- Fail to install the newest version HOT 1
- failed to install HOT 2
- Question about add operation count in different case
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flops-counter.pytorch.