model = models.resnet18(pretrained=True)
dummy_input = torch.zeros(64, 3, 224, 224) # Tensor shape is that of a standard input for the given model
simplified_model = simplify(model, dummy_input)
def run_model(model, dummy_input):
start = time.time()
for _ in tqdm(range(100)):
model(dummy_input)
end = time.time()
print("Time taken: ", end - start)
print("Original model")
run_model(model, dummy_input)
print("Simplified model")
run_model(simplified_model, dummy_input)
Original model
100%|██████████| 100/100 [01:37<00:00, 1.03it/s]
Time taken: 97.2786557674408
Simplified model
100%|██████████| 100/100 [01:39<00:00, 1.01it/s]
Time taken: 99.11441326141357
It seems there is no acceleration in inference. Maybe there are not zero channels to be pruned in the pre-trained model.
I have a question about how to speed up the inference time.
Should I use the prune.ln_structured in torch.nn.utils.prune to prune the pre-trained model at first?
I think this is a good project to do the following work behind the torch prune.
Can you provide an entire example for accelerated inference?