Code for Learned Thresholds Token Merging and Pruning for Vision Transformers (LTMP). A technique to reduce the size of Vision Transformers to any desired size with minimal loss of accuracy.
你好,请问一下论文中实现”学习阈值以实现合并与剪枝“的功能的核心代码是哪部分呢?Hello, may I ask which part of the core code in the paper implements the function of "learning threshold to achieve merging and pruning"?
Hi, I have a question about implementation details regarding learned threshold merging.
In this line of code, you detach the generated mask which still has a gradient flow by straight through trick.
In my understanding, still, the threshold can still be learned by flop loss. Is there any other reason for using a stop gradient in the mask applying to the features? Can it make models learn hard if no stop gradient is applied?