jpfeil / dilated-attention-pytorch Goto Github PK
View Code? Open in Web Editor NEWThis project forked from fkodom/dilated-attention-pytorch
(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307.02486)
License: MIT License