The implementation of "X-TF-GridNet: A Time-Frequency Domain Target Speaker Extraction Network with Adaptive Speaker Embedding Fusion", which is submitted to Information Fusion.
Hi, dear, author, really thanks for sharing the result of this paper, it is an exciting work. I click this github page many times everyday to see if the code is updated.... I am confusing about Adaptive Embedding Aggregation Fusion module, in formula 13, as my understanding, Q is fixed embedding, it is one dimensional, K is reshaped into two dimensional, then how can them multiply together?