Giter Site home page Giter Site logo

Comments (10)

zhongkaifu avatar zhongkaifu commented on May 19, 2024

Hi @piedralaves ,

It's a bug related to weights release on some operators (only used by LSTM type models) during training. I already fixed it and you can check out this file from the repo: https://github.com/zhongkaifu/Seq2SeqSharp/blob/master/Seq2SeqSharp/Tools/ComputeGraphTensor.cs

Thanks
Zhongkai Fu

from seq2seqsharp.

piedralaves avatar piedralaves commented on May 19, 2024

Dear Zhongkai,
is it a big change?
I cannot see what you have done.
I tried to substitute ComputeGraphTensor but many errors arise.
Sorry.

from seq2seqsharp.

zhongkaifu avatar zhongkaifu commented on May 19, 2024

It's a minor change. Here is the diffs: e3de9c6#diff-4101e3779b113596c6c988619cf726c9a04d15f795e3f7ec886ae9ad96d4ec89

You can check these diffs and modify your local file.

from seq2seqsharp.

piedralaves avatar piedralaves commented on May 19, 2024

Are the changes only in ComputeGraphTensor.cs?

from seq2seqsharp.

zhongkaifu avatar zhongkaifu commented on May 19, 2024

Yes, only in CompteGraphTensor.cs

from seq2seqsharp.

piedralaves avatar piedralaves commented on May 19, 2024

Thanks.

from seq2seqsharp.

clm33 avatar clm33 commented on May 19, 2024

The code that you posted solved the problem except for the condition in which the encoder is BiLSTM and the Decoder is Transformer. The other combinations work perfectly fine, so we where wondering whether you could know what may be the issue with this encoder as BiLSTM and Decoder as Transformer condition. The error that arises is the following:

Exception: 'Output tensor must have the same number of elements as the input. Size = 3720 300 , New Size = 186 20 600 '

I attach the log so that you can explore it if you want to.

I have searched through the posts to find whether someone had the same issue and found one in which the problem was that the "MultiHeadNum" should be divisible by the "HiddenSize" parameter, but that has not solved the issue.

Again, we would appreciate a lot if you could shed some light on this issue.

Seq2SeqConsole_Train_2023_02_08_12h_04m_26s.log

from seq2seqsharp.

zhongkaifu avatar zhongkaifu commented on May 19, 2024

Hi @clm33 ,

The reason is that BiLSTM output concatenates hidden layer (forward and backward) on the top of the network, so its dimension becomes "2 * hidden_dim" which is different value with the dim at decoder, and then decoder failed.

I just made a check-in that changing BiLSTM output from "concatenates" to "add" mode, so "BiLSTM + Transformer" is working now. Let me know if you have any questions.

Thanks
Zhongkai Fu

from seq2seqsharp.

piedralaves avatar piedralaves commented on May 19, 2024

Hi Zhongkai:
Could you specify the change, please?

from seq2seqsharp.

zhongkaifu avatar zhongkaifu commented on May 19, 2024

It's all in this commit: 7723cc1

from seq2seqsharp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.