Comments (10)
Hi @piedralaves ,
It's a bug related to weights release on some operators (only used by LSTM type models) during training. I already fixed it and you can check out this file from the repo: https://github.com/zhongkaifu/Seq2SeqSharp/blob/master/Seq2SeqSharp/Tools/ComputeGraphTensor.cs
Thanks
Zhongkai Fu
from seq2seqsharp.
Dear Zhongkai,
is it a big change?
I cannot see what you have done.
I tried to substitute ComputeGraphTensor but many errors arise.
Sorry.
from seq2seqsharp.
It's a minor change. Here is the diffs: e3de9c6#diff-4101e3779b113596c6c988619cf726c9a04d15f795e3f7ec886ae9ad96d4ec89
You can check these diffs and modify your local file.
from seq2seqsharp.
Are the changes only in ComputeGraphTensor.cs?
from seq2seqsharp.
Yes, only in CompteGraphTensor.cs
from seq2seqsharp.
Thanks.
from seq2seqsharp.
The code that you posted solved the problem except for the condition in which the encoder is BiLSTM and the Decoder is Transformer. The other combinations work perfectly fine, so we where wondering whether you could know what may be the issue with this encoder as BiLSTM and Decoder as Transformer condition. The error that arises is the following:
Exception: 'Output tensor must have the same number of elements as the input. Size = 3720 300 , New Size = 186 20 600 '
I attach the log so that you can explore it if you want to.
I have searched through the posts to find whether someone had the same issue and found one in which the problem was that the "MultiHeadNum" should be divisible by the "HiddenSize" parameter, but that has not solved the issue.
Again, we would appreciate a lot if you could shed some light on this issue.
Seq2SeqConsole_Train_2023_02_08_12h_04m_26s.log
from seq2seqsharp.
Hi @clm33 ,
The reason is that BiLSTM output concatenates hidden layer (forward and backward) on the top of the network, so its dimension becomes "2 * hidden_dim" which is different value with the dim at decoder, and then decoder failed.
I just made a check-in that changing BiLSTM output from "concatenates" to "add" mode, so "BiLSTM + Transformer" is working now. Let me know if you have any questions.
Thanks
Zhongkai Fu
from seq2seqsharp.
Hi Zhongkai:
Could you specify the change, please?
from seq2seqsharp.
It's all in this commit: 7723cc1
from seq2seqsharp.
Related Issues (20)
- Didn't save the model? HOT 7
- Error: C# 8.0 language feature HOT 1
- sentencepiece.dll problem in the API HOT 2
- SeqClassification Validation HOT 16
- CPU_MKL Error converting value "CPU_MKL" to type 'Seq2SeqSharp.ProcessorTypeEnums HOT 6
- sqc.m_srcEmbedding_p.GetNetworkOnDevice(k).GetWeightAt() HOT 1
- GPTconsole HOT 4
- Target vocabulary size fixed to 45000 HOT 5
- Contextual embeddings HOT 22
- Train with general sequences of symbols HOT 2
- Moment of updating weights HOT 4
- Issues to get started with "Seq2SeqClassificationConsole" HOT 33
- Matrix initialization method HOT 4
- No requirement.txt in this repo HOT 2
- Sudden high increase in memory consumption while training a seq2seq model and validation happens HOT 11
- Setting FocalLossGamma = 2 causes weight corruption in the beginning of the seq2seq model training
- The checkpoint to save the model regularly should not depend on validation HOT 1
- Serialization of Seq2seq model is wrong
- SeqLabel model backward compatibility is broken by latest update HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seq2seqsharp.