Comments (4)
Hi @yajiez,
Thanks for pointing this out.
I tried to remember why I used this initialization, I remember reading carefully one of the cited papers and tried to implement their scheme but I can't find it back at the moment.
From the tests I've done, initialization does not matter that much in terms of results but I agree that the current implementation differs from the original paper.
About the gain values, from the (unknown) paper, I remember that I was aiming for a standard deviation of sqrt(4/fan_in)
for GLU layers and sqrt(1/fan_in)
for other layers.
I will try to find back the paper where I got that, but you are right, we probably need to discuss about it and simply go with xavier_uniform
with gain=1
to get closer to the orignial paper.
Thanks for your feeback!
from tabnet.
Hi @Optimox,
Thanks for your reply and explanation.
All make sense and it would be great if you could please post the references here later if you can recall it.
Thanks for sharing your work.
from tabnet.
Hey @yajiez,
I know it's been a while but I found the paper!
It's from Convolutional Sequence to Sequence Learning (cited in tabnet paper)
https://arxiv.org/pdf/1705.03122.pdf
If you go to the 3.5 Initialization section I think I tried to implement this, if you see that it does not match please let me know!
Thanks
from tabnet.
Aha, this was one of my favourite papers but I did not remember their initialization details. Seems a good time to read this paper again :)
Thank you @Optimox for pointing out the reference!
from tabnet.
Related Issues (20)
- Minimal working example for TabNetRegressor/Classifier HOT 4
- Transfer learning, capability to change structure of model HOT 1
- Generate Embeddings for Tabular Data HOT 1
- TabNet overfits (help wanted, not a bug) HOT 9
- TabNetRegressor vs other networks HOT 1
- spike in memory when training ends HOT 8
- Severe overfitting HOT 18
- OOM problem when I search hyperparameters with Tabnet HOT 3
- Support for complex-valued datasets HOT 4
- Different classification variables in the test set and train set HOT 1
- Struggling to get model to fit - Help Wanted HOT 7
- Optimizing TabNet for Disease Classification with Continuous Audio Features HOT 1
- Interpreting Sparsity on Global Importance HOT 5
- ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() HOT 1
- Validation loss HOT 1
- Lightweight Fine-tunning or few-shot learning for limited labeled data HOT 1
- Maybe `drop_last` should be set as False in default? HOT 1
- Incompatiblity of current round() method with pytorch tensors when performing early stopping HOT 1
- Retraining a saved model on different dataset HOT 3
- change device seems not work HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tabnet.