Giter Site home page Giter Site logo

Comments (10)

taolei87 avatar taolei87 commented on August 23, 2024

Hi @gailysun

Do you have the log of the pretraining run? I think it would be helpful to see the exact running options and the training information.

from rcnn.

gailysun avatar gailysun commented on August 23, 2024

Hi,taolei,
The following is my pre_train log information:

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 4007)
Namespace(activation='tanh', batch_size=256, corpus='/data1/gailsun/qa/data/text_tokenized.txt', cut_off=1, depth=1, dev='', dropout=0.1, embeddings='/data1/gailsun/qa/data/vector/vectors_pruned.200.txt', heldout='/data1/gailsun/qa/data/train_random.txt', hidden_dim=200, l2_reg=1e-05, layer='rcnn', learning='adam', learning_rate=0.001, max_epoch=50, max_seq_len=100, mode=1, model='model.pkl.gz', normalize=1, order=2, outgate=0, reweight=1, test='', train='/data1/gailsun/qa/data/train_random.txt', use_anno=1, use_body=1, use_title=1)

0 empty titles ignored.
100406 pre-trained embeddings loaded.
vocab size=100410, corpus size=167765
/usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2652: VisibleDeprecationWarning: rank is deprecated; use the ndim attribute or function instead. To find the rank of a matrix see numpy.linalg.matrix_rank.
VisibleDeprecationWarning)
heldout examples=139570
2.94957613945 to create batches
num of parameters: 20503210
p_norm: ['5.773', '5.777', '8.155', '0.402', '0.393', '5.777', '5.771', '8.166', '0.415', '0.428', '0.000', '9.131']
^M0/111^M10/111^M20/111^M30/111^M40/111^M50/111^M60/111^M70/111^M80/111^M90/111^M100/111^M110/111 model saved.

from rcnn.

gailysun avatar gailysun commented on August 23, 2024

hi @taolei87 ,
Another difference is that I set THEANO_FLAGS = 'device=gpu,floatX=float64'.Will this affect the result?
The following is the current finetune log information.p_norm is always "nan", and the measurements seem not change.
Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 4007)
Namespace(activation='tanh', average=0, batch_size=40, corpus='/data1/gailsun/qa/data/text_tokenized.txt', cut_off=1, depth=1, dev='/data1/gailsun/qa/data/dev.txt', dropout=0.1, embeddings='/data1/gailsun/qa/data/vector/vectors_pruned.200.txt', hidden_dim=200, l2_reg=1e-05, layer='rcnn', learning='adam', learning_rate=0.001, load_pretrain='/data1/gailsun/qa/code/pt/model.pkl.gz.pkl.gz', max_epoch=50, max_seq_len=100, mode=1, normalize=1, order=2, outgate=0, reweight=1, save_model='model_d200_qa', test='/data1/gailsun/qa/data/test.txt', train='/data1/gailsun/qa/data/train_random.txt')

0 empty titles ignored.
100406 pre-trained embeddings loaded.
vocab size=100408, corpus size=167765
/usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2652: VisibleDeprecationWarning: rank is deprecated; use the ndim attribute or function instead. To find the rank of a matrix see numpy.linalg.matrix_rank.
VisibleDeprecationWarning)
23.4045739174 to create batches
315 batches, 35312679 tokens in total, 360602 triples in total
h_title dtype: float64
h_avg_title dtype: float64
h_final dtype: float64
num of parameters: 160400
p_norm: ['nan', 'nan', 'nan', 'nan', 'nan']
^M0/315^M10/315^M20/315^M30/315^M40/315^M50/315^M60/315^M70/315^M80/315^M90/315^M100/315^M110/315^M120/315^M130/315^M140/315^M150/315^M160/315^M170/315^M180/315^M190/315^M200/315^M210/315^M220/315^M230/315^M240/315^M250/315^M260/315^M270/315^M280/315^M290/315^M300/315^M310/315^M

Epoch 0 cost=nan loss=nan MRR=63.39,63.39 |g|=nan [58.735m]
p_norm: ['nan', 'nan', 'nan', 'nan', 'nan']

+-------+---------+---------+---------+---------+---------+---------+---------+---------+
| Epoch | dev MAP | dev MRR | dev P@1 | dev P@5 | tst MAP | tst MRR | tst P@1 | tst P@5 |
+-------+---------+---------+---------+---------+---------+---------+---------+---------+
| 0 | 44.87 | 63.39 | 51.85 | 31.01 | 42.81 | 62.98 | 53.76 | 26.99 |
+-------+---------+---------+---------+---------+---------+---------+---------+---------+
^M0/315^M10/315^M20/315^M30/315^M40/315^M50/315^M60/315^M70/315^M80/315^M90/315^M100/315^M110/315^M120/315^M130/315^M140/315^M150/315^M160/315^M170/315^M180/315^M190/315^M200/315^M210/315^M220/315^M230/315^M240/315^M250/315^M260/315^M270/315^M280/315^M290/315^M300/315^M310/315^M

Epoch 1 cost=nan loss=nan MRR=63.39,63.39 |g|=nan [58.200m]
p_norm: ['nan', 'nan', 'nan', 'nan', 'nan']

+-------+---------+---------+---------+---------+---------+---------+---------+---------+
| Epoch | dev MAP | dev MRR | dev P@1 | dev P@5 | tst MAP | tst MRR | tst P@1 | tst P@5 |
+-------+---------+---------+---------+---------+---------+---------+---------+---------+
| 0 | 44.87 | 63.39 | 51.85 | 31.01 | 42.81 | 62.98 | 53.76 | 26.99 |
+-------+---------+---------+---------+---------+---------+---------+---------+---------+
^M0/315^M10/315^M20/315^M30/315^M40/315^M50/315^M60/315^M70/315^M80/315^M90/315^M100/315^M110/315^M120/315^M130/315^M140/315^M150/315^M160/315^M170/315^M180/315^M190/315^M200/315^M210/315^M220/315

from rcnn.

taolei87 avatar taolei87 commented on August 23, 2024

The pnorm is the L2 norm of parameters. In the fine-tuning log, the pnorm is NaN right after loading the model:
num of parameters: 160400 p_norm: ['nan', 'nan', 'nan', 'nan', 'nan']

This means the pre-training is not correctly run or has some error. During pre-training, I also print out necessary information such as the pnorms (here), which seems missing from the log you showed me.

Could you attach or send me the full log of pre-training run? I see that the dev set is empty (--dev option)? The model saving code logic is inside the dev evaluation part (here).

from rcnn.

taolei87 avatar taolei87 commented on August 23, 2024

I'd better use "float32" by default. Most GPU only supports float32. Also it seems Theano doesn't support float64 in GPU mode. Here's what I found on this webpage:

You will also need to set floatX to be float32, along with your path to CUDA. Theano does not yet support float64 (it will soon), so float32 must, for now, be assigned to floatX.

from rcnn.

gailysun avatar gailysun commented on August 23, 2024

hi, @taolei87 ,
I really appreciate that you answer my question in time. Thank you very much. The following is the current pre-train log information, where the arguments are set the same as you suggest. During pre-train, it appears that p_norm is "NAN". Hope you can help. Thank you very much.

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 4007)
Namespace(activation='tanh', batch_size=256, corpus='/data1/gailsun/qa/data/text_tokenized.txt', cut_off=1, depth=1, dev='/data1/gailsun/qa/data/dev.txt', dropout=0.1, embeddings='/data1/gailsun/qa/data/vector/vectors_pruned.200.txt', heldout='/data1/gailsun/qa/data/heldout.txt', hidden_dim=400, l2_reg=1e-05, layer='rcnn', learning='adam', learning_rate=0.001, max_epoch=50, max_seq_len=100, mode=1, model='model_pt_d400', normalize=1, order=2, outgate=0, reweight=1, test='/data1/gailsun/qa/data/test.txt', train='/data1/gailsun/qa/data/train_random.txt', use_anno=1, use_body=1, use_title=1)

0 empty titles ignored.
WARNING: n_d (400) != init word vector size (200). Use 200 instead.
100406 pre-trained embeddings loaded.
vocab size=100410, corpus size=167765
/usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2652: VisibleDeprecationWarning: rank is deprecated; use the ndim attribute or function instead. To find the rank of a matrix see numpy.linalg.matrix_rank.
VisibleDeprecationWarning)
heldout examples=1989
3.02918314934 to create batches
num of parameters: 41066010
p_norm: ['8.165', '8.170', '14.155', '0.553', '0.562', '8.160', '8.164', '14.104', '0.602', '0.598', '0.000', '9.128']
^M0/732^M10/732^M20/732^M30/732^M40/732^M50/732^M60/732^M70/732^M80/732^M90/732^M100/732^M110/732^M120/732^M130/732^M140/732^M150/732^M160/732^M170/732^M180/732^M190/732^M200/732^M210/732^M220/732^M230/732^M240/732^M250/732^M260/732^M270/732^M280/732^M290/732^M300/732^M310/732^M320/732^M330/732^M340/732^M350/732^M360/732^M370/732^M380/732^M390/732^M400/732^M410/732^M420/732^M430/732^M440/732^M450/732^M460/732^M470/732^M480/732^M490/732^M500/732^M510/732^M520/732^M530/732^M540/732^M550/732^M560/732^M570/732^M580/732^M590/732^M600/732^M610/732^M620/732^M630/732^M640/732^M650/732^M660/732^M670/732^M680/732^M690/732^M700/732^M710/732^M720/732^M730/732 model saved.
^M

Epoch 0 cost=nan loss=nan nan MRR=63.39,63.39 PPL=nan |g|=nan [39.961m]
p_norm: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan']

+-------+---------+---------+---------+---------+---------+---------+---------+---------+
| Epoch | dev MAP | dev MRR | dev P@1 | dev P@5 | tst MAP | tst MRR | tst P@1 | tst P@5 |
+-------+---------+---------+---------+---------+---------+---------+---------+---------+
| 0 | 44.87 | 63.39 | 51.85 | 31.01 | 42.81 | 62.98 | 53.76 | 26.99 |
+-------+---------+---------+---------+---------+---------+---------+---------+---------+
^M0/732^M10/732^M20/732^M30/732^M40/732^M50/732^M60/732^M70/732^M80/732^M90/732^M100/732^M110/732^M120/732^M130/732^M140/732^M150/732^M160/732^M170/732^M180/732^M190/732^M200/732^M210/732^M220/732^M230/732^M240/732^M250/732^M260/732^M270/732^M280/732^M290/732^M300/732^M310/732^M320/732^M330/732^M340/732^M350/732^M360/732^M370/732^M380/732^M390/732^M400/732^M410/732^M420/732^M430/732^M440/732^M450/732^M460/732^M470/732^M480/732^M490/732^M500/732^M510/732^M520/732^M530/732^M540/732^M550/732^M560/732^M570/732^M580/732^M590/732^M600/732^M610/732^M620/732^M630/732^M640/732^M650/732^M660/732^M670/732^M680/732^M690/732^M700/732^M710/732^M720/732^M730/732^M

Epoch 1 cost=nan loss=nan nan MRR=63.39,63.39 PPL=nan |g|=nan [43.745m]
p_norm: ['nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan']

+-------+---------+---------+---------+---------+---------+---------+---------+---------+
| Epoch | dev MAP | dev MRR | dev P@1 | dev P@5 | tst MAP | tst MRR | tst P@1 | tst P@5 |
+-------+---------+---------+---------+---------+---------+---------+---------+---------+
| 0 | 44.87 | 63.39 | 51.85 | 31.01 | 42.81 | 62.98 | 53.76 | 26.99 |
+-------+---------+---------+---------+---------+---------+---------+---------+---------+

from rcnn.

taolei87 avatar taolei87 commented on August 23, 2024

Hi @gailysun

The training options look fine to me. I used to see NaN issue at some point, but it disappeared after switching the Theano version.

The version on my machine is: 0.7.0.dev-8d3a67b73fda49350d9944c9a24fc9660131861c; but I think 0.8.0 should also work.

What's your Theano version? It's a bit late in Boston time now. I can try your version on my machine later.

from rcnn.

gailysun avatar gailysun commented on August 23, 2024

Hi, @taolei87 ,
My theano version is theano 0.8.2. Thank you very much.

from rcnn.

taolei87 avatar taolei87 commented on August 23, 2024

@gailysun The error seems to come from a later commit I did on parameter initialization. See here.

Could you try changing "0.00" to "0.001" ? The NaN issue disappeared on my machine by fixing this.

from rcnn.

gailysun avatar gailysun commented on August 23, 2024

Hi @taolei87 ,
Yes, when revise the W_val as 0.001, the code can run successfully. Thank you very much.

from rcnn.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.