After running the train on a virtual environment using python 3.5 and <a href="

passing first_n_steps=-1 as follows: <code class=

problem with tensorflow 0.10 about chatbot-retrieval HOT 5 CLOSED

dennybritz commented on June 3, 2024

problem with tensorflow 0.10

from chatbot-retrieval.

Comments (5)

dennybritz commented on June 3, 2024

Couldn't find trained model at /home/admin1/exp/projects/chatbot-retrieval/runs/1471797260 is a strange error. I'm not sure why it would look for a trained model.

Try removing the EvaluationMonitor, or try giving a first_n_steps=-1 argument to the monitor and see if that solves it.

from chatbot-retrieval.

amirj commented on June 3, 2024

removing EvaluationMonitor and use estimator as:
estimator.fit(input_fn=input_fn_train, steps=None, monitors=[])
solve the problem:

$ python udc_train.py
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
WARNING:tensorflow:Setting feature info to {'utterance': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(160)]), is_sparse=False), 'utterance_len': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(1)]), is_sparse=False), 'context': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(160)]), is_sparse=False), 'context_len': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(1)]), is_sparse=False)}
WARNING:tensorflow:Setting targets info to TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(1)]), is_sparse=False)
INFO:tensorflow:No glove/vocab path specificed, starting with random embeddings.
INFO:tensorflow:Create CheckpointSaver
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:02:00.0
Total memory: 12.00GiB
Free memory: 11.87GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 5858 get requests, put_count=3526 evicted_count=1000 eviction_rate=0.283607 and unsatisfied allocation rate=0.585865
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:Step 1: loss = 4.48393
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1019 get requests, put_count=2032 evicted_count=1000 eviction_rate=0.492126 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1020 get requests, put_count=2037 evicted_count=1000 eviction_rate=0.490918 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1020 get requests, put_count=2043 evicted_count=1000 eviction_rate=0.489476 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1910 get requests, put_count=3940 evicted_count=2000 eviction_rate=0.507614 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 7516 get requests, put_count=7598 evicted_count=3000 eviction_rate=0.394841 and unsatisfied allocation rate=0.39356
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 449 to 493
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1059 get requests, put_count=2118 evicted_count=1000 eviction_rate=0.472144 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 7976 get requests, put_count=8510 evicted_count=3000 eviction_rate=0.352526 and unsatisfied allocation rate=0.319082
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 871 to 958
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 621 get requests, put_count=1748 evicted_count=1000 eviction_rate=0.572082 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 7592 get requests, put_count=8161 evicted_count=1000 eviction_rate=0.122534 and unsatisfied allocation rate=0.0893045
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 2725 to 2997
INFO:tensorflow:Step 101: loss = 0.95188
INFO:tensorflow:Step 201: loss = 0.731247
INFO:tensorflow:Saving checkpoints for 300 into /home/admin1/exp/projects/chatbot-retrieval/runs/1471802620/model.ckpt.
INFO:tensorflow:Step 301: loss = 0.674725
INFO:tensorflow:Step 401: loss = 0.71134
INFO:tensorflow:Step 501: loss = 0.690085
INFO:tensorflow:Saving checkpoints for 600 into /home/admin1/exp/projects/chatbot-retrieval/runs/1471802620/model.ckpt.
INFO:tensorflow:Step 601: loss = 0.685092
INFO:tensorflow:Step 701: loss = 0.686206
INFO:tensorflow:Step 801: loss = 0.685859
INFO:tensorflow:Saving checkpoints for 900 into /home/admin1/exp/projects/chatbot-retrieval/runs/1471802620/model.ckpt.
INFO:tensorflow:Step 901: loss = 0.708556
INFO:tensorflow:Step 1001: loss = 0.670678
INFO:tensorflow:Step 1101: loss = 0.658413
INFO:tensorflow:Saving checkpoints for 1200 into /home/admin1/exp/projects/chatbot-retrieval/runs/1471802620/model.ckpt.
INFO:tensorflow:Step 1201: loss = 0.652786
INFO:tensorflow:Step 1301: loss = 0.650005

EvaluationMonitor has a comment, since I think it maybe a problem of new version of tensorflow. Logging and Monitoring Basics maybe helpful.

from chatbot-retrieval.

dennybritz commented on June 3, 2024

Try passing first_n_steps=-1

from chatbot-retrieval.

amirj commented on June 3, 2024

passing first_n_steps=-1 as follows:
eval_monitor = EvaluationMonitor(every_n_steps=FLAGS.eval_every, first_n_steps=-1)
also solve the problem:

$ python udc_train.py
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
WARNING:tensorflow:Setting feature info to {'context_len': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(1)]), is_sparse=False), 'utterance_len': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(1)]), is_sparse=False), 'context': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(160)]), is_sparse=False), 'utterance': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(160)]), is_sparse=False)}
WARNING:tensorflow:Setting targets info to TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(1)]), is_sparse=False)
INFO:tensorflow:No glove/vocab path specificed, starting with random embeddings.
INFO:tensorflow:Create CheckpointSaver
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:02:00.0
Total memory: 12.00GiB
Free memory: 11.87GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 5878 get requests, put_count=3567 evicted_count=1000 eviction_rate=0.280348 and unsatisfied allocation rate=0.580299
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:Step 1: loss = 4.48393
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1016 get requests, put_count=2029 evicted_count=1000 eviction_rate=0.492854 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1016 get requests, put_count=2033 evicted_count=1000 eviction_rate=0.491884 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1020 get requests, put_count=2043 evicted_count=1000 eviction_rate=0.489476 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1765 get requests, put_count=3795 evicted_count=2000 eviction_rate=0.527009 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 7500 get requests, put_count=7573 evicted_count=3000 eviction_rate=0.396144 and unsatisfied allocation rate=0.3956
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 449 to 493
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1060 get requests, put_count=2119 evicted_count=1000 eviction_rate=0.471921 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1085 get requests, put_count=2172 evicted_count=1000 eviction_rate=0.460405 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 458 get requests, put_count=1585 evicted_count=1000 eviction_rate=0.630915 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 14841 get requests, put_count=14728 evicted_count=1000 eviction_rate=0.0678979 and unsatisfied allocation rate=0.091638
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 2725 to 2997
INFO:tensorflow:Step 101: loss = 1.27205
INFO:tensorflow:Step 201: loss = 0.845954
INFO:tensorflow:Saving checkpoints for 300 into /home/admin1/exp/projects/chatbot-retrieval/runs/1471804077/model.ckpt.
INFO:tensorflow:Step 301: loss = 0.728196
INFO:tensorflow:Step 401: loss = 0.725865

from chatbot-retrieval.

dennybritz commented on June 3, 2024

Fixed this in the code, thanks for reporting it!

from chatbot-retrieval.

problem with tensorflow 0.10 about chatbot-retrieval HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent