Comments (5)
Couldn't find trained model at /home/admin1/exp/projects/chatbot-retrieval/runs/1471797260
is a strange error. I'm not sure why it would look for a trained model.
Try removing the EvaluationMonitor, or try giving a first_n_steps=-1
argument to the monitor and see if that solves it.
from chatbot-retrieval.
removing EvaluationMonitor and use estimator as:
estimator.fit(input_fn=input_fn_train, steps=None, monitors=[])
solve the problem:
$ python udc_train.py
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
WARNING:tensorflow:Setting feature info to {'utterance': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(160)]), is_sparse=False), 'utterance_len': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(1)]), is_sparse=False), 'context': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(160)]), is_sparse=False), 'context_len': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(1)]), is_sparse=False)}
WARNING:tensorflow:Setting targets info to TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(1)]), is_sparse=False)
INFO:tensorflow:No glove/vocab path specificed, starting with random embeddings.
INFO:tensorflow:Create CheckpointSaver
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:02:00.0
Total memory: 12.00GiB
Free memory: 11.87GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 5858 get requests, put_count=3526 evicted_count=1000 eviction_rate=0.283607 and unsatisfied allocation rate=0.585865
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:Step 1: loss = 4.48393
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1019 get requests, put_count=2032 evicted_count=1000 eviction_rate=0.492126 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1020 get requests, put_count=2037 evicted_count=1000 eviction_rate=0.490918 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1020 get requests, put_count=2043 evicted_count=1000 eviction_rate=0.489476 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1910 get requests, put_count=3940 evicted_count=2000 eviction_rate=0.507614 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 7516 get requests, put_count=7598 evicted_count=3000 eviction_rate=0.394841 and unsatisfied allocation rate=0.39356
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 449 to 493
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1059 get requests, put_count=2118 evicted_count=1000 eviction_rate=0.472144 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 7976 get requests, put_count=8510 evicted_count=3000 eviction_rate=0.352526 and unsatisfied allocation rate=0.319082
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 871 to 958
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 621 get requests, put_count=1748 evicted_count=1000 eviction_rate=0.572082 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 7592 get requests, put_count=8161 evicted_count=1000 eviction_rate=0.122534 and unsatisfied allocation rate=0.0893045
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 2725 to 2997
INFO:tensorflow:Step 101: loss = 0.95188
INFO:tensorflow:Step 201: loss = 0.731247
INFO:tensorflow:Saving checkpoints for 300 into /home/admin1/exp/projects/chatbot-retrieval/runs/1471802620/model.ckpt.
INFO:tensorflow:Step 301: loss = 0.674725
INFO:tensorflow:Step 401: loss = 0.71134
INFO:tensorflow:Step 501: loss = 0.690085
INFO:tensorflow:Saving checkpoints for 600 into /home/admin1/exp/projects/chatbot-retrieval/runs/1471802620/model.ckpt.
INFO:tensorflow:Step 601: loss = 0.685092
INFO:tensorflow:Step 701: loss = 0.686206
INFO:tensorflow:Step 801: loss = 0.685859
INFO:tensorflow:Saving checkpoints for 900 into /home/admin1/exp/projects/chatbot-retrieval/runs/1471802620/model.ckpt.
INFO:tensorflow:Step 901: loss = 0.708556
INFO:tensorflow:Step 1001: loss = 0.670678
INFO:tensorflow:Step 1101: loss = 0.658413
INFO:tensorflow:Saving checkpoints for 1200 into /home/admin1/exp/projects/chatbot-retrieval/runs/1471802620/model.ckpt.
INFO:tensorflow:Step 1201: loss = 0.652786
INFO:tensorflow:Step 1301: loss = 0.650005
EvaluationMonitor
has a comment, since I think it maybe a problem of new version of tensorflow. Logging and Monitoring Basics maybe helpful.
from chatbot-retrieval.
Try passing first_n_steps=-1
from chatbot-retrieval.
passing first_n_steps=-1
as follows:
eval_monitor = EvaluationMonitor(every_n_steps=FLAGS.eval_every, first_n_steps=-1)
also solve the problem:
$ python udc_train.py
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
WARNING:tensorflow:Setting feature info to {'context_len': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(1)]), is_sparse=False), 'utterance_len': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(1)]), is_sparse=False), 'context': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(160)]), is_sparse=False), 'utterance': TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(160)]), is_sparse=False)}
WARNING:tensorflow:Setting targets info to TensorSignature(dtype=tf.int64, shape=TensorShape([Dimension(128), Dimension(1)]), is_sparse=False)
INFO:tensorflow:No glove/vocab path specificed, starting with random embeddings.
INFO:tensorflow:Create CheckpointSaver
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:02:00.0
Total memory: 12.00GiB
Free memory: 11.87GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 5878 get requests, put_count=3567 evicted_count=1000 eviction_rate=0.280348 and unsatisfied allocation rate=0.580299
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:Step 1: loss = 4.48393
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1016 get requests, put_count=2029 evicted_count=1000 eviction_rate=0.492854 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1016 get requests, put_count=2033 evicted_count=1000 eviction_rate=0.491884 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1020 get requests, put_count=2043 evicted_count=1000 eviction_rate=0.489476 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1765 get requests, put_count=3795 evicted_count=2000 eviction_rate=0.527009 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 7500 get requests, put_count=7573 evicted_count=3000 eviction_rate=0.396144 and unsatisfied allocation rate=0.3956
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 449 to 493
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1060 get requests, put_count=2119 evicted_count=1000 eviction_rate=0.471921 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1085 get requests, put_count=2172 evicted_count=1000 eviction_rate=0.460405 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 458 get requests, put_count=1585 evicted_count=1000 eviction_rate=0.630915 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 14841 get requests, put_count=14728 evicted_count=1000 eviction_rate=0.0678979 and unsatisfied allocation rate=0.091638
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 2725 to 2997
INFO:tensorflow:Step 101: loss = 1.27205
INFO:tensorflow:Step 201: loss = 0.845954
INFO:tensorflow:Saving checkpoints for 300 into /home/admin1/exp/projects/chatbot-retrieval/runs/1471804077/model.ckpt.
INFO:tensorflow:Step 301: loss = 0.728196
INFO:tensorflow:Step 401: loss = 0.725865
from chatbot-retrieval.
Fixed this in the code, thanks for reporting it!
from chatbot-retrieval.
Related Issues (20)
- any code for tensoflow > 2.0
- 这个代码是否可以用于中文状态下的封闭域问答? HOT 6
- How to solve AttributeError: module 'tensorflow.contrib.learn' has no attribute 'estimators' HOT 4
- Derive actual response from the probability? Just wondering how do I generate actual response from this model? HOT 1
- Gettin error while running idc_train.py HOT 3
- How to select candidate answers when predict HOT 1
- How to Deal with Context of multiple column ?
- How can I export/serve this model using saved_model_cli ?
- How to stops training after specied number of steps? HOT 1
- InvalidArgumentError (see above for traceback): indices[24,12] = 135816 is not in [0, 91620)
- InvalidArgumentError: Name: <unknown>, Feature: distractor_1 (data type: int64) is required but could not be found. [[{{node read_batch_features_eval/ParseExample/ParseExample}}]]
- InvalidArgumentError (see above for traceback): indices[7,16] = 99296 is not in [0, 91620)
- ValueError: Shapes (10, ?, 160) and () are incompatible
- Incompatible shapes: [20,1] vs. [80,1] HOT 2
- UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence HOT 1
- The question about Tensorflow about Incompatible shapes: [730,5] vs. [30,5]
- udc_test.py出错
- Data missing from drive
- any examples of chatbot conversation?
- InvalidArgumentError: Incompatible shapes: [128,14,14,16] vs. [8] [[{{node max_unpooling2d_4/max_unpooling2d_4/mul_4}}]] [[{{node Mean_1}}]] HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chatbot-retrieval.