Thanks for developing the new CUT architecture.
I'm running on a remote server where the visdom setup doesn't work. That would be ok except it appears that the code is relying on visdom in some way to save out the training models?
Is there a way to break that link?
I'm training and can see the losses are moving along nicely, but it never saves out a model thus my training is wasted.
`dataset [UnalignedDataset] was created
model [CUTModel] was created
The number of training images = 352
Setting up a new session...
Exception in user code:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.7/site-packages/urllib3/connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/home/ubuntu/.local/lib/python3.7/site-packages/urllib3/util/connection.py", line 83, in create_connection
raise err
File "/home/ubuntu/.local/lib/python3.7/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/home/ubuntu/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 357, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/home/ubuntu/anaconda3/lib/python3.7/http/client.py", line 1252, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/ubuntu/anaconda3/lib/python3.7/http/client.py", line 1298, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/ubuntu/anaconda3/lib/python3.7/http/client.py", line 1247, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/ubuntu/anaconda3/lib/python3.7/http/client.py", line 1026, in _send_output
self.send(msg)
File "/home/ubuntu/anaconda3/lib/python3.7/http/client.py", line 966, in send
self.connect()
File "/home/ubuntu/.local/lib/python3.7/site-packages/urllib3/connection.py", line 166, in connect
conn = self._new_conn()
File "/home/ubuntu/.local/lib/python3.7/site-packages/urllib3/connection.py", line 150, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fc2c1ecfe50>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/home/ubuntu/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/ubuntu/.local/lib/python3.7/site-packages/urllib3/util/retry.py", line 388, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /env/main (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc2c1ecfe50>: Failed to establish a new connection: [Errno 111] Connection refused'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/visdom/init.py", line 711, in _send
data=json.dumps(msg),
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/visdom/init.py", line 677, in _handle_post
r = self.session.post(url, data=data)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 578, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /env/main (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc2c1ecfe50>: Failed to establish a new connection: [Errno 111] Connection refused'))
[Errno 111] Connection refused
Could not connect to Visdom server.
Trying to start a server....
Command: /home/ubuntu/anaconda3/bin/python -m visdom.server -p 8097 &>/dev/null &
create web directory ./checkpoints/nhspos_CUT/web...
[W TensorIterator.cpp:924] Warning: Mixed memory format inputs detected while calling the operator. The operator will output channels_last tensor even if some of the inputs are not in channels_last format. (function operator())
---------- Networks initialized -------------
[Network G] Total number of parameters : 11.378 M
[Network F] Total number of parameters : 0.560 M
[Network D] Total number of parameters : 2.765 M
saving the latest model (epoch 1, total_iters 50)
nhspos_CUT
(epoch: 1, iters: 100, time: 0.124, data: 0.116) G_GAN: 0.405 D_real: 0.169 D_fake: 0.213 G: 4.913 NCE: 4.447 NCE_Y: 4.570
saving the latest model (epoch 1, total_iters 100)
`
In the above I forced it to save repeatedly rather than waiting...anyway, it makes the ./checkpoints dir but neither a model nor (after letting it run a bit) are any .html or images ever saved out.
Is there a quick way to make visdom optional for those that don't have the permissions to run that on remote servers?
Thanks!