ch3njust1n / smpl Goto Github PK

Simultaneous Multi-Party Learning Framework

Python 97.98% Shell 2.02%

distributed-deep-learning gradient-descent hypergraph deep-learning deep-neural-networks artificial-neural-networks artificial-intelligence distributed-systems hgsgd hypergraph-sgd

smpl's Introduction

senior software engineer

On my free time, I like reading research papers and comic books, and working on coding projects.

Languages and Tools:

smpl's People

Contributors

Watchers

smpl's Issues

KeyError: u'sess961626070984585'

Process Process-2:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 204, in __receive
    resp = self.__route({"addr": addr, "length": expected, "content": json.loads(data)})
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 255, in __route
    return self.__share_grad(*args)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 442, in __share_grad
    self.gradients[sess_id]['peers'].append(sender)
  File "<string>", line 2, in __getitem__
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/managers.py", line 774, in _callmethod
    raise convert_to_error(kind, result)
KeyError: u'sess961626070984585'

Implement inference/holdout test

DataServer should distribute testset and peers should have an option for a inference mode

Redis Publish/Subscribe

Python Redis Publish/Subscribe mechanism to make updates async
Redis pub/sub docs
Redis docs

Implement DataServer

Implement DataServer (DS) on smpl-super. DS will control experiments, whether the data should be shuffled, track dataset partitions being distributed, and initiate all 32 nodes.

pc.send() type conversion bug

pc.send() invalid literal for int() with base 10: ''

Update dataproc.py to use ujson instead of json

Process Process-3:
Traceback (most recent call last):
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/smpl/distributed/parameter_server.py", line 305, in __train_hyperedge
self.__train(sess_id)
File "/home/ubuntu/smpl/distributed/parameter_server.py", line 390, in __train
self.__allreduce(sess_id, multistep, share[sess_id]['train_size'])
File "/home/ubuntu/smpl/distributed/parameter_server.py", line 427, in __allreduce
share_count = ujson.loads(self.cache.get(sess_id))['share_count']
KeyError: 'share_count'

TypeError: int() argument must be a string or a number, not 'NoneType'

Process Process-2:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 201, in __receive
    resp = self.__route({"addr": addr, "length": expected, "content": ujson.loads(data)})
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 243, in __route
    return self.__establish_session(*args)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 688, in __establish_session
    if int(self.cache.get('edge')) == self.max:
TypeError: int() argument must be a string or a number, not 'NoneType'

AttributeError: 'NoneType' object has no attribute 'debug'

Process Process-2:1:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 331, in __train
    log.debug('ps.train() sess_id:{}'.format(sess_id))
AttributeError: 'NoneType' object has no attribute 'debug'

AttributeError: 'ParameterChannel' object has no attribute 'remove'

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 177, in receive
    self.pc.remove('{}:{}'.format(addr[0], addr[1]))
AttributeError: 'ParameterChannel' object has no attribute 'remove'

AttributeError: 'NoneType' object has no attribute 'dim'

Process Process-4:9:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/train.py", line 68, in train
    loss = self.network.loss(self.network(data), target)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/nn/functional.py", line 1048, in nll_loss
    dim = input.dim()
AttributeError: 'NoneType' object has no attribute 'dim'

Clean logs

Remove all logging statements except ones identifying function name at top of function

RuntimeError: size mismatch, m1: [16 x 784], m2: [2 x 1]

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/train.py", line 69, in train
    loss = self.network.loss(self.network(data), target)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/smpl/distributed/model/network.py", line 228, in forward
    x = F.sigmoid(self.fc1(x))
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/nn/modules/linear.py", line 55, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/nn/functional.py", line 835, in linear
    return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [16 x 784], m2: [2 x 1] at /opt/conda/conda-bld/pytorch-cpu_1518280788456/work/torch/lib/TH/generic/THTensorMath.c:1434

RuntimeError: sub() received an invalid combination of arguments

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 277, in train_hyperedge
    self.train(sess_id)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 344, in train
    multistep = nn.multistep_grad(sess['parameters'])
  File "/home/ubuntu/smpl/distributed/model/network.py", line 119, in multistep_grad
    return [b-FloatTensor(a) for (b, a) in zip(self.parameters(), network)]
RuntimeError: sub() received an invalid combination of arguments - got (torch.FloatTensor), but expected one of:
 * (float other, float alpha)
 * (Variable other, float alpha)

Refactor parameter server

Make a super Server class and have ParameterServer extend it

ValueError: Expecting property name: line 1 column 2 (char 1)

Traceback (most recent call last):
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/smpl/distributed/parameter_server.py", line 277, in train_hyperedge
self.train(sess_id)
File "/home/ubuntu/smpl/distributed/parameter_server.py", line 318, in train
sess = json.loads(getSess)#self.cache.get(sess_id))
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/json/init.py", line 339, in loads
return _default_decoder.decode(s)
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/json/decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)

KeyError: '192.168.0.11:9888'

[network.py:154 -       multistep_grad()] 2018-04-22 01:39:45,547 - root - DEBUG - MSG 2
[parameter_server.py:428 -          __allreduce()] 2018-04-22 01:39:45,548 - root - DEBUG - ps.__allreduce sess_id:sess144468
[parameter_channel.py:120 -                 send()] 2018-04-22 01:39:45,549 - root - ERROR - '192.168.0.11:9888'
Traceback (most recent call last):
  File "/home/ubuntu/smpl/distributed/parameter_channel.py", line 94, in send
    sock = self.connections[addr]
KeyError: '192.168.0.11:9888'

ParameterChannel Connections Dict KeyError

ERROR:root:'192.168.0.12:9888'
Traceback (most recent call last):
  File "/home/ubuntu/smpl/distributed/parameter_channel.py", line 107, in send
    sock = self.connections[addr]
KeyError: '192.168.0.12:9888'

Error 111 connecting to localhost:6379. Connection refused.

Traceback (most recent call last):
  File "./smpl.py", line 130, in <module>
    main()
  File "./smpl.py", line 69, in main
    ps = ParameterServer(args)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 94, in __init__
    self.flush()
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 775, in flush
    self.cache.flushall()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/redis/client.py", line 743, in flushall
    return self.execute_command('FLUSHALL')
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/redis/client.py", line 673, in execute_command
    connection.send_command(*args)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/redis/connection.py", line 610, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/redis/connection.py", line 585, in send_packed_command
    self.connect()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/redis/connection.py", line 489, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to localhost:6379. Connection refused.

Implement Redis distributed locking

Distributed locking

Pass log to Train

Pass log to train
Test code to perturb parameters in Train() by adding 1
- Difference between previous and current should be all 1s
- After share_grad(), summed grads should be all 3s

KeyError: 'log'

Process Process-2:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 201, in __receive
    resp = self.__route({"addr": addr, "length": expected, "content": ujson.loads(data)})
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 251, in __route
    return self.__share_grad(*args)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 458, in __share_grad
    log_name = sess["log"]
KeyError: 'log'

RuntimeError: manual_seed expected a long, but got bool

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 303, in __train_hyperedge
    self.__train(sess_id, log)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 402, in __train
    Train(conf).validate()
  File "/home/ubuntu/smpl/distributed/train.py", line 32, in __init__
    super(Train, self).__init__(*config)
  File "/home/ubuntu/smpl/distributed/trainer.py", line 125, in __init__
    super(DevTrainer, self).__init__(data, network, sess_id, share, batch_size, cuda, drop_last, seed, shuffle)
  File "/home/ubuntu/smpl/distributed/trainer.py", line 103, in __init__
    super(DistributedTrainer, self).__init__(batch_size, cuda, data, drop_last, network, shuffle, seed)
  File "/home/ubuntu/smpl/distributed/trainer.py", line 39, in __init__
    manual_seed(self.seed)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/random.py", line 34, in manual_seed
    return default_generator.manual_seed(seed)
RuntimeError: manual_seed expected a long, but got bool

AttributeError: 'NoneType' object has no attribute 'data'

Process Process-1:1:9:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/model/network.py", line 215, in add_coordinates
    params[index].grad.data += grads
AttributeError: 'NoneType' object has no attribute 'data'

AttributeError: 'module' object has no attribute 'SMPLData'

AttributeError: 'module' object has no attribute 'SMPLData'

TypeError: string indices must be integers, not str

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 243, in train_hyperedge
    connected, sess_id = self.init_session()
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 449, in init_session
    self.cache.set(sess_id, json.dumps({"parameters": model['parameters'], "accuracy": model['accuracy'],
TypeError: string indices must be integers, not str
time (second):  11.4726130962
Process Process-6:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 243, in train_hyperedge
    connected, sess_id = self.init_session()
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 449, in init_session
    self.cache.set(sess_id, json.dumps({"parameters": model['parameters'], "accuracy": model['accuracy'],
TypeError: string indices must be integers, not str
Process Process-6:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 243, in train_hyperedge
    connected, sess_id = self.init_session()
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 449, in init_session
    self.cache.set(sess_id, json.dumps({"parameters": model['parameters'], "accuracy": model['accuracy'],
TypeError: string indices must be integers, not str

RuntimeError: manual_seed expected a long, but got bool

Process Process-3:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 266, in train_hyperedge
    self.train(sess_id)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 316, in train
    p = Process(target=Train(conf).train)
  File "/home/ubuntu/smpl/distributed/train.py", line 36, in __init__
    super(Train, self).__init__(*config)
  File "/home/ubuntu/smpl/distributed/trainer.py", line 129, in __init__
    shuffle, seed)
  File "/home/ubuntu/smpl/distributed/trainer.py", line 115, in __init__
    network, shuffle, seed)
  File "/home/ubuntu/smpl/distributed/trainer.py", line 50, in __init__
    manual_seed(self.seed)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/random.py", line 34, in manual_seed
    return default_generator.manual_seed(seed)
RuntimeError: manual_seed expected a long, but got bool

Redis Pipelineing

Look into Redis Pipelines to improve performance
Redis pipelining
Redis docs

Implement AlexNet

AlexNet without pretrained weights

Implement SGD mode

Baseline SGD mode on local

NameError: global name 'logger' is not defined

Process Process-3:
Traceback (most recent call last):
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/smpl/distributed/parameter_server.py", line 272, in train_hyperedge
self.train(sess_id)
File "/home/ubuntu/smpl/distributed/parameter_server.py", line 322, in train
p = Process(target=Train(conf).train)
File "/home/ubuntu/smpl/distributed/train.py", line 36, in init
super(Train, self).init(*config)
File "/home/ubuntu/smpl/distributed/trainer.py", line 129, in init
shuffle, seed)
File "/home/ubuntu/smpl/distributed/trainer.py", line 114, in init
super(DistributedTrainer, self).init(batch_size, cuda, data, drop_last, logger,
NameError: global name 'logger' is not defined

undefined symbol: nvrtcGetProgramLogSize

Traceback (most recent call last):
  File "./smpl.py", line 13, in <module>
    from distributed.parameter_server import ParameterServer
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 16, in <module>
    from train import Train
  File "/home/ubuntu/smpl/distributed/train.py", line 11, in <module>
    import os, utils
  File "/home/ubuntu/smpl/distributed/utils.py", line 7, in <module>
    import os, pickle, datetime, ujson, codecs, glob, torch, math, logging
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/__init__.py", line 56, in <module>
    from torch._C import *
ImportError: /home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/_C.so: undefined symbol: nvrtcGetProgramLogSize

KeyError: 'parameters'

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 356, in __train
    nn.update_parameters(sess['parameters'])
KeyError: 'parameters'

Wrong network in ps.train()

Should be using DevNeuron instead. Need to change in ps.train().

Redis parsing

Look into HiRedis Parser for more efficient parsing of redis replies
Redis docs

Implement Adagrad manually

Refer to page 5 of Large Scale Distributed Deep Networks

ps.AllReduce deadlock

Need to send gradients to peers at the very beginning of ps.allreduce(). Currently, all peers wait without notifying other peers which causes deadlock.

Clean up self.logger logs

Too messy can't read anything.

Set appropriate logging levels. Be conservative. Only log what is necessary.
Remove unnecessary log lines

AttributeError: cannot assign module before Module.init() call

Traceback (most recent call last):
  File "./smpl.py", line 125, in <module>
    main()
  File "./smpl.py", line 64, in main
    ps = ParameterServer(args)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 75, in __init__
    "parameters": [x.data.tolist() for x in net.DevNet().parameters()]}))
  File "/home/ubuntu/smpl/distributed/model/network.py", line 221, in __init__
    self.fc1 = nn.Linear(784, 10)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/nn/modules/module.py", line 424, in __setattr__
    "cannot assign module before Module.__init__() call")
AttributeError: cannot assign module before Module.__init__() call

AttributeError: 'ParameterServer' object has no attribute 'pc'

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 183, in __receive
    self.pc.remove('{}:{}'.format(addr[0], addr[1]))
AttributeError: 'ParameterServer' object has no attribute 'pc'

Optimize Redis memory usage

Memory optimization documentation

Optimize SMPL

Enumerate all debugs and make schedule for finishing

Place timers everywhere
Log timers

TypeError: object of type 'bool' has no len()

Process Process-2:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 200, in __receive
    conn.sendall(self.__format_msg(resp))
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 218, in __format_msg
    return ''.join([str(len(msg)), '::', ujson.dumps(msg)])
TypeError: object of type 'bool' has no len()

TypeError: 'NoneType' object is not iterable

Process Process-1:1:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 389, in __train
    sess["accuracy"] = Train(conf).validate()
  File "/home/ubuntu/smpl/distributed/trainer.py", line 67, in validate
    for data, target in self.val_loader:
TypeError: 'NoneType' object is not iterable

TypeError: object of type 'bool' has no len()

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 206, in __receive
    conn.sendall(self.__format_msg(resp))
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 224, in __format_msg
    return ''.join([str(len(msg)), '::', ujson.dumps(msg)])
TypeError: object of type 'bool' has no len()

ps.__synchronize_parameters() pc.send() error

Error: pc.send() "{'api': 'synchronize_parameters', 'args': ['sess246', {u'id': 2, u'alias': u'smpl-2', u'host': u'192.168.0.11', u'port': 9888, u'accuracy': 0.0}, [{u'id': 2, u'alias': u'smpl-2', u'host': u'192.168.0.11', u'port': 9888, u'accuracy': 0.0}]]}:192.168.0.11"

TypeError: unhashable type: 'list'

Process Process-2:1:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 402, in __train
    nn.add_batched_coordinates(sess['gradients'], sess['samples'])
  File "/home/ubuntu/smpl/distributed/model/network.py", line 241, in add_batched_coordinates
    if layer in params_coords:
TypeError: unhashable type: 'list'

matplotlib backend import error

This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

The backend was *originally* set to 'Qt5Agg' by the following code:
  File "smpl.py", line 17, in <module>
    from distributed.parameter_server import ParameterServer
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 24, in <module>
    import os, torch, json, redis, socket, logging, utils, test, train, session, data
  File "data/data.py", line 13, in <module>
    import matplotlib.pyplot as plt
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/pyplot.py", line 71, in <module>
    from matplotlib.backends import pylab_setup
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/__init__.py", line 16, in <module>
    line for line in traceback.format_stack()
This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

The backend was *originally* set to 'Qt5Agg' by the following code:
  File "smpl.py", line 17, in <module>
    from distributed.parameter_server import ParameterServer
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 24, in <module>
    import os, torch, json, redis, socket, logging, utils, test, train, session, data
  File "data/data.py", line 13, in <module>
    import matplotlib.pyplot as plt
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/pyplot.py", line 71, in <module>
    from matplotlib.backends import pylab_setup
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/__init__.py", line 16, in <module>
    line for line in traceback.format_stack()

ImportError: libGL.so.1: cannot open shared object file: No such file or directory

Traceback (most recent call last):
  File "./smpl.py", line 13, in <module>
    from distributed.parameter_server import ParameterServer
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 21, in <module>
    import os, torch, ujson, redis, socket, logging, utils, test, train, data
  File "data/data.py", line 6, in <module>
    import matplotlib.pyplot as plt
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/pyplot.py", line 115, in <module>
    _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/__init__.py", line 32, in pylab_setup
    globals(),locals(),[backend_name],0)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/backend_qt5agg.py", line 16, in <module>
    from .backend_qt5 import QtCore
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.py", line 26, in <module>
    import matplotlib.backends.qt_editor.figureoptions as figureoptions
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/qt_editor/figureoptions.py", line 20, in <module>
    import matplotlib.backends.qt_editor.formlayout as formlayout
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/qt_editor/formlayout.py", line 56, in <module>
    from matplotlib.backends.qt_compat import QtGui, QtWidgets, QtCore
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/qt_compat.py", line 128, in <module>
    from PyQt5 import QtCore, QtGui, QtWidgets
ImportError: libGL.so.1: cannot open shared object file: No such file or directory