Giter Site home page Giter Site logo

ch3njust1n / smpl Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 14.17 MB

Simultaneous Multi-Party Learning Framework

Python 97.98% Shell 2.02%
distributed-deep-learning gradient-descent hypergraph deep-learning deep-neural-networks artificial-neural-networks artificial-intelligence distributed-systems hgsgd hypergraph-sgd

smpl's Introduction


senior software engineer

  • On my free time, I like reading research papers and comic books, and working on coding projects.

Languages and Tools:

Justin's github stats

smpl's People

Contributors

ch3njust1n avatar

Watchers

 avatar  avatar

smpl's Issues

KeyError: u'sess961626070984585'

Process Process-2:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 204, in __receive
    resp = self.__route({"addr": addr, "length": expected, "content": json.loads(data)})
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 255, in __route
    return self.__share_grad(*args)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 442, in __share_grad
    self.gradients[sess_id]['peers'].append(sender)
  File "<string>", line 2, in __getitem__
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/managers.py", line 774, in _callmethod
    raise convert_to_error(kind, result)
KeyError: u'sess961626070984585'

Implement DataServer

Implement DataServer (DS) on smpl-super. DS will control experiments, whether the data should be shuffled, track dataset partitions being distributed, and initiate all 32 nodes.

KeyError: 'share_count'

Process Process-3:
Traceback (most recent call last):
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/smpl/distributed/parameter_server.py", line 305, in __train_hyperedge
self.__train(sess_id)
File "/home/ubuntu/smpl/distributed/parameter_server.py", line 390, in __train
self.__allreduce(sess_id, multistep, share[sess_id]['train_size'])
File "/home/ubuntu/smpl/distributed/parameter_server.py", line 427, in __allreduce
share_count = ujson.loads(self.cache.get(sess_id))['share_count']
KeyError: 'share_count'

TypeError: int() argument must be a string or a number, not 'NoneType'

Process Process-2:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 201, in __receive
    resp = self.__route({"addr": addr, "length": expected, "content": ujson.loads(data)})
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 243, in __route
    return self.__establish_session(*args)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 688, in __establish_session
    if int(self.cache.get('edge')) == self.max:
TypeError: int() argument must be a string or a number, not 'NoneType'

AttributeError: 'NoneType' object has no attribute 'debug'

Process Process-2:1:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 331, in __train
    log.debug('ps.train() sess_id:{}'.format(sess_id))
AttributeError: 'NoneType' object has no attribute 'debug'

AttributeError: 'ParameterChannel' object has no attribute 'remove'

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 177, in receive
    self.pc.remove('{}:{}'.format(addr[0], addr[1]))
AttributeError: 'ParameterChannel' object has no attribute 'remove'

AttributeError: 'NoneType' object has no attribute 'dim'

Process Process-4:9:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/train.py", line 68, in train
    loss = self.network.loss(self.network(data), target)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/nn/functional.py", line 1048, in nll_loss
    dim = input.dim()
AttributeError: 'NoneType' object has no attribute 'dim'

Clean logs

Remove all logging statements except ones identifying function name at top of function

RuntimeError: size mismatch, m1: [16 x 784], m2: [2 x 1]

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/train.py", line 69, in train
    loss = self.network.loss(self.network(data), target)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/smpl/distributed/model/network.py", line 228, in forward
    x = F.sigmoid(self.fc1(x))
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/nn/modules/linear.py", line 55, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/nn/functional.py", line 835, in linear
    return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [16 x 784], m2: [2 x 1] at /opt/conda/conda-bld/pytorch-cpu_1518280788456/work/torch/lib/TH/generic/THTensorMath.c:1434

RuntimeError: sub() received an invalid combination of arguments

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 277, in train_hyperedge
    self.train(sess_id)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 344, in train
    multistep = nn.multistep_grad(sess['parameters'])
  File "/home/ubuntu/smpl/distributed/model/network.py", line 119, in multistep_grad
    return [b-FloatTensor(a) for (b, a) in zip(self.parameters(), network)]
RuntimeError: sub() received an invalid combination of arguments - got (torch.FloatTensor), but expected one of:
 * (float other, float alpha)
 * (Variable other, float alpha)

ValueError: Expecting property name: line 1 column 2 (char 1)

Traceback (most recent call last):
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/smpl/distributed/parameter_server.py", line 277, in train_hyperedge
self.train(sess_id)
File "/home/ubuntu/smpl/distributed/parameter_server.py", line 318, in train
sess = json.loads(getSess)#self.cache.get(sess_id))
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/json/init.py", line 339, in loads
return _default_decoder.decode(s)
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/json/decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)

KeyError: '192.168.0.11:9888'

[network.py:154 -       multistep_grad()] 2018-04-22 01:39:45,547 - root - DEBUG - MSG 2
[parameter_server.py:428 -          __allreduce()] 2018-04-22 01:39:45,548 - root - DEBUG - ps.__allreduce sess_id:sess144468
[parameter_channel.py:120 -                 send()] 2018-04-22 01:39:45,549 - root - ERROR - '192.168.0.11:9888'
Traceback (most recent call last):
  File "/home/ubuntu/smpl/distributed/parameter_channel.py", line 94, in send
    sock = self.connections[addr]
KeyError: '192.168.0.11:9888'

ParameterChannel Connections Dict KeyError

ERROR:root:'192.168.0.12:9888'
Traceback (most recent call last):
  File "/home/ubuntu/smpl/distributed/parameter_channel.py", line 107, in send
    sock = self.connections[addr]
KeyError: '192.168.0.12:9888'

Error 111 connecting to localhost:6379. Connection refused.

Traceback (most recent call last):
  File "./smpl.py", line 130, in <module>
    main()
  File "./smpl.py", line 69, in main
    ps = ParameterServer(args)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 94, in __init__
    self.flush()
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 775, in flush
    self.cache.flushall()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/redis/client.py", line 743, in flushall
    return self.execute_command('FLUSHALL')
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/redis/client.py", line 673, in execute_command
    connection.send_command(*args)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/redis/connection.py", line 610, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/redis/connection.py", line 585, in send_packed_command
    self.connect()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/redis/connection.py", line 489, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to localhost:6379. Connection refused.

Pass log to Train

  1. Pass log to train
  2. Test code to perturb parameters in Train() by adding 1
    • Difference between previous and current should be all 1s
    • After share_grad(), summed grads should be all 3s

KeyError: 'log'

Process Process-2:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 201, in __receive
    resp = self.__route({"addr": addr, "length": expected, "content": ujson.loads(data)})
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 251, in __route
    return self.__share_grad(*args)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 458, in __share_grad
    log_name = sess["log"]
KeyError: 'log'

RuntimeError: manual_seed expected a long, but got bool

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 303, in __train_hyperedge
    self.__train(sess_id, log)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 402, in __train
    Train(conf).validate()
  File "/home/ubuntu/smpl/distributed/train.py", line 32, in __init__
    super(Train, self).__init__(*config)
  File "/home/ubuntu/smpl/distributed/trainer.py", line 125, in __init__
    super(DevTrainer, self).__init__(data, network, sess_id, share, batch_size, cuda, drop_last, seed, shuffle)
  File "/home/ubuntu/smpl/distributed/trainer.py", line 103, in __init__
    super(DistributedTrainer, self).__init__(batch_size, cuda, data, drop_last, network, shuffle, seed)
  File "/home/ubuntu/smpl/distributed/trainer.py", line 39, in __init__
    manual_seed(self.seed)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/random.py", line 34, in manual_seed
    return default_generator.manual_seed(seed)
RuntimeError: manual_seed expected a long, but got bool

AttributeError: 'NoneType' object has no attribute 'data'

Process Process-1:1:9:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/model/network.py", line 215, in add_coordinates
    params[index].grad.data += grads
AttributeError: 'NoneType' object has no attribute 'data'

TypeError: string indices must be integers, not str

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 243, in train_hyperedge
    connected, sess_id = self.init_session()
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 449, in init_session
    self.cache.set(sess_id, json.dumps({"parameters": model['parameters'], "accuracy": model['accuracy'],
TypeError: string indices must be integers, not str
time (second):  11.4726130962
Process Process-6:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 243, in train_hyperedge
    connected, sess_id = self.init_session()
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 449, in init_session
    self.cache.set(sess_id, json.dumps({"parameters": model['parameters'], "accuracy": model['accuracy'],
TypeError: string indices must be integers, not str
Process Process-6:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 243, in train_hyperedge
    connected, sess_id = self.init_session()
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 449, in init_session
    self.cache.set(sess_id, json.dumps({"parameters": model['parameters'], "accuracy": model['accuracy'],
TypeError: string indices must be integers, not str

RuntimeError: manual_seed expected a long, but got bool

Process Process-3:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 266, in train_hyperedge
    self.train(sess_id)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 316, in train
    p = Process(target=Train(conf).train)
  File "/home/ubuntu/smpl/distributed/train.py", line 36, in __init__
    super(Train, self).__init__(*config)
  File "/home/ubuntu/smpl/distributed/trainer.py", line 129, in __init__
    shuffle, seed)
  File "/home/ubuntu/smpl/distributed/trainer.py", line 115, in __init__
    network, shuffle, seed)
  File "/home/ubuntu/smpl/distributed/trainer.py", line 50, in __init__
    manual_seed(self.seed)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/random.py", line 34, in manual_seed
    return default_generator.manual_seed(seed)
RuntimeError: manual_seed expected a long, but got bool

NameError: global name 'logger' is not defined

Process Process-3:
Traceback (most recent call last):
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/smpl/distributed/parameter_server.py", line 272, in train_hyperedge
self.train(sess_id)
File "/home/ubuntu/smpl/distributed/parameter_server.py", line 322, in train
p = Process(target=Train(conf).train)
File "/home/ubuntu/smpl/distributed/train.py", line 36, in init
super(Train, self).init(*config)
File "/home/ubuntu/smpl/distributed/trainer.py", line 129, in init
shuffle, seed)
File "/home/ubuntu/smpl/distributed/trainer.py", line 114, in init
super(DistributedTrainer, self).init(batch_size, cuda, data, drop_last, logger,
NameError: global name 'logger' is not defined

undefined symbol: nvrtcGetProgramLogSize

Traceback (most recent call last):
  File "./smpl.py", line 13, in <module>
    from distributed.parameter_server import ParameterServer
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 16, in <module>
    from train import Train
  File "/home/ubuntu/smpl/distributed/train.py", line 11, in <module>
    import os, utils
  File "/home/ubuntu/smpl/distributed/utils.py", line 7, in <module>
    import os, pickle, datetime, ujson, codecs, glob, torch, math, logging
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/__init__.py", line 56, in <module>
    from torch._C import *
ImportError: /home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/_C.so: undefined symbol: nvrtcGetProgramLogSize

KeyError: 'parameters'

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 356, in __train
    nn.update_parameters(sess['parameters'])
KeyError: 'parameters'

ps.AllReduce deadlock

Need to send gradients to peers at the very beginning of ps.allreduce(). Currently, all peers wait without notifying other peers which causes deadlock.

Clean up self.logger logs

Too messy can't read anything.

  1. Set appropriate logging levels. Be conservative. Only log what is necessary.
  2. Remove unnecessary log lines

AttributeError: cannot assign module before Module.__init__() call

Traceback (most recent call last):
  File "./smpl.py", line 125, in <module>
    main()
  File "./smpl.py", line 64, in main
    ps = ParameterServer(args)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 75, in __init__
    "parameters": [x.data.tolist() for x in net.DevNet().parameters()]}))
  File "/home/ubuntu/smpl/distributed/model/network.py", line 221, in __init__
    self.fc1 = nn.Linear(784, 10)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/torch/nn/modules/module.py", line 424, in __setattr__
    "cannot assign module before Module.__init__() call")
AttributeError: cannot assign module before Module.__init__() call

AttributeError: 'ParameterServer' object has no attribute 'pc'

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 183, in __receive
    self.pc.remove('{}:{}'.format(addr[0], addr[1]))
AttributeError: 'ParameterServer' object has no attribute 'pc'

Optimize SMPL

Enumerate all debugs and make schedule for finishing

  1. Place timers everywhere
  2. Log timers

TypeError: object of type 'bool' has no len()

Process Process-2:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 200, in __receive
    conn.sendall(self.__format_msg(resp))
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 218, in __format_msg
    return ''.join([str(len(msg)), '::', ujson.dumps(msg)])
TypeError: object of type 'bool' has no len()

TypeError: 'NoneType' object is not iterable

Process Process-1:1:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 389, in __train
    sess["accuracy"] = Train(conf).validate()
  File "/home/ubuntu/smpl/distributed/trainer.py", line 67, in validate
    for data, target in self.val_loader:
TypeError: 'NoneType' object is not iterable

TypeError: object of type 'bool' has no len()

Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 206, in __receive
    conn.sendall(self.__format_msg(resp))
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 224, in __format_msg
    return ''.join([str(len(msg)), '::', ujson.dumps(msg)])
TypeError: object of type 'bool' has no len()

ps.__synchronize_parameters() pc.send() error

Error: pc.send() "{'api': 'synchronize_parameters', 'args': ['sess246', {u'id': 2, u'alias': u'smpl-2', u'host': u'192.168.0.11', u'port': 9888, u'accuracy': 0.0}, [{u'id': 2, u'alias': u'smpl-2', u'host': u'192.168.0.11', u'port': 9888, u'accuracy': 0.0}]]}:192.168.0.11"

TypeError: unhashable type: 'list'

Process Process-2:1:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 402, in __train
    nn.add_batched_coordinates(sess['gradients'], sess['samples'])
  File "/home/ubuntu/smpl/distributed/model/network.py", line 241, in add_batched_coordinates
    if layer in params_coords:
TypeError: unhashable type: 'list'

matplotlib backend import error

This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

The backend was *originally* set to 'Qt5Agg' by the following code:
  File "smpl.py", line 17, in <module>
    from distributed.parameter_server import ParameterServer
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 24, in <module>
    import os, torch, json, redis, socket, logging, utils, test, train, session, data
  File "data/data.py", line 13, in <module>
    import matplotlib.pyplot as plt
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/pyplot.py", line 71, in <module>
    from matplotlib.backends import pylab_setup
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/__init__.py", line 16, in <module>
    line for line in traceback.format_stack()
This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

The backend was *originally* set to 'Qt5Agg' by the following code:
  File "smpl.py", line 17, in <module>
    from distributed.parameter_server import ParameterServer
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 24, in <module>
    import os, torch, json, redis, socket, logging, utils, test, train, session, data
  File "data/data.py", line 13, in <module>
    import matplotlib.pyplot as plt
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/pyplot.py", line 71, in <module>
    from matplotlib.backends import pylab_setup
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/__init__.py", line 16, in <module>
    line for line in traceback.format_stack()

ImportError: libGL.so.1: cannot open shared object file: No such file or directory

Traceback (most recent call last):
  File "./smpl.py", line 13, in <module>
    from distributed.parameter_server import ParameterServer
  File "/home/ubuntu/smpl/distributed/parameter_server.py", line 21, in <module>
    import os, torch, ujson, redis, socket, logging, utils, test, train, data
  File "data/data.py", line 6, in <module>
    import matplotlib.pyplot as plt
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/pyplot.py", line 115, in <module>
    _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/__init__.py", line 32, in pylab_setup
    globals(),locals(),[backend_name],0)
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/backend_qt5agg.py", line 16, in <module>
    from .backend_qt5 import QtCore
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.py", line 26, in <module>
    import matplotlib.backends.qt_editor.figureoptions as figureoptions
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/qt_editor/figureoptions.py", line 20, in <module>
    import matplotlib.backends.qt_editor.formlayout as formlayout
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/qt_editor/formlayout.py", line 56, in <module>
    from matplotlib.backends.qt_compat import QtGui, QtWidgets, QtCore
  File "/home/ubuntu/anaconda2/envs/smpl/lib/python2.7/site-packages/matplotlib/backends/qt_compat.py", line 128, in <module>
    from PyQt5 import QtCore, QtGui, QtWidgets
ImportError: libGL.so.1: cannot open shared object file: No such file or directory

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.