aetros / aetros-cli Goto Github PK

View Code? Open in Web Editor NEW

110.0 8.0 28.0 823 KB

AETROS CLI + SDK. Command line application to manage/monitor machine learning training in AETROS Trainer

Home Page: http://aetros.com/trainer

License: MIT License

Makefile 0.05% Python 99.95%

theano deep-learning machine-learning tensorflow

aetros-cli's Introduction

AETROS CLI + Python SDK

This package is a python application you

Want to start jobs from your local PC on remote server
Want to start jobs from your local PC on local machine
Want to connect your server as cluster computing server in AETROS Trainer
You want to use certain features of the Python SDK (e.g. job actions)

How to use AETROS CLI

Please see our documentation AETROS CLI: Getting started.

How to use AETROS Python SDK

Please see our documentation Python SDK: Getting started.

Installation

$ sudo pip install aetros

# update
$ sudo pip install aetros --upgrade

Requirement

For simple models (where we generate the Keras code for you), you need to install Keras 2 (<=2.1.2), Tensorflow and Python 2.7/3.

For custom models (where you start any command and might integrate our Python SDK), you only need Python 2/3.

Installation development version

If you want to install current master (which is recommended during the closed-beta) you need to execute:

$ git clone https://github.com/aetros/aetros-cli.git
$ cd aetros-cli
$ make dev-install
$ aetros --help
$ # maybe you have to execute aetros-cli commands using python directly
$ python -m aetros --help

To debug issues, you can try to enable debug mode using DEBUG=1 environment variable in front of the command, example:

$ DEBUG=1 python -m aetros start owner/model-name/cd877e3f91e137394d644f4b61d97e6ab47fdfde
2017-09-04 17:18:52 osx.fritz.box aetros-job[11153] DEBUG Home config loaded from /Users/marc/.aetros.yml
...

You can alternatively to git clone download the zip at https://github.com/aetros/aetros-cli/archive/master.zip.

aetros-cli's People

Contributors

Stargazers

Watchers

aetros-cli's Issues

README, Documentation link broken

The Documentation link in the README file is broken.

https://github.com/aetros/aetros-cli/blob/master/docu/python-sdk/getting-started

Starter does not terminate child processes properly

Currently, when aetros run is executed in non-docker mode, the following group (tree) of 3 processes are created :

aetros run
- sh -c [command]
  - [command] (for example pip install -r requirements.txt && python script.py)

The issue is that when the job is aborted, child processes (sh and [command]) are not terminated properly (with SIGINT).

To replicate it, consider the following script terminate.py:

import signal
from time import sleep

import aetros.backend
from aetros.utils import prepend_signal_handler


def on_sigint(sig, frame):
    print("on_sigint called: %s" % sig)


if __name__ == '__main__':
    job = aetros.backend.context()
    prepend_signal_handler(signal.SIGINT, on_sigint)
    try:
        print("Starting..")
        for i in range(1000):
            sleep(2)
            print("i: %d" % i)
    except (KeyboardInterrupt, SystemExit) as e:
        print("Process interrupted!", e)

If we run python terminate.py, then we will notice that when we terminate the job (i.e. press Ctrl+C or stop the job in the Trainer app, or via kill -SIGINT) we will notice that on_sigint was called and SystemExit exception was caught. (P.S. SystemExit exception is caught instead of KeyboardInterrupt because aetros backend terminates with sys.exit, see

aetros-cli/aetros/backend.py

Line 921 in 4281392

sys.exit(0 if self.in_early_stop else 1)

)

However, when we run the same script with aetros starter (i.e. aetros run --local) and abort/interrupt the job by any of the above methods, you will notice that python process hasn't been terminated properly (particularly, you will see counting numbers after job status changed to aborted and stopped). Moreover, this behaviour might be more unpleasant if we have a child processes inside our python script. In such case, child process will not be terminated at all and keep running.

Invalid param name --secure-key when runnning model locally

When new job were defined through GUI such message appear:

aetros start 1WPX06QP0 --secure-key=YOUR_KEY

but if we try to run this command we receive such exception:

aetros start: error: unrecognized arguments: --secure-key=my_key_id

Reading --help message for start I understood that it should be called with --api-key= param like:

aetros start 1WPX06QP0 --api-key=my_key_id

I think param name inside GUI or inside backend should be changed.

ValueError: No JSON object could be decoded

ValueError: No JSON object could be decoded while run API_KEY='fc605a322ceea8bab59b2acde592d793' aetros start weiyu322/digit-convolution --insights

Connection times out

During training, the connection times out and aetros.com is not reachable from the network anymore. After 10 minutes or so, aetros.com is again reachable. I tested this on multiple different networks.

Epoch 13: loss=0.039863, acc=0.988683, val_loss=0.054563, val_acc=0.987000
Epoch 14: loss=0.039226, acc=0.988967, val_loss=0.055365, val_acc=0.986900
Epoch 15: loss=0.038914, acc=0.989067, val_loss=0.057667, val_acc=0.986500
Sync weights ...
Crashed ...
ERROR:root:Traceback (most recent call last):
  File "/home/d/AETROS/local/lib/python2.7/site-packages/aetros/starter.py", line 94, in start
    job_model.sync_weights()
  File "/home/d/AETROS/local/lib/python2.7/site-packages/aetros/JobModel.py", line 223, in sync_weights
    self.backend.upload_weights('best.hdf5', self.get_weights_filepath_best(), with_status=True)
  File "/home/d/AETROS/local/lib/python2.7/site-packages/aetros/AetrosBackend.py", line 227, in upload_weights
    response = requests.post(url, data=body, headers=headers)
  File "/home/d/AETROS/local/lib/python2.7/site-packages/requests/api.py", line 110, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/home/d/AETROS/local/lib/python2.7/site-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/d/AETROS/local/lib/python2.7/site-packages/requests/sessions.py", line 475, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/d/AETROS/local/lib/python2.7/site-packages/requests/sessions.py", line 596, in send
    r = adapter.send(request, **kwargs)
  File "/home/d/AETROS/local/lib/python2.7/site-packages/requests/adapters.py", line 487, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='aetros.com', port=80): Max retries exceeded with url: /api/job/weights?id=JmvGARpLb&accuracy=-1.000000.2&token=94dc9107aa68bc7c9307257df82928c6 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fbfbf2a5990>: Failed to establish a new connection: [Errno 110] Connection timed out',))

Running local code

@marcj First, thanks for the great effort in making this GUI. Please excuse the questions if they sound very simple.

- I think my issue is the lack of documented examples (examples in general) in the repository. I'm trying to run the simple Keras mnist_cnn code. I have a very powerful local machine, and ssh is already set up. After I define the model I added this:

job = aetros.backend.start_job('<username>/mnist_cnn')
for i in range(0, 100):
     model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
     job.progress(epoch=i, total=100)
job.done()

When the model starts training, it shows in trainer that it is alive, but everything is blank. What should I do to output the progress plots, insights, channels, etc. ?

- For mnist_cnn example, mnist_data already comes with the package. If I have my own local data files, is there anything special I have to do to access them during training? Currently, before I run the code, I push it to my account at Aetros. Then I run the code locally. python my_code.py. This makes my unsure of how accessing the data is handled. Is it locally, or do we have to upload it somewhere on Aetros. Please clarify.
- If I stop the job, then restart it. I get a code to use. When I run the code I get the following error Exception: Server python script is not configured. Aborted. What does this mean? and how should I go about fixing it?
- It is says in the documentation that hyper-parameters optimization is easy with Aetros. Is it possible to have an complete example in the repository to show how one can do that?

Thanks again,
-M

Aetros not running on Windows Server

When starting the aetros cli it produces:

[0m �[34maetros-server[3912]�[0m �[30mERROR�[0m �[31mConnection error during connecting to trainer.aetros.com: preexec_fn is not supported on Windows platforms�[0m
Exception in thread Thread-3:
Traceback (most recent call last):
  File "c:\anaconda\lib\threading.py", line 801, in __bootstrap_inner
    self.run()
  File "c:\anaconda\lib\threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "c:\anaconda\lib\site-packages\aetros\backend.py", line 347, in thread_write
    if not self.connect():
  File "c:\anaconda\lib\site-packages\aetros\backend.py", line 210, in connect
    stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=subprocess.PIPE)
  File "c:\anaconda\lib\subprocess.py", line 343, in __init__
    raise ValueError("preexec_fn is not supported on Windows "
ValueError: preexec_fn is not supported on Windows platforms

TypeError: "integer" is not JSON serializable

Here is the stacktrace for my simple scipt:

Using Theano backend.
Using gpu device 0: GeForce GTX 970 (CNMeM is enabled with initial size: 80.0% of memory, cuDNN 5105)
X_train shape: (50000, 3, 32, 32)
50000 train samples
10000 test samples
Using real-time data augmentation.
Training status changed to TRAINING
Traceback (most recent call last):
  File "cifar10_cnn.py", line 116, in <module>
    validation_data=(X_test, Y_test))
  File "C:\Program Files\Anaconda3\lib\site-packages\aetros-0.3.5-py3.5.egg\aetros\KerasIntegration.py", line 65, in overwritten_fit_generator
    callback = self.setup(generator, nb_epoch)
  File "C:\Program Files\Anaconda3\lib\site-packages\aetros-0.3.5-py3.5.egg\aetros\KerasIntegration.py", line 94, in setup
    network_type=self.network_type, graph=graph)
  File "C:\Program Files\Anaconda3\lib\site-packages\aetros-0.3.5-py3.5.egg\aetros\AetrosBackend.py", line 275, in ensure_network
    'graph': json.dumps(graph, allow_nan=False),
  File "C:\Program Files\Anaconda3\lib\json\__init__.py", line 237, in dumps
    **kw).encode(obj)
  File "C:\Program Files\Anaconda3\lib\json\encoder.py", line 198, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "C:\Program Files\Anaconda3\lib\json\encoder.py", line 256, in iterencode
    return _iterencode(o, 0)
  File "C:\Program Files\Anaconda3\lib\json\encoder.py", line 179, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: 2304 is not JSON serializable

here is my script (taken from the keras examples folder):

from __future__ import print_function
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.optimizers import SGD
from keras.utils import np_utils
import os
os.environ['API_KEY'] = "----------"

batch_size = 32
nb_classes = 10
nb_epoch = 200
data_augmentation = True

# input image dimensions
img_rows, img_cols = 32, 32
# the CIFAR10 images are RGB
img_channels = 3

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()

model.add(Convolution2D(32, 3, 3, border_mode='same',
                        input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

# let's train the model using SGD + momentum (how original).
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])
			  
from aetros.KerasIntegration import KerasIntegration
KerasIntegration('gabrieldemarmiesse/test2', model, insights=True)


X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

if not data_augmentation:
    print('Not using data augmentation.')
    model.fit(X_train, Y_train,
              batch_size=batch_size,
              nb_epoch=nb_epoch,
              validation_data=(X_test, Y_test),
              shuffle=True)
else:
    print('Using real-time data augmentation.')

    # this will do preprocessing and realtime data augmentation
    datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False)  # randomly flip images

    # compute quantities required for featurewise normalization
    # (std, mean, and principal components if ZCA whitening is applied)
    datagen.fit(X_train)

    # fit the model on the batches generated by datagen.flow()
    model.fit_generator(datagen.flow(X_train, Y_train,
                        batch_size=batch_size),
                        samples_per_epoch=X_train.shape[0],
                        nb_epoch=nb_epoch,
                        validation_data=(X_test, Y_test))

here is my setup:

windows 10
Anaconda with python 3 (latest version right now)
keras (latest version right now)
Theano (latest version right now)
I'm working on a CPU. Cuda is not installed.

Here is the content of the .theanorc file:

[global]
floatX = float32
device = gpu

[nvcc]
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin

and here is the content of the .keras/keras.json file:

{
    "floatx": "float32",
    "epsilon": 1e-07,
    "image_dim_ordering": "th",
    "backend": "theano"
}

Thank you.

Aetros using exclusively TensorFlow as backend ?

Are you planing to use TensorFlow as a backend exclusively ?

Because when running a training with aetros on my server, using Keras with TensorFlow backend, Theano AND TensorFlow seems to run in parallel which leads me to this crash :

Found 21 classes, 3963 images (3168 in training [augmented], 795 in validation). Read all images into memory from /home/ubuntu/aetros-cli-data/datasets/arnauddelaunay/dataset/fashion/datasets_downloads
trainer.input_shape = []
trainer.classes = ["Class 15", "Class 1", "Class 14", "Class 18", "Class 10", "Class 19", "Class 4", "Class 17", "Class 5", "Class 6", "Class 11", "Class 0", "Class 3", "Class 9", "Class 13", "Class 20", "Class 16", "Class 12", "Class 2", "Class 8", "Class 7"]
Possible data keys 'arnauddelaunay/dataset/fashion'
Training status changed to CONSTRUCT   
F tensorflow/stream_executor/cuda/cuda_driver.cc:316] current context was not created by the StreamExecutor cuda_driver API: 0x3e13e20; a CUDA runtime call was likely performed without using a StreamExecutor context
Aborted (core dumped)

On this issue, they say it may come from running both Theano and Tensorflow on GPU.

By the way, it works fine when I run with Keras using Theano as backend.

OSError: [Errno 2] No such file or directory

I try to run aetros on windows and get the following error. I did not have any issues when running it from linux. I have keras==1.2.2 installed. I do not know if the error is aetros related or not. Can you provide any feedback ?

Exception in thread Thread-1:
Traceback (most recent call last):
File "c:\python27\lib\threading.py", line 801, in __bootstrap_inner
self.run()
File "c:\python27\lib\threading.py", line 754, in run
self.__target(*self.__args, **self._kwargs)
File "c:\python27\lib\site-packages\aetros\commands\ServerCommand.py", line 131, in thread
if not self.connect():
File "c:\python27\lib\site-packages\aetros\commands\ServerCommand.py", line 82, in connect
self.event_listener.fire('registration', server_id)
File "c:\python27\lib\site-packages\aetros\backend.py", line 87, in fire
callback(parameter)
File "c:\python27\lib\site-packages\aetros\commands\ServerCommand.py", line 388, in registration_complete
self.server.send_message({'type': 'system', 'values': self.collect_system_information()})
File "c:\python27\lib\site-packages\aetros\commands\ServerCommand.py", line 405, in collect_system_information
values['disks'][name] = psutil.disk_usage(disk[1]).total
File "c:\python27\lib\site-packages\psutil_init.py", line 2017, in disk_usage
return _psplatform.disk_usage(path)
File "c:\python27\lib\site-packages\psutil_pswindows.py", line 249, in disk_usage
raise OSError(errno.ENOENT, msg)
OSError: [Errno 2] No such file or directory: 'X:'

Traceback (most recent call last):
File "c:\python27\lib\runpy.py", line 162, in run_module_as_main
"main", fname, loader, pkg_name)
File "c:\python27\lib\runpy.py", line 72, in run_code
exec code in run_globals
File "C:\Python27\Scripts\aetros.exe_main.py", line 9, in
File "c:\python27\lib\site-packages\aetros_init.py", line 58, in main
code = command.main(cmd_args)
File "c:\python27\lib\site-packages\aetros\commands\ServerCommand.py", line 301, in main
self.server.send_message({'type': 'utilization', 'values': self.collect_system_utilization()})
File "c:\python27\lib\site-packages\aetros\commands\ServerCommand.py", line 434, in collect_system_utilization
values['disks'][name] = psutil.disk_usage(disk[1]).used
File "c:\python27\lib\site-packages\psutil_init_.py", line 2017, in disk_usage
return _psplatform.disk_usage(path)
File "c:\python27\lib\site-packages\psutil_pswindows.py", line 249, in disk_usage
raise OSError(errno.ENOENT, msg)
OSError: [Errno 2] No such file or directory: 'X:'

Error in sending job information

I am getting this error when running my code.

Training status changed to TRAINING Error in sending job information: {"error":"InvalidArgumentException","message":"id is required"}

Error when running script from a branch.

In Python3, the branches variable in utils/git.py need to be decoded into strings from byte-type object.

AttributeError: module 'signal' has no attribute 'SIGUSR1' undre windows when registering server

Trying to register a server with aetros-0.10.3.tar.gz in anaconda environment on python 3.6

(phd) C:\Users\floar>aetros server floAr/Pandora
Traceback (most recent call last):
  File "d:\programs\anaconda3\envs\phd\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "d:\programs\anaconda3\envs\phd\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Programs\Anaconda3\envs\phd\Scripts\aetros.exe\__main__.py", line 9, in <module>
  File "d:\programs\anaconda3\envs\phd\lib\site-packages\aetros\__init__.py", line 91, in main
    code = command.main(cmd_args)
  File "d:\programs\anaconda3\envs\phd\lib\site-packages\aetros\commands\ServerCommand.py", line 148, in main
    signal.signal(signal.SIGUSR1, self.on_signusr1)
AttributeError: module 'signal' has no attribute 'SIGUSR1'

CLI crashes when running with --insights flag

Hi Marc,

Trying to train an old model of mine. Everything works fine, however, when I try to turn on the insights flag I get the following dump:

Running

aetros start --local USERNAME/MODEL --insights

Trace

Traceback (most recent call last):
  File "/usr/local/Cellar/python@2/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/local/Cellar/python@2/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/Users/i820491/.virtualenvs/aetros/lib/python2.7/site-packages/aetros/__main__.py", line 5, in <module>
    sys.exit(aetros.main())
  File "/Users/i820491/.virtualenvs/aetros/lib/python2.7/site-packages/aetros/__init__.py", line 121, in main
    code = command.main(cmd_args)
  File "/Users/i820491/.virtualenvs/aetros/lib/python2.7/site-packages/aetros/commands/StartSimpleCommand.py", line 45, in main
    start_keras(self.logger, job_backend)
  File "/Users/i820491/.virtualenvs/aetros/lib/python2.7/site-packages/aetros/starter.py", line 702, in start_keras
    keras_model_utils.job_start(job_backend, trainer, keras_logger)
  File "/Users/i820491/.virtualenvs/aetros/lib/python2.7/site-packages/aetros/keras_model_utils.py", line 93, in job_start
    model_provider.train(trainer, model, data_train, data_validation)
  File "model.py", line 132, in train
    callbacks=trainer.callbacks
  File "/Users/i820491/.virtualenvs/aetros/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/Users/i820491/.virtualenvs/aetros/lib/python2.7/site-packages/keras/engine/training.py", line 2141, in fit_generator
    callbacks.on_train_begin()
  File "/Users/i820491/.virtualenvs/aetros/lib/python2.7/site-packages/keras/callbacks.py", line 130, in on_train_begin
    callback.on_train_begin(logs)
  File "/Users/i820491/.virtualenvs/aetros/lib/python2.7/site-packages/aetros/KerasCallback.py", line 195, in on_train_begin
    images = self.build_insight_images()
  File "/Users/i820491/.virtualenvs/aetros/lib/python2.7/site-packages/aetros/KerasCallback.py", line 375, in build_insight_images
    Y = result[0]
IndexError: list index out of range

Cheers

Starter stuck when job is failed/done too quickly

Consider the example when job has failed too quickly (for instance, due to the invalid parameter value), in such case job status will be changed to FAILED and stuck forever (with running processes). It will similarly stuck when job is successfully finished.

To replicate use the following script and run it with starter aetros run --local:

import aetros.backend

if __name__ == '__main__':
    job = aetros.backend.context()
    raise IOError("File not found")

However, if you run it with python quick.py it will finish properly (i.e. status will be changed to STOPPED), which suggests that the problem is with aetros starter.
Interestingly, that when you run it in debug mode DEBUG=1 aetros run --local or add delay sleep(1) after job initialisation, it will also finish properly. Might be something related to race conditions / async issues (i.e. when job hasn't initialised properly but already finished)

Batch predict

Current predict functionality allows for one (or few) test data at a time. A more comprehensive functionality, allowing for batch predict, together with the associated KPIs would be perhaps desirable.

TypeError: Too many parameter passed to theano function

I'm getting the following error immediately after the first epoch completes. Any idea what the problem might be?

The shape of my inputs is (3, 256, 256).

Epoch 1/5
2976/3000 [============================>.] - ETA: 0s - loss: 0.1243 - acc: 0.9458Traceback (most recent call last):
  File "learn.py", line 36, in <module>
    batch_size=32)
  File "/home/nweninge/.local/lib/python2.7/site-packages/aetros/KerasIntegration.py", line 55, in overwritten_fit
    class_weight, sample_weight, **kwargs)
  File "/home/nweninge/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1124, in fit
    callback_metrics=callback_metrics)
  File "/home/nweninge/.local/lib/python2.7/site-packages/keras/engine/training.py", line 862, in _fit_loop
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/home/nweninge/.local/lib/python2.7/site-packages/keras/callbacks.py", line 42, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/home/nweninge/.local/lib/python2.7/site-packages/aetros/KerasLogger.py", line 218, in on_epoch_end
    images = self.build_insight_images()
  File "/home/nweninge/.local/lib/python2.7/site-packages/aetros/KerasLogger.py", line 300, in build_insight_images
    Y = fn(input_data_x_sample)[0]
  File "/home/nweninge/.local/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 792, in __call__
    return self.function(*inputs)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 770, in __call__
    raise TypeError("Too many parameter passed to theano function")
TypeError: Too many parameter passed to theano function```

Exception for KerasCallback on multiple losses

My model has multiple losses and I get the following error using the KerasCallback. It seems that the callback sends a list which is not supported by the backend.

Traceback (most recent call last):
  File "dsb_train.py", line 69, in <module>
    pipeline=pipeline)
  File "/Users/wouter/.aetros/wouterdewinter/dsb-2018/27546ea456ec6e659a5811e5cadd39eff5e4da26/dsb/model.py", line 325, in train
    verbose=0
  File "/Users/wouter/anaconda2/envs/deep-matting/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/Users/wouter/anaconda2/envs/deep-matting/lib/python3.6/site-packages/keras/engine/training.py", line 2213, in fit_generator
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/Users/wouter/anaconda2/envs/deep-matting/lib/python3.6/site-packages/keras/callbacks.py", line 76, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/Users/wouter/anaconda2/envs/deep-matting/lib/python3.6/site-packages/aetros/KerasCallback.py", line 271, in on_epoch_end
    self.all_losses.send(log['epoch'], losses)
  File "/Users/wouter/anaconda2/envs/deep-matting/lib/python3.6/site-packages/aetros/backend.py", line 292, in send
    raise Exception('Could not send channel value for ' + self.name+' since type ' + type(y).__name__+' is not supported. Use int, float or string values.')
Exception: Could not send channel value for All loss since type list is not supported. Use int, float or string values.

Install CLI issue / SyntaxError: invalid syntax

$ python setup.py install

Traceback (most recent call last):
File "setup.py", line 3, in
import aetros
File "/Users/kaushik/aetros-cli/aetros/init.py", line 6, in
from aetros.commands.UploadWeightsCommand import UploadWeightsCommand
File "/Users/kaushik/aetros-cli/aetros/commands/UploadWeightsCommand.py", line 70
print "Uploading weights to %s of %s ..." %(job_id, job['networkId'])
^
SyntaxError: invalid syntax

KeyError: 'id'

Hi,

I am running aetros 0.9.9 on a Windows 7 machine. I managed to successfully run the mnist-digits classifier with no errors. (Thanks by the way for fixing #17 ).
I created my own digits dataset which I uploaded to aetros. I duplicated the mnist model and updated it to load my new dataset. I also updated the input dimensions. When I run the job I get the following error:

File "F:\Development\Python35-keras2-tf-gpu\lib\site-packages\aetros\auto_dataset.py", line 544, in get_images if image['id'] in images and os.path.isfile(images[image['id']]): KeyError: 'id'

about windows system

$ API_KEY='MY_API_KEY' aetros start my/my-network --insights
This order can not run in windows system,says "API_KEY"is not a internal command.
whether or not the windows syetem can not support aetros?
Thank you very much!

bash: aetros: command not found in Win10

I'm using Windows 10 and Python 3.5.2, I did python -m pip install aetros but I can't find the aetros command in normal cmd or Git Bash. I can however import aetros as a python module, but that doesn't help me to run a simple model locally on my machine.

Failed to find libhdf5.h during instalation

Hi!
I've tried to build your client on my local computer inside clear virtualenv. On the ubuntu 16.04 it failed with following error:

In file included from /tmp/easy_install-bwpD2f/h5py-2.7.0rc2/h5py/defs.c:470:0:
/tmp/easy_install-bwpD2f/h5py-2.7.0rc2/h5py/api_compat.h:27:18: fatal error: hdf5.h: No such file or directory
compilation terminated.
error: Setup script exited with error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

After few tries I've fixed this error by manually installing h5py package with pip. And it seems something strange - h5py exist inside setup.py install_requires list, but was not installed correct way.
Maybe this info will be helpful for you for future development.

keras conv vae example not working

https://github.com/fchollet/keras/blob/master/examples/variational_autoencoder_deconv.py

Theano, CNMeM is disabled, cuDNN 4007. Ubuntu 14.04.

Training status changed to TRAINING 
Epoch 1/5
59900/60000 [============================>.] - ETA: 0s - loss: 0.2128
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-25-6d04987fe1fd> in <module>()
      3         nb_epoch=nb_epoch,
      4         batch_size=batch_size,
----> 5         validation_data=(x_test, x_test))

/opt/conda/lib/python2.7/site-packages/aetros-0.3.4-py2.7.egg/aetros/KerasIntegration.pyc in overwritten_fit(x, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, **kwargs)
     52             callbacks.append(callback)
     53             copy['fit'](x, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, True,
---> 54                         class_weight, sample_weight, **kwargs)
     55 
     56             self.end()

/opt/conda/lib/python2.7/site-packages/keras/engine/training.pyc in fit(self, x, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight)
   1106                               verbose=verbose, callbacks=callbacks,
   1107                               val_f=val_f, val_ins=val_ins, shuffle=shuffle,
-> 1108                               callback_metrics=callback_metrics)
   1109 
   1110     def evaluate(self, x, y, batch_size=32, verbose=1, sample_weight=None):

/opt/conda/lib/python2.7/site-packages/keras/engine/training.pyc in _fit_loop(self, f, ins, out_labels, batch_size, nb_epoch, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics)
    844                         for l, o in zip(out_labels, val_outs):
    845                             epoch_logs['val_' + l] = o
--> 846             callbacks.on_epoch_end(epoch, epoch_logs)
    847             if callback_model.stop_training:
    848                 break

/opt/conda/lib/python2.7/site-packages/keras/callbacks.pyc in on_epoch_end(self, epoch, logs)
     38     def on_epoch_end(self, epoch, logs={}):
     39         for callback in self.callbacks:
---> 40             callback.on_epoch_end(epoch, logs)
     41 
     42     def on_batch_begin(self, batch, logs={}):

/opt/conda/lib/python2.7/site-packages/aetros-0.3.4-py2.7.egg/aetros/KerasLogger.pyc in on_epoch_end(self, epoch, logs)
    160             #todo, this is not very generic
    161             log['validation_loss'][layer.name] = log['val_loss'] #outs[0]
--> 162             log['validation_accuracy'][layer.name] = log['val_acc'] #outs[1]
    163 
    164             log['training_loss'][layer.name] = log['loss'] #outs[0]

KeyError: 'val_acc'

TypeError: expected a string or other character buffer object

C:\Users\Vinay\Documents\keras>aetros start vinaysawant/kerasCNN
...
Training '1WPX0q0P0' created and started. Open http://aetros.com/trainer/app?tra
ining=1WPX0q0P0 to monitor the training.
start network ...
Using Theano backend.
Setup training
Crashed ...
ERROR:root:Traceback (most recent call last):
File "c:\anaconda2\lib\site-packages\aetros\starter.py", line 88, in start
network.job_prepare(job)
File "c:\anaconda2\lib\site-packages\aetros\network.py", line 130, in job_prep
are
f.write(config['code'])
TypeError: expected a string or other character buffer object

Sending last (13) monitoring information to server ...
out.
Traceback (most recent call last):
File "c:\anaconda2\lib\runpy.py", line 174, in run_module_as_main
"main", fname, loader, pkg_name)
File "c:\anaconda2\lib\runpy.py", line 72, in run_code
exec code in run_globals
File "C:\Anaconda2\Scripts\aetros.exe_main.py", line 9, in
File "c:\anaconda2\lib\site-packages\aetros_init.py", line 75, in main
return command.main(cmd_args)
File "c:\anaconda2\lib\site-packages\aetros\commands\StartCommand.py", line 45
, in main
start(parsed_args.network_name, dataset_id=parsed_args.dataset, insights=par
sed_args.insights, insights_sample_path=parsed_args.insights_sample)
File "c:\anaconda2\lib\site-packages\aetros\starter.py", line 129, in start
raise e
TypeError: expected a string or other character buffer object

Crash when running on regression task

Aetros integration makes my regression DNN crash when performing the fit operation.

Here is the stack trace :

File "downloads.py", line 40, in <module>
    model.fit(X, Y, validation_split=0.2, nb_epoch=1000, batch_size=10, callbacks=callbacks_list, verbose=1)
  File "/usr/local/lib/python2.7/dist-packages/aetros/KerasIntegration.py", line 54, in overwritten_fit
    class_weight, sample_weight, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 597, in fit
    sample_weight=sample_weight)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1107, in fit
    callback_metrics=callback_metrics)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 845, in _fit_loop
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/usr/local/lib/python2.7/dist-packages/keras/callbacks.py", line 40, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/usr/local/lib/python2.7/dist-packages/aetros/KerasLogger.py", line 162, in on_epoch_end
    log['validation_accuracy'][layer.name] = log['val_acc'] #outs[1]
KeyError: 'val_acc

Here is the script :

# create model
model = Sequential()
model.add(Dense(34, input_dim=47, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(23, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(12, init='normal', activation='relu'))
model.add(Dense(1, init='normal', activation='linear'))
# Compile model
model.compile(loss='mean_absolute_percentage_error', optimizer='adam')

# Save best weigths
filepath="weights.best.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=0, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

# Add Aetros integration
from aetros.KerasIntegration import KerasIntegration
KerasIntegration('almathie/downloads', model, insights=True)

model.fit(X, Y, validation_split=0.2, nb_epoch=1000, batch_size=10, callbacks=callbacks_list, verbose=1)

Note that running the script without aetros works just fine