This repository contains examples of custom models for PowerAI Vision.
It provides samples, explanations on how to use it, as well as some test files allowing to test models locally.
There are currently 2 examples, image classification and object detection. Both have been implemented in Keras and tested in PowerAI Vision 1.1.4.0 .
I'm Maxime Deloche and I work as a deep learning engineer in the Cognitive Systems Lab, at the IBM Systems Center of Montpellier, France. Our team is providing pre-sales technical support to IBM teams, partners and customers in Europe who are interested in Power Systems based infrastruture for AI: including PowerAI Vision, WMLCE, WMLA...
For any question or assistance, feel free to contact me or open an issue, and we'll get back to you.
Source: IBM Knowledge Center - Working with custom models
To import a custom model in PowerAI Vision, you must build a zip
file containing at least 2 files:
train.py
and deploy.py
-
train.py
: must define aMyTrain
class implementingTrainCallback
- template:
from train_interface import TrainCallback class MyTrain(TrainCallback): def __init__(self): pass def onPreprocessing(self, labels, images, workspace_path, params): pass def onTraining(self, monitor_handler): pass def onCompleted(self, model_path): pass def onFailed(self, train_status, e, tb_message): pass
- procedure:
onPreprocessing
first called- then
onTraining
- if error raised:
onFailed
called - else:
onCompleted
called
-
deploy.py
: must define aMyDeploy
class implementingDeployCallback
- template:
from deploy_interface import DeployCallback class MyDeploy(DeployCallback): def __init__(self): pass def onModelLoading(self, model_path, labels, workspace_path): pass def onTest(self): pass def onInference(self, image_url, params): pass def onFailed(self, deploy_status, e, tb_message): pass
- procedure:
onModelLading
first called- then
onInference
- if error raised:
onFailed
called
Note: those procedures are implemented in
tests/test_My{Train,Deploy}.py
files, allowing to emulate PowerAI Vision behavior and therefore test models locally
To import a model in PAIV, put train.py
and deploy.py
in a zip file using:
zip -j model.zip src/train.py src/deploy.py
and upload model.zip
in PAIV, section Custom Models
.
Both folders (image_classification/
and object_detection/
) have the same following structure:
src/train.py
: class implementing model trainingsrc/deploy.py
: class implementing model inferencesrc/train_interface.py
andsrc/deploy_interface.py
: empty classes to emulate PAIV source code, in order for the test to run outside PAIVtests/
: tests that emulate PAIV behavior, ie. call callbacks in appropriate order and handle errors
All datasets are available both as:
original
version (zip file containing images and labels)paiv
version (zip file that can be directly imported in the datasets of PowerAI Vision)
If you want to perform local tests (outside PAIV), download the original dataset and unzip it in datasets/monkeys-dataset/
or datasets/bears-dataset
(or set a different path in the test files).
Dataset source: 10-monkey-species on Kaggle
Dataset source: extract from Open Images Dataset
These scripts are meant to simulate the behavior or PowerAI Vision, allowing to test locally your code before uploading it to PowerAI Vision.
tests/test_MyTrain.py
: test the training of the model usingsrc.train.MyTrain
class: callsonPreprocessing
,onTraining
, andonCompleted
oronFailed
depending on the training result
./tests/test_MyTrain.py
tests/test_MyDeploy.py
: test the inference usingsrc.deploy.MyDeploy
class: callsonModelLoading
,onInference
on a dataset image andonFailed
if error raised during deployment or inference
./tests/test_MyDeploy.py
If logs of PAIV instance are not recorded (in Kibana for example), they can be found in Docker logs. As the Docker for the training is
removed as soon as the training ends, in case of failure you don't have enough time to find the docker ID and get its log: a 'dirty'
workaround is to add a sleep(60)
in the onFailure
callback ensuring the container stays alive for a minute and giving you time to
launch the docker logs
function below (it can be removed in prod or if you record the logs).
Note: PAIV somehow sets the python log level to logging.INFO, no matter what you set in your script, thus you need to write logs in at least INFO level.
To display logs of the most recent PowerAI Vision classification training:
sudo docker logs -f `sudo docker ps -qf "name=k8s_powerai-vision-cic-train" | head -n 1`
or
kubectl logs -f `kubectl get pods --sort-by=.status.startTime | tail -n 1 | awk '{print $1}'`
The docker ps
get the IDs of containers containing 'k8s_powerai-vision-cic-train' in their name (indicating classification trainings)
sorted by date. head
gets the most recent one, and docker logs
displays its log stack.