ucbrise / clipper Goto Github PK

View Code? Open in Web Editor NEW

1.4K 1.4K 280.0 24.28 MB

A low-latency prediction-serving system

Home Page: http://clipper.ai

License: Apache License 2.0

CMake 8.74% C++ 55.16% Python 25.92% Shell 2.35% Java 1.99% Scala 2.74% Makefile 0.03% R 1.77% Dockerfile 1.29%

clipper's People

Contributors

Stargazers

Watchers

Forkers

dcrankshaw ljzzju hangelwen codeaudit lucentcosmos corey-zumar almostimplemented giulio-zhou yujialuo anshul-cached jegonzal shaneknapp yunshengb salemameen rishabhbhardwaj nishadsingh1 lucasmoura frank2wang87 sueann delding andyhyh etsangsplk huangyanjuner hrishikeshvganu duzhanyuan rmdort lyntel dubeyabhi07 cfandy weialexzhang phanther feynmanliang hemdec24 mindis gdtm86 wxdublin smrtslckr robinsingh1 andyk wtx626 bigdata2 xc35 ryanchr tanwanirahul the-sea vicavo cube3power lomascolo nikolayvoronchikhin tatevikstep withsmilo vickkyy raytsui stratospark bigrlab reppala yukunzhang santi81 akayeshmantha bdod6 unous1996 sreddybr3 pallavi-rao benzei es1024 prongs kakru jedoe cs99485 dnovokme chenchen-hci jeffwan zmoon111 dalenicholson liuguoyou simon-mo jwirjo haofanwang ryanhoque rbala19 chester-leung yrbahn nduggirala amogkam vikash0837 ankitsd persona-tech okazari dohyunkim-dev gtfierro blackhat06 pelluru mylinyuzhi rogervaas oribaldi jayshenoy daminig jojoyu xunzhang khush-bhatia

clipper's Issues

Verify that model containers are serving previously registered models

When a model container connects to Clipper, we should check the Redis model table to ensure that the container represents a registered Clipper model. At a minimum, we should log an error message if this is not the case.

clipper asks for SSH password

When i connect to clipper, even though i am providing the SSH key, clipper still asks for the password. This only happens once.

cm.Clipper(host, user, key)

Maybe something to do with Fabric api? I am not logging in as root, but the user has sudo permission

kubernetes integration

do you plan to allow for kubernetes integration instead of using docker-compose / swarm?

pthread_mutex_lock error after deploying two applications

The clipper works well when it is deployed one application. When it is deployed two applications ,it also works well for request related to the first deployed application, but it reports the following error for request related to the second deployed application.
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffc7fff700 (LWP 59250)] __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66 66 ../nptl/pthread_mutex_lock.c: No such file or directory.

Possible `labels_to_str` undefined behavior.

The labels_to_str function dereferences labels.end() - 1 where labels is a vector. If labels is an empty vector than *(label.end() - 1) is undefined behavior (I think).

dowload_cifar not working on python3

Because of how urllib is being used, the download_cifar script is not working on python 3, since URLOpener is not availabe since python 3.3.

Should we add support to python 3 to the files related to the tutorials, or just inform that the user should run them with python 2 ?

port collisions with hard-coded redis ports

the hard-coded port number will cause collisions when the PRB and master build simultaneously.

i suggest something like this in your run_unittests.sh script:

set +e  # turn of exit on command fail
REDIS_PORT=$((34256 + RANDOM % 1000))
lsof -i :$REDIS_PORT &> /dev/null

if [ $? -eq 0 ]; then # existing port in use found
  while true; do
    REDIS_PORT=$(($REDIS_PORT + RANDOM % 1000))
    lsof -i :$REDIS_PORT &> /dev/null
    if [ $? -eq 1 ]; then  # port not in use
      break
    fi
  done
fi
export $REDIS_PORT  # if you want to slurp this in to your test_constants before compilation or something
set -e  # turn exit on fail back on

now you need to update your test_constants.hpp file to respect this new port. a couple of ideas:

export the REDIS_PORT variable from your bash, and upon compilation, slurp it in and set it that way (i think this should work)
stupid unix tricks to the rescue! sed -i "s/34256/$REDIS_PORT/g" /home/jenkins/workspace/$JOB_NAME/src/libclipper/include/clipper/test_constants.hpp

stop_all only removes clipper/management_frontend

clipper.stop_all() only removes management_frontend container. It should remove all clipper containers.

boost error when executing make

I could successfully make the old clipper source code (cloned on March). But I came across the following boost error when I executed make instruction after I updated the new clipper source code today.

../libclipper/libclipper.a(query_processor.cpp.o): In function boost::detail::shared_state_base::set_exception_at_thread_exit(boost::exception_ptr)': /home/guest/tools/boost-1.60.0/include/boost-1_60/boost/thread/future.hpp:434: undefined reference to boost::detail::make_ready_at_thread_exit(boost::shared_ptrboost::detail::shared_state_base)'
../libclipper/libclipper.a(query_processor.cpp.o): In function boost::detail::shared_state<void>::set_value_at_thread_exit()': /home/guest/tools/boost-1.60.0/include/boost-1_60/boost/thread/future.hpp:791: undefined reference to boost::detail::make_ready_at_thread_exit(boost::shared_ptrboost::detail::shared_state_base)'
collect2: error: ld returned 1 exit status
make[2]: *** [src/frontends/query_frontend] Error 1
make[1]: *** [src/frontends/CMakeFiles/query_frontend.dir/all] Error 2
make: *** [all] Error 2

Make clipper_manager use version tag not latest versions of Docker containers

In clipper_manager, all the Docker containers use the latest tag when they should be used the 0.1 tag.

Randomize Redis port selection in unit tests

cc @shaneknapp

Error when tried run the clipper

Hi, I ran into two issues when tried the clipper project:

when run the "pip install -r requirements.txt" , the error occur:

Exception:
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/commands/install.py", line 342, in run
prefix=options.prefix_path,
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_set.py", line 784, in install
**kwargs
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_install.py", line 851, in install
self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_install.py", line 1064, in move_wheel_files
isolated=self.isolated,
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/wheel.py", line 377, in move_wheel_files
clobber(source, dest, False, fixer=fixer, filter=filter)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/wheel.py", line 316, in clobber
ensure_dir(destdir)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/utils/init.py", line 83, in ensure_dir
os.makedirs(path)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 150, in makedirs
makedirs(head, mode)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 150, in makedirs
makedirs(head, mode)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 1] Operation not permitted: '/System/Library/Frameworks/Python.framework/Versions/2.7/share'

this error happened after I ignored installing the "six", the error was "OSError: [Errno 1] Operation not permitted: '/var/folders/wj/wkz200q12ssg05sqrxzqb7sr0000gp/T/pip-JbPkys-uninstall/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/six-1.4.1-py2.7.egg-info'", Is there any thing I need to do to install all these? this error might something with OS, my OS is macOS Sierra.

2.When I tried to run the “example_client.py”. I got the error message which is:

lTraceback (most recent call last):
File "example_client.py", line 29, in
"-1.0", 40000)
File "/Users/jidai/Downloads/clipper/management/clipper_manager.py", line 294, in register_application
r = requests.post(url, headers=headers, data=req_json)
File "/Library/Python/2.7/site-packages/requests/api.py", line 112, in post
return request('post', url, data=data, json=json, **kwargs)
File "/Library/Python/2.7/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 518, in request
resp = self.send(prep, **send_kwargs)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 639, in send
r = adapter.send(request, **kwargs)
File "/Library/Python/2.7/site-packages/requests/adapters.py", line 502, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=1338): Max retries exceeded with url: /admin/add_app (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x10d916450>: Failed to establish a new connection: [Errno 61] Connection refused',))

Should I try another host and port and if yes, any suggestions for the host and port? (these dockers are on:

clipper_redis_1,

clipper_mgmt_frontend_1,

clipper_query_frontend_1)

Looking forward to hearing from you.

Clipper prediction doesn't seem to match the model prediction before deployment

We're getting mostly 1's for prediction coming from clipper. The model we deployed was giving 0's on some examples, but that is no longer true querying through clipper.

One hypothesis:
The latency requirement for the application is set at 20ms and the /metrics endpoint shows mean latency of 100ms, so maybe it's not serving real predictions to meet the latency goal? Is there a way to check that?

Are there other things we could check?

Also some ideas for making debugging easier:

Some way to easily view errors as metrics -- number of errors, of what type. I'm guessing currently we can look at the clipper & model container logs to get this? (not sure what is currently logged)
Model metadata such as when a model was deployed, what (name, version) a particular model container is using, which model containers are connected to clipper (basically a birds-eye view of the system).

Model release cycle

Do you have plans to support the model release cycle? What I mean is a workflow where:

The calling application triggers a model retrain based on deterioration in performance
Then a new model could be mounted in place of the old version.

#1 can be handled by the calling application but #2 is where I wanted to know if it's possible to programmatically upload a new version.

Remove uid from prediction queries until personalization is supported

Currently, users must supply a uid as part of the prediction request body but it is never used and must always be 0. We should remove the uid field entirely. The simplest way to do this is to just stop checking for uid when we parse the prediction request JSON and hardcode the uid as 0.

Replace this line with

long uid = 0;

and update the corresponding schema here

Pausing an application or a model

Q: Is there a concept of a “paused model / application” in Clipper?
Can you “turn off” a model (container) and turn it back on without deleting & re-deploying?

Support for different SSH port

Can you support different sshd port when initializing clipper.

clipper = cm.Clipper(host, PORT_NUMBER, user, key)

When i specify port within the IP, redis attempts to connect to incorrect address

Could not connect to Redis at XXX:XX:XX:XX:2201:6379: nodename nor servname provided, or not known

safe handling of duplicate register_application requests

Q: what happens if you try to register the same application twice (same name)?

A: The update will overwrite the old application with the new one. I think the place this will cause problems is you will have multiple query handlers registered to the same REST endpoint and I'm not sure which one will get called. We should probably prevent this from happening (which is a 2 line code change).

Deploy model causes docker container to exit

To reproduce this

Deploy any model with version no as 1
Try the /predict api to do a simple call
Deploy version 2 of the same model
Try /predict

query_frontend container exits with code 139

clipper/query_frontend:latest        "/clipper/release/..."   38 minutes ago      Exited (139) 26 seconds ago

This is the container log before exiting

[15:17:44.212][info]     [CLIPPER] Adding new container - model: captcha_predict_model, version: 6, ID: 1, input_type: strings
[15:17:44.212][info]  [TASKEXE...] Created queue for new model: captcha_predict_model : 6
[15:17:44.224][info]       [REDIS] Successfully issued command "HMSET captcha_predict_model,6,0 model_id captcha_predict_model:6 model_name captcha_predict_model model_version 6 model_replica_id 0 zmq_connection_id 1 batch_size 1 input_type strings"
[15:17:46.600][info]         [RPC] Found message to receive
[15:17:49.217][info]         [RPC] Found message to receive
[15:17:51.609][info]         [RPC] Found message to receive
[15:17:52.517][info]       [REDIS] Successfully issued command "GET captcha_predict:0:0"
[15:17:52.517][info]  [QUERYPR...] Found 1 tasks
[15:17:52.525][info]         [RPC] Found message to receive

Is there any other log i can look at whats happening. I am using the NoopContainer

Clipper_manager does not support sudo in local mode

Clipper manager should support running Docker commands that require sudo in both local and remote mode. It currently only supports this in remote mode. The relevant code is here and was written this way because Fabric doesn't support running local commands with sudo. We should fix this, but for now one workaround is to add your user to the docker group.

Reduce time until container is ready to serve predictions

It now takes ~10 minutes to get a container up and running, ready to serve predictions. We should improve that.

Basic implementation of RPC Service with ZeroMQ

@Corey-Zumar can you submit a PR for the RPC service implementation (whatever you have) before you go out of town?

Clipper does not build on OS X 10.10 due to dependency on std::shared_timed_mutex

Full error message:

[ 13%] Built target redox
[ 23%] Built target gmock_main
[ 28%] Built target gtest
Scanning dependencies of target clipper
[ 31%] Building CXX object src/libclipper/CMakeFiles/clipper.dir/src/query_processor.cpp.o
[ 34%] Building CXX object src/libclipper/CMakeFiles/clipper.dir/src/metrics.cpp.o
[ 36%] Building CXX object src/libclipper/CMakeFiles/clipper.dir/src/selection_policies.cpp.o
[ 39%] Building CXX object src/libclipper/CMakeFiles/clipper.dir/src/task_executor.cpp.o
[ 42%] Building CXX object src/libclipper/CMakeFiles/clipper.dir/src/rpc_service.cpp.o
[ 44%] Linking CXX static library libclipper.a
[ 57%] Built target clipper
Scanning dependencies of target frontendtests
Scanning dependencies of target libclippertests
[ 60%] Building CXX object src/frontends/CMakeFiles/frontendtests.dir/src/query_frontend_tests.cpp.o
[ 63%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/test_main.cpp.o
[ 65%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/metrics_test.cpp.o
[ 68%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/rpc_service_test.cpp.o
[ 71%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/timers_test.cpp.o
[ 73%] Linking CXX executable frontendtests
Undefined symbols for architecture x86_64:
  "std::__1::shared_timed_mutex::lock_shared()", referenced from:
      clipper::metrics::RatioCounter::increment(unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
      clipper::metrics::RatioCounter::clear() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::get_rate_seconds() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::get_rate_micros() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::compute_stats() in libclipper.a(metrics.cpp.o)
  "std::__1::shared_timed_mutex::lock()", referenced from:
      clipper::metrics::RatioCounter::get_ratio() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::tick() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::reset() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::clear() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::insert(long long) in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::clear() in libclipper.a(metrics.cpp.o)
  "std::__1::shared_timed_mutex::unlock()", referenced from:
      clipper::metrics::RatioCounter::increment(unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
      clipper::metrics::RatioCounter::get_ratio() in libclipper.a(metrics.cpp.o)
      clipper::metrics::RatioCounter::clear() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::tick() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::reset() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::get_rate_seconds() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::get_rate_micros() in libclipper.a(metrics.cpp.o)
      ...
  "std::__1::shared_timed_mutex::shared_timed_mutex()", referenced from:
      clipper::metrics::RatioCounter::RatioCounter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::EWMA(long, clipper::metrics::LoadAverage) in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::Meter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<clipper::metrics::MeterClock>) in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::Histogram(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned long) in libclipper.a(metrics.cpp.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [src/frontends/frontendtests] Error 1
make[2]: *** [src/frontends/CMakeFiles/frontendtests.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs....
[ 76%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/serialization_test.cpp.o
[ 78%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/persistent_state_test.cpp.o
[ 81%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/redis_test.cpp.o
[ 84%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/config_test.cpp.o
[ 86%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/selection_policies_test.cpp.o
[ 89%] Linking CXX executable libclippertests
Undefined symbols for architecture x86_64:
  "std::__1::shared_timed_mutex::lock_shared()", referenced from:
      clipper::metrics::RatioCounter::increment(unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
      clipper::metrics::RatioCounter::clear() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::get_rate_seconds() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::get_rate_micros() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::compute_stats() in libclipper.a(metrics.cpp.o)
  "std::__1::shared_timed_mutex::lock()", referenced from:
      clipper::metrics::RatioCounter::get_ratio() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::tick() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::reset() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::clear() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::insert(long long) in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::clear() in libclipper.a(metrics.cpp.o)
  "std::__1::shared_timed_mutex::unlock()", referenced from:
      clipper::metrics::RatioCounter::increment(unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
      clipper::metrics::RatioCounter::get_ratio() in libclipper.a(metrics.cpp.o)
      clipper::metrics::RatioCounter::clear() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::tick() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::reset() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::get_rate_seconds() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::get_rate_micros() in libclipper.a(metrics.cpp.o)
      ...
  "std::__1::shared_timed_mutex::shared_timed_mutex()", referenced from:
      clipper::metrics::RatioCounter::RatioCounter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::EWMA(long, clipper::metrics::LoadAverage) in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::Meter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<clipper::metrics::MeterClock>) in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::Histogram(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned long) in libclipper.a(metrics.cpp.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [src/libclipper/libclippertests] Error 1
make[2]: *** [src/libclipper/CMakeFiles/libclippertests.dir/all] Error 2
make[1]: *** [CMakeFiles/unittests.dir/rule] Error 2
make: *** [unittests] Error 2

The project builds successfully when the std::shared_timed_mutex's are replaced with boost::shared_timed_mutex (which turns out to be the same thing as boost::shared_mutex) but produces the following error when I run the unit tests using run_unittests.sh:

[ 13%] Built target redox
[ 23%] Built target gmock_main
[ 28%] Built target gtest
[ 57%] Built target clipper
[ 65%] Built target managementtests
[ 92%] Built target libclippertests
Scanning dependencies of target frontendtests
[ 94%] Building CXX object src/frontends/CMakeFiles/frontendtests.dir/src/query_frontend_tests.cpp.o
[ 97%] Linking CXX executable frontendtests
[100%] Built target frontendtests
Scanning dependencies of target unittests
[100%] Built target unittests
[==========] Running 41 tests from 8 test cases.
[----------] Global test environment set-up.
[----------] 8 tests from MetricsTests
[ RUN      ] MetricsTests.CounterCorrectness
[       OK ] MetricsTests.CounterCorrectness (0 ms)
[ RUN      ] MetricsTests.RatioCounterCorrectness
Ratio Test Ratio Counter has denominator zero!
Assertion failed: (exclusive), function assert_locked, file /usr/local/include/boost/thread/pthread/shared_mutex.hpp, line 51.
./bin/run_unittests.sh: line 43: 18339 Abort trap: 6           ./src/libclipper/libclippertests

@dcrankshaw @Corey-Zumar
@atumanov Whenever you can, see if the current Clipper project builds for you on OS X 10.11 (I believe you mentioned you had El Capitan installed).

Return default prediction explanation in query frontend response JSON

We should return a user-readable explanation in the event that Clipper returns a JSON response containing the default prediction (i.e "no active containers available" when no container is available to field the query)

JSON response from prediction

Correct me if i am wrong. Currently query_front.hpp returns a string output all the time

Line 145
ss << "qid:" << r.query_id_ << ", predict:" << r.output_.y_hat_;

I want to return JSON response from predict_strings handler in my docker container. Is this currently possible?

Failed to test the tutorial in the clipper source code version

I could run the tutorial successfully in the clipper docker version. But it always reports "No containers found for model tf_cifar:1" when I run the clipper source code version.

The source code clipper was generated in Ubuntu 14.04. (** can clippper work on Ubuntu 14.04?**)

My working steps of the clipper source code version are the same as the clipper docker version ,which are marked as following:

run ./bin/start_clipper.sh to start clipper.
Run the following python code to deploy new model container

################################################################3
cifar_loc="./data"
import sys
import os
sys.path.append(os.path.abspath('../../management/'))
import clipper_manager as cm
import cifar_utils
test_x, test_y = cifar_utils.filter_data(*cifar_utils.load_cifar(cifar_loc, cifar_filename="cifar_test.data", norm=True))

user = ""
key = ""
host = "localhost"
clipper = cm.Clipper(host, user, key)

## The clipper has been stared in source code , so not to start the docker version here
##clipper.start()

app_name = "cifar_demo"
candidate_models = [
{"model_name": "tf_cifar", "model_version": 1},
]

clipper.register_application(
app_name,
candidate_models,
"doubles",
"EXP4",
slo_micros=20000)

model_added = clipper.deploy_model(
"tf_cifar",
1,
os.path.abspath("tf_cifar_model"),
"clipper/tf_cifar_container:latest",
["cifar", "tf"],
"doubles",
num_containers=1
)
print("Model deploy successful? {success}".format(success=model_added))
###############################################################
After executing these codes, it outputs:
Found clipper/tf_cifar_container:latest in Docker hub
Copied model data to host
Published model to Clipper
Model deploy successful? True

So the model container was deployed to clipper successfully.
##################################################################

However, the outputs for " ./bin/start_clipper.sh" are:
[22:38:54.493][info] [REDIS] Successfully issued command "SELECT 2"
[22:38:54.493][info] [REDIS] Successfully issued command "HMSET tf_cifar:1 model_name tf_cifar model_version 1 load 0.000000 input_type doubles labels cifar,tf container_name clipper/tf_cifar_container:latest model_data_path /tmp/clipper-models/tf_cifar/1/tf_cifar_model"
[22:39:24.963][error] [REDIS] Error with command "GET cifar_demo:0:0":
[22:39:24.964][info] [QUERYPR...] Found 1 tasks
[22:39:24.964][info] [TASKEXE...] No active containers found for model tf_cifar:1

AND if I sent a prediction request, it was the random result. Obviously, the new model container wasn't used.

In addition, the following RPC related information was found when I deployed the new container in the clipper docker version. But it can't be found in the clipper source code version. So I guess it maybe some problems to the RPC interface. What's the exact problem ?
[14:33:04.732][info] [RPC] Found message to receive
[14:33:04.732][info] [RPC] New container connected
[14:33:04.732][info] [RPC] Container added
[14:33:04.733][info] [REDIS] Successfully issued command "SELECT 3"
[14:33:04.733][info] [REDIS] Successfully issued command "HMSET tf_cifar:1:0 model_id tf_cifar:1 model_name tf_cifar model_version 1 model_replica_id 0 zmq_connection_id 0 batch_size 1 input_type doubles"

Restart clipper containers and app containers on server reboot

Upon server reboot, can we restart

Clipper containers - I think this can be by adding restart:always to docker-compose.yaml file
Model containers - If a model is currently active, can we restart the specific containers. Not sure how this can be done.

Can we also move redis storage to a persistent volume in docker-compose. I think i saw a discussion about this on your Jira, but the issue was closed.

services:
  redis:
     volumes:
          - redisdata:/data

Volume mounting host paths into docker containers precludes distributed deployment of models

To deploy a model, clipper currently:

On the client running clipper_manager, serializes the prediction function to disk (https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L666) and copies the pickled function to the clipper host (https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L539) at a magical /tmp/{model_repo}/{name}/{version} file path (https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L497)
On the clipper host, spins up a docker container using model_metadata[container_name] and read-only volume mounts the serialized model from the magical file path https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L955

Reading models from the host's filesystem is not durable, as all models are lost when a single machine fails. Furthermore, it prevents distributed deployment since which physical host a container is scheduled to is not readily guaranteed. Under the current design: a model can only be deployed if its host machine has the model in its filesystem.

Kubernetes provides a volumes concept, but these are backed either by the host filesystem or a cloud provider volume (e.g. AWS EBS), both of which k8s restricts to be accessible from a single pod (EBS can only be mounted to one EC2 instance at a time).

There are a few possible solutions to the problem:

When deploying a model, build a docker image on the client and push the image to a docker registry. Have Redis track the pushed docker image instead of the magical /tmp/{model_repo/... file path. This will make build significantly slower, but improves durability (if the docker registry is durable) and scalability (each replica pulls the image and deploys it with no sharing of global state)
Have deployed containers request the model after container startup (e.g. from S3, or from the clipper host). This keeps builds at the same speed as before (time-cost of predict function serialization), but introduces additional setup costs (S3 bucket setup requires configuration and couples design to AWS) or could also lack durability (if models are still only persisted on the clipper host)

No selection state found for query with user_id: 1

when I send a query request with "uid=0"to the PREDICT port of the registered application, it works well. But if i assign uid with 1 or other value(i.e, uid=1, or uid=2), it reports the following error. What is the variable uid used for?
No selection state found for query with user_id: 1

Codes are similar to the cifar_utils.py in L122
uid=1
url = "http://%s:1337/%s/predict" % (host, app)
json = json.dumps({'uid': uid, 'input': list(x)})

Clipper returns default predictions when a new version of a model is deployed but not connected

When a user updates a model version, Clipper is informed that the model version has changed and it immediately starts to query the new version. However, it can take several minutes for the container for that new version to initialize and connect to Clipper. In that intervening period, Clipper attempts to query the latest version, cannot find any containers for that model version, and instead returns the default prediction.

Instead, it may be desirable for Clipper to wait until the new container has finished initializing and connects to Clipper before switching to the new version.

Deploying model from s3 fails for clipper running inside non-Debian distros

https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L513 requires dpkg-query, a tool only available in Debian-based linux distros. As a result, dockerized deployments of clipper require the use of a ubuntu/debian base and lighter-weight container distros (e.g. alpine) cannot be used. This definitely impacts container image size and may also introduce performance overhead due to unneeded OS subsystems.

Instead of using fabric to run aws s3, we could use boto to remove this dependence on the OS distro.

Confusing parts of Clipper Manager API

The following are a list of peculiarities I found in the python admin API:

The clipper_admin python module contains the clipper_manager. Is the second clipper needed?
The register_application function takes a model argument but must the model exist? It seems (from the demo) that the model does not need to be present when registering it in an application. This should probably raise an error.
Are selection_policies active in this release? (See inspect_selection_policy)?
What are external_models (See register external models)
The use of labels is a bit unclear.

mkdir permission denied issue

I am getting the following error when deploying models

Fatal error: run() received nonzero return code 1 while executing!

Requested: mkdir -p /tmp/clipper-models/mom_predict_model/3
Executed: /bin/bash -l -c "mkdir -p /tmp/clipper-models/mom_predict_model/3"

=============================== Standard output ===============================

mkdir: cannot create directory ‘/tmp/clipper-models/mom_predict_model/3’: Permission denied

================================================================================

Aborting.

How to replicate

Start clipper
Deploy a model
sudo reboot the server
Deploy a new model

is the tutorial supposed to work?

Add a content type header in the response

Clipper returns json without a proper Content-Type header.

Can we set it to application/json

Allow redis_persistence_path to read from an existing Redis snapshot

I just realized that forcing redis_persistence_path to not already exist will be problematic if you want to restart the Redis container and have it read from an existing Redis snapshot.

@Corey-Zumar Can you check if you can replicate the Docker container issue I was having when setting redis_persistence_path to /tmp?

Make Clipper compatible with Python 3

While running the tutorials, I realized that Clipper is not supporting Python 3. It will be necessary to update the python files to add support to Python 3 while also maintaining support for Python 2.

It will also be necessary to check the dependencies of the project. For example, fabric is only available for Python 2.

boost error when make

hi, I filed an issues titled by "make can't find the shared_mutex". Dcrankshaw proposed I need to compile with GCC 5.2 or later. And now I compile it with GCC 5.4.0 and run into other problems about boost.

My PC information is
OS: Ubuntu 14.04.3 LTS
gcc: 5.4.0
boost: boost-1.63

[ 73%] Linking CXX executable bench
../libclipper/libclipper.a(selection_policies.cpp.o): In function void boost::serialization::throw_exception<boost::archive::archive_exception>(boost::archive::archive_exception const&)': /home/test/tools/boost-1.63/include/boost-1_63/boost/serialization/throw_exception.hpp:36: undefined reference to boost::archive::archive_exception::archive_exception(boost::archive::archive_exception const&)'
../libclipper/libclipper.a(selection_policies.cpp.o): In function void boost::archive::save_access::save_primitive<boost::archive::binary_oarchive, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(boost::archive::binary_oarchive&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)': /home/test/tools/boost-1.63/include/boost-1_63/boost/archive/detail/oserializer.hpp:89: undefined reference to boost::archive::basic_binary_oprimitive<boost::archive::binary_oarchive, char, std::char_traits >::save(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)'
../libclipper/libclipper.a(selection_policies.cpp.o): In function void boost::archive::load_access::load_primitive<boost::archive::binary_iarchive, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(boost::archive::binary_iarchive&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)': /home/test/tools/boost-1.63/include/boost-1_63/boost/archive/detail/iserializer.hpp:107: undefined reference to boost::archive::basic_binary_iprimitive<boost::archive::binary_iarchive, char, std::char_traits >::load(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&)'
../libclipper/libclipper.a(selection_policies.cpp.o): In function void boost::archive::binary_iarchive_impl<boost::archive::binary_iarchive, char, std::char_traits<char> >::load_override<boost::archive::class_name_type>(boost::archive::class_name_type&)': /home/test/tools/boost-1.63/include/boost-1_63/boost/archive/binary_iarchive_impl.hpp:58: undefined reference to boost::archive::basic_binary_iarchiveboost::archive::binary_iarchive::load_override(boost::archive::class_name_type&)'
collect2: error: ld returned 1 exit status
make[2]: *** [src/frontends/bench] Error 1
make[1]: *** [src/frontends/CMakeFiles/bench.dir/all] Error 2
make: *** [all] Error 2

Why it exits in configure line 221?

Is the "exit 0" in line 221 for testing only? Should it be removed? Thanks!

  # cd into the extracted directory and install
  cd cmake-*
  exit 0

Duplicate `models_used_` field.

Inside of datatypes.hpp, the Output class has a std::vector<VersionedModelId> models_used_ field. The Response class includes an Output output_ field, but it also includes a std::vector<VersionedModelId> models_used_ field! I'm guessing the repeated models_used_ was an accident?

Failing tests when executing run_unittests.sh

Some tests appear to fail when executing the run_unittests.sh script, but in the end, it appears that all the tests were executed without any errors. This can be seen on the following image:

This is also happening in the project build. Is that the expected behavior, or should these tests break the current build ?

make can't find the shared_mutex

hi, I configured the clipper environments following the instruction on the github.
I could successfully execute the configure command. But when I executed the "make" command, I came across the following error.

clipper-develop/src/libclipper/include/clipper/persistent_state.hpp:6:35: fatal error: shared_mutex: No such file or directory

Codes of persistent_state.hpp are following:
4 #include
5 #include
6 #include <shared_mutex>

A shared_mutex.hpp file is found in path ''boost-1.63/include/boost-1_63/boost/thread/pthread/ . So I modified the include line to "#include <boost/thread/pthread/shared_mutex.hpp> ". But there came another error referred to boost.

So could anyone give me some suggestion?

Thanks.

Use SemVer for model versioning

Model version is current stored as an int (see https://github.com/ucbrise/clipper/blob/develop/src/libclipper/include/clipper/datatypes.hpp#L14 and https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L453) which makes it difficult to distinguish whether a version bump is a bug-fix or feature and whether it's is backwards compatible.

SemVer could be adopted for managing model versions to better faciliate this.

Allow model versions to be strings

There's no real reason to enforce that model versions be numeric. We should allow them to be strings, which for example would allow versions to be SHAs (e.g. git hashes).

Unloading old models

When a new model is deployed, can clipper unload previous versions of that model ?

We deploy at least 10 versions of a model a day. All the old docker containers hogs the machine. Right now we manually stop the old ones. Is it possible to have an API to undeploy?

Model-container implementation documentation

We should have some documentation on the how to implement a model-container

Unimplemented `state_key_hash` function.

size_t state_key_hash(const StateKey& key) is declared inside of persistent_state.hpp, but it is never implemented and never called.

Support deploy_predict_function running outside Anaconda

If you call deploy_predict_function running outside of an Anaconda environment, we should still serialize the function and try to run it in the python container. Basically, we should wrap this block in an if-statement that saves and checks the anaconda environment if present, otherwise prints a warning to the user and just serializes the function.