ucbrise / clipper Goto Github PK
View Code? Open in Web Editor NEWA low-latency prediction-serving system
Home Page: http://clipper.ai
License: Apache License 2.0
A low-latency prediction-serving system
Home Page: http://clipper.ai
License: Apache License 2.0
There's no real reason to enforce that model versions be numeric. We should allow them to be strings, which for example would allow versions to be SHAs (e.g. git hashes).
I just realized that forcing redis_persistence_path
to not already exist will be problematic if you want to restart the Redis container and have it read from an existing Redis snapshot.
@Corey-Zumar Can you check if you can replicate the Docker container issue I was having when setting redis_persistence_path
to /tmp
?
cc @shaneknapp
Do you have plans to support the model release cycle? What I mean is a workflow where:
#1 can be handled by the calling application but #2 is where I wanted to know if it's possible to programmatically upload a new version.
The labels_to_str
function dereferences labels.end() - 1
where labels
is a vector. If labels
is an empty vector than *(label.end() - 1)
is undefined behavior (I think).
Full error message:
[ 13%] Built target redox
[ 23%] Built target gmock_main
[ 28%] Built target gtest
Scanning dependencies of target clipper
[ 31%] Building CXX object src/libclipper/CMakeFiles/clipper.dir/src/query_processor.cpp.o
[ 34%] Building CXX object src/libclipper/CMakeFiles/clipper.dir/src/metrics.cpp.o
[ 36%] Building CXX object src/libclipper/CMakeFiles/clipper.dir/src/selection_policies.cpp.o
[ 39%] Building CXX object src/libclipper/CMakeFiles/clipper.dir/src/task_executor.cpp.o
[ 42%] Building CXX object src/libclipper/CMakeFiles/clipper.dir/src/rpc_service.cpp.o
[ 44%] Linking CXX static library libclipper.a
[ 57%] Built target clipper
Scanning dependencies of target frontendtests
Scanning dependencies of target libclippertests
[ 60%] Building CXX object src/frontends/CMakeFiles/frontendtests.dir/src/query_frontend_tests.cpp.o
[ 63%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/test_main.cpp.o
[ 65%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/metrics_test.cpp.o
[ 68%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/rpc_service_test.cpp.o
[ 71%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/timers_test.cpp.o
[ 73%] Linking CXX executable frontendtests
Undefined symbols for architecture x86_64:
"std::__1::shared_timed_mutex::lock_shared()", referenced from:
clipper::metrics::RatioCounter::increment(unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
clipper::metrics::RatioCounter::clear() in libclipper.a(metrics.cpp.o)
clipper::metrics::EWMA::get_rate_seconds() in libclipper.a(metrics.cpp.o)
clipper::metrics::Meter::get_rate_micros() in libclipper.a(metrics.cpp.o)
clipper::metrics::Histogram::compute_stats() in libclipper.a(metrics.cpp.o)
"std::__1::shared_timed_mutex::lock()", referenced from:
clipper::metrics::RatioCounter::get_ratio() in libclipper.a(metrics.cpp.o)
clipper::metrics::EWMA::tick() in libclipper.a(metrics.cpp.o)
clipper::metrics::EWMA::reset() in libclipper.a(metrics.cpp.o)
clipper::metrics::Meter::clear() in libclipper.a(metrics.cpp.o)
clipper::metrics::Histogram::insert(long long) in libclipper.a(metrics.cpp.o)
clipper::metrics::Histogram::clear() in libclipper.a(metrics.cpp.o)
"std::__1::shared_timed_mutex::unlock()", referenced from:
clipper::metrics::RatioCounter::increment(unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
clipper::metrics::RatioCounter::get_ratio() in libclipper.a(metrics.cpp.o)
clipper::metrics::RatioCounter::clear() in libclipper.a(metrics.cpp.o)
clipper::metrics::EWMA::tick() in libclipper.a(metrics.cpp.o)
clipper::metrics::EWMA::reset() in libclipper.a(metrics.cpp.o)
clipper::metrics::EWMA::get_rate_seconds() in libclipper.a(metrics.cpp.o)
clipper::metrics::Meter::get_rate_micros() in libclipper.a(metrics.cpp.o)
...
"std::__1::shared_timed_mutex::shared_timed_mutex()", referenced from:
clipper::metrics::RatioCounter::RatioCounter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
clipper::metrics::EWMA::EWMA(long, clipper::metrics::LoadAverage) in libclipper.a(metrics.cpp.o)
clipper::metrics::Meter::Meter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<clipper::metrics::MeterClock>) in libclipper.a(metrics.cpp.o)
clipper::metrics::Histogram::Histogram(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned long) in libclipper.a(metrics.cpp.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [src/frontends/frontendtests] Error 1
make[2]: *** [src/frontends/CMakeFiles/frontendtests.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs....
[ 76%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/serialization_test.cpp.o
[ 78%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/persistent_state_test.cpp.o
[ 81%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/redis_test.cpp.o
[ 84%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/config_test.cpp.o
[ 86%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/selection_policies_test.cpp.o
[ 89%] Linking CXX executable libclippertests
Undefined symbols for architecture x86_64:
"std::__1::shared_timed_mutex::lock_shared()", referenced from:
clipper::metrics::RatioCounter::increment(unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
clipper::metrics::RatioCounter::clear() in libclipper.a(metrics.cpp.o)
clipper::metrics::EWMA::get_rate_seconds() in libclipper.a(metrics.cpp.o)
clipper::metrics::Meter::get_rate_micros() in libclipper.a(metrics.cpp.o)
clipper::metrics::Histogram::compute_stats() in libclipper.a(metrics.cpp.o)
"std::__1::shared_timed_mutex::lock()", referenced from:
clipper::metrics::RatioCounter::get_ratio() in libclipper.a(metrics.cpp.o)
clipper::metrics::EWMA::tick() in libclipper.a(metrics.cpp.o)
clipper::metrics::EWMA::reset() in libclipper.a(metrics.cpp.o)
clipper::metrics::Meter::clear() in libclipper.a(metrics.cpp.o)
clipper::metrics::Histogram::insert(long long) in libclipper.a(metrics.cpp.o)
clipper::metrics::Histogram::clear() in libclipper.a(metrics.cpp.o)
"std::__1::shared_timed_mutex::unlock()", referenced from:
clipper::metrics::RatioCounter::increment(unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
clipper::metrics::RatioCounter::get_ratio() in libclipper.a(metrics.cpp.o)
clipper::metrics::RatioCounter::clear() in libclipper.a(metrics.cpp.o)
clipper::metrics::EWMA::tick() in libclipper.a(metrics.cpp.o)
clipper::metrics::EWMA::reset() in libclipper.a(metrics.cpp.o)
clipper::metrics::EWMA::get_rate_seconds() in libclipper.a(metrics.cpp.o)
clipper::metrics::Meter::get_rate_micros() in libclipper.a(metrics.cpp.o)
...
"std::__1::shared_timed_mutex::shared_timed_mutex()", referenced from:
clipper::metrics::RatioCounter::RatioCounter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
clipper::metrics::EWMA::EWMA(long, clipper::metrics::LoadAverage) in libclipper.a(metrics.cpp.o)
clipper::metrics::Meter::Meter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<clipper::metrics::MeterClock>) in libclipper.a(metrics.cpp.o)
clipper::metrics::Histogram::Histogram(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned long) in libclipper.a(metrics.cpp.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [src/libclipper/libclippertests] Error 1
make[2]: *** [src/libclipper/CMakeFiles/libclippertests.dir/all] Error 2
make[1]: *** [CMakeFiles/unittests.dir/rule] Error 2
make: *** [unittests] Error 2
The project builds successfully when the std::shared_timed_mutex
's are replaced with boost::shared_timed_mutex
(which turns out to be the same thing as boost::shared_mutex
) but produces the following error when I run the unit tests using run_unittests.sh
:
[ 13%] Built target redox
[ 23%] Built target gmock_main
[ 28%] Built target gtest
[ 57%] Built target clipper
[ 65%] Built target managementtests
[ 92%] Built target libclippertests
Scanning dependencies of target frontendtests
[ 94%] Building CXX object src/frontends/CMakeFiles/frontendtests.dir/src/query_frontend_tests.cpp.o
[ 97%] Linking CXX executable frontendtests
[100%] Built target frontendtests
Scanning dependencies of target unittests
[100%] Built target unittests
[==========] Running 41 tests from 8 test cases.
[----------] Global test environment set-up.
[----------] 8 tests from MetricsTests
[ RUN ] MetricsTests.CounterCorrectness
[ OK ] MetricsTests.CounterCorrectness (0 ms)
[ RUN ] MetricsTests.RatioCounterCorrectness
Ratio Test Ratio Counter has denominator zero!
Assertion failed: (exclusive), function assert_locked, file /usr/local/include/boost/thread/pthread/shared_mutex.hpp, line 51.
./bin/run_unittests.sh: line 43: 18339 Abort trap: 6 ./src/libclipper/libclippertests
@dcrankshaw @Corey-Zumar
@atumanov Whenever you can, see if the current Clipper project builds for you on OS X 10.11 (I believe you mentioned you had El Capitan installed).
I could successfully make
the old clipper source code (cloned on March). But I came across the following boost error when I executed make
instruction after I updated the new clipper source code today.
../libclipper/libclipper.a(query_processor.cpp.o): In function boost::detail::shared_state_base::set_exception_at_thread_exit(boost::exception_ptr)': /home/guest/tools/boost-1.60.0/include/boost-1_60/boost/thread/future.hpp:434: undefined reference to
boost::detail::make_ready_at_thread_exit(boost::shared_ptrboost::detail::shared_state_base)'
../libclipper/libclipper.a(query_processor.cpp.o): In function boost::detail::shared_state<void>::set_value_at_thread_exit()': /home/guest/tools/boost-1.60.0/include/boost-1_60/boost/thread/future.hpp:791: undefined reference to
boost::detail::make_ready_at_thread_exit(boost::shared_ptrboost::detail::shared_state_base)'
collect2: error: ld returned 1 exit status
make[2]: *** [src/frontends/query_frontend] Error 1
make[1]: *** [src/frontends/CMakeFiles/query_frontend.dir/all] Error 2
make: *** [all] Error 2
I could run the tutorial successfully in the clipper docker version. But it always reports "No containers found for model tf_cifar:1" when I run the clipper source code version.
The source code clipper was generated in Ubuntu 14.04. (** can clippper work on Ubuntu 14.04?**)
My working steps of the clipper source code version are the same as the clipper docker version ,which are marked as following:
################################################################3
cifar_loc="./data"
import sys
import os
sys.path.append(os.path.abspath('../../management/'))
import clipper_manager as cm
import cifar_utils
test_x, test_y = cifar_utils.filter_data(*cifar_utils.load_cifar(cifar_loc, cifar_filename="cifar_test.data", norm=True))
user = ""
key = ""
host = "localhost"
clipper = cm.Clipper(host, user, key)
## The clipper has been stared in source code , so not to start the docker version here
##clipper.start()
app_name = "cifar_demo"
candidate_models = [
{"model_name": "tf_cifar", "model_version": 1},
]
clipper.register_application(
app_name,
candidate_models,
"doubles",
"EXP4",
slo_micros=20000)
model_added = clipper.deploy_model(
"tf_cifar",
1,
os.path.abspath("tf_cifar_model"),
"clipper/tf_cifar_container:latest",
["cifar", "tf"],
"doubles",
num_containers=1
)
print("Model deploy successful? {success}".format(success=model_added))
###############################################################
After executing these codes, it outputs:
Found clipper/tf_cifar_container:latest in Docker hub
Copied model data to host
Published model to Clipper
Model deploy successful? True
So the model container was deployed to clipper successfully.
##################################################################
However, the outputs for " ./bin/start_clipper.sh" are:
[22:38:54.493][info] [REDIS] Successfully issued command "SELECT 2"
[22:38:54.493][info] [REDIS] Successfully issued command "HMSET tf_cifar:1 model_name tf_cifar model_version 1 load 0.000000 input_type doubles labels cifar,tf container_name clipper/tf_cifar_container:latest model_data_path /tmp/clipper-models/tf_cifar/1/tf_cifar_model"
[22:39:24.963][error] [REDIS] Error with command "GET cifar_demo:0:0":
[22:39:24.964][info] [QUERYPR...] Found 1 tasks
[22:39:24.964][info] [TASKEXE...] No active containers found for model tf_cifar:1
AND if I sent a prediction request, it was the random result. Obviously, the new model container wasn't used.
In addition, the following RPC related information was found when I deployed the new container in the clipper docker version. But it can't be found in the clipper source code version. So I guess it maybe some problems to the RPC interface. What's the exact problem ?
[14:33:04.732][info] [RPC] Found message to receive
[14:33:04.732][info] [RPC] New container connected
[14:33:04.732][info] [RPC] Container added
[14:33:04.733][info] [REDIS] Successfully issued command "SELECT 3"
[14:33:04.733][info] [REDIS] Successfully issued command "HMSET tf_cifar:1:0 model_id tf_cifar:1 model_name tf_cifar model_version 1 model_replica_id 0 zmq_connection_id 0 batch_size 1 input_type doubles"
Model version is current stored as an int
(see https://github.com/ucbrise/clipper/blob/develop/src/libclipper/include/clipper/datatypes.hpp#L14 and https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L453) which makes it difficult to distinguish whether a version bump is a bug-fix or feature and whether it's is backwards compatible.
SemVer could be adopted for managing model versions to better faciliate this.
Inside of datatypes.hpp
, the Output
class has a std::vector<VersionedModelId> models_used_
field. The Response
class includes an Output output_
field, but it also includes a std::vector<VersionedModelId> models_used_
field! I'm guessing the repeated models_used_
was an accident?
Is the "exit 0" in line 221 for testing only? Should it be removed? Thanks!
# cd into the extracted directory and install
cd cmake-*
exit 0
hi, I filed an issues titled by "make can't find the shared_mutex". Dcrankshaw proposed I need to compile with GCC 5.2 or later. And now I compile it with GCC 5.4.0 and run into other problems about boost.
My PC information is
OS: Ubuntu 14.04.3 LTS
gcc: 5.4.0
boost: boost-1.63
[ 73%] Linking CXX executable bench
../libclipper/libclipper.a(selection_policies.cpp.o): In function void boost::serialization::throw_exception<boost::archive::archive_exception>(boost::archive::archive_exception const&)': /home/test/tools/boost-1.63/include/boost-1_63/boost/serialization/throw_exception.hpp:36: undefined reference to
boost::archive::archive_exception::archive_exception(boost::archive::archive_exception const&)'
../libclipper/libclipper.a(selection_policies.cpp.o): In function void boost::archive::save_access::save_primitive<boost::archive::binary_oarchive, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(boost::archive::binary_oarchive&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)': /home/test/tools/boost-1.63/include/boost-1_63/boost/archive/detail/oserializer.hpp:89: undefined reference to
boost::archive::basic_binary_oprimitive<boost::archive::binary_oarchive, char, std::char_traits >::save(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)'
../libclipper/libclipper.a(selection_policies.cpp.o): In function void boost::archive::load_access::load_primitive<boost::archive::binary_iarchive, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(boost::archive::binary_iarchive&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)': /home/test/tools/boost-1.63/include/boost-1_63/boost/archive/detail/iserializer.hpp:107: undefined reference to
boost::archive::basic_binary_iprimitive<boost::archive::binary_iarchive, char, std::char_traits >::load(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&)'
../libclipper/libclipper.a(selection_policies.cpp.o): In function void boost::archive::binary_iarchive_impl<boost::archive::binary_iarchive, char, std::char_traits<char> >::load_override<boost::archive::class_name_type>(boost::archive::class_name_type&)': /home/test/tools/boost-1.63/include/boost-1_63/boost/archive/binary_iarchive_impl.hpp:58: undefined reference to
boost::archive::basic_binary_iarchiveboost::archive::binary_iarchive::load_override(boost::archive::class_name_type&)'
collect2: error: ld returned 1 exit status
make[2]: *** [src/frontends/bench] Error 1
make[1]: *** [src/frontends/CMakeFiles/bench.dir/all] Error 2
make: *** [all] Error 2
Correct me if i am wrong. Currently query_front.hpp returns a string output all the time
Line 145
ss << "qid:" << r.query_id_ << ", predict:" << r.output_.y_hat_;
I want to return JSON response from predict_strings
handler in my docker container. Is this currently possible?
When a new model is deployed, can clipper unload previous versions of that model ?
We deploy at least 10 versions of a model a day. All the old docker containers hogs the machine. Right now we manually stop the old ones. Is it possible to have an API to undeploy?
clipper.stop_all()
only removes management_frontend container. It should remove all clipper containers.
To reproduce this
/predict
api to do a simple call/predict
query_frontend container exits with code 139
clipper/query_frontend:latest "/clipper/release/..." 38 minutes ago Exited (139) 26 seconds ago
This is the container log before exiting
[15:17:44.212][info] [CLIPPER] Adding new container - model: captcha_predict_model, version: 6, ID: 1, input_type: strings
[15:17:44.212][info] [TASKEXE...] Created queue for new model: captcha_predict_model : 6
[15:17:44.224][info] [REDIS] Successfully issued command "HMSET captcha_predict_model,6,0 model_id captcha_predict_model:6 model_name captcha_predict_model model_version 6 model_replica_id 0 zmq_connection_id 1 batch_size 1 input_type strings"
[15:17:46.600][info] [RPC] Found message to receive
[15:17:49.217][info] [RPC] Found message to receive
[15:17:51.609][info] [RPC] Found message to receive
[15:17:52.517][info] [REDIS] Successfully issued command "GET captcha_predict:0:0"
[15:17:52.517][info] [QUERYPR...] Found 1 tasks
[15:17:52.525][info] [RPC] Found message to receive
Is there any other log i can look at whats happening. I am using the NoopContainer
Q: what happens if you try to register the same application twice (same name)?
A: The update will overwrite the old application with the new one. I think the place this will cause problems is you will have multiple query handlers registered to the same REST endpoint and I'm not sure which one will get called. We should probably prevent this from happening (which is a 2 line code change).
We should return a user-readable explanation in the event that Clipper returns a JSON response containing the default prediction (i.e "no active containers available" when no container is available to field the query)
When a user updates a model version, Clipper is informed that the model version has changed and it immediately starts to query the new version. However, it can take several minutes for the container for that new version to initialize and connect to Clipper. In that intervening period, Clipper attempts to query the latest version, cannot find any containers for that model version, and instead returns the default prediction.
Instead, it may be desirable for Clipper to wait until the new container has finished initializing and connects to Clipper before switching to the new version.
size_t state_key_hash(const StateKey& key)
is declared inside of persistent_state.hpp
, but it is never implemented and never called.
In clipper_manager, all the Docker containers use the latest
tag when they should be used the 0.1
tag.
To deploy a model, clipper currently:
clipper_manager
, serializes the prediction function to disk (https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L666) and copies the pickled function to the clipper host (https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L539) at a magical /tmp/{model_repo}/{name}/{version}
file path (https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L497)model_metadata[container_name]
and read-only volume mounts the serialized model from the magical file path https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L955Reading models from the host's filesystem is not durable, as all models are lost when a single machine fails. Furthermore, it prevents distributed deployment since which physical host a container is scheduled to is not readily guaranteed. Under the current design: a model can only be deployed if its host machine has the model in its filesystem.
Kubernetes provides a volumes concept, but these are backed either by the host filesystem or a cloud provider volume (e.g. AWS EBS), both of which k8s restricts to be accessible from a single pod (EBS can only be mounted to one EC2 instance at a time).
There are a few possible solutions to the problem:
/tmp/{model_repo/...
file path. This will make build significantly slower, but improves durability (if the docker registry is durable) and scalability (each replica pulls the image and deploys it with no sharing of global state)Hi, I ran into two issues when tried the clipper project:
Exception:
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/commands/install.py", line 342, in run
prefix=options.prefix_path,
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_set.py", line 784, in install
**kwargs
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_install.py", line 851, in install
self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_install.py", line 1064, in move_wheel_files
isolated=self.isolated,
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/wheel.py", line 377, in move_wheel_files
clobber(source, dest, False, fixer=fixer, filter=filter)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/wheel.py", line 316, in clobber
ensure_dir(destdir)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/utils/init.py", line 83, in ensure_dir
os.makedirs(path)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 150, in makedirs
makedirs(head, mode)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 150, in makedirs
makedirs(head, mode)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 1] Operation not permitted: '/System/Library/Frameworks/Python.framework/Versions/2.7/share'
this error happened after I ignored installing the "six", the error was "OSError: [Errno 1] Operation not permitted: '/var/folders/wj/wkz200q12ssg05sqrxzqb7sr0000gp/T/pip-JbPkys-uninstall/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/six-1.4.1-py2.7.egg-info'", Is there any thing I need to do to install all these? this error might something with OS, my OS is macOS Sierra.
2.When I tried to run the “example_client.py”. I got the error message which is:
lTraceback (most recent call last):
File "example_client.py", line 29, in
"-1.0", 40000)
File "/Users/jidai/Downloads/clipper/management/clipper_manager.py", line 294, in register_application
r = requests.post(url, headers=headers, data=req_json)
File "/Library/Python/2.7/site-packages/requests/api.py", line 112, in post
return request('post', url, data=data, json=json, **kwargs)
File "/Library/Python/2.7/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 518, in request
resp = self.send(prep, **send_kwargs)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 639, in send
r = adapter.send(request, **kwargs)
File "/Library/Python/2.7/site-packages/requests/adapters.py", line 502, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=1338): Max retries exceeded with url: /admin/add_app (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x10d916450>: Failed to establish a new connection: [Errno 61] Connection refused',))
Should I try another host and port and if yes, any suggestions for the host and port? (these dockers are on:
clipper_redis_1,
clipper_mgmt_frontend_1,
clipper_query_frontend_1)
Looking forward to hearing from you.
We're getting mostly 1's for prediction coming from clipper. The model we deployed was giving 0's on some examples, but that is no longer true querying through clipper.
One hypothesis:
The latency requirement for the application is set at 20ms and the /metrics endpoint shows mean latency of 100ms, so maybe it's not serving real predictions to meet the latency goal? Is there a way to check that?
Are there other things we could check?
Also some ideas for making debugging easier:
@Corey-Zumar can you submit a PR for the RPC service implementation (whatever you have) before you go out of town?
hi, I configured the clipper environments following the instruction on the github.
I could successfully execute the configure command. But when I executed the "make" command, I came across the following error.
clipper-develop/src/libclipper/include/clipper/persistent_state.hpp:6:35: fatal error: shared_mutex: No such file or directory
Codes of persistent_state.hpp are following:
4 #include
5 #include
6 #include <shared_mutex>
A shared_mutex.hpp file is found in path ''boost-1.63/include/boost-1_63/boost/thread/pthread/ . So I modified the include line to "#include <boost/thread/pthread/shared_mutex.hpp> ". But there came another error referred to boost.
So could anyone give me some suggestion?
Thanks.
Upon server reboot, can we restart
restart:always
to docker-compose.yaml
fileCan we also move redis storage to a persistent volume in docker-compose. I think i saw a discussion about this on your Jira, but the issue was closed.
services:
redis:
volumes:
- redisdata:/data
Clipper manager should support running Docker commands that require sudo in both local and remote mode. It currently only supports this in remote mode. The relevant code is here and was written this way because Fabric doesn't support running local commands with sudo. We should fix this, but for now one workaround is to add your user to the docker group.
Clipper returns json without a proper Content-Type
header.
Can we set it to application/json
The following are a list of peculiarities I found in the python admin API:
clipper_admin
python module contains the clipper_manager
. Is the second clipper needed?register_application
function takes a model
argument but must the model exist? It seems (from the demo) that the model does not need to be present when registering it in an application. This should probably raise an error.selection_policies
active in this release? (See inspect_selection_policy
)?external_models
(See register external models
)Q: Is there a concept of a “paused model / application” in Clipper?
Can you “turn off” a model (container) and turn it back on without deleting & re-deploying?
When i connect to clipper, even though i am providing the SSH key, clipper still asks for the password. This only happens once.
cm.Clipper(host, user, key)
Maybe something to do with Fabric api? I am not logging in as root, but the user has sudo permission
do you plan to allow for kubernetes integration instead of using docker-compose / swarm?
Some tests appear to fail when executing the run_unittests.sh script, but in the end, it appears that all the tests were executed without any errors. This can be seen on the following image:
This is also happening in the project build. Is that the expected behavior, or should these tests break the current build ?
We should have some documentation on the how to implement a model-container
It now takes ~10 minutes to get a container up and running, ready to serve predictions. We should improve that.
Currently, users must supply a uid as part of the prediction request body but it is never used and must always be 0. We should remove the uid field entirely. The simplest way to do this is to just stop checking for uid
when we parse the prediction request JSON and hardcode the uid as 0.
Replace this line with
long uid = 0;
and update the corresponding schema here
https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L513 requires dpkg-query
, a tool only available in Debian-based linux distros. As a result, dockerized deployments of clipper require the use of a ubuntu/debian base and lighter-weight container distros (e.g. alpine) cannot be used. This definitely impacts container image size and may also introduce performance overhead due to unneeded OS subsystems.
Instead of using fabric
to run aws s3
, we could use boto to remove this dependence on the OS distro.
I am getting the following error when deploying models
Fatal error: run() received nonzero return code 1 while executing!
Requested: mkdir -p /tmp/clipper-models/mom_predict_model/3
Executed: /bin/bash -l -c "mkdir -p /tmp/clipper-models/mom_predict_model/3"
=============================== Standard output ===============================
mkdir: cannot create directory ‘/tmp/clipper-models/mom_predict_model/3’: Permission denied
================================================================================
Aborting.
How to replicate
sudo reboot
the serverWhen a model container connects to Clipper, we should check the Redis model table to ensure that the container represents a registered Clipper model. At a minimum, we should log an error message if this is not the case.
when I send a query request with "uid=0"to the PREDICT port of the registered application, it works well. But if i assign uid with 1 or other value(i.e, uid=1, or uid=2), it reports the following error. What is the variable uid used for?
No selection state found for query with user_id: 1
Codes are similar to the cifar_utils.py in L122
uid=1
url = "http://%s:1337/%s/predict" % (host, app)
json = json.dumps({'uid': uid, 'input': list(x)})
Can you support different sshd port when initializing clipper.
clipper = cm.Clipper(host, PORT_NUMBER, user, key)
When i specify port within the IP, redis attempts to connect to incorrect address
Could not connect to Redis at XXX:XX:XX:XX:2201:6379: nodename nor servname provided, or not known
If you call deploy_predict_function
running outside of an Anaconda environment, we should still serialize the function and try to run it in the python container. Basically, we should wrap this block in an if-statement that saves and checks the anaconda environment if present, otherwise prints a warning to the user and just serializes the function.
Because of how urllib is being used, the download_cifar script is not working on python 3, since URLOpener is not availabe since python 3.3.
Should we add support to python 3 to the files related to the tutorials, or just inform that the user should run them with python 2 ?
While running the tutorials, I realized that Clipper is not supporting Python 3. It will be necessary to update the python files to add support to Python 3 while also maintaining support for Python 2.
It will also be necessary to check the dependencies of the project. For example, fabric is only available for Python 2.
The clipper works well when it is deployed one application. When it is deployed two applications ,it also works well for request related to the first deployed application, but it reports the following error for request related to the second deployed application.
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffc7fff700 (LWP 59250)] __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66 66 ../nptl/pthread_mutex_lock.c: No such file or directory.
the hard-coded port number will cause collisions when the PRB and master build simultaneously.
i suggest something like this in your run_unittests.sh script:
set +e # turn of exit on command fail
REDIS_PORT=$((34256 + RANDOM % 1000))
lsof -i :$REDIS_PORT &> /dev/null
if [ $? -eq 0 ]; then # existing port in use found
while true; do
REDIS_PORT=$(($REDIS_PORT + RANDOM % 1000))
lsof -i :$REDIS_PORT &> /dev/null
if [ $? -eq 1 ]; then # port not in use
break
fi
done
fi
export $REDIS_PORT # if you want to slurp this in to your test_constants before compilation or something
set -e # turn exit on fail back on
now you need to update your test_constants.hpp file to respect this new port. a couple of ideas:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.