gremlin-docker's People
Forkers
saketjnu msrb miteshvp fridex jyasveer tuxdna abs51295 jpopelka jmelis riuvshin yzainee-zz sivaavkd arajkumar rajusem deepak1725 jparsai dgpatelgit sawood14012 rafiu007 vinagarw272001gremlin-docker's Issues
Resulting image is huge
The resulting image is huge (~2.7GB) due to layers that are present. Even though there is a clean up done at the end in the Dockerfile, note that all files are present in the layers where they were introduced/downloaded.
Docker image fails to build
Following links seem to be dead:
https://s3.amazonaws.com/bayesian-titan110/titan-1.1.0-SNAPSHOT-hadoop2.zip
https://s3.amazonaws.com/bayesian-titan110/titan-all.tgz
Step 5 : RUN curl -o /opt/titan-1.1.0-SNAPSHOT-hadoop2.zip https://s3.amazonaws.com/bayesian-titan110/titan-1.1.0-SNAPSHOT-hadoop2.zip
---> Running in 62346995ad28
�[91m �[0m�[91m % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0�[0m�[91m
0 �[0m�[91m 0 0 �[0m�[91m 0�[0m�[91m 0�[0m�[91m �[0m�[91m �[0m�[91m �[0m�[91m �[0m�[91m0�[0m�[91m �[0m�[91m 0 0 --:--:-- --:--:-- --:--:-- 0�[0m�[91m
100 307 �[0m�[91m 0 307 �[0m�[91m0�[0m�[91m �[0m�[91m �[0m�[91m �[0m�[91m �[0m�[91m �[0m�[91m0 �[0m�[91m 535 0 --:--:-- --:--:-- --:--:-- 535
�[0m ---> 346f85945912
Removing intermediate container 62346995ad28
Step 6 : RUN curl -O https://s3.amazonaws.com/bayesian-titan110/titan-all.tgz && tar xzf titan-all.tgz
---> Running in 6c4b477eff81
�[91m �[0m�[91m % Total % Received % Xferd Average Speed Time Ti�[0m�[91mme Time Current
�[0m�[91m �[0m�[91m Dload Upload Total Spent �[0m�[91m Left Speed
0 0 0 �[0m�[91m �[0m�[91m0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0�[0m�[91m
�[0m�[91m 0 0 0 0 0 0 0 0 --:--:-- --:-�[0m�[91m-:-- --:--:-- 0�[0m�[91m
�[0m�[91m 0 0 �[0m�[91m �[0m�[91m �[0m�[91m0�[0m�[91m 0�[0m�[91m �[0m�[91m �[0m�[91m �[0m�[91m �[0m�[91m0 0 0 0 --:--:-- 0:00:01 --:--:-- 0�[0m�[91m
100 �[0m�[91m307 0 307 0 0 181 0 --:--:-- 0:00:01 --:--:-- 181
�[0m�[91m
gzip: stdin: not in gzip format
�[0m�[91mtar: Child returned status 1
tar: Error is not recoverable: exiting now
�[0mThe command '/bin/sh -c curl -O https://s3.amazonaws.com/bayesian-titan110/titan-all.tgz && tar xzf titan-all.tgz' returned a non-zero code: 2
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Setting status of 3dcbb600fcf3f52e35f53996b6b8d05e13358522 to FAILURE with url https://cucos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/gremlin-docker-PRs/10/ and message: 'Build finished. '
ERROR: script returned exit code 1
Finished: FAILURE
Improve Graph Sync
This issue collates all the points that could be helpful in improving graph writes. There are three ways to solve the issue.
- Improve current data_importer service
- Run data_importer gunicorn (HTTP Server) process in multi-worker mode
- Run more than one replica for data_importer in openshift
- Split actual graph sync process in a way that other workers are not kept waiting
- Call data_importer/<ingest_to_graph> API in an asynchronous way. (Cons, we may loose logging ability. Pros, very simple to implement and no worker is kept waiting)
- Update the workflow to not wait for Graph Sync task (Not much idea if possible. cc @fridex)
- Implement data_importer/<ingest_to_graph> as part of selinon tasks. (Cons, implementation might require more time. Pros, all the selinon related advantage)
- Improve Graph Writes
- Use single-model instead of multi-model which helps in faster writes. (Cons, rewrite full graph. Pros, writes will be much faster. Need to see how much faster with a small load test)
Split Titan and Gremlin - impossible to scale
As the current stack looks like the following:
DynamoDB - Titan - Gremlin
We would like to split Titan and Gremlin into two standalone containers which would allow us to scale Gremlin independently. Note that creating multiple Titan instances talking to the same DynamoDB tables results in data inconsistencies and data corruptions as stated in "Titan Limitations":
Running multiple Titan instances on one machine backed by the same storage backend (distributed or local) requires that each of these instances has a unique configuration for storage.machine-id-appendix. Otherwise, these instances might overwrite each other leading to data corruption. See Graph Configuration for more information.
Source: http://titan.thinkaurelius.com/wikidoc/0.3.1/Titan-Limitations.html
Gremlin server can not be started on local machine (in Docker)
bayesian-gremlin-http | 21457 [gremlin-server-boss-1] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Gremlin Server configured with worker thread pool of 1, gremlin pool of 8 and boss thread pool of 1.
bayesian-gremlin-http | 21458 [gremlin-server-boss-1] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Channel started at port 8182.
data-model-importer_1 | [2017-06-08 15:22:29 +0000] [7] [INFO] Starting gunicorn 19.7.1
data-model-importer_1 | [2017-06-08 15:22:29 +0000] [7] [INFO] Listening at: http://0.0.0.0:9192 (7)
data-model-importer_1 | [2017-06-08 15:22:29 +0000] [7] [INFO] Using worker: sync
data-model-importer_1 | [2017-06-08 15:22:29 +0000] [12] [INFO] Booting worker with pid: 12
data-model-importer_1 | [2017-06-08 15:22:30 +0000] [12] [ERROR] Exception in worker process
data-model-importer_1 | Traceback (most recent call last):
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/gunicorn/arbiter.py", line 578, in spawn_worker
data-model-importer_1 | worker.init_process()
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/gunicorn/workers/base.py", line 126, in init_process
data-model-importer_1 | self.load_wsgi()
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/gunicorn/workers/base.py", line 135, in load_wsgi
data-model-importer_1 | self.wsgi = self.app.wsgi()
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
data-model-importer_1 | self.callable = self.load()
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/gunicorn/app/wsgiapp.py", line 65, in load
data-model-importer_1 | return self.load_wsgiapp()
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/gunicorn/app/wsgiapp.py", line 52, in load_wsgiapp
data-model-importer_1 | return util.import_app(self.app_uri)
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/gunicorn/util.py", line 352, in import_app
data-model-importer_1 | __import__(module)
data-model-importer_1 | File "/src/rest_api.py", line 23, in <module>
data-model-importer_1 | if not BayesianGraph.is_index_created():
data-model-importer_1 | File "/src/graph_manager.py", line 76, in is_index_created
data-model-importer_1 | status, json_result = cls.execute(str_gremlin_dsl)
data-model-importer_1 | File "/src/graph_manager.py", line 48, in execute
data-model-importer_1 | data=json.dumps(payload))
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/requests/api.py", line 112, in post
data-model-importer_1 | return request('post', url, data=data, json=json, **kwargs)
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/requests/api.py", line 58, in request
data-model-importer_1 | return session.request(method=method, url=url, **kwargs)
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 513, in request
data-model-importer_1 | resp = self.send(prep, **send_kwargs)
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 623, in send
data-model-importer_1 | r = adapter.send(request, **kwargs)
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 504, in send
data-model-importer_1 | raise ConnectionError(e, request=request)
data-model-importer_1 | ConnectionError: HTTPConnectionPool(host='bayesian-gremlin-http', port=8182): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x38ae510>: Failed to establish a new connection: [Errno 111] Connection refused',))
gremlin-http fails with following error when assigned same instance id as being in use previously
Caused by: com.thinkaurelius.titan.core.TitanException: A Titan graph with the same instance id [0a8102e51-bayesian-gremlin-http-1-xsjni1] is already open. Might required forced shutdown.
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.<init>(StandardTitanGraph.java:146)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:94)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:74)
... 13 more
We have seen this issue happening very rarely. Root cause of the problem is two Titan graphs with the same instance id does not work. It happens very rarely, and also, out of our scope to fix it to happening.
Issue Upstream - amazon-archives/dynamodb-janusgraph-storage-backend#198
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.