Giter Site home page Giter Site logo

gremlin-docker's People

Contributors

arajkumar avatar concaf avatar fridex avatar humaton avatar jmelis avatar jparsai avatar jpopelka avatar miteshvp avatar msrb avatar rajusem avatar sawood14012 avatar shaded-enmity avatar tuxdna avatar vpavlin avatar yzainee avatar yzainee-zz avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

gremlin-docker's Issues

Resulting image is huge

The resulting image is huge (~2.7GB) due to layers that are present. Even though there is a clean up done at the end in the Dockerfile, note that all files are present in the layers where they were introduced/downloaded.

Docker image fails to build

Following links seem to be dead:
https://s3.amazonaws.com/bayesian-titan110/titan-1.1.0-SNAPSHOT-hadoop2.zip
https://s3.amazonaws.com/bayesian-titan110/titan-all.tgz

Step 5 : RUN curl -o /opt/titan-1.1.0-SNAPSHOT-hadoop2.zip https://s3.amazonaws.com/bayesian-titan110/titan-1.1.0-SNAPSHOT-hadoop2.zip
 ---> Running in 62346995ad28
�[91m �[0m�[91m % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0�[0m�[91m
  0 �[0m�[91m    0    0    �[0m�[91m 0�[0m�[91m    0�[0m�[91m �[0m�[91m  �[0m�[91m �[0m�[91m �[0m�[91m0�[0m�[91m �[0m�[91m     0      0 --:--:-- --:--:-- --:--:--     0�[0m�[91m
100   307  �[0m�[91m  0   307    �[0m�[91m0�[0m�[91m �[0m�[91m �[0m�[91m �[0m�[91m �[0m�[91m �[0m�[91m0   �[0m�[91m 535      0 --:--:-- --:--:-- --:--:--   535
�[0m ---> 346f85945912
Removing intermediate container 62346995ad28
Step 6 : RUN curl -O https://s3.amazonaws.com/bayesian-titan110/titan-all.tgz &&    tar xzf titan-all.tgz
 ---> Running in 6c4b477eff81
�[91m �[0m�[91m % Total    % Received % Xferd  Average Speed   Time    Ti�[0m�[91mme     Time  Current
     �[0m�[91m �[0m�[91m                           Dload  Upload   Total   Spent   �[0m�[91m Left  Speed

  0     0    0    �[0m�[91m �[0m�[91m0    0     0      0      0 --:--:-- --:--:-- --:--:--     0�[0m�[91m
�[0m�[91m  0     0    0     0    0     0      0      0 --:--:-- --:-�[0m�[91m-:-- --:--:--     0�[0m�[91m
�[0m�[91m  0     0 �[0m�[91m  �[0m�[91m �[0m�[91m0�[0m�[91m     0�[0m�[91m �[0m�[91m �[0m�[91m �[0m�[91m �[0m�[91m0     0      0      0 --:--:--  0:00:01 --:--:--     0�[0m�[91m
100   �[0m�[91m307    0   307    0     0    181      0 --:--:--  0:00:01 --:--:--   181
�[0m�[91m
gzip: stdin: not in gzip format
�[0m�[91mtar: Child returned status 1
tar: Error is not recoverable: exiting now
�[0mThe command '/bin/sh -c curl -O https://s3.amazonaws.com/bayesian-titan110/titan-all.tgz &&    tar xzf titan-all.tgz' returned a non-zero code: 2
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Setting status of 3dcbb600fcf3f52e35f53996b6b8d05e13358522 to FAILURE with url https://cucos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/gremlin-docker-PRs/10/ and message: 'Build finished. '
ERROR: script returned exit code 1
Finished: FAILURE

Improve Graph Sync

This issue collates all the points that could be helpful in improving graph writes. There are three ways to solve the issue.

  • Improve current data_importer service
  1. Run data_importer gunicorn (HTTP Server) process in multi-worker mode
  2. Run more than one replica for data_importer in openshift
  • Split actual graph sync process in a way that other workers are not kept waiting
  1. Call data_importer/<ingest_to_graph> API in an asynchronous way. (Cons, we may loose logging ability. Pros, very simple to implement and no worker is kept waiting)
  2. Update the workflow to not wait for Graph Sync task (Not much idea if possible. cc @fridex)
  3. Implement data_importer/<ingest_to_graph> as part of selinon tasks. (Cons, implementation might require more time. Pros, all the selinon related advantage)
  • Improve Graph Writes
  1. Use single-model instead of multi-model which helps in faster writes. (Cons, rewrite full graph. Pros, writes will be much faster. Need to see how much faster with a small load test)

cc @msrb @krishnapaparaju @samuzzal-choudhury

Split Titan and Gremlin - impossible to scale

As the current stack looks like the following:

DynamoDB - Titan - Gremlin

We would like to split Titan and Gremlin into two standalone containers which would allow us to scale Gremlin independently. Note that creating multiple Titan instances talking to the same DynamoDB tables results in data inconsistencies and data corruptions as stated in "Titan Limitations":

Running multiple Titan instances on one machine backed by the same storage backend (distributed or local) requires that each of these instances has a unique configuration for storage.machine-id-appendix. Otherwise, these instances might overwrite each other leading to data corruption. See Graph Configuration for more information.

Source: http://titan.thinkaurelius.com/wikidoc/0.3.1/Titan-Limitations.html

Gremlin server can not be started on local machine (in Docker)

bayesian-gremlin-http   | 21457 [gremlin-server-boss-1] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Gremlin Server configured with worker thread pool of 1, gremlin pool of 8 and boss thread pool of 1.
bayesian-gremlin-http   | 21458 [gremlin-server-boss-1] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Channel started at port 8182.
data-model-importer_1   | [2017-06-08 15:22:29 +0000] [7] [INFO] Starting gunicorn 19.7.1
data-model-importer_1   | [2017-06-08 15:22:29 +0000] [7] [INFO] Listening at: http://0.0.0.0:9192 (7)
data-model-importer_1   | [2017-06-08 15:22:29 +0000] [7] [INFO] Using worker: sync
data-model-importer_1   | [2017-06-08 15:22:29 +0000] [12] [INFO] Booting worker with pid: 12
data-model-importer_1   | [2017-06-08 15:22:30 +0000] [12] [ERROR] Exception in worker process
data-model-importer_1   | Traceback (most recent call last):
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/gunicorn/arbiter.py", line 578, in spawn_worker
data-model-importer_1   |     worker.init_process()
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/gunicorn/workers/base.py", line 126, in init_process
data-model-importer_1   |     self.load_wsgi()
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/gunicorn/workers/base.py", line 135, in load_wsgi
data-model-importer_1   |     self.wsgi = self.app.wsgi()
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
data-model-importer_1   |     self.callable = self.load()
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/gunicorn/app/wsgiapp.py", line 65, in load
data-model-importer_1   |     return self.load_wsgiapp()
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/gunicorn/app/wsgiapp.py", line 52, in load_wsgiapp
data-model-importer_1   |     return util.import_app(self.app_uri)
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/gunicorn/util.py", line 352, in import_app
data-model-importer_1   |     __import__(module)
data-model-importer_1   |   File "/src/rest_api.py", line 23, in <module>
data-model-importer_1   |     if not BayesianGraph.is_index_created():
data-model-importer_1   |   File "/src/graph_manager.py", line 76, in is_index_created
data-model-importer_1   |     status, json_result = cls.execute(str_gremlin_dsl)
data-model-importer_1   |   File "/src/graph_manager.py", line 48, in execute
data-model-importer_1   |     data=json.dumps(payload))
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/requests/api.py", line 112, in post
data-model-importer_1   |     return request('post', url, data=data, json=json, **kwargs)
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/requests/api.py", line 58, in request
data-model-importer_1   |     return session.request(method=method, url=url, **kwargs)
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 513, in request
data-model-importer_1   |     resp = self.send(prep, **send_kwargs)
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 623, in send
data-model-importer_1   |     r = adapter.send(request, **kwargs)
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 504, in send
data-model-importer_1   |     raise ConnectionError(e, request=request)
data-model-importer_1   | ConnectionError: HTTPConnectionPool(host='bayesian-gremlin-http', port=8182): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x38ae510>: Failed to establish a new connection: [Errno 111] Connection refused',))

gremlin-http fails with following error when assigned same instance id as being in use previously

Caused by: com.thinkaurelius.titan.core.TitanException: A Titan graph with the same instance id [0a8102e51-bayesian-gremlin-http-1-xsjni1] is already open. Might required forced shutdown.
    at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.<init>(StandardTitanGraph.java:146)
    at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:94)
    at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:74)
... 13 more

We have seen this issue happening very rarely. Root cause of the problem is two Titan graphs with the same instance id does not work. It happens very rarely, and also, out of our scope to fix it to happening.

Issue Upstream - amazon-archives/dynamodb-janusgraph-storage-backend#198

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.