Giter Site home page Giter Site logo

aws-samples / amazon-neptune-samples Goto Github PK

View Code? Open in Web Editor NEW
353.0 55.0 142.0 51.99 MB

Samples and documentation for using the Amazon Neptune graph database service

License: MIT No Attribution

Python 3.11% Jupyter Notebook 3.96% Shell 0.09% JavaScript 43.96% HTML 28.58% Java 19.47% Groovy 0.02% CSS 0.04% Dockerfile 0.03% TypeScript 0.74%
aws-neptune amazon-neptune gremlin graph-model rdf-graph

amazon-neptune-samples's Introduction

Amazon Neptune Samples

Samples and documentation for using the Amazon Neptune graph database service

Tools and Utilities

You may also be interested in the Amazon Neptune Tools github repository, which includes tools for data export, conversion, gremlin client load balancing, and more.

Amazon Neptune Graphs and Jupyter Notebooks

[August 2021] The Neptune-SageMaker examples in this repository have been deprecated in favour of the Amazon Neptune Workbench. We recommend that you use the Workbench for all new Neptune notebook development. Alternatively, you can create your own notebooks using the neptune-python-utils library. Note that neptune-python-utils supports Gremlin Python 3.5.x. As such, it is not compatible with the Neptune Workbench, which currently supports 3.4.x.

Whether you’re creating a new graph data model and queries, or exploring an existing graph dataset, it can be useful to have an interactive query environment that allows you to visualize the results. This directory has samples from two blog posts to show you how to achieve this by connecting an Amazon SageMaker notebook to an Amazon Neptune database. Using the notebook, you load data into the database, query it and visualize the results.

Writing to Amazon Neptune from Amazon Kinesis Data Streams

This example demonstrates using an Amazon Kinesis Data Stream and AWS Lambda to issue batch writes to Amazon Neptune. The code samples use the Gremlin API, but it can be readily adapted for RDF graphs as well.

ETL Process for Transforming and Loading Data Into Amazon Neptune

The following lab uses the open IMDB dataset. This is a small subset of the full IMDB.com application. With this dataset, we want to develop an application that allows for us to find whether or not an actor or actress is no more than six degrees separated from the actor Kevin Bacon. In this example, AWS Glue and Amazon Athena are used to discover and transform the relational model used by IMDB into a graph model that can be loaded into Amazon Neptune. This pattern can be used to transform other relational models into graph models for similar purposes.

Gremlin

Chatbot Full Stack Application

This is an example of using Amazon Neptune in combination with Amazon Comprehend and Amazon Lex to build a full stack knowledge graph application with NLP and Natural Language Search capabilities.

Collaborative Filtering

This is an example of using Gremlin and making recommendations using collaborative filtering. It includes examples of loading data and Gremlin traversals.

Visualize data in Amazon Neptune using VIS.js library

This GitHub lab will take you through hands-on excercise of visualizing graph data in Amazon Neptune using VIS.js library. Amazon Neptune is a fast, reliable, fully-managed graph database service available from AWS. With Amazon Neptune you can use open source and popular graph query languages such as Apache TinkerPop Gremlin for property graph databases or SPARQL for W3C RDF model graph databases.

Property Graph Data Models

These examples demonstrate a "working backwards" approach to designing and implementing an application graph data model and queries based on a backlog of use cases.

The design process here is shown in "slow motion", with each use case triggering a revision to the data model, the introduction of new queries, and updates to existing queries. In many real-world design sessions you would compress these activities and address several use cases at once, converging on a model more immediately. The intention here is to unpack the iterative and evolutionary nature of the many modelling processes that often complete in the "blink of an eye".

RDF & SPARQL

PG-Schema for RDF

An experiment to extend the PG-Schema proposal to work with RDF also.

More coming soon!

Related samples

Serverless Application with AWS AppSync and Amazon Neptune

You may also be interested in this example serverless application that uses AWS AppSync GraphQL and Amazon Neptune.

License Summary

This sample code is made available under a modified MIT license. See the LICENSE file.

amazon-neptune-samples's People

Contributors

abhishekpradeepmishra avatar angad02 avatar bechbd avatar beebs-systap avatar chriscoombs avatar colefichter avatar dependabot[bot] avatar don-simpson avatar ejazsayyed avatar ejazsyd avatar eksoward avatar iansrobinson avatar jaypeeig avatar jpeddicord avatar krlawrence avatar kwokmeli avatar maishsk avatar michaelnchin avatar mikelabib avatar nmoutschen avatar oranmoshai avatar rdfguy avatar triggan avatar vinodabh avatar w8r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amazon-neptune-samples's Issues

Gremlin 3.5.4 gives error from Java code but not from gremlin console (Connecting to Neptune)

Hi team, I have followed the following link https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-java.html to create a sample application. This works fine. But when I add one traversal to add a vertex it fails saying the {"requestId":"xyz","code":"ConstraintViolationException","detailedMessage":"Vertex with id already exists: "} even though the id is not ingested. The traversal that I have added is -

g.inject(1).union(__.addV('label').property(T.id, 'uniqueId1').property('prop1','val1')).valueMap(true)

What is surprising the same query runs from the gremlin console and does not complain about vertex already ingested.

Is there something in the query that must be changed for it to run from the code ?

Noticed two Parameters Keys

Hey, I'm working on enabling streams in my Neptune template and checked here for an example of how to implement it. I noticed in the json that there are two parameters keys (lines 110 and 115), which seems like a mistake.

Reference:

"NeptuneDBClusterParameterGroup": {
"Type": "AWS::Neptune::DBClusterParameterGroup",
"Properties": {
"Family": "neptune1",
"Description": {
"Fn::Sub": "${ApplicationID} DB cluster parameter group"
},
"Name": {
"Fn::Sub": "${ApplicationID}-cluster-parameter-group"
},
"Parameters": {
"neptune_enable_audit_log": {
"Ref": "NeptuneEnableAuditLog"
}
},
"Parameters": {
"neptune_lab_mode": "Streams=enabled"
},
"Tags": [
{
"Key": "Name",
"Value": "Neptune DB cluster parameter group"
}
]
}
},

[amazon-neptune-and-aws-cdk-for-amundsen] In Customization

Hi Team,

We have already working AWS account where we don't have option to create new vpn and public subnet. How can we customize this git as per our environment .

Customization like :

1)- Use existing VPC and respective perquisites.
2)- How to do all config in public subnet.
3)- ECS cluster creation and using ECR for docker image etc.

visualize-graph line 58

I believe that this line should be this:
var PROXY_API_URL = API_GATEWAY_ENDPOINT;

Otherwise when trying to do Step 6, it cannot find anything to replace.

AWS Neptune - Submit Query with empty string crashes

Script with " " (empty space with space) works fine but "" empty string crashes.

gremlin> g.addV("Test").property("title", "Test node 1").property("a", "")
{"requestId":"111xxxx-xxx-xxx-xxx-xxx","code":"MalformedQueryException","detailedMessage":"Query parsing failed at line 1, character position at 62, error message : no viable alternative at input 'g.addV(\"Test\").property(\"title\",\"Test node 1\").property(\"a\",\"\"'"}
Type ':help' or ':h' for help.
Display stack trace? [yN]


gremlin> g.addV("Test").property("title", "Test node 1").property("a", " ")
==>v[98b22f0f-6be0-fb11-38cc-066bf7e17051]

This works fine with NEO4J Gremlin, so I doubt this is a Gremlin issue. Is this a Neptune bug or feature?

Not able to do bulk load in aws neptune from aws s3

I am not able to do neptune bulk load from aws s3 bucket. Loading failed comes when I load data from s3 bucket to neptune.
Command:
awscurl -X POST --service neptune-db -H 'Content-Type: application/json' --region us-east-2 \

https://:8182/loader -d'
{
"source" : "s3:///Unsaved/2022/12/13/4a873928-9910-47b0-85ca-de593ace4f4a.csv",
"format" : "csv",
"iamRoleArn" : "arn:aws:iam::959061167427:role/NeptuneLoadFroms3",
"region" : "us-east-2",
"failOnError" : "FALSE"
}'
Output:
{
"status" : "200 OK",
"payload" : {
"loadId" : "37aeb194-677a-4cdf-a577-dae8684a6681"
}
}
Command: awscurl --service neptune-db 'https://:8182/loader/37aeb194-677a-4cdf-a577-dae8684a6681' --region us-east-2
output:
{
"status" : "200 OK",
"payload" : {
"feedCount" : [
{
"LOAD_COMPLETED" : 1
},
{
"LOAD_FAILED" : 1
}
],
"overallStatus" : {
"fullUri" : "s3://
**/Unsaved/2022/12/13/4a873928-9910-47b0-85ca-de593ace4f4a.csv",
"runNumber" : 1,
"retryNumber" : 12,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 3,
"startTime" : 1672193300,
"totalRecords" : 1,
"totalDuplicates" : 0,
"parsingErrors" : 1,
"datatypeMismatchErrors" : 0,
"insertErrors" : 0
}
}
}

Can anyone help me with this ?

PySpark Code?

Hello Team,

Is there a way i can use PySpark to extract from Neptune database and write into S3?I know we can write into S3 but my problem is with PySpark. Assistance on this is muchappreciated.

Thanks

Add CDK example for setting up a Neptune cluster with a SageMaker Notebook workbench associated to it

General Information
A CDK example of how to set up Neptune with the Notebook workbench would be useful for anyone working with Amazon Neptune.

Proposed Solution
The CDK example would set up the following components to achieve this solution

  • VPC with subnet and security group
  • Neptune Cluster with cluster and db parameters
  • Notebook with an IAM role, custom lifecycle and network association

Acknowledge

  • I may be able to implement this feature request

Remove instructions for public read access to S3 buckets

We should remove public s3 read access instructions for S3 bucket.

--create Amazon S3 bucket with public read access
aws s3api create-bucket --bucket --acl public-read --region --create-bucket-configuration LocationConstraint=

--
This should be done using Cloudfront distribution with policy to restrict access to the CloudFrontOriginAccessIdentity

sdk reference in cloudFormation template not available anymore

Hello,

I was trying to plug a notebook to an existing neptune instance, following the Analyze Amazon Neptune Graphs using Amazon SageMaker Jupyter Notebooks tutorial. (the part "What if I want to reuse an existing Neptune cluster with SageMaker?")

The cloudFormation template is referencing an eclipse rdf4j sdk version that is not available anymore, resulting in the stack failing when trying to download it.

" wget https://ftp.osuosl.org/pub/eclipse/rdf4j/eclipse-rdf4j-2.3.2-sdk.zip\n",

I used version 2.4.6 instead and everything went fine, but you may want to consider 2.5.x or 3.x versions.

Thanks for the great tutorial.

Cannot run the event loop while another loop is running

I had to build neptune_python_utils on python3.6.

Running this on SageMaker Jupyter.

neptune_endpoint = 'neptunecluster.cluster-cghdntee9kjh.us-east-1.neptune.amazonaws.com' neptune_port = 8182 neptune.clear(neptune_endpoint=neptune_endpoint, neptune_port=neptune_port)

clearing data...
clearing property graph data [edge_batch_size=200, edge_count=Unknown]...

RuntimeError Traceback (most recent call last)
in ()
----> 1 neptune.clear(neptune_endpoint=neptune_endpoint, neptune_port=neptune_port)

~/SageMaker/util/neptune.py in clear(self, neptune_endpoint, neptune_port, batch_size, edge_batch_size, vertex_batch_size)
60 def clear(self, neptune_endpoint=None, neptune_port=None, batch_size=200, edge_batch_size=None, vertex_batch_size=None):
61 print('clearing data...')
---> 62 self.clearGremlin(neptune_endpoint, neptune_port, batch_size, edge_batch_size, vertex_batch_size)
63 self.clearSparql(neptune_endpoint, neptune_port)
64 print('done')

~/SageMaker/util/neptune.py in clearGremlin(self, neptune_endpoint, neptune_port, batch_size, edge_batch_size, vertex_batch_size)
77 else:
78 print('clearing property graph data [edge_batch_size={}, edge_count={}]...'.format(edge_batch_size, edge_count))
---> 79 g.E().limit(edge_batch_size).drop().toList()
80 edge_count = g.E().count().next()
81 has_edges = (edge_count > 0)

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/process/traversal.py in toList(self)
56
57 def toList(self):
---> 58 return list(iter(self))
59
60 def toSet(self):

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/process/traversal.py in next(self)
46 def next(self):
47 if self.traversers is None:
---> 48 self.traversal_strategies.apply_strategies(self)
49 if self.last_traverser is None:
50 self.last_traverser = next(self.traversers)

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/process/traversal.py in apply_strategies(self, traversal)
571 def apply_strategies(self, traversal):
572 for traversal_strategy in self.traversal_strategies:
--> 573 traversal_strategy.apply(traversal)
574
575 def apply_async_strategies(self, traversal):

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/remote_connection.py in apply(self, traversal)
147 def apply(self, traversal):
148 if traversal.traversers is None:
--> 149 remote_traversal = self.remote_connection.submit(traversal.bytecode)
150 traversal.remote_results = remote_traversal
151 traversal.side_effects = remote_traversal.side_effects

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/driver_remote_connection.py in submit(self, bytecode)
53
54 def submit(self, bytecode):
---> 55 result_set = self._client.submit(bytecode)
56 results = result_set.all().result()
57 side_effects = RemoteTraversalSideEffects(result_set.request_id, self._client,

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/client.py in submit(self, message, bindings)
109
110 def submit(self, message, bindings=None):
--> 111 return self.submitAsync(message, bindings=bindings).result()
112
113 def submitAsync(self, message, bindings=None):

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/client.py in submitAsync(self, message, bindings)
125 message.args.update({'bindings': bindings})
126 conn = self._pool.get(True)
--> 127 return conn.write(message)

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/connection.py in write(self, request_message)
53 def write(self, request_message):
54 if not self._inited:
---> 55 self.connect()
56 request_id = str(uuid.uuid4())
57 result_set = resultset.ResultSet(queue.Queue(), request_id)

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/connection.py in connect(self)
43 self._transport.close()
44 self._transport = self._transport_factory()
---> 45 self._transport.connect(self._url, self._headers)
46 self._protocol.connection_made(self._transport)
47 self._inited = True

~/anaconda3/envs/python3/lib/python3.6/site-packages/gremlin_python/driver/tornado/transport.py in connect(self, url, headers)
34 url = httpclient.HTTPRequest(url, headers=headers)
35 self._ws = self._loop.run_sync(
---> 36 lambda: websocket.websocket_connect(url))
37
38 def write(self, message):

~/anaconda3/envs/python3/lib/python3.6/site-packages/tornado/ioloop.py in run_sync(self, func, timeout)
569 self.stop()
570 timeout_handle = self.add_timeout(self.time() + timeout, timeout_callback)
--> 571 self.start()
572 if timeout is not None:
573 self.remove_timeout(timeout_handle)

~/anaconda3/envs/python3/lib/python3.6/site-packages/tornado/platform/asyncio.py in start(self)
130 self._setup_logging()
131 asyncio.set_event_loop(self.asyncio_loop)
--> 132 self.asyncio_loop.run_forever()
133 finally:
134 asyncio.set_event_loop(old_loop)

~/anaconda3/envs/python3/lib/python3.6/asyncio/base_events.py in run_forever(self)
410 if events._get_running_loop() is not None:
411 raise RuntimeError(
--> 412 'Cannot run the event loop while another loop is running')
413 self._set_coroutine_wrapper(self._debug)
414 self._thread_id = threading.get_ident()

RuntimeError: Cannot run the event loop while another loop is running

Not able to get "status" : "200 OK" for Bulk Load API Call

I am not able to do Neptune bulk load from AWS S3 using curl command Bulk Load API Call ?
Command:
curl -X POST
-H 'Content-Type: application/json'
https://*.cluster-c4brigvg3m9m.us-east-2.neptune.amazonaws.com:8182/loader -d'
{
"source" : "s3://
/Unsaved/2022/12/13/4a873928-9910-47b0-85ca-de593ace4f4a.csv",
"format" : "csv",
"iamRoleArn" : "arn:aws:iam::959061167427:role/NeptuneLoadFroms3",
"region" : "us-east-2",
"failOnError" : "FALSE",
}'

This is the error I am getting:
{"code":"AccessDeniedException","requestId":"f6243cd3-2a4f-48a2-9d91-13803c199ef1","detailedMessage":"Missing Authentication Token"}

Can you please help me why I am getting this error and How can I resolve it ?

It is not clear how to make collaborative filtering example for many users

The collaborative filtering example is good, but for a real app, we would have to make millions of requests, spending a lof of time going from app to Neptune. How can we change this example for many users in a single query?

g.V().has('GamerAlias','skywalker123')

How can we make a query that get the topN for many users, not only for skywalker123?

Is this dataset still available and in which zones?

I'm testing the integration of our application with Amazon Neptune and I would like to load this dump onto my cluster.

My netpune is on eu-west-1.
NOTE: I'm not an AWS expert.

I've got an Ec2 instance from where I'm trying to load the dump to my cluster.
I gave this command:

curl -X POST \
    -H 'Content-Type: application/json' \
    http://mycluster.end.point:8182/loader -d '
    {
      "source" : "s3://neptune-data-ml/recommendation/",
      "accessKey" : "my arn",
      "secretKey" : "my secret key",
      "format" : "csv",
      "region" : "us-east-1", 
      "failOnError" : "FALSE"
    }'

It fails.

Is the dump available on eu-west-1 too?
Is the dump still available

Thank you in advance.

Not able to load data using neptune bulk load

Submit api call get initiate successfully with status 200 ok.
But when able to check Monitor the Loading Process using this command then I am able to get this error.
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://*****:8182/loader/33aaf4f9-54b5-4f7b-8b39-b00cfb379397

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.