Light

z5labs / megamind Goto Github PK

View Code? Open in Web Editor NEW

3.0 2.0 0.0 152 KB

An eventually-consistent Knowledge Graph construction system

License: Apache License 2.0

Starlark 2.81% Go 12.31% HCL 84.89%

go golang knative knowledge-graph kubernetes grpc

megamind's Introduction

An eventually consistent Knowledge Graph construction service.

Megamind is a system for constructing Knowledge Graphs through subgraphs. The main objective of Megamind is to provide a modern, resilient, cloud agnostic service for ingesting data into your Knowledge Graph.

Features

Modern

Megamind implements multiple different API endpoints allowing you to quickly and easily integrate into your current stack. See the following list for current API endpoint support:

RESTful endpint
gRPC endpoint

Cloud Agnostic

Megamind is built solely on Kubernetes and Knative. Meaning you can deploy Megamind anywhere Kubernetes is supported, such as:

AWS EKS
Google Cloud GKE
Azure AKS
and, of course, your own private infrastructure

Resilient

Megamind leverages Kubernetes and Knative to implement an event-driven architecture with great resiliency.

Install with Terraform

TBD

Install with Helm

TBD

Developers

Please see Contributing to Megamind for guidelines on contributions.

megamind's People

Contributors

Stargazers

Watchers

megamind's Issues

build(service/subgraph-ingester): add bazel target for container image

Use bazel to build a container image for the subgraph-ingester service, which can later be published to a public image registry.

infra(knative/serving): config sslip.io dns for local testing support

feat(terraform): create megamind module and k8s namespace

story(infra): add module for deploying neo4j

Description

As megamind, I want to support Neo4j as the Knowledge Graph datastore, so that Neo4j customers can easily adopt megamind.

Acceptance Criteria

A k8s yaml is created for deploying it
A terraform module is created for deploying it

Related Issues

No response

story(knative/eventing): use kafka broker and channels

Description

As megamind, I want to rely on a performant, scalable, and reliable event broker and channels, so that the risk of eventing failing is low.

Acceptance Criteria

Kafka Broker is deployable via k8s yaml
Kafka Broker is deployable via terraform
Kafka Channel is deployable via k8s yaml
Kafka Channel is deployable via terraform

Related Issues

No response

docs(megamind): forgot terraform as a prereq for contributing

deploy(service/entity-registry): use terraform to deploy to a k8s cluster

OCI image is deployed to ghcr.io
rbac policies are configured for accessing appropriate resources
entitiy-registry is registered with Knative serving

story(megamind): implement ingest service direct to dgraph

Description

As a dev, I want to simply implement an ingest service which goes directly to Dgraph, so that I can quickly begin using megamind and gain some insights.

Acceptance Criteria

it works

Related Issues

No response

story(service/subgraph-ingester): add missing license headers

Description

As the open source maintainer, I want to make sure every file has a license header, so that anyone trying to use this work clearly can identify copyrighted material.

Acceptance Criteria

License headers are added to files where they're missing

Related Issues

No response

story(service/graph-mutator): neo4j support

Description

As the graph-mutator, I want to support Neo4j as a datastore, so that existing Neo4j customers can quickly adopt megamind for their Knowledge Graph projects.

Acceptance Criteria

Should map labelled subgraph into Neo4j format
Should receive Neo4j mutation response with node ids for any newly created nodes

Related Issues

depends: #62

docs(arch/megamind): make reparier service non cron job based

By leveraging, etcd as the KV store, the Repairer service can use etcd watch streams to fix disjointedness in real time.

story(build): migrate to bazel modules

Description

As a maintainer, I want to migrate from bazel workspaces to bazel modules, so that greater reproducibility can be achieved.

Acceptance Criteria

build via bazel modules

Related Issues

No response

idea(contributing): introduce agile practices

Description

Let's better formalize our workflow process by using "story" issues. A "story" issue is basically just an issue template but following the "standards" of an agile user story. For example the template would at minimum require the following:

brief user story description
acceptance criteria for recognizing its completed
link(s) to issue(s) which caused the creation of this story
story points (probably best to do this through a label)

The flow for creating a "story" issue would be to start with a bug or feature issue and upon reviewing of those a "story" issue would be created to actually encompass the work required to address that bug or feature.

My thoughts

I believe this will help us better track work since all work item will now be standardized into a story.
I believe this will help new contributors quickly onboard since they won't have to read random issues. Instead they filter for stories which have already been reviewed and should contain all the information necessary to be completed.
Story points can also help new contribs determine if they're able to handle the work.

feat(infra): use terraform to config knative on k8s

story(infra): deploy etcd into megamind namespace

Description

As megamind, I want to use a distributed cache to store the database ids for known nodes, so that disjointedness in the Knowledge Graph can be minimized.

Acceptance Criteria

etcd is deployable using megamind k8s yaml
etcd is deployable using megamind terraform

Related Issues

No response

chore(git): ignore tools directory used in some editors to support bazel and go

story(tool/megamind): fmt subgraphs command

Description

As a end user, I want a cli for reformatting subgraphs, so that I can use in adhoc quick scripts or actual scripts to do things like convert from JSON (human readable) to Proto (non-human readable).

Acceptance Criteria

Subgraphs can be read from either a file or stdin
Subgraphs are expected to be newline separated
Multiple input/output formats should be supported: JSON, Proto
Newly formatted subgraphs should be written to stdout

Related Issues

No response

story(tool/megamind): read subgraphs for ingestation

Description

As a consumer, I want a cli for ingesting subgraphs to Megamind, so that I can leverage it in scripts or while demoing/poc-ing.

Acceptance Criteria

Subgraphs can be read from either a file or stdin
Multiple encodings should be supported: JSON, Proto

Related Issues

No response

chore(codeowners): setup codeowners

Create CODEOWNERS file with at least the following assignments:

@erictg should be responsible for any k8s yaml config/terraform
@Zaba505 should be responsible for any bazel config

story(github-pages): initialize site

Description

As an end user, I want to interact with a modern product/documentation site instead of reading through a github repo, so that I (possibly non tech person) can find details and answers to my questions quicker.

Acceptance Criteria

Landing page
Docs page roughed out with "Under construction"
Link to github repo
Build and deploy using Github Action workflow

Related Issues

No response

story(service/subgraph-ingester): publish subgraphs to knative

Description

As the subgraph-ingester, I want to publish client subgraphs to Knative, so that downstream event-driven components can resiliently and performantly handle the ingestion of the data into the Knowledge Graph.

Acceptance Criteria

Max event size is identified
Subgraphs are broken down into smaller subgraphs if they're greater than the max event size
Subgraphs are successfully published

Related Issues

depends: #

story(tool/megamind): topologically sort subgraphs for ingestation

Description

As a consumer, I want a cli for ingesting subgraphs to Megamind, so that I can leverage it in scripts or while demoing/poc-ing.

Acceptance Criteria

Subgraphs should be merged into one subgraph
Line graph should constructed from subgraph
Line graph should be topologically sorted
Triples should be ingested in reverse topological order

Related Issues

No response

feat(proto/subgraph): define a "labeled" subgraph

Create a new protobuf message which is a subgraph but also allows database ids to be attached to Subjects.

story(cicd): migrate away from deprecated set-output function

Description

As a maintainer, I want to update the Github Action workflow to no longer use set-output, so that once Github completely removes it the workflow won't break.

Acceptance Criteria

set-output is replaced with appending a key-value pair to $GITHUB_ENV

Related Issues

No response

story(deps): enable renovate bot for automated dependency updates

Description

As a maintainer, I want to enable renovate bot, so that dependencies can be automatically updated.

Acceptance Criteria

it works

Related Issues

No response

story(codeowners): assign various teams to be certain file types

Description

As a project maintainer, I want to properly assigned to PRs for review so that they can hopefully be approved sooner than later.

Acceptance Criteria

Each file type in the repo has a team corresponding to it

Related Issues

implements: #48

story(service/graph-mutator): base implementation

Description

As megamind, I want an event-driven service to ingest labelled subgraphs into the Knowledge Graph, so that the overall system resiliency and performance can scale effortlessly.

Acceptance Criteria

Trigger on labelled subgraph event
Handle cloudevents with labelled subgraph as event data
Log out event details
DO NOT DO CACHING
DO NOT DO ACTUAL DUMPING INTO A DB

Related Issues

No response

story(service/entity-registry): base implementation

Description

As megamind, I want to implement an event-driven service for auto-labeling known nodes with their database ids, so that disjointedness within the knowledge graph can be minimized.

Acceptance Criteria

Trigger on unlabelled subgraph event
Handle cloudevent with subgraph as event data
Publish labelled subgraph event
Do not implement cache lookup

Related Issues

No response

story(prototype): define event names

Description

As megamind, I want to standardize and define the events emitted/consumed by each service, so that each service can be developed independently without the risk of having to be changed in the near future because event names were chosen willy-nilly in the beginning.

Acceptance Criteria

A "standard" naming convention is defined
At minimum, all events for prototype milestone should be defined

Related Issues

No response

docs(megamind): contributing instructions

chore(license): add an open source license

docs(contributing): create story issue template

Based on discussion #54 a more agile development is being taken for managing Megamind. A part of that will be using specific issues types, called stories, to track work. Reiterating from discussion #54, the story creation process will then flow as follows:

User creates generic issue (ideally following the conventional commit styling)
a. Project members who are experienced in our agile flow may skip this and instead head straight to step 2
Project member reviews generic issue and if its determined to add value, then a story issue is created to replace it.
The new story will be assigned to the Megamind Refinement project.
a. Then project members will "regularly" meet to refine the stories until they believe anyone can read the story and complete the work
Once the story is ready it will be transitioned to the main project board, Megamind and assigned to one of the milestones
At this point, the story will just remain in the project backlog until someone decides to pick it up in a sprint

infra(knative): deploy eventing and serving in megamind namespace

Synopsis

Currently, the Knative eventing and serving components which act as the backbone for megamind are deployed in their own namespaces. My proposal is to deploy all Knative dependencies under the megamind namespace with the megamind helm chart/terraform module.

Reason for separate namespace

By keeping the Knative dependencies out of megamind, it allows for better resource sharing of clusters with Knative already installed on them. It also allows users to customize Knative to broader use cases.

Reason for including in megamind namespace

By bundling the Knative dependencies into megamind, it gives us better control of tweaking Knative to optimize the performance of megamind. It also greatly simplifies the deployment of megamind for users since they won't have to config Knative themselves.

feat(service/subgraph-ingester): implement a rest-ful api

deploy(service/subgraph-ingester): use terraform to deploy to a k8s cluster

Deploy using knative serving

deploy(service/subgraph-ingester): publish container image to ghcr

Publish container image to Github Container Registry

infra(knative): add terraform module for deploying knative serving

story(proto): generate go code for protobuf

Description

As a contributor, I want the proto files to be generated into go code, so that code completion works in my dev environment.

Acceptance Criteria

All proto files are generated

Related Issues

No response

build(megamind): setup github actions workflow using bazel

feat(service/subgraph-ingester): implement a grpc api for clients to stream subgraphs to

deploy(service/graph-mutator): use terraform to deploy to a k8s cluster

story(build): enable codeql

Description

As a maintainer, I want to enable codeql for code scanning, so that code quality and easy mistakes can hopefully be avoided.

Acceptance Criteria

it works

Related Issues

No response

story(roadmap): add roadmap file

Description

As an end user, I want to know what the future of Megamind is, so that I can appropriately evaluate if it fits my use case or will fit in a future release.

Acceptance Criteria

All milestones up to and including V2 are documented

Related Issues

No response

story(service/graph-mutator): publish node ids for caching

Description

As megamind, I want to cache node ids from the Knowledge Graph datastore to help minimize disjointedness.

Acceptance Criteria

Event is successfully published to channel

Related Issues

depends: #59

story(tool/megamind): rough out cli

Description

As a consumer, I want a cli tool for working with Megamind, so that I can leverage it in scripts or while demoing/poc-ing.

Acceptance Criteria

The root command should be defined
Global logging flags should be defined

Related Issues

No response

story(service/entity-registry): check etcd for any known nodes

Description

As megamind, I want to read database ids of known nodes from a cache, so that disjointedness in the Knowledge Graph downstream can be minimized.

Acceptance Criteria

Database id is able to be retrieved from cache using TUID (type unique id) from unlabelled subgraph
Database id is added to labelled subgraph

Related Issues

depends: #60

docs(arch/simple): create simple arch to explain issues with it

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.