Giter Site home page Giter Site logo

open-saves's Introduction

Open Saves

GoPkg Widget Go Report Card License GitHub release

Open Saves is an open-source, purpose-built single interface for multiple storage backends on Google Cloud.

With Open Saves, game developers can run a cloud-native storage system that is:

  • Simple: Open Saves provides a unified, well-defined gRPC endpoint for all operations for metadata, structured, and unstructured objects.
  • Fast: With a built-in caching system, Open Saves optimizes data placements based on access frequency and data size, all to achieve both low latency for smaller binary objects and high throughput for big objects.
  • Scalable: The Open Saves API server can run on either Google Kubernetes Engine, or Cloud Run. Both platforms can scale out to handle hundreds of thousands of requests per second. Open Saves also stores data in Google Datastore and Cloud Storage, and can handle hundreds of gigabytes of data.

Table of Contents

Disclaimer

This software is currently beta, and subject to change. It is not yet ready to serve production workloads.

Code of Conduct

Participation in this project comes under the Contributor Covenant Code of Conduct.

License

Apache 2.0

open-saves's People

Contributors

ano12ak avatar dependabot[bot] avatar hongalex avatar irataxy avatar karenarialin avatar lorangf avatar piotr-mpg avatar thisisnotapril avatar vasconcelosvcd avatar yuryu avatar zaratsian avatar zurvarian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

open-saves's Issues

Architecture and terminology

Here's the overall architecture of Triton:

image

In scope:

  • Triton server
  • Client SDKs for Java and Go

Out of scope:

  • Game servers and/or gateway servers for each game service
  • Game clients and direct access from game clients

Terminology

  • User: developers who want to build services with Triton
  • Server: the triton server running on Cloud Run
  • Client: the client sdks that users use to integrate their services to Triton
  • SDKs: client code to call triton services
  • Game client: actual game client on PC and game consoles
  • Game server: game servers or gateway servers that consumes the Triton APIs to store/load data

I'll add this to the docs directory once I get sign-offs. I'll update this issue as I get feedback.

Use user supplied primary key for store, and owner_id for external IDs

Is your feature request related to a problem? Please describe.
Feature request:

(1) Align the Store and Record messages to both use a "User provided primary key" called key instead of an internal ID called id.
(2) Rename both the external id to be called owner_id.

Describe the solution you'd like
Proposed proto messages for Store and Record:

message Store {
  string key;    // MODIFIED, User defined primary key (instead of id)
  string name = 2;
  StoreOptions options = 3;
  string owner_id = 4;     // MODIFIED, used to assign store ownership
}

message Record {
  string key = 1;       // User defined primary key
  bytes blob = 2;       // opaque to the server
  int64 blob_size = 3;  // defined by the server
  map<string, StructuredData> stuctured_data = 4;
  string owner_id = 5;  // MODIFIED, used to frecord ownership
  repeated string tags = 6;
  // "player:1", "system", "inventory:xxx"
}

Describe alternatives you've considered
The current solution can be used as is, but required an extra lookup by the Frontend API server to find the internal Store ID to use to query the records.

Using a user defined primary key in the Store message as well would align the implementation of stores and records, and remove the need for an extra lookup (or caching) to get the store ID.

Additional context

Use Google style in proto files

Is your feature request related to a problem? Please describe.
Use clang-format --style=Google for proto files

Describe alternatives you've considered
Other styles like LLVM

Support large blobs bigger than a single gRPC message

Is your feature request related to a problem? Please describe.
Blobs stored in Triton could be several hundreds megabytes or larger. We need to support streaming gRPC calls or other measures to support this.

Describe the solution you'd like
One possible way of doing this is to use streaming gRPC endpoints. grpc-gateway doesn't support client-side streaming, so we would need to either

  • Drop the REST endpoint for now
  • Find another way

Describe alternatives you've considered

  • Manual chunking would not be as robust as the proven standard gRPC method.

Define storage backend interfaces

Is your feature request related to a problem? Please describe.
As in the architecture, we need three interfaces that represent each of the backend storage classes.

Describe the solution you'd like
Each interface should include operations like CRUD and basic query if necessary. We can then implement interfaces for backing service providers, such as Cloud Datastore and Redis.

How we define the interfaces is TBD. We could use Go interfaces to define common operations, for example. Each backing services will be statically linked.

Describe alternatives you've considered

  • Directly implementing providers without defining interfaces: While we prioritize the support for GCP backends but defining clear interfaces would help in the long run without introducing much complexity.
  • Using the standard go plugin: It is not meant to support dynamic loading of third party plugins. Also this only works on Linux, FreeBSD, and MacOS. Not suitable for our use case.
  • Hashicorp's go-plugin: This is a proven open-source plugin-system for Go that uses a multi-process architecture and gRPC calls. This could work well for our use case, however, we want to minimize the complexity and time to ship for the initial release. We might switch to this tool in the future when we implement additional backend providers.

Additional context
n/a

API: Delete methods should fail when resources are not found

What happened:
Currently DeleteStore and DeleteRecord succeed even if the specified resources are not found. This is contrary to the AIP recommendation, which states:

The Delete method should succeed if and only if a resource was present and was successfully deleted. If the resource did not exist, the method should send a NOT_FOUND error.

What you expected to happen:
DeleteStore and DeleteRecord should return the error when the specified resource is not present.

Migrate to standard go build

Describe the solution you'd like
Currently we use Bazel as a build system because it's cross platform and language agnostic, however, there have been some difficulties using the tool on Windows. Also we've had reports of users who face challenges using Bazel and VSCode.
After some research, I now think it makes sense to use the standard Go building toolchain (go build) for go binaries, and CMake, one of the most widely used build system, for C++.

Describe alternatives you've considered

  • Continue using Bazel - after gathering feedback from people using both Bazel and CMake, it makes sense to migrate to CMake before we write a lot of code

Remove the C++ code

Is your feature request related to a problem? Please describe.
As discussed offline, we are targeting Java and Go as Triton clients. We are removing the C++ code.

Describe the solution you'd like
Remove the C++ client code.

Support for tags and store types in StoreOptions (.proto file)

Is your feature request related to a problem? Please describe.
Support optional tags (string[]) and store type (enum) in the StoreOptions.

Describe the solution you'd like
Add 2 properties in the StoreOptions proto message so that we can set optional tags and a store type (e.g. UserStore, GameStore, StudioStore, and PublisherStore).

Describe alternatives you've considered
Tags: No other alternatives, the StoreOptions does not contain anything at the moment.
Store Type: Having an implicit knowledge of which store IDs are of type user, game, etc., or using the generic tags array to store that information.

Additional context
Without storing an explicit store type or having a way to capture this information in a tag, it will be more difficult to implement the authorization layer, since there won't be a way to tell which stores are for what except by having implicit knowledge of their IDs.

feat: Add timestamps and UUID to metadata for race detection

Is your feature request related to a problem? Please describe.
A unique attribute that is changed every time is necessary to make sure there is no race when updating a metadata entity. For example,

  • Record R has an external blob X attached to it
  • A: Start updating Record R, with a large blob. Reads from MetaDB and detects the associated blob X
  • B: Start updating Record R, with a large blob. Reads from MetaDB and detects the associated blob X
  • B: Finishes uploading, commit changes to the blob store and MetaDB, delete the old blob X
  • A: Finishes uploading, commit changes to the blob store and MetaDB, delete the old blob X

In this case, A should detect the conflicting change, and delete the new blob updated by B. A unique attribute makes it easy to detect such changes because that's the only field it needs to pay attention to.
Timestamps will not be used to maintain consistency but they will be useful to the clients.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered

  • Datastore transactions: We can't use Datastore transactions because the timeout limit is 60 seconds, and it would take more than 60 seconds to upload large blobs.
  • You could compare each attribute manually every time but this is error prone.

Release: Alpha0 release tracker

First alpha release is scheduled in the week of July 13. There are a few outstanding issues (mostly documentation) but we're still on track to drop it by the end of the week.

API: Prioritize gRPC endpoints and freeze HTTP REST support for now

Is your feature request related to a problem? Please describe.
Currently the triton.proto API definition includes both gRPC and REST. During our discussion, it became clear that the current priority is the gRPC endpoint and the REST implementation is not well tested.
gRPC supports a well-defined streaming implementation, however, client streaming support with grpc-gateway is limited. In order to ship the best gRPC endpoint as soon as possible, we may want to focus on gRPC and freeze the REST client for now.

Describe the solution you'd like
Freeze the REST support for now. We can revisit how we can better support REST endpoints in the future.

Describe alternatives you've considered

  • Continue supporting both: it's a trade off between implementation speed and feature coverage. We can continue supporting REST if that's absolutely necessary, but I think we want to focus on gRPC for now.

Implement C++ Client Library

Describe the solution you'd like

Implement a C++ client library to wrap the gRPC/REST APIs. Since protobuf already generates a wrapper, I think it's sufficient for most use cases. If there's a specific method signatures or classes that could be more convenient for a particular use case, we can add that.

  • Use gRPC for the standard client library

Describe alternatives you've considered

  • Use REST instead of gRPC. It's just less efficient, so I don't think there's a benefit.

Backend: Implement Cloud Datastore provider

Describe the solution you'd like
Implement a backend storage provider for Cloud Datastore. This will also cover schema work and related GCP configurations (split into sub-issues appropriately).

Depends on Issue #9 .

Create artifacts and deployment for KNative / Cloud Run

Is your feature request related to a problem? Please describe.
Triton is designed to be deployed to KNative running on Kubernetes or the various flavors of Cloud Run (including Serverless).

Describe the solution you'd like
Create deployment for Triton to KNative. Test on Kubernetes, GKE, and Cloud Run Serverless.

Splitting client and server depots

Rationale behind the request:

  1. Separation of concern. The client is not a part of the server and should live as a separate entity.

  2. Considering there is no real dependency between the two pieces, code/design wise, having separate depots will help with the client maintainability / release process as well.

  3. The client piece will be a small piece compared the server piece, but I dont expect to be that small a piece. Ex: Only the C++ version can have multiple web request adapters that can make web requests and receive responses, to support different console compilers (detailed in the client architecture diagram).

Enforce a branch protection rule for master

Enforce a branch protection rule for the master branch.

We are enabling the following protections:

  • Require pull request reviews before merging
    • Required approving reviews: 1
  • Require status checks to pass before merging
    • Required status checks: Build and cla/google
  • Restrict who can push to matching branches: Only admins can push to master

Include pre-built dependencies in Cloud Build base image

Is your feature request related to a problem? Please describe.
Cloud Build sometimes times out because builds take too long. About 2 minutes is spent to download dependencies for the C++ builds with cmake, and another 2 minutes to just build the dependencies. We can change the Docker image to include this part.

Describe the solution you'd like
CMake will try to update the build directory, instead of getting everything from scratch. It's faster. There is a chance that we could accidentally break full builds with stale dependencies, but faster builds are more beneficial, I think.

Describe alternatives you've considered

  • Make the build machine to a 32-core instance. This won't speed up CMake. It does speed up builds though.
  • Just include predownloaded dependencies. It's safe to prebuild dependencies as they don't change as often as the main code.

Add a sample C++ code for the gRPC endpoint

Describe the solution you'd like
Add a sample C++ code for the gRPC endpoint to explain

  • How to link the library
  • Basic operations that include create store, get, set, update

Design: cache behavior

A quick discussion note to clarify the cache behavior according to an offline meeting with @hongalex today.

Cache behavior

The cache server will contain

  • key: <store id>/<record id> (concat strings)
  • value: a struct that contains (idea: we could either reuse the Record message, or use the gob package to serialize a Go proto struct)
  • blob size
  • blob binary or location to the blob
  • structured data (map of key, data type + value)
  • other metadata cache, like user_id, tags

The Triton server should decide whether to cache a value or not based on the size

When the server updates the metadata, it invalidates the corresponding cache entry.

Use the standard Redis LRU cache

Create Load Tests

Is your feature request related to a problem? Please describe.
Demonstrate scalability of Triton solution by building load tests to issue requests to the service at scale.

Describe the solution you'd like
Load testing framework that can demonstrate thousands or tens of thousands of concurrent requests per second.

Backend: Implement Cloud Storage provider

Describe the solution you'd like
Implement the Cloud Storage provider. This issue covers related GCP configurations, etc (split to multiple issues if necessary).

Depends on #9.

API: Change IDs to uuid string

Is your feature request related to a problem? Please describe.
int64 is not a recommended type for distributed systems as it is difficult to guarantee incremental unique IDs. Prefer UUIDv4.

Describe the solution you'd like
Change the IDs to text-represented UUIDv4.

Describe alternatives you've considered

  • Store UUIDs as binary: saves bytes, but harder to read and debug on the server side
  • Create a new ID field and deprecate the current field: the product is not shipped yet so no need to maintain compatibility.

Add Identity and Access Management

Is your feature request related to a problem? Please describe.
The API must be protected by a service account so that it is not open to anonymous users on the internet.

Describe the solution you'd like
Implement authentication so that the service requires an authenticated user in order to invoke API calls.

Describe alternatives you've considered
Consider utilizing Cloud Identity Aware Proxy, or native capabilities of Kubernetes / GKE.

API: Add users and user stores

Is your feature request related to a problem? Please describe.
Currently the API doesn't have a way to handle users. This is a proposal to support user specific stores as well as the ability to associate each record to a user.

Describe the solution you'd like
Define a 1:1 relationship between User and Store. User 0 is a system user that is reserved for generic stores. Positive user IDs are automatically assigned by the system for new users. Negative user IDs are reserved for future use.

Add a new field to Store and Record to represent the owner user ID. The owner must match the owner of the store for user-owned stores.

Describe alternatives you've considered

  • Don't add users, and let the tagging system handle it: may be slower to query. The owner is likely a frequently used criteria for filtering, so it is useful to have special-case handling.

Additional context

Add an external Id property to Stores and Records as first class citizen for fast lookup

Is your feature request related to a problem? Please describe.
When creating new Stores and Records in Triton, the back-end assigns a UUID (RFC 4122) to the "id" property of the data.

In order to query data in Triton, an external system must keep a mapping between its own internal IDs and Triton IDs. A way to leverage Triton to keep this mapping without the need to use any other DB externally could be:

(1) To save the external ID of the data in the tags for Stores and Records, and use the tags to query the data with the external IDs directly. This is already something that is supported, but potentially slow as there are no guarantee of a primary index created for them.

(2) Add a property "externalId" to the Stores and Records as first class citizen to save the ID of the external system as a string (<255 chars) and ensure fast retrieval using the externalId by indexing the property.

Describe the solution you'd like
The solution that I would like to see implemented is #2 above.

Describe alternatives you've considered
An alternative to the solution is #1 above, but would be (potentially) significantly slower if the tags are not indexed.

Additional context
The idea would be to use the GET endpoints to retrieve the Store or Record data using the externalId in the query parameters like this:

GET /triton/v1/stores?externalId=my-own-store-external-id-free-format-less-255-characters
GET /triton/v1/stores/{storeId}/records?externalId=my-own-record-external-id-free-format-less-255-characters

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.