googleforgames / open-saves Goto Github PK

View Code? Open in Web Editor NEW

219.0 16.0 22.0 1.84 MB

Open Saves is a cloud native data store for game development.

License: Apache License 2.0

Go 97.60% Shell 0.26% Makefile 0.54% Dockerfile 0.88% HCL 0.72%

open-saves's Introduction

Open Saves

Open Saves is an open-source, purpose-built single interface for multiple storage backends on Google Cloud.

With Open Saves, game developers can run a cloud-native storage system that is:

Simple: Open Saves provides a unified, well-defined gRPC endpoint for all operations for metadata, structured, and unstructured objects.
Fast: With a built-in caching system, Open Saves optimizes data placements based on access frequency and data size, all to achieve both low latency for smaller binary objects and high throughput for big objects.
Scalable: The Open Saves API server can run on either Google Kubernetes Engine, or Cloud Run. Both platforms can scale out to handle hundreds of thousands of requests per second. Open Saves also stores data in Google Datastore and Cloud Storage, and can handle hundreds of gigabytes of data.

Overview
Key terms
Using Open Saves
- Deployment guide
- API reference
Contributing to Open Saves
- How to contribute
- Open Saves development guide

Disclaimer

This software is currently beta, and subject to change. It is not yet ready to serve production workloads.

Code of Conduct

Participation in this project comes under the Contributor Covenant Code of Conduct.

License

Apache 2.0

open-saves's People

Contributors

Stargazers

Watchers

open-saves's Issues

Architecture and terminology

Here's the overall architecture of Triton:

In scope:

Triton server
Client SDKs for Java and Go

Out of scope:

Game servers and/or gateway servers for each game service
Game clients and direct access from game clients

Terminology

User: developers who want to build services with Triton
Server: the triton server running on Cloud Run
Client: the client sdks that users use to integrate their services to Triton
SDKs: client code to call triton services
Game client: actual game client on PC and game consoles
Game server: game servers or gateway servers that consumes the Triton APIs to store/load data

I'll add this to the docs directory once I get sign-offs. I'll update this issue as I get feedback.

Build: Create Dockerfile for containerization

Create a Dockerfile so we can deploy the application to Cloud Run

Use user supplied primary key for store, and owner_id for external IDs

Is your feature request related to a problem? Please describe.
Feature request:

(1) Align the Store and Record messages to both use a "User provided primary key" called key instead of an internal ID called id.
(2) Rename both the external id to be called owner_id.

Describe the solution you'd like
Proposed proto messages for Store and Record:

message Store {
  string key;    // MODIFIED, User defined primary key (instead of id)
  string name = 2;
  StoreOptions options = 3;
  string owner_id = 4;     // MODIFIED, used to assign store ownership
}

message Record {
  string key = 1;       // User defined primary key
  bytes blob = 2;       // opaque to the server
  int64 blob_size = 3;  // defined by the server
  map<string, StructuredData> stuctured_data = 4;
  string owner_id = 5;  // MODIFIED, used to frecord ownership
  repeated string tags = 6;
  // "player:1", "system", "inventory:xxx"
}

Describe alternatives you've considered
The current solution can be used as is, but required an extra lookup by the Frontend API server to find the internal Store ID to use to query the records.

Using a user defined primary key in the Store message as well would align the implementation of stores and records, and remove the need for an extra lookup (or caching) to get the store ID.

Additional context

Create test plans for the Memorystore storage provider

Subtask for #10

docs: Write GCP setup instructions

Write GCP setup instructions for Datastore (Firestore in Datastore mode), Memorystore, and Cloud Storage.

Use Google style in proto files

Is your feature request related to a problem? Please describe.
Use clang-format --style=Google for proto files

Describe alternatives you've considered
Other styles like LLVM

Support large blobs bigger than a single gRPC message

Is your feature request related to a problem? Please describe.
Blobs stored in Triton could be several hundreds megabytes or larger. We need to support streaming gRPC calls or other measures to support this.

Describe the solution you'd like
One possible way of doing this is to use streaming gRPC endpoints. grpc-gateway doesn't support client-side streaming, so we would need to either

Drop the REST endpoint for now
Find another way

Describe alternatives you've considered

Manual chunking would not be as robust as the proven standard gRPC method.

Define storage backend interfaces

Is your feature request related to a problem? Please describe.
As in the architecture, we need three interfaces that represent each of the backend storage classes.

Describe the solution you'd like
Each interface should include operations like CRUD and basic query if necessary. We can then implement interfaces for backing service providers, such as Cloud Datastore and Redis.

How we define the interfaces is TBD. We could use Go interfaces to define common operations, for example. Each backing services will be statically linked.

Describe alternatives you've considered

Directly implementing providers without defining interfaces: While we prioritize the support for GCP backends but defining clear interfaces would help in the long run without introducing much complexity.
Using the standard go plugin: It is not meant to support dynamic loading of third party plugins. Also this only works on Linux, FreeBSD, and MacOS. Not suitable for our use case.
Hashicorp's go-plugin: This is a proven open-source plugin-system for Go that uses a multi-process architecture and gRPC calls. This could work well for our use case, however, we want to minimize the complexity and time to ship for the initial release. We might switch to this tool in the future when we implement additional backend providers.

Additional context
n/a

Write basic unit testing framework

Prepare a basic unit testing framework and write the BUILD file for that so that people can add their unit tests to the project.

API: Delete methods should fail when resources are not found

What happened:
Currently DeleteStore and DeleteRecord succeed even if the specified resources are not found. This is contrary to the AIP recommendation, which states:

The Delete method should succeed if and only if a resource was present and was successfully deleted. If the resource did not exist, the method should send a NOT_FOUND error.

What you expected to happen:
DeleteStore and DeleteRecord should return the error when the specified resource is not present.

Migrate to standard go build

Describe the solution you'd like
Currently we use Bazel as a build system because it's cross platform and language agnostic, however, there have been some difficulties using the tool on Windows. Also we've had reports of users who face challenges using Bazel and VSCode.
After some research, I now think it makes sense to use the standard Go building toolchain (go build) for go binaries, and CMake, one of the most widely used build system, for C++.

Describe alternatives you've considered

Continue using Bazel - after gathering feedback from people using both Bazel and CMake, it makes sense to migrate to CMake before we write a lot of code

Remove the C++ code

Is your feature request related to a problem? Please describe.
As discussed offline, we are targeting Java and Go as Triton clients. We are removing the C++ code.

Describe the solution you'd like
Remove the C++ client code.

Support for tags and store types in StoreOptions (.proto file)

Is your feature request related to a problem? Please describe.
Support optional tags (string[]) and store type (enum) in the StoreOptions.

Describe the solution you'd like
Add 2 properties in the StoreOptions proto message so that we can set optional tags and a store type (e.g. UserStore, GameStore, StudioStore, and PublisherStore).

Describe alternatives you've considered
Tags: No other alternatives, the StoreOptions does not contain anything at the moment.
Store Type: Having an implicit knowledge of which store IDs are of type user, game, etc., or using the generic tags array to store that information.

Additional context
Without storing an explicit store type or having a way to capture this information in a tag, it will be more difficult to implement the authorization layer, since there won't be a way to tell which stores are for what except by having implicit knowledge of their IDs.

Issue building protoc files with CMakelists in the api folder

Tools used :
Cmake - 3.17.2
protoc - 3.11.4
MSVC - MSVC 19.16.27034.0

Repro Steps:

Open command prompt in the root directory -> cd api
mkdir build
cd build
cmake ..

Output:
error.txt

Please advice.

Thanks

feat: Add timestamps and UUID to metadata for race detection

Is your feature request related to a problem? Please describe.
A unique attribute that is changed every time is necessary to make sure there is no race when updating a metadata entity. For example,

Record R has an external blob X attached to it
A: Start updating Record R, with a large blob. Reads from MetaDB and detects the associated blob X
B: Start updating Record R, with a large blob. Reads from MetaDB and detects the associated blob X
B: Finishes uploading, commit changes to the blob store and MetaDB, delete the old blob X
A: Finishes uploading, commit changes to the blob store and MetaDB, delete the old blob X

In this case, A should detect the conflicting change, and delete the new blob updated by B. A unique attribute makes it easy to detect such changes because that's the only field it needs to pay attention to.
Timestamps will not be used to maintain consistency but they will be useful to the clients.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered

Datastore transactions: We can't use Datastore transactions because the timeout limit is 60 seconds, and it would take more than 60 seconds to upload large blobs.
You could compare each attribute manually every time but this is error prone.

Write terraform configuration for Cloud Datastore

Subtask for #11

Implement the Memorystore storage provider

Subtask for #10

Create test plans for the Cloud Storage provider

Subtask for #12

Release: Alpha0 release tracker

First alpha release is scheduled in the week of July 13. There are a few outstanding issues (mostly documentation) but we're still on track to drop it by the end of the week.

API: Prioritize gRPC endpoints and freeze HTTP REST support for now

Is your feature request related to a problem? Please describe.
Currently the triton.proto API definition includes both gRPC and REST. During our discussion, it became clear that the current priority is the gRPC endpoint and the REST implementation is not well tested.
gRPC supports a well-defined streaming implementation, however, client streaming support with grpc-gateway is limited. In order to ship the best gRPC endpoint as soon as possible, we may want to focus on gRPC and freeze the REST client for now.

Describe the solution you'd like
Freeze the REST support for now. We can revisit how we can better support REST endpoints in the future.

Describe alternatives you've considered

Continue supporting both: it's a trade off between implementation speed and feature coverage. We can continue supporting REST if that's absolutely necessary, but I think we want to focus on gRPC for now.

Create recipe for inventory

This is a placeholder for implementing the recipe for the inventory use-case.

refactor: Use testing.T.Cleanup instead of defer

What happened:
testing.T.Cleanup was added in Go 1.14 and is a preferred way of cleaning up test data. Use it instead of defer.

Implement the frontend API server

Describe the solution you'd like
Implement the frontend API server implementation that handles the API.

Implement C++ Client Library

Describe the solution you'd like

Implement a C++ client library to wrap the gRPC/REST APIs. Since protobuf already generates a wrapper, I think it's sufficient for most use cases. If there's a specific method signatures or classes that could be more convenient for a particular use case, we can add that.

Use gRPC for the standard client library

Describe alternatives you've considered

Use REST instead of gRPC. It's just less efficient, so I don't think there's a benefit.

Task: Write terraform configuration for Memorystore

Subtask for #10

Backend: Implement Cloud Datastore provider

Describe the solution you'd like
Implement a backend storage provider for Cloud Datastore. This will also cover schema work and related GCP configurations (split into sub-issues appropriately).

Depends on Issue #9 .

Design interface for common operations for the bulk storage class

Subtask for #12

Create artifacts and deployment for KNative / Cloud Run

Is your feature request related to a problem? Please describe.
Triton is designed to be deployed to KNative running on Kubernetes or the various flavors of Cloud Run (including Serverless).

Describe the solution you'd like
Create deployment for Triton to KNative. Test on Kubernetes, GKE, and Cloud Run Serverless.

Splitting client and server depots

Rationale behind the request:

Separation of concern. The client is not a part of the server and should live as a separate entity.
Considering there is no real dependency between the two pieces, code/design wise, having separate depots will help with the client maintainability / release process as well.
The client piece will be a small piece compared the server piece, but I dont expect to be that small a piece. Ex: Only the C++ version can have multiple web request adapters that can make web requests and receive responses, to support different console compilers (detailed in the client architecture diagram).

Enforce a branch protection rule for master

Enforce a branch protection rule for the master branch.

We are enabling the following protections:

Require pull request reviews before merging
- Required approving reviews: 1
Require status checks to pass before merging
- Required status checks: Build and cla/google
Restrict who can push to matching branches: Only admins can push to master

Include pre-built dependencies in Cloud Build base image

Is your feature request related to a problem? Please describe.
Cloud Build sometimes times out because builds take too long. About 2 minutes is spent to download dependencies for the C++ builds with cmake, and another 2 minutes to just build the dependencies. We can change the Docker image to include this part.

Describe the solution you'd like
CMake will try to update the build directory, instead of getting everything from scratch. It's faster. There is a chance that we could accidentally break full builds with stale dependencies, but faster builds are more beneficial, I think.

Describe alternatives you've considered

Make the build machine to a 32-core instance. This won't speed up CMake. It does speed up builds though.
Just include predownloaded dependencies. It's safe to prebuild dependencies as they don't change as often as the main code.

Add a sample C++ code for the gRPC endpoint

Describe the solution you'd like
Add a sample C++ code for the gRPC endpoint to explain

How to link the library
Basic operations that include create store, get, set, update

Backend: Implement Cloud Memorystore provider

Describe the solution you'd like
Implement the Cloud Memorystore support after defining the interface (Issue #9).

Design: cache behavior

A quick discussion note to clarify the cache behavior according to an offline meeting with @hongalex today.

Cache behavior

The cache server will contain

key: <store id>/<record id> (concat strings)
value: a struct that contains (idea: we could either reuse the Record message, or use the gob package to serialize a Go proto struct)
blob size
blob binary or location to the blob
structured data (map of key, data type + value)
other metadata cache, like user_id, tags

The Triton server should decide whether to cache a value or not based on the size

When the server updates the metadata, it invalidates the corresponding cache entry.

Use the standard Redis LRU cache

Implement the Cloud Storage storage provider

Subtask for #12

Create Load Tests

Is your feature request related to a problem? Please describe.
Demonstrate scalability of Triton solution by building load tests to issue requests to the service at scale.

Describe the solution you'd like
Load testing framework that can demonstrate thousands or tens of thousands of concurrent requests per second.

Backend: Implement Cloud Storage provider

Describe the solution you'd like
Implement the Cloud Storage provider. This issue covers related GCP configurations, etc (split to multiple issues if necessary).

Depends on #9.

Implement the Cloud Datastore storage provider

Subtask for #11

Define interface for common operations for the metadata storage

Subtask for #11

Add a pull request template

Add a pull request template to make the process consistent and easy for all.
We can use the same template as Agones.

Create test plans for Cloud Datastore provider

Subtask for #11

Create recipe for achievements

This is a placeholder for implementing the recipe for the achievements use-case.

Automatically copy generated swagger files to source directory

Currently generated swagger files are put in the bazel-{out,bin} directories by Bazel. When committing, we need to manually copy the files to the original source directory. This process should be automated.

API: Change IDs to uuid string

Is your feature request related to a problem? Please describe.
int64 is not a recommended type for distributed systems as it is difficult to guarantee incremental unique IDs. Prefer UUIDv4.

Describe the solution you'd like
Change the IDs to text-represented UUIDv4.

Describe alternatives you've considered

Store UUIDs as binary: saves bytes, but harder to read and debug on the server side
Create a new ID field and deprecate the current field: the product is not shipped yet so no need to maintain compatibility.

Define the interface for the cache backend

Subtask for #10

Create recipe for replays

This is a placeholder for implementing the recipe for the replays use-case.

Add Identity and Access Management

Is your feature request related to a problem? Please describe.
The API must be protected by a service account so that it is not open to anonymous users on the internet.

Describe the solution you'd like
Implement authentication so that the service requires an authenticated user in order to invoke API calls.

Describe alternatives you've considered
Consider utilizing Cloud Identity Aware Proxy, or native capabilities of Kubernetes / GKE.

Write terraform configuration for Cloud Storage

Subtask for #12

API: Add users and user stores

Is your feature request related to a problem? Please describe.
Currently the API doesn't have a way to handle users. This is a proposal to support user specific stores as well as the ability to associate each record to a user.

Describe the solution you'd like
Define a 1:1 relationship between User and Store. User 0 is a system user that is reserved for generic stores. Positive user IDs are automatically assigned by the system for new users. Negative user IDs are reserved for future use.

Add a new field to Store and Record to represent the owner user ID. The owner must match the owner of the store for user-owned stores.

Describe alternatives you've considered

Don't add users, and let the tagging system handle it: may be slower to query. The owner is likely a frequently used criteria for filtering, so it is useful to have special-case handling.

Additional context

Add an external Id property to Stores and Records as first class citizen for fast lookup

Is your feature request related to a problem? Please describe.
When creating new Stores and Records in Triton, the back-end assigns a UUID (RFC 4122) to the "id" property of the data.

In order to query data in Triton, an external system must keep a mapping between its own internal IDs and Triton IDs. A way to leverage Triton to keep this mapping without the need to use any other DB externally could be:

(1) To save the external ID of the data in the tags for Stores and Records, and use the tags to query the data with the external IDs directly. This is already something that is supported, but potentially slow as there are no guarantee of a primary index created for them.

(2) Add a property "externalId" to the Stores and Records as first class citizen to save the ID of the external system as a string (<255 chars) and ensure fast retrieval using the externalId by indexing the property.

Describe the solution you'd like
The solution that I would like to see implemented is #2 above.

Describe alternatives you've considered
An alternative to the solution is #1 above, but would be (potentially) significantly slower if the tags are not indexed.

Additional context
The idea would be to use the GET endpoints to retrieve the Store or Record data using the externalId in the query parameters like this:

GET /triton/v1/stores?externalId=my-own-store-external-id-free-format-less-255-characters
GET /triton/v1/stores/{storeId}/records?externalId=my-own-record-external-id-free-format-less-255-characters

googleforgames / open-saves Goto Github PK

open-saves's Introduction

Open Saves

Table of Contents

Disclaimer

Code of Conduct

License

open-saves's People

Contributors

Stargazers

Watchers

Forkers

open-saves's Issues

Terminology

Cache behavior

Recommend Projects

Recommend Topics

Recommend Org