joystream / hydra Goto Github PK

A Substrate indexing framework

Shell 0.77% TypeScript 97.16% JavaScript 0.45% Batchfile 0.02% Dockerfile 0.80% Handlebars 0.65% Ruby 0.14%

substrate

hydra's Introduction

Joystream

This is the main code repository for all Joystream software. In this mono-repo you will find all the software required to run a Joystream network: The Joystream full node, runtime and all reusable substrate runtime modules that make up the Joystream runtime. In addition to all front-end apps and infrastructure servers necessary for operating the network.

Overview

The Joystream network builds on the substrate blockchain framework, and adds additional functionality to support the various roles that can be entered into on the platform.

Development

For best results use GNU/Linux with minimum glibc version 2.28 for nodejs v18 to work. So Ubuntu 20.04 or newer.

You can check your version of glibc with ldd --version

The following tools are required for building, testing and contributing to this repo:

Rust toolchain - required
nodejs >= v14.18.x - required (However volta will try to use v18.6)
yarn classic package manager v1.22.x- required
docker and docker-compose v2.20.x or higher - required
ansible - optional

If you use VSCode as your code editor we recommend using the workspace settings for recommend eslint plugin to function properly.

After cloning the repo run the following to get started:

Install development tools

./setup.sh

If you prefer your own node version manager

Install development tools without Volta version manager.

./setup.sh --no-volta

For older operating systems which don't support node 18

Modify the root package.json and change volta section to use node version 16.20.1 instead of 18.6.0

"volta": {
    "node": "16.20.1",
    "yarn": "1.22.19"
}

Run local development network

# Build local npm packages
yarn build

# Build joystream/node docker testing image
RUNTIME_PROFILE=TESTING yarn build:node:docker

# Start a local development network
yarn start

Software

Substrate blockchain

Server Applications - infrastructure

Storage Node - Media Storage Infrastructure
Query Node
Distributor Node

Front-end Applications

Pioneer v2 - Main UI for accessing Joystream community and governance features
Atlas - Media Player

Tools and CLI

joystream-cli - CLI for community and governance activities

Testing infrastructure

Network integration - Joystream network integration testing framework

Running a local full node

git checkout master
WASM_BUILD_TOOLCHAIN=nightly-2022-11-15 cargo build --release
./target/release/joystream-node -- --pruning archive --chain joy-mainnet.json

Learn more about joystream-node.

A step by step guide to setup a full node and validator on the Joystream main network, can be found here.

Pre-built joystream-node binaries

Look under the 'Assets' section:

Ephesus release v8.3.0

Mainnet chainspec file

joy-mainnet.json

Integration tests

# Make sure yarn packages are built
yarn build

# Build the test joystream-node
RUNTIME_PROFILE=TESTING yarn build:node:docker

# Run tests
yarn test

Contributing

We have lots of good first issues open to help you get started on contributing code. If you are not a developer you can still make valuable contributions by testing our software and providing feedback and opening new issues.

A description of our branching model will help you to understand where work on different software components happens, and consequently where to direct your pull requests.

We rely on eslint for code quality of our JavaScript and TypeScript code and prettier for consistent formatting. For Rust we rely on rustfmt and clippy.

The husky npm package is used to manage the project git-hooks. This is automatically installed and setup when you run yarn install.

When you git commit and git push some scripts will run automatically to ensure committed code passes lint, tests, and code-style checks.

During a rebase/merge you may want to skip all hooks, you can use HUSKY_SKIP_HOOKS environment variable.

HUSKY_SKIP_HOOKS=1 git rebase ...

RLS Extension in VScode or Atom Editors

If you use RLS extension in your IDE, start your editor with the BUILD_DUMMY_WASM_BINARY=1 environment set to workaround a build issue that occurs in the IDE only.

BUILD_DUMMY_WASM_BINARY=1 code ./joystream

Authors

See the list of contributors who participated in this project.

License

All software under this project is licensed as GPLv3 unless otherwise indicated.

Acknowledgments

Thanks to the whole Parity Tech team for making substrate and helping in chat with tips, suggestions, tutorials and answering all our questions during development.

hydra's People

Contributors

Stargazers

Watchers

Forkers

dzhelezov metmirr mnaamani sulejman ondratra slimuss sbnair ledgerfoundation spatsochi ezaruba benjellouno nakata5321 eldargab samelamin netobjex-inc oracle58 wintexpro jonalvarezz zoheb391 vuatbrothersinarm slixon-technologies ajand cpurta sekmet tmcgroul lezek123 bdmason singulart florian-stoica-tlabs dappforce samchuk-vlad ma-shulgin zeeshanakram3 nathanwhit mkbeefcake sedoley ylmz34 traderozy p-xyz ladaalekz extsunset goldstarhigher lokialice thesan

hydra's Issues

Indexer should allow additional TypeORM entities to be registered with the default connection

Currently, the only way to extend the default set of TypeORM entities is to provide a glob pattern to the environment variable TYPEORM_ENTITIES. This is not sufficiently flexible, as relative glob path patterns will likely fail. A more flexible approach is to allow type references in the (dynamic) config passed at createDBConnection() in QueryNodeManager.

Hydra v2 Progress Issue

`Mon, Aug 24th`

Agenda

What are our priorities now that Metin is back and Hydra has been submitted?
I had suggested a set of focus areas that needed to be reviewed, and work had to be split up

a) faster processing/synching
b) decoupling blockchain synching and processing, so that one can easily rerun processing when altering a schema or mapping.
c) static type safety in all mappings <== this latter point needs feasibility input from Mokhtar, but I am quite sure its possible. If its possible, I think the upside is substantial enough.
d) more integration tests on Hydra

I think Arsen can start as soon as he is ready, even if these are not done, but let me know if you think that would be counter-productive.

Present

Metin
Dmitrii
Bedeho

Topics covered

What is Dmitrii currently working on?
What Metin is working on, and details of how to address those bugs.
Do we really need to fix the mappings for Kusama treasury now that we have submitted, its not the highest priority?
Should we continue to try to fix the out of memory issue now?
Perhaps we need a better solution for handling naming conflicts, using a manifest or some other more explicit approach.

Conclusions

Dmitrii will focus on a+b, and mix in d for the next week or so.
Metin will focus on writing mappings, with tests for some part of our runtime, and will try to identify bugs and rough edges of the developer workflow. Its very important here to get to a place where we find out how to give mapping author confidence that they are doing things correctly.
We will delay and see what to do about c, hopefully we can settle next meeting.
We will delay any work on manifest solution for now, Dmitrii will make issue.
The out of memory bug will either implicitly get resolved by Dmitriis work, or it will pop up again in our own node, and then we will have better shot at local reproduction.

Pre mappings

Currently, inside the mappings, we do a lot of decoding to get data from the events and extrinsics. In last Hydra meeting, we decided to have pre mappings to do data decoding.

The mapping author will need to define:

The type that will be returned by pre handler
Pre handler function that takes Substrate event and return the type defined in step 1
The actual mapping which takes a DB instance and the type defined in step 1

Let's look at an example for Joystream's MemberRegistered event handler:

// Type defination
export interface JoystreamMember extends BaseEventHandlerParameter {
	memberId: BN;
	handle: string;
	avatarUri: string;
	about: string;
	registeredAtBlock: number;
	rootAccount: Buffer;
	controllerAccount: Buffer;
}


// Pre mapping
function pre_members_MemberRegistered(event: SubstrateEvent): JoystreamMember {
	const { 0: memberId, 1: accountId } = event.params;

	debug(`Substrate event: ${JSON.stringify(event)}`);
	assert(event.extrinsic, "No extrinsic data");

	const extrinsicArgs = event.extrinsic.args.map((arg) => arg.value.toString());

	return {
		registeredAtBlock: event.blockNumber,
		memberId: new BN(memberId.value.toString()),
		rootAccount: Buffer.from(accountId.value.toString()),
		controllerAccount: Buffer.from(accountId.value.toString()),
		handle: extrinsicArgs[1],
		avatarUri: extrinsicArgs[2],
		about: extrinsicArgs[3],
	};
}

// Actual mapping
export async function members_MemberRegistered(db: DB, member: JoystreamMember) {
	let m = new Member({ ...member });
	db.save<Member>(m);
}

Misc: isolated databases/schemas for processors

Abstract mappings

Background

Currently, the only way to trigger a mapper is in response to an event. We already have plans to extend this to also cover transactions, and moveover, we have plans to have mappers with static signatures. This is an excellent start, but these solutions are temporary solutions to the more general issue that the level of abstractions desired for mappers to key off can be totally arbitrary, and unrelated to blockchain level concepts. The most salient examples are the following modules

These are used in plenty of chains, such as Acala, Edgewere, Moonbeam, and in the future also Joystream.

In these cases, the mapping author wants to key off type safe smart contract initiations, which do not exist as concepts in the native Runtime metadata. It would be unworkable to require the mapping author to decode this by hand, as they would have to understand implementation details of the modules above, and it would be hard to reuse cleanly across mappings for different chains or even different input schemas for a single chain.

Proposal

Hydra should allow some user provided middleware code to run during the generation step, and this code can understand whatever specific abstraction the mapping author was targeting. Then there should be standardized middle ware, like hydera-evm or hydra-contract that authors could take off the shelf, and then write mappings and manifest files cleanly.

Add more tests for invalid schemas

Let's make a full list later.

Misc: move indexer to redis-mq

Redis-based message queues is a more robust and effective way to manage indexer workers and the indexer state. At the moment the inter-process communication is via querying the DB, which is ineffective.

Processors subscribes to the indexer and listen for new heads

Depends on #25

Indexer API: extrinsics

Substask of #24

Hydra indexer API

Basic indexer state queries:

This issue outlines the REST API methods to be exposed by the Hydra Indexer API.

GET /api/indexer: Returns the current state of the indexer:

current indexer head
current substrate chain head

GET /api/processor/:id: Returns info about the mappings processor with given id:

Processor name
Processor DB schema
Last processed event id
Total number of processed events so far
Last scanned block
List of events the processor handles

Nice to have:

POST /api/subsrcribe with body: { events: [events_to_subscribe], from_event: <event_id>, chunk_size: <num> }
Returns { cursor: <cursor_id>, total: <total number of events>}
Creates a subscription to given events, retrievable by cursor_id.

POST /api/events/:cursor_id: Retrieve next <chunk_size> events with a given filter.
Returns:
{ events: [], ... }

GET /api/event/:event_name: Retrieve documentation for the given event
GET /api/extrinsic/:exrtrinsic_name: Retrieve documentation for the given extrinsic, for each runtime upgrade
GET /api/module/:module: Get all versions of the runtime, and for each runtime version list all events and extrinsics

Even nicer to have:

Deploy a new mapping processor by uploading a tgz archive

Algebraic types II

Background

As already explained in the background section here

https://github.com/Joystream/joystream/issues/554

We have a problem domain which has algebraic types all over the place, and it would be a big benefit if they could be reflected neatly in our query infrastructure. As explained in that first post, doing it deeply (approach 4), is ideal, however a later post points out

https://github.com/Joystream/joystream/issues/554#issuecomment-640254649

The resulting GraphQL API must generate OpenCRUD like capabilities that tie into matching these algebraic types, and this is a non-trivial task.

This issue attempts to clarify how this can be done.

Proposal

Input Schema

Algebraic Types

The new algebraic types respect the GraphQL standard, specifically they are informally defined as

union <name> = T_1 | ... | T_N where name is a GraphQL NamedType, and T_i are algebraic types, and referred to as cases of the type. This algebraic type is called an algebraic union.
type <name> @variant { f_1: T_1 ... f_N: T_N } where name is a GraphQL NamedType, f_i is a GraphQL Name, T_i is a non-ID GraphQL Type or an algebraic union. This algebraic type is called an algebraic variant, and such a type without any algebraic union members is called called a flat variant.

An algebraic union can be a member field of one or more normal @entity types, and can use non-null requirement. An algebraic variant cannot be a member of an @entity type.

Here is an example

type Miserable @variant {
  hates: String!
}

type HappyPoor @variant {
  isMale: boolean
}

union Poor = HappyPoor | Miserable

type MiddleClass @variant {
  father: Poor
  mother: Poor
}

type Rich @variant {
  bank: EntityC
}

union Status = Poor | MiddleClass | Rich

type EntityA @entity {
  id: ID!
  status*: Status!
}

type EntityB @entity {
  id: ID!
  status_b: Status!
  status_b: Status
}

type EntityC @entity {
  id: ID!
}

Relationships

When an algebraic union is a member in an entity type, there are cross-entity constraints around how entity member fields, or list thereof, because they model relationship semantics. This is also the case for normal entities.

In order to validate these requirements, just proceed as if every entity has any field in occurring in a member algebraic type as a top level field. If the resulting set of entities have valid relationship references, then the original usage of entity member fields in algebraic types is valid.

Concepts

An algebraic type can be represented as a labelled tree as follows

tree(union type <name> = T_1 | ... | T_N) = T[UNION||name, (ε, tree(T_1)), ..., (ε, tree(T_N))]
tree(type <name> @variant { f_1: T_1 ... f_N: T_N }) =
- T[VARIANT||name, (f_{g_1}, tree(T_{g_1})), ... , (f_{g_M}, tree(T_{g_M}))] where g_i are distinct indexes of one or more algebraic member fields.
- Node[VARIANT||name] when all fields f_i are non-algebraic.

where

for labelled non-empty trees c_1,...,c_n, and string labels l_1,...,l_n, T[x, (l_1, c_1), ..., (l_N, c_N)] is the tree with the root labelled with x and the roots of c_1,...,c_n as children, each with an edge labelled l_1,...,l_n.
Node[name] is a labelled node without any children.
|| is a string concatenation operator.
ε is the empty string.

Any node in such a tree that corresponds to a algebraic union type, is called a union node, an any node corresponding to a variant type is called a variant node.

Given such a tree we can define the idea of coherent union, which is a subset of union nodes in the tree such that if you mark all edges from a node in the set all the way to the root, then this marked tree should have no union node which has more than one marked edge with a child. From this it should be clear that for any two nodes in such a set, the way that their paths to the root avoid violating this constraint is that the join up in some variant node because it has at least two union members. Here is the set of all coherent unions for Status above

{Status}
{Status/Poor}
{Status/MiddleClass.father}
{Status/MiddleClass.mother}
{Status/MiddleClass.father, Status/MiddleClass.mother}
{Status/Rich}

where each node is represented by its path from the root. Computing the set of of all such matches is trivially done recursively by coherent_unions(ε, tree(Status))

coherent_unions(s, T[UNION||name, (ε, T_1), ..., (ε, T_N)]) = {s||name} U coherent_unions(s||name/,T_1) U ... U coherent_unions(s||name/,T_N)
coherent_unions(s, T[VARIANT||name, (f_1, T_1), ... , (f_N, T_N)]) = coherent_unions(s||name||.||f_1,T_1) U ... U coherent_unions(s||name||.||f_N,T_N) U all_combined_coherent_unions(s, name, (f_1, T_1), ... , (f_N, T_N))
coherent_unions(Node[VARIANT||name]) = Ø

where all_combined_coherent_unions(s, name, (f_1, T_1), ... , (f_N, T_N)) will for be the union of

for each non-empty subset of inputs T_{g_1},...,T_{g_M} where M>1 do the next step
compute C_{g_i} = coherent_unions(s||name||.||f_{g_i},T_{g_i})
return flattened version of C_{g_1} X ... X C_{g_M}, i.e. where tuples are turned into sets.

GraphQL API

The key goal of the generated API for entities that have one or more algebraic member types is to allow for safe, expressive and practical queries that are sensitive to the algebraic structure. The key observation in the resulting API is that such queries each correspond to the concept of a coherent union, defined prior, as follows.

Pick an entity type which has algebraic union member fields f_1: T_1,...,f_N:T_N. Notice that we allow for the entity type to have multiple union member fields.
Compute the set of coherent unions C_i for each union type T_i.
For each subset of fields f_{g_1}, ..., f_{g_M} do the next steps.
For each (c_1,...,c_M) in C_{g_1}x...xC_{g_M} do the next steps. Recall that c_i is a set of union nodes.
For each (n_1,...,n_M) in c_1,...,c_M do the next steps.
For each (v_1,...,v_M) where v_i is a child node of n_i, generate the following query:

accepts where and order-by OpenCrud input types for each v_i if type is suitable (i.e. has comparable fields), and also pagination inputs.
returns a type which is the result of taking the initial entity type, taking each v_i and replacing n_i in the type tree to which it corresponds.

Database

Here the idea is very simple, simply take the table for any entity type and fully flatten, in the natural way, the type tree of any algebraic type. Union case indicators should be encoded as database level enumerated types. There should also

The TypeORM embedded entities approach may be a natural way to do this at the query node level, as it allows for the same class for an algebraic type to be reused across multiple fields in an entity, or multiple entities, as that is possible.

The most important thing is to capture as many constraints as possible at both the ORM and database level. The former will give the mapping author an ergonomic and safe interface to work with. It should indeed be possible to offer them a fully statically type safe interface for working with a given generated schema. The latter will protect against inadvertent representational corruption, for example when writing migration or initialisation code in the future.

cli: `warthog codegen` fails silently

After running yarn codegen:server I expect to have generated/graphql-server/generated folder which holds the schema for the GraphQL API. Running yarn warthog codegen inside the graphql-server directory fails due to an error and I can see details. But hydra-cli fails silently at the warthog codegen stage.

Deploy the Indexer API server for Kusama network

Reindex Kusama and spin up Indexer API

Missing import for `variant` types

For the BigInt scalar type BN.js is used. If a variant type has a field of type BigInt then BN must be import. Adding import statement to templates/variants/variants.mst should be enough.

The Graph and the Query Node

The Graph

Here is a summary of my understanding of The Graph, please correct any possible misunderstandings on my part:

The Graph is

a standard for specifying a GraphQL API, and associated WASM blockchain data processing routines called mappings, for maintaining the underlying data for this API.
A particular instantiation of this two concepts is called a subgraph, hence there would for example eventually be a Joystream subgraph. Currently this standard
only covers Ethereum.
a set of tools, the centrepiece of which is a Rust based API serving node, which can load a subgraph dynamically.
Currently, this tooling only works for Ethereum.
a future network of node operators which will operate infrastructure for different subgraphs. The key goal here
is to incentivise these operators to provide quality service at scale, and also to provide honest query results.
How this is to happen is yet to be resolved. All current uses of The Graph rely on a trusted operator, e.g. such as the
DApp developer.

Using The Graph for our query node

There is a good chance that The Graph, both as a standard and the tools, is coming to Substrate.
The timeline for when anything production ready would be available is however very uncertain.

There are a number of plausible benefits of relying on The Graph, rather than rolling our own full stack bespoke solution

Better tooling: They are writing a high performance query node, and have a large team (15+) working on improving and maintaining it, as well as substantial community buying, even at an early stage. Our own solution is entirely bespoke,
and written largely in Python and Typescript, and has much less surrounding tooling and documentation.
Outsourcing unresolved hard problems: There are some important hard problems that need to be resolved, such as how to deal with in-flight runtime upgrades, or how to authenticate the responses of the query node. There is a much greater chance
that The Graph will solve this problem better than us, and even if not, we have other areas of focus which are worth trading off against investing in the query node.
Follow a standard: It will be easier for new developers in the Substrate ecosystem to contribute and improve our query infrastructure, if it follows some familiar standard. If The Graph comes to Substrate, many will adopt it, and thus there
will be a larger pool of trained developers who can improve the query node at a lower barrier to entry.
Free features: Things like filtering, sorting, pagination and in the future aggregate functions with grouping, are part of the well designed framework, you get them for free without any extra coding. We would have to replicate this in each query by hand, or
at least replicate some reusable abstraction we can inject in our manually written query resolvers, such as The Graph has already done.

Impositions of The Graph

This is the current main design constraints we must respect in order to have our API and blockchain data processors maximally transferrable to a future Substrate The Graph.

Join free queries: The Graph requires that each query exactly one entity type at the data layer, and accepts no user defined type arguments, or allows the developer to write query resolvers.
There is an automatic query resolver supplied which simply looks up across instances of the single entity type in the data layer. This means that if we have a desired query which needs to do an implicit join operation access to multiple different entities in the Substrate
storage layer, then the entity type in The Graph be this join product itself. Critically, even with this, we cannot replicate any conceivable join query at this stage, because aggregators are not currently ready.
Pure mappings: It appears that The Graph allows you to write mappings that key off one of the following: contract calls, block arrival, contract event. This means that each one of these must contain all relevant information to perform the
required mapping. E.g. if a particular event occurs, the event parameters defined by the contract author must have included all information that is needed for the query node mapping author to figure out what side-effect this event will
have on the set of entities in the API. This is not the case for many events that we have currently defined in the Substrate runtime. This has so far not been a problem in our own bespoke node, because Substrate events exposed by the Harvester
will include information about the initial call that was part of triggering it, and together this has always been sufficient.
No filtering, sorting, pagination: This is not really a requirement per say, its just that, if we try to add this by hand, we will be duplicating work we get for free. So perhaps the best approach is to only add this by hand if we
absolutely need it for our UX in the interim.
Write mappings with Assemblyscript in mind: The Graph has tooling for compiling a subset of Typescript down to WASM. We should write our data processors in a way which has this in mind, by sticking as close as possible to the subset of Typescript
available in Assemblyscript.

Risks

The Graph may never arrive for Substrate, and some of the constraints may have had some costs which will not then in the end made up for.
The Graph for Substrat may end up being materially different from the existing The Graph for Ethereum, in which case some of the listed constraints may be false, or there may be other new constraints we have not taken into account, which all conspire to raise the cost of the transition.

Misc: processors should use Indexer API for event sourcing

Fail-fast UX for mapping developers

We should be able to detect as many mapping errors as possible without fully deploying to the indexer. This may include:

unit tests against prefetched substrate data
Type definition provided by the user

WIP: Field type spec

Write proper spec for @field type references here

Joystream/joystream#1378 (comment)

Has to describe

What are allowed member types? can one do unions, other field types, reverse lookups, relationships?
How to model in database? should optimize for safety and Warthog compaitiblity.
How to expose in AIP? Should filter, ordering be possible?

Integration testing: run a basic scenario

Submit a tx
Wait for the indexer to process the block
Wait for the processor
Query the GraphQL server

Define mappings in a manifest file

Currently, the indexer looks up the mappings for an event solely based on the function name ( as per Joystream/joystream#1073 the expected format is )

A more intuitive and flexible approach would be to define the mappings in a manifest file, similar to TheGraph

Here is the possible metadata to be defined in the manifest:

Fine-grained definitions for handlers. A mapping can be defined per each (event, extrinsic) pair with an arbitrary path and name
Provide type definition for extrinsic and event parameters
Starting block height

Later on, the manifest may be used to implement the following features:

Possibly address the type-safety issue. The manifest file can be used by Hydra CLI to generate type-safe mappings stubs by looking up in the database type definitions
Pre-flight checks: Hydra may warn the user if the event has been emitted by an extrinsic not explicitly defined in the manifest

Hydra indexer fails with OutOfMemory

index-builder: 0.0.7-alpha
event: treasury.tipClosing (2417683-3)
block no: # 2417683

The indexer consumes all the memory while trying to update the tipper in the following mapping snippet:

// A tip suggestion has reached threshold and is closing.
export async function handleTipClosing(db: DB, event: SubstrateEvent) {
  const { Hash } = event.event_params;
  const { extrinsic } = event;
  const tip = await db.get(Tip, { where: { reason: Buffer.from(Hash.toString()) } });

  assert(tip, 'Invalid reason hash!');
  if (tip && extrinsic) { 
    const t = new Tipper();
    t.tipper = Buffer.from(extrinsic?.signer.toString());
    t.tipValue = new BN(extrinsic.args[1].toString(), 10);
    t.tip = tip;
    await db.save<Tipper>(t);
    console.log(`Tip: ${JSON.stringify(tip, null, 2)}`);
    tip.closes = new BN(event.block_number.toString());
    await db.save<Tip>(tip);
  }
}

Here is a full stacktrace together with the debug logging:

indexer_1                  | 2020-08-11T10:04:41.944Z index-builder:indexer Yay, block producer at height: #2417683
indexer_1                  | 2020-08-11T10:04:41.945Z index-builder:indexer Processing event treasury.TipClosing, index: 0
indexer_1                  | 2020-08-11T10:04:41.945Z index-builder:indexer 			Parameters:
indexer_1                  | 2020-08-11T10:04:41.946Z index-builder:indexer 				[object Object]: 0xf07a241d567dd16481ac72bcccac163b06dc63002c9606eea9f6adb9250c450d
indexer_1                  | 2020-08-11T10:04:41.947Z index-builder:indexer 			Extrinsic: treasury.tip
indexer_1                  | 2020-08-11T10:04:41.947Z index-builder:indexer 				Phase: {"ApplyExtrinsic":3}
indexer_1                  | 2020-08-11T10:04:41.947Z index-builder:indexer 				Parameters:
indexer_1                  | 2020-08-11T10:04:41.947Z index-builder:indexer 					H256: 0xf07a241d567dd16481ac72bcccac163b06dc63002c9606eea9f6adb9250c450d
indexer_1                  | 2020-08-11T10:04:41.947Z index-builder:indexer 					Balance: 100000000000000
indexer_1                  | query: START TRANSACTION
indexer_1                  | 2020-08-11T10:04:41.951Z index-builder:indexer Recognized: treasury.TipClosing
indexer_1                  | query: SELECT "Tip"."id" AS "Tip_id", "Tip"."created_at" AS "Tip_created_at", "Tip"."created_by_id" AS "Tip_created_by_id", "Tip"."updated_at" AS "Tip_updated_at", "Tip"."updated_by_id" AS "Tip_updated_by_id", "Tip"."deleted_at" AS "Tip_deleted_at", "Tip"."deleted_by_id" AS "Tip_deleted_by_id", "Tip"."version" AS "Tip_version", "Tip"."reason" AS "Tip_reason", "Tip"."who" AS "Tip_who", "Tip"."finder" AS "Tip_finder", "Tip"."deposit" AS "Tip_deposit", "Tip"."closes" AS "Tip_closes", "Tip"."finders_fee" AS "Tip_finders_fee", "Tip"."retracted" AS "Tip_retracted" FROM "tip" "Tip" WHERE "Tip"."reason" = $1 LIMIT 1 -- PARAMETERS: [{"type":"Buffer","data":[48,120,102,48,55,97,50,52,49,100,53,54,55,100,100,49,54,52,56,49,97,99,55,50,98,99,99,99,97,99,49,54,51,98,48,54,100,99,54,51,48,48,50,99,57,54,48,54,101,101,97,57,102,54,97,100,98,57,50,53,48,99,52,53,48,100]}]
indexer_1                  | [DEBUG] 10:04:41 structured-stack added to watcher
indexer_1                  | Tip created
indexer_1                  | Tipper created
indexer_1                  | query: SELECT "Tipper"."id" AS "Tipper_id", "Tipper"."created_at" AS "Tipper_created_at", "Tipper"."created_by_id" AS "Tipper_created_by_id", "Tipper"."updated_at" AS "Tipper_updated_at", "Tipper"."updated_by_id" AS "Tipper_updated_by_id", "Tipper"."deleted_at" AS "Tipper_deleted_at", "Tipper"."deleted_by_id" AS "Tipper_deleted_by_id", "Tipper"."version" AS "Tipper_version", "Tipper"."tip_id" AS "Tipper_tip_id", "Tipper"."tipper" AS "Tipper_tipper", "Tipper"."tip_value" AS "Tipper_tip_value" FROM "tipper" "Tipper" WHERE "Tipper"."id" IN ($1) -- PARAMETERS: ["iZe2Tnba6"]
indexer_1                  | query: INSERT INTO "tipper"("id", "created_at", "created_by_id", "updated_at", "updated_by_id", "deleted_at", "deleted_by_id", "version", "tip_id", "tipper", "tip_value") VALUES ($1, DEFAULT, $2, DEFAULT, DEFAULT, DEFAULT, DEFAULT, $3, $4, $5, $6) RETURNING "created_at", "updated_at", "version" -- PARAMETERS: ["iZe2Tnba6","TkShhLYuL5",1,"8jPSZUKpM",{"type":"Buffer","data":[68,84,76,99,85,117,57,50,78,111,81,119,52,103,103,54,86,109,78,103,88,101,89,81,105,78,121,119,68,104,102,89,77,81,66,80,89,103,50,89,49,87,54,65,107,74,70]},"100000000000000"]
indexer_1                  | Tip: {
indexer_1                  |   "id": "8jPSZUKpM",
indexer_1                  |   "createdAt": "2020-08-06T14:38:15.138Z",
indexer_1                  |   "createdById": "hY-lk33TB2",
indexer_1                  |   "updatedAt": "2020-08-06T14:38:15.138Z",
indexer_1                  |   "updatedById": null,
indexer_1                  |   "deletedAt": null,
indexer_1                  |   "deletedById": null,
indexer_1                  |   "version": 1,
indexer_1                  |   "reason": "0x307866303761323431643536376464313634383161633732626363636163313633623036646336333030326339363036656561396636616462393235306334353064",
indexer_1                  |   "who": "0x44d033080eff366766ceee9defe975cc92a07a9e9815cc6b58bcb2b9cc5a6341",
indexer_1                  |   "finder": "0x4839655376576533347651444a4157636b6554485753715343685261743862674b48473339474331666a76456d3779",
indexer_1                  |   "deposit": "",
indexer_1                  |   "closes": "",
indexer_1                  |   "findersFee": false,
indexer_1                  |   "retracted": false
indexer_1                  | }
indexer_1                  | query: SELECT "Tip"."id" AS "Tip_id", "Tip"."created_at" AS "Tip_created_at", "Tip"."created_by_id" AS "Tip_created_by_id", "Tip"."updated_at" AS "Tip_updated_at", "Tip"."updated_by_id" AS "Tip_updated_by_id", "Tip"."deleted_at" AS "Tip_deleted_at", "Tip"."deleted_by_id" AS "Tip_deleted_by_id", "Tip"."version" AS "Tip_version", "Tip"."reason" AS "Tip_reason", "Tip"."who" AS "Tip_who", "Tip"."finder" AS "Tip_finder", "Tip"."deposit" AS "Tip_deposit", "Tip"."closes" AS "Tip_closes", "Tip"."finders_fee" AS "Tip_finders_fee", "Tip"."retracted" AS "Tip_retracted" FROM "tip" "Tip" WHERE "Tip"."id" IN ($1) -- PARAMETERS: ["8jPSZUKpM"]
indexer_1                  | Tip created
<--- Last few GCs --->

[2067232:0xabad60]    41858 ms: Mark-sweep 1388.2 (1423.8) -> 1388.2 (1424.3) MB, 1666.1 / 0.0 ms  (average mu = 0.182, current mu = 0.019) allocation failure scavenge might not succeed
[2067232:0xabad60]    43436 ms: Mark-sweep 1389.2 (1424.3) -> 1389.2 (1425.8) MB, 1576.5 / 0.0 ms  (average mu = 0.104, current mu = 0.001) allocation failure scavenge might not succeed


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x2e4f9c75452b]
    1: StubFrame [pc: 0x2e4f9c7556f3]
Security context: 0x310d184aee11 <JSObject>
    2: toString [0x30dae90bd031] [/home/hakusama/kusama/generated/graphql-server/node_modules/bn.js/lib/bn.js:~459] [pc=0x2e4f9cb35e6c](this=0x2e5b1a042e51 <BN map = 0x2dd3d46302a9>,base=10,padding=1)
    3: arguments adaptor frame: 0->2
    4: /* anonymous */ [0x3357c465b609] [/home/hakusama/kusama/generated/graphql-ser...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

UPD Temporarily commenting BN fields in the Tip entity class allowed db.save<Tip>(tip) to go through. This indicates that the problem is somehow related to how the BN fields are treated by TypeORM, even though I was not able to reproduce this behavior locally.

Indexer API: Filtering & Pagination

Subtask of #24

Processor API

All the key processor real-time stats should be exposed via API. This is needed for health-checking and integration testing.
The stats include:

Most recent processed event
Most recent indexer head the processor is aware of
Time since the last indexer poll

Design: Substrate Query Infrastructure Framework

Background

This issue describes what is meant by query infrastructure, and also why its needed

Joystream/joystream#17

Ignore the proposal itself.

Goal

Develop a software framework for making query infrastructure for Substrate runtimes, and use it to start the implementation for the Joystream runtime specifically. Moreover, the framework is designed to be as compatible as possible with The Graph protocol, in the sense that, it should minimise the cost of translate an instance based on this framework into an instance based on a possible future The Graph Substrate compatible standard.

Architecture

The query infrastructure consists of the following three servers operating in concert:

A GraphQL server serving the API. The GraphQL API only has queries, no subscriptions or mutations, and these are resolved into a standalone relational database server (2). Critically, each query must correspond to exactly one table in the database, in essence meaning that queries map to a single SELECT lookup, without any joins.
A relational database server which holds the current query state. The database must also hold some state which represents exactly how much the blockchain has been processed to result in the current state of the database. This information must be atomically updated with processing of each mapping, as it allows the whole query infrastructure to continue if it is halts during operation for any reason (power outage, lost connection to full node, etc.).
A block ingestion server which processes blocks and corresponding emitted events originating from a given Substrate archival node (4), and updates the database based correspondingly.
A Substrate archival node is a full node which stores, for all blocks, the set of events that were emitted. A normal full node will not do, as it only emits events being generated by ongoing validation, and this is not sufficient for our purposes, since the ingestion processes may need access to other events, for example during initial synchronisation or catchup.

It should be the case that 1 and 2-3 should be able to run on separate hosts.

Developer Workflow

To instantiate query infrastructure for a given runtime using the framework, the developer as to provide the following:

API description: A description of all the types and queries which will be in the API, and also have a corresponding table in the database. The query that is exposed in the actual GraphQL API will also include OpenCrud arguments for filtering, pagination and ordering, but this is not included in this description. There should also be documentations in this description, and it should be propagated all the way to the database and GraphQL schemas.
Event processors: An event processor is a (Typescript) function which corresponds to a specific event name in a specific module, and updates the query database based on the semantics of the event, along with information about the originating transaction and transaction parameters, if it applies. Some events may originate from block finalisation code, e.g. on_finalize, or from genesis builder logic, this is why there isn't always an extrinsic. The developer must write such processors for each event that must be detected in order to properly manage the query state. These processors will often just update the table for a single query, but not always. They correspond quite closely to the concept of mappings in The Graph.

Once a developer has these ready, there should be some simple CLI tool for generating the database schemas, ORM library for talking to the database and the GraphQL server schema. The CLI tool should probably also help with setting up a workspace for developing, packaging and deploying your own infrastructure.

Framework Implementation Requirements

Must use Typescript.
Must use Warthog GraphQL API framework. Provides autogenerated, database, GraphQL schema with OpenCrud support and client side ORM.
Must have have tests and CI.
Documentation written in API description should propagate to become autogenerated documentation for the GraphQL API and database schemas.
Locate in new root directory substrate-query-node in repo https://github.com/Joystream/substrate-runtime-joystream.

Joystream Node Implementation Requirements

Must use Typescript.
Should use Joystream/types library.
Must have have tests and CI.
Located in new root directory joystream-query-node in repo https://github.com/Joystream/substrate-runtime-joystream.
Reliable and automated deployment, e.g. through dockerization of some sort.
Targets Constantinople runtime, with key queries for membership and proposal modules.

Questions

How should the types found in the runtime be encoded in the database and the GraphQL schema. For example, u128 that may be part of runtime written in Rust, how should that be encoded? Keep in mind that we want an encoding which allows us to
(Hard Problem): How should we deal with runtime upgrades? Runtime upgrades may often also involve on-chain migrations of the stored state, and totally new types. Locally in the query infrastructure, how should it attempt to deal with this, and also how should a query node synch up with a chain which may have multiple upgrade since genesis? A full local migration may involve
- updating database schema
- migrating database tables
- updating graphql schema & event processors

Indexer API: subscriptions

Subtask of #24

The indexer should publish updates to the subscribers, including the following info:

Current indexer head
Events in the given block

Decode extrinsic args for sub call

Context: Runtime events don't provide all the data that we need in order to update the underlying database so we need more data like extrinsic args etc. For example, Kusama/Polkadot treasury module has TipClosing event and this event is dispatched by the treasury module's tip function.

If an extrinsic call is direct to this function then the extrinsic args are available. But in a sub call the situation different. The arguments for the sub call are encoded in bytes.

Look at those two examples the first one is direct call and the second one is sub call:

when we read extrinsic args in mappings we have the following:

#Extrinsic Args:

{
	threshold: 2,
	other_signatories: [
		FG78iuAYrn43g8b3DFjroC6mTyMDqc5xk6cbETsH1MFGCKa, Fa3N98oETbFcjTX3pCVdJ7gFCs5NgxDYPVw1gJiexLnF6rM
	],
	maybe_timepoint: {"height":3398237,"index":3},
	call: 0x12064fd51ac10d122719a67621607ff10a1fd58a04dae0e714010530380d408edb7e00e057eb481b00000000000000000000,
	store_call: false,
	max_weight: 252000000
}

as you can see that the call argument is encoded (Bytes - (@polkadot/types/primitives)

We can decode call data like this:

const c = api.createType('Call', extrinsic.method.args[3]);
console.log(c.toHuman());

//OUTPUT
{
  args: [
    '0x4fd51ac10d122719a67621607ff10a1fd58a04dae0e714010530380d408edb7e',
    '30.000 KSM'
  ],
  callIndex: '0x1206',
  method: 'tip',
  section: 'treasury'
}

Note that this requires ApiPromise instance in the present.

Allow developer to specify ID

Currently, this is not under the control of the mapping author, but it's important for creating rational APIs, with good semantics and enforce consistency across query nodes.

Indexer crashes at blocks containing large number of events.

Version: [email protected]
Issue: The indexer crashes at block 2307545 of Kusama, containing 5000+ events.

Improve observability: integrate Prometheus or any other metrics collector

For stable operation, key metrics of the indexer and the mappers should be exported. Prometheus is a popular way to expose system- and application-level metrics for an external monitoring system.

Indexer API: Runtime

subtask #24

Remedies for an Insecure Query Node

Background

We have concluded that we are going to have a query node as an intermediate data layer between full nodes and end-user applications.

Security

An application such as Pioneer enables many sensitive user actions, for example associated with
transferring funds, staking, slashing, etc. These high stakes operations depend on the application
being served accurate information about how and when such actions may be undertaken. If the data
layer for Pioneer is not supposed to be a trusted third party, then it becomes an issue
how we can secure the integrity of such information being served to end-user applications.

The standard approach of having a working group with staked and rewarded query nodes is not going
to be sufficient. This is because the operations in question are so high value, and therefore the payoff of abuse will correspondingly be very high in many instances,
as a result, the amount of stake required would be prohibitive to make the scheme incentive compatible,
and the unavoidable imperfection of discretionary slashing would generate a huge risk for even honest query node providers.

A beneficial asymmetry

Upon further thought, its becomes clear that the problem has a built in asymmetry which is of great benefit.

The content consumption, and perhaps seven publishing, use case for accessing the system is almost entirely insulated from this problem. This is because the chain read operations and transactions that are relevant are not as high value, and there are far fewer of them. At the same time, this mode of access will likely constitute the overwhelming majority of read operations on the system. In contrast, all the governance and operational use cases are very exposed to this problem, however, at any given time, the volume of operation, and the number of people involved in this, will be an order of magnitude lower.

This asymmetry allows us to treat the two cases separately, and this separate treatment will probably fall within entirely separate products as well.

Remedies

In light of this, we can do as follows.

For apps that are less, or perhaps entirely not, exposed to the security issue at hand, one among an untrusted set of query nodes is used, which then preserves many of the desirable benefits of the query node, while side stepping the security issue.

For apps that are exposed, we have to pick one among the following approaches.

Back to full node

We switch back to using full nodes as the data source, in which case all read operations will have accompanying light client proofs that can be verified.

It should be said that this is not entirely without cost, as the sensitive apps will have all the downsides we wanted to avoid. They will be complex to write, as each query has to be broken down into N separate basic key+val reads, the latency will be high, and there have to be some set of full nodes that run in archival mode, at least for the last M blocks. This latter constraint is because an app may needs to make all N state reads at the same, recent, block height, even as new blocks are possibly coming in and updating the state.

Back to full node with runtime index

This is similar to the former approach, however we take the extra step of hard coding in all the query state we would expect a query node to hold, into the runtime itself. This has the benefit of resolving some, or maybe even all, the original downsides of the full node approach, but has many obvious new costs, such as

state size bloat
harder to migrate state during upgrades
more complex runtime code
runtime tied to one specific app/UX
unlikely to support arbitrary rich query indexes

but for simpler modules, these could possibly be acceptable.

Signed queries + magic

We could also just rely on the query node alone, but require query nodes to sign their query responses in a way which allows the client to prove bad queries. Leaving a side the probably performance penalty of having to sign all queries at scale, this is hard problem to solve fully, for the following reasons:

The client application cannot automatically detect misconduct, it can simply retain the signed queries, and some manual forensic effort would be required to detect the effort, and match it to one or more false query replies.
Validating the proof cannot be done by the runtime, as that would require maintaining the query logic in the runtime, it would have to be done by some role which runs a query node and is incentives to police correctly.

Its unlikely that step 1 could be replaced by some sort of probobalistic challenge response protocol, which in principle could resolve the detection problem.

Bundle App + Query node + Full node

We are already quite certain that, on balance, the best platform for distributing an experience for governance purposes would be some sort of native client application on the desktop. In this case, one could bundle a full node and a query node in this package, which retains all benefits we could want, and only has the cost a of

a larger binary
greater client side processing costs
excludes other distribution platforms, such as a pure browser

This seems like the most ideal solution.

Note: Point 3 is actually not entirely correct. We could, at least in theory, bundle a javascript implementation of the Substrate node and query node, all running client side in the browser. There is already work being done to build a browser based full node for Substrate.

https://github.com/polkadot-js/client

Support algebraic types in schema

Background

In our runtime, we have many types, used in both storage, events and parameters, that that have a rich algebraic structure.
Here is an example

enum SingleStatus {
	Happy,
	Unhappy,
	Looking(/* years */ u32)
}

enum MarriageStatus {
	Single(SingleStatus),
	MarriedTo(/*Spouce */ Person, /* years married*/ u32),
}

struct PersonStatus {
	is_eduated: bool,
	marriage_status: MarriageStatus
}

struct Person {
	account: T::AccounId,
	status: PersonStatus
}

These types need to be reflected in the GraphQL API of our query node, and there are roughly three ways to do it

1. Flatten

Here we just encode like this

type Person @entity {
	account: Bytes!
	status: Bytes!
}

Where status is just a serialized version of the Rust PersonStatus type, for example using the native serialisatin in our joystream/types library. Now on the client side, there would have to be a corresponding deserialization when reading from the API, and serialization when wanting to query for specific values.

The main downside here is that

Semantic content and valid values for status is not self-documenting and endogenous.
Invalid values for status are easily generated inadvertendly by both API user and mapping author.
Creates extra burden on both sides to do the serialisation and desearialialisation.
Masks field selection inside encoded field.

2. Sloppy `The Graph` way

Here we just break the type safety entirely,

type Person @entity {
	account: Bytes!
	marriage_status: MarriageStatus,
	is_eduated: bool,
	single_status: SingleStatus
	...
}

enum MarriageStatus {
	Single,
	Married
}

enum SingleStatus {
	Happy,
	Unhappy,
	Looking
}

...

This creates a nightmare in almost every respect. It becomes hard to

write mappers correctly
understand the runtime invariants that exist all fields in Person for API consumer
write queries that are well formed properly, because lots of invalid types are representable, e.g. no education is define but marriage satus is.

3. Less sloppy `The Graph` way

Here we make our schemas much safer by introducing , currently undocumented, interfaces for each union, as in this slightly different example

type Person @entity {
  id: ID!
  name: String
  status: Status!
}

interface Status {
  id: ID!
}

type Single implements Status @entity {
  id: ID!
  last_partner: Person
}

type Married implements Status @entity {
  id: ID!
  current_partner: Person!
  isHappy: Boolean!
}

This results in a table being made for Single and Married, but not for Status, and Person holds a single non-foreign key value which is a string representation of either a Single or Married row ID.

The main issues here are that

Status becomes a queriable type in the API
There is no database integrity check tht enforces that a valid ID is held for the status. This means the ID may point to neither table, or it holds an ID that exists in both!

4. Introduce algebraic types

Here we introduce non-entity types and unions at the schema level, and enforce the suitable representation in Postgres to make sure all integrity constraints are enfroced. In this case the schema will be isomorphic to the Rust code above.

The main downside here

We lose full compaitiblity with The Graph at least for some time, possibly permanently.

Proposal

The cost of 1-3 above seem more substantial than the expected cost of losing full compatibility with The Graph, the cost of which is that we do not get to outsource the maintenance of the framework component of our codebase. Given how well its already working, and how much of the code lives on the user side of that boundry (schemas, mappings, documentaion, tests, client code), it seems like option 4 is the best approach on balance.

There may be many ways to execute on 4., one option is to

make every non-entity type into a separate table, with a full row unique constraint and an artificial primary key, but which is not directly queryable.
make every union type into a Postgres union table, but which is not directly queryable

Subscriptions

Introduce a subscription per entity, which announces when an entity is created, removed or updated, with suitable extra information.

Cli: Explicit `id: ID` field support

In every entity definition (schema.graphql) we want to have id: ID! field defined explicitly and add a description where does the value comes from or how it is calculated ie:

type Class @entity {
  "Randomly generated unique string"
  id: ID!

  classId: BigInt!
}

type Property @entity {
  "The field value comes from 'classId:propertyId'" 
  id: ID!

  propertyId: Int!
}

Integration testing: spin processor

Request pipelining and concurrent block fetching

The performance of fetching raw block data can be significantly improved by pipelining API requests and fetching different blocks concurrently. Note that in contrast to applying the mappings, the blocks don't have to be fetched sequentially.

GraphQL support for Indexer API

This is a separate issue for adding GraphQL wrapper to Indexer API

Integration testing: create basic mapping

Major Directions

Babylon Release

Given the Hackathon submission experience of using Hydra with Kusama, we are now laser-focused on solving some salient last-mile obstacles to deploying the first version solely for Joystream purposes.

Here is the issue currently used to summarise Hydra team meeting summaries

#10

✅ Faster Blockchain Synchronization

Due to a number of factors, it took some 10-14 days to synch up to 3M blocks on Kusama. This is a very long time when combined with the fact that changing your schema and/or mappings in any way currently would require resynching from scratch. There are many obvious ways of radically reducing this time.

✅ One-Time Blockchain Synchronization

As mentioned in the prior issue, when doing development, you are likely to change your schema and/or mappings. If you did not have to resynch the chain, but instead build a local database of all relevant events and transactions, then reprocessing them when you make changes would be much faster, and the development experience would be radically improved. It would also significantly help with reproducing failures during processing due to bugs.

✅ Separate Databases for Blockchain and API Indexes

Currently, the local database mentioned in the prior point is the same database that holds the search index. This prevents a single such database from servicing distinct APIs, or even just API nodes with the same API, from the same database. This means every developer working against the same chain has to rebuild this database locally, or in the cloud when deploying, from scratch every time.

Post Babylon Release

Statically Type-Safe Mappings

Input to mappings, which include the event parameter values and possible originating extrinsic, is dynamically typed. This means that the developer has to do lots of manual type conversions and checks, any one of which could easily have a mistake. By moving to statically typed mapping signatures, it will be easier for developers to write, maintain, test and document mappings.

Mapping Manifest

Currently, we are just using a naming convention to capture what mappings should run for what event/transaction. This is very brittle, a standalone manifest makes more sense.

Transaction Processing

Only event processing is possible at the moment. This is often very inconvenient, because of two reasons.

Unless the module developer has followed the convention of replicating all extrinsic parameters in the corresponding event, which is rarely the case, then just about all mappings require that one recovers the originating extrinsic associated with an event. A single event could in principle originate from more than one extrinsic, or even from a "dynamic" extrinsic like that found in SUDO or Utility pallets. Being forced to write mappings that deal with this heterogeneity is difficult, and the Hydra node deployed for Kusama failed a number of times because of this. Tracking down the exact details of each oversight is also time-intensive.
This heterogeneity makes it next to impossible to autogenerate the types required for statically type-safe mappings (see above) because there is no way to automatically determine what extrinsics could generate each event. A better compromise would be to have type-safe mappings, where event mappings only depend on event parameters, and extrinsic mappings only depend on input parameters. In both cases, all required static types can be easily autogenerated from runtime metadata.

Improved Hosted Deployment

A starter pack to quickly deploy a Hydra stack to Heroku
Sample CircleCI scripts

Long-Term

There are a few major directions we are evaluating after this short term faze.

FRAME Paletts Support

Joystream uses some important palletts that are part of FRAME, such as balances, staking, etc. These are likely also of great interest to other runtime developers. Beyond this, it could also make sense to implement support for many of the other prominent FRAME palletts in order to encourage the adoption of Hydra.

Reusable & Stitchable Schemas and Mappings

Schemas and mappings written for one module or runtime should ideally be conveniently reusable in another. For example, if a module developer writes a new module, it would be nice if a corresponding set of schemas and mappings for Hydra could be shipped and reused as easily as the module itself.

Graceful Handling of Runtime Upgrades

Currently, we have no good ideas for how the query infrastructure could gracefully keep working across a runtime upgrade. This could involve a range of changes, including on-chain storage migrations, changes to types repertoire, etc. Some best practices, and possibly explicit tools and functionality, are required to handle this in Hydra.

Attributable Malicious Query Results

A Hydra node serving a DApp instance can compromise user funds if replying with malicious results. For example, the user may be provided an incorrect account to send their funds to for some purpose.

One way to attempt to deal with this is to write the Dapp in a way where it attempts to explicitly double-check certain key state variables by talking to a full node during critical steps with a light client. However, this can get quite complex for the DApp developer, and may not always be feasible.

Another model, which appears to be the most popular one, is effectively to nearly fully trust the DApp developer that owns and maintains the app or website that serves the client-side application the user runs to interact with the chain. Under this model, it's not a lot of extra risk to also trust the developer to run an honest Hydra node.

A third model is to require all query responses to include a signature from the originating Hydra node, and also to have a commitment in the result to a specific block and the initial query. This signature would serve as proof that a given node operator returned a given result, at a given block height, to a given query. If the node is staked, then such a signature of a malicious query could be submitted a DAO, or some incentive aligned actor set, which can then individually confirm the validity of the result, and vote to slash or not slash the node operator. This model is closest to what we would like to have in Joystream.

Improved Integration Tests

There is currently very little integration testing of Hydra.

Improved Documentation and Tutorials

We have something now, but having more non-Hydra developers start to use it internally is likely to reveal much room for improvement.

Separate fetching blocks and processing the mappers

Currently fetching the block data and applying the mappings is done in a single transaction. This becomes a major performance bottleneck, as the blocks can be fetched only sequentially. Further, any mapping failure aborts the data ingestion. We need to store all the event information in the database and process the mappings separately.

Address indexer stability issues

If the webservices endpoint is closed by the remote server, the indexer does not reconnect and is stalled indefinitely.

Add block arrival processing hooks

Indexer fail to fetch metadata for Joystream dev chain

Ensuring metadata is not working for @polkadot/[email protected]. Because of the error api uses the latest metadata for encoding/decoding.

Failed to get Metadata for block "0x41145e4919f7696b5c12a74da913f03387b5b21dc6a7f28d1a843ce43604907a", using latest.
TypeError: Cannot destructure property 'knownTypes' of 'undefined' as it is undefined.

After doing some debugging I see that api.registry is undefined somehow https://github.com/Joystream/joystream/blob/0753bbc913e6073fffeddc344a70473ccf10a76d/query-node/substrate-query-framework/index-builder/src/QueryService.ts#L49

Support multiple mappings

Currently, the mappings are tracked in a single table SavedEntityEvents and the entities share the same table space as the indexer. This is undesirable b/c of the following reasons:

It is hard to reset a mapping (since it shares the same table space with the indexer)
It is hard to run multiple mappings simultaneously (as the progress is tracked by a single table)

One of the possible solutions is to store mapping-specific information (progress and the entities) in isolated postgres schemas.

Indexer API: events

Subtask of #24

Type safe mappings in query node

Here is an example of how our mappings typically will look

export async function handleProposed(db: DB, event: SubstrateEvent) {
  const { ProposalIndex } = event.event_params;
  if (event.extrinsic) {
    const proposal = new Proposal();
    proposal.proposalIndex = ProposalIndex.toString();
	
    proposal.value = event.extrinsic?.args[0].toString();
    proposal.bond = event.extrinsic?.args[0].toString();
    proposal.beneficiary = Buffer.from(event.extrinsic?.args[1].toString());
    proposal.proposer = Buffer.from(event.extrinsic?.signer.toString());
    proposal.status = ProposalStatus.NONE;

    await db.save<Proposal>(proposal);
  }
}

One salient issue here is the amount of work that has to be done on decoding information in the event. This will have to occur in each handler, and likewise if the handler needs to look into the underlying transaction, it will have to decode those paramaters as well, and the logic for that will possibly be repeatedly invoked across multiple handlers. A minor mistake here, may generate an error which is very hard to track down.

Contrast this with a mapping in the graph protocol

import { NewGravatar, UpdatedGravatar } from '../generated/Gravity/Gravity'
import { Gravatar } from '../generated/schema'

export function handleNewGravatar(event: NewGravatar): void {
  let gravatar = new Gravatar(event.params.id.toHex())
  gravatar.owner = event.params.owner
  gravatar.displayName = event.params.displayName
  gravatar.imageUrl = event.params.imageUrl
  gravatar.save()
}

This is perfectly safe, as the event has a specific type. How could we replicate this sort type safety.

joystream / hydra Goto Github PK

hydra's Introduction

Joystream

Overview

Development

Install development tools

If you prefer your own node version manager

For older operating systems which don't support node 18

Run local development network

Software

Running a local full node

Pre-built joystream-node binaries

Mainnet chainspec file

Integration tests

Contributing

RLS Extension in VScode or Atom Editors

Authors

License

Acknowledgments

hydra's People

Contributors

Stargazers

Watchers

Forkers

hydra's Issues

Mon, Aug 24th

Agenda

Present

Topics covered

Conclusions

Background

Proposal

Basic indexer state queries:

Nice to have:

Even nicer to have:

Background

Proposal

Input Schema

Algebraic Types

Relationships

Concepts

GraphQL API

Database

The Graph

Using The Graph for our query node

Impositions of The Graph

Risks

Background

Goal

Architecture

Developer Workflow

Framework Implementation Requirements

Joystream Node Implementation Requirements

Questions

Remedies for an Insecure Query Node

Background

Security

A beneficial asymmetry

Remedies

Back to full node

Back to full node with runtime index

Signed queries + magic

Bundle App + Query node + Full node

Background

1. Flatten

2. Sloppy The Graph way

3. Less sloppy The Graph way

4. Introduce algebraic types

Proposal

Babylon Release

✅ Faster Blockchain Synchronization

✅ One-Time Blockchain Synchronization

✅ Separate Databases for Blockchain and API Indexes

Post Babylon Release

Statically Type-Safe Mappings

Mapping Manifest

Transaction Processing

Improved Hosted Deployment

Long-Term

FRAME Paletts Support

`Mon, Aug 24th`

2. Sloppy `The Graph` way

3. Less sloppy `The Graph` way