graphops / subgraph-radio Goto Github PK

View Code? Open in Web Editor NEW

7.0 1.0 1.0 9.09 MB

Gossip about Subgraphs with other Graph Protocol Indexers

Home Page: docs.graphops.xyz/graphcast/radios/subgraph-radio/intro

License: Apache License 2.0

Dockerfile 0.56% Shell 0.28% Rust 99.16%

graph-protocol graphcast indexers radio the-graph

subgraph-radio's People

Contributors

Stargazers

subgraph-radio's Issues

[Feat.Req] update to indexer-agent multi-network support

Problem statement
Indexer-agent now handles multi-network management, and indexer-cli requires protocolNetwork in indexing rules' schema. This required field is currently missing from offchain requests the radio operator send to the indexer management server.

Expectation proposal

Add user configurations for protocolNetwork
network-subgraph gateway endpoints follow the graph-network-xxx suffix convention
- if user doesn't specify a protocol network, network will be automatically derived from network-subgraph.
Add protocolNetwork to the offchain request

Alternative considerations

If network-subgraph is an indexer's local network subgraph deployment then the endpoint doesn't follow the suffix convention. We can optionally add a IPFS client that queries for subgraph deployment's manifest file, and read the network name from there. Since we are already adding a specific user config, this extra step can be done later on if there is more solid reason to add IPFS dependencies.

Add ProtocolNetwork enum to the SDK for strict validations

Add labels workflow

See PR description

Refactor Slack notifications to use a webhook URL

Problem statement
The existing slack integration requires a bot token, but really all we need is a webhook url integration, which is easier to set up as a user.

Expectation proposal
Refactor slack integration to just require webhook url

Message validation mechanism shouldn't be shared between features

Problem statement
Currently, the inbound message validation mechanism is set using the ID_VALIDATION configuration variable. That validation level is used for all features (POI cross-checking and Subgraph Upgrade pre-sync).

The default validation level is indexer, which is great for the POI cross-checking feature, but it means that all Upgrade intent messages will be discarded, because they're coming from Subgraph Developers (Upgrade intent messages will be accepted only if the level is graph-network-account, or lower). It's worth mentioning that Upgrade pre-sync messages are being tested for whether the sender is the owner of the Subgraph, but that happens after initial validation.

Users can of course set the validation level to graph-network-account, that way they'll accept both POI cross-checking messages from Indexers as usual, as well as Upgrade pre-sync messages from Subgraph Developers. That however would compromise security for the POI cross-checking messages.

Expectation proposal
Each feature should have a validation level, adequately corresponding to its function.

Possible solutions

We set a concrete validation level for Upgrade pre-sync messages to graph-network-account, while keeping the ID_VALIDATION for all other features (currently just the POI cross-checking one).
We allow granular control of the validation level for each feature.

[Bug] Proper control flow

Problem statement

Currently the program have separate graceful shutdown listener and hardcoded event durations and timeouts.
Async radio operations timeout durations no longer seem effective.

Expectation proposal

Utilize ControlFlow or something else that

Listens for unix signals
Graceful shutdown for Http server, metrics port, async radio operations, state persistence
Overall control of event intervals and schedules

Alternative considerations

review shutdown_signal
use running: Arc in more threads
joining separate threads before exiting to allows them to complete their tasks and release resources properly, like to close open files, network connections, and other resources properly before exiting.

Additional context

https://tokio.rs/tokio/topics/shutdown
Example but not effective in our context of complexity - https://github.com/tokio-rs/axum/blob/025144be7e500e498b036bee8ca8c0489c235622/examples/graceful-shutdown/src/main.rs#L31

ctrl+c hangs indefinitely when http server is running

Describe the bug
When Subgraph Radio is started with a SERVER_PORT provided, and I try to stop the Radio with ctrl+c, it hangs indefinitely with Shutting down server...

To Reproduce
Steps to reproduce the behavior:

Run Subgraph Radio (latest dev branch) locally with cargo run -p subgraph-radio
Try to stop the Radio with ctrl+c

Expected behavior
The server should gracefully shutdown, the server port should be freed, and the Radio should exit.

Desktop (please complete the following information):

OS: MacOS
Version Ventura 13
Rust Version rustup 1.26.0

Additional context
Not context but more of a question - will it be an issue if we forcefully shut down the server?

Remove locally tracked public POIs panel from Grafana dashboard JSON

Off the back of #81 we need to remove the Locally Tracked public POIs panel from the Grafana dashboard JSON file

Radio crashes randomly

Describe the bug
The instance of Subgraph Radio, running as a Docker container using our mainnet Indexer, is randomly crashing with no errors, after a restart it runs fine. The last logs we see are:

  2023-09-07T16:36:01.984928Z DEBUG graphcast_sdk: Signer is not registered at Graphcast Registry. Check Graph Network, e: ParseResponseError("No indexer data queried from registry for GraphcastID: 0x0f2840ec21b3a4af515358deb250f16d3bca3a7e"), account: Account { agent: "0x0f2840ec21b3a4af515358deb250f16d3bca3a7e", account: "0xb4b4570df6f7fe320f10fdfb702dba7e35244550" }
    at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/graphcast-sdk-0.4.2/src/lib.rs:284

  2023-09-07T16:36:02.239575Z DEBUG subgraph_radio::operator: Message processor timed out, error: "deadline has elapsed"
    at subgraph-radio/src/operator/mod.rs:383

This might get resolved with the recent update of the timeout limit for message processing (added as part of #65).

Sensible default of id_validation and topics_coverage, remove old metrics

Problem statement

For usability

id_validation default update from registered-indexer to indexer to permission more accounts on default
update log level from warn to debug for the local sender validation check against its own id_validation
coverage default from on-chain to comprehensive for higher topics coverage on the network
remove indexer_count_by_ppoi metrics in favor of the ratio strings

Expectation proposal

PR for the changes

Add peer and message received counts to Grafana dashboard

Following on from #53, we should add these metrics to the Grafana dashboard. Don't forget to export the new version with the "share externally" toggle enabled.

[Feat.Req] Regex message format during validation

Problem statement

Protobuf decoding can successfully decode a message to a format that is different from the original format the message was sent in

Expectation proposal

On a message type level, add regex validation on the specialized inputs

Qm hash validation
bytes/hex validation

[Feat.Req] Add toggle for functionalities

Problem statement
Currently we have the POI cross-checking and Subgraph Versioning functionalities enabled by default for Subgraph Radio users, but ideally Indexers should be able to choose which functionalities to enable/disable.

Expectation proposal
We should add a new config variable, something like functionalities / features

Improve UpgradeIntentMessage

ratelimit such that a given deployment only takes 1 upgrade (Issue link)
allowlist, blocklist (Issue link)
Check for signal - configurable minimal check

Document the potential attack on owner sending malicious messages can abuse indexers' resources, outline the capital needed by the owner to perform attacks

[Feat.Req] Store, serve, and clean version upgrade messages

Problem statement

Currently version upgrade messages are ephemeral. For debugging and transparency of the radio operations, a user might find the storage of the version upgrade messages useful when the upgrade is actively in progress.

Expectation proposal

Add persistence of VersionUpgradeMessages as part of persisted_state.
Add API get_version_upgrades function to Http server for stored messages.
Add cleanup of VersionUpgradeMessages when graph node is no longer syncing the old deployment.
Add E2E tests

Additional context
Related to #6

Add more e2e tests

Problem statement

We are currently missing E2E tests for

API service
Validation mechanisms
Other message types

Expectation proposal

test that persisted_state and API responses matches
- matching remote_ppoi_messages
- matching local_attestations
- matching comparison_results
- comparison_results should have just 1 per deployment
- remote_ppoi_messages should not have messages earlier than comparison_result's attested block
test various receiver validation and sender identity setup
- mismatched sender by a strict validation configured in the test_runner (graphcast_id to GraphNetworkAccount not okay, graphcast_id to GraphcastRegistered okay, graph account to Indexer not okay, ... perhaps make test-runner send a message under each identity and check against the runner config)
- unverifiable sender by a loose validation configured in the test_runner (no-check okay, ValidAddress not okay)
- loose validation with good sender should be okay
decode other messages but does not affect PublicPoiMessage operations
- test-sender to send a VersionUpgradeMessage, does not affect the basic tests for test-runner
- test-sender to send a message of random type, does not affect the basic tests
- ...

[Feat.Req] Message pipelining with persisted operations summary

Problem statement

Message pipelining is a crucial point for the success and efficiency of Subgraph Radio's POI cross-checking feature. It ensures that public POIs are seamlessly created, sent, received, processed, and cross-checked. This issue goes over the current state and missing pieces of the pipeline.

Triggering Creation of POI Messages

Upon message block number calculated by a fixed block interval and the synced indexing status, Subgraph Radio operator initiates the process to create a public POI message.

Sending the POI Message

Once a public POI message is created, operator immediately passes the message to Graphcast agent for the message signing and broadcast to the Graphcast Network, ensuring other Indexers are informed about the latest POIs.

Receiving and Processing

Other Indexers' Radios constantly listen for incoming public POI messages. Once a message has been send to the network, they start a local timer for message collection window and accept messages for that message block. When a message is received, it's authenticated and cached for further processing.

TODO: The message arrival time varies depending on the network connectivity, but make sure that we can accept remote messages sent before our local message if the time period is reasonable.

Cross-checking Calculation

After the collection window for a certain block, the radio compares the local POIs against the received public ones. The goal is to establish consensus across all Indexers (radios are limited to compute consensus from the received messages). If there are any new divergence, alerts are triggered.

Persisted State of Summary for POI Messages

Provide an overview and insight into the behavior and performance of the Subgraph Radio regarding the handling of POI messages, a persisted state of summary is maintained. This summary should enable users to quickly gauge the health, performance, and trends related to the POI messages without diving deep into individual message details.

We currently have summary logged for sending, comparing, and network updates, but there's a few improvements for better quality and comprehensiveness

Expectation proposal

Summary Specifications (can be extended)

Total topics tracked
- List of IPFS hash for the deployment
Total Message Count
- [ ]A running total of all the POI messages that have been processed. The count is incrementally update by adding the number of new messages. Potentially labeled with deployment hash in metrics and read from metrics.
Average message frequency
- The mean of receiving messages from the network, labeled with deployment hash. This gives us an insight to the message traffic. Update for each message block interval or use a simple moving average $SMA=\frac{x}{N}$ where $x$ is the sum of message counts over the last N periods.
Average Processing Time
- The mean time taken to process each POI message, giving an insight into the efficiency and speed of the Radio. Use equation Σ(Processing times of new messages) / Number of new messages for each block interval. In combination with Average message frequency, we can better understand the performance of radio in relation to the network.
- Explore ways to utilize autometrics for the functions
Total Divergence Detected
- A count of all instances where a divergence was identified during the cross-checking calculation. A consecutive diverged range of a deployment should be counted only once as metric is incrementally update by adding any new diverged state. Optionally labeled with deployment hash.
Frequent senders
- List of Indexers sorted descending by the number of POI messages received, useful to identify active participants or potential spamming nodes. Update using a mode calculation based on the source of the latest set of messages within a comparison interval.
Divergence Rate
- The percentage of messages that resulted in a discrepancy. Calculated as (Total Divergence Detected / Total Message Count) * 100. Update by recalculating the rate using the updated total divergence and total message count for a given timeframe.
Latest Message Timestamp
- Timestamp of the last received POI message to indicate the most recent activity. Capture the timestamp from the latest message. Can be labeled with deployment hash.
POI stakes for Deployment
- A reference to the stake for the POI that has the maximum on-chain stake backing it for a deployment.
Network connectivity

Aside from the active senders, record the average number of gossip peers to reflect connection with the network

Alternative considerations

More metrics

[Feat.Req] Auto-upgrade by version upgrade message

Problem statement

Indexers can get notifications for the version upgrades. To automate the process of deploying the subgraph on graph-node, we should add this workflow in the radio.

Expectation proposal

In the VersionUpgradeMessage handler, send POST request offchain of the upgraded subgraph hash.
- condition on auto_upgrade, availability of indexer_management_server_url, and validity of the message
Check indexingStatus from graph node to make sure the deployment is being synced
Add E2E test

Related to #6

GraphQL query method showing connected peer info

Problem statement
vec of WakuPeerData

Subgraph Versioning operational features

Problem statement

Motivated by forum post, Initial design doc

Radios can already send and receive subgraph versioning messages. Indexers running the radio with notifications should receive version upgrade message from verifiable subgraph owners. Currently we expect them to handle the message manually.

We can consider further handling of the messages and interactions between indexers and subgraph developers. This should close the gap to achieve seamless subgraph version upgrades. For developers, the subgraph can be published and upgraded with deterministic data availability and service quality. For indexers, the subgraph traffic becomes predictable during the upgrade and subgraphs can be pre-synced so query services do not get disrupted.

General workflow

sequenceDiagram
    actor DR as Subgraph A Owner
    participant GN as Graphcast Network
    participant SIR as Subscribed Indexer Radios
    participant IMS as Indexer Management Server
    SIR-->>GN: Periodic Public PoI messages (topic Subgraph A-0)
    Note over DR: Deploy new Subgraph A version 1
    DR-->>GN: Send version upgrade message (A-0)
    GN-->>SIR: Boardcast version upgrade message (A-0)
    activate SIR
    SIR->>SIR: Trusted source identity verification
    deactivate SIR
    opt Sender identity as Subgraph owner verified
        opt Auto sync management
            SIR->>IMS: POST request to initiate off-chain syncing A-1
            IMS->>SIR: Response from Graph Node (Success/Error)
            alt Success
                SIR->>SIR: Update topics
                SIR-->>GN: Subscribe to A-1
                SIR-->>GN: Broadcast public status endpoint (A-1)
            else Error
                SIR-->>GN: Broadcast error message (A-0)
            end
        end
        opt Notifications
            activate SIR
            SIR->>SIR: Notify events to human
            deactivate SIR
        end
    end
    opt Continuous Radio
        activate DR
        DR-->GN: Collect Public Status messages (A-1)
        Note over DR: Monitors for updatedness threshold 
        deactivate DR
    end
    DR-->SIR: Switch service from A-0 to A1, deprecate A-0
    SIR-->>GN: Unsubscribe to A-0

Expectation proposal

For indexers:

Automatically manage the indexer to start syncing the new subgraph deployment.
- Optional, adds dependency to indexer management server
- If the topic coverage is set to comprehensive, the radio will subscribe to the new deployment hash during the next content topic refresh iteration.
- If the topic coverage is set to on-chain, the radio will neither publish nor subscribe to this topic until the indexer operator allocates resources to the new version.
While syncing the new subgraph, indexers could send indexing_statuses messages to the new hash channel. This could create a competitive race to chainhead even though unverified and no rewards.

For subgraph developers:

Developers are not expected to run the radio continuously, but there could be a benefit to do so anyway
- Developers can subscribe to the subgraph deployments they own and monitor the public PoI consensus
- If the indexers send indexing_statuses messages, then subgraph owners can immediately triage and respond to the problem with well defined error information, before or even after the version gets published to the network.
- Developers can upgrade the service deterministically, ensuring stability.

Issue breakdown

Minimal

Add indexer_management_server_url: Option<String> to configs. (issue #23)
In VersionUpgradeMessage handler, optionally send POST request indexer offchain new_hash to the url. topics should be automatically updated according to topic_coverage (issue #24)
Ratelimit subgraph deployment

Can extend on the developer + indexer continuous interactions later on

Always persisting VersionUpgradeMessages as part of persisted_state (issue #25)

Alternative considerations

indexers to send DeploymentHealth or IndexingStatus messages to the new hash channel while syncing the new subgraph. or Indexers to send their public indexing status API for more active checking
developers can continue to run the radio until they receive one or more PublicPoiMessages, the radio notifies them that there are indexers synced up-to chainhead and then shuts down

Additional context
From old repo: graphops/poi-radio#211

[Feat.Req] Config input file and groupings

Problem statement

Currently the configurations are supplied into CLI by providing arguments on the same level. We can utilize input file (with either toml or yaml formats) and groups the arguments appropriately.

Expectation proposal

Add functionality to parse an input file if provided, and the inputs should be overwritten by the CLI arguments.
Group the arguments for better organization.
- GraphStack: graph_node_endpoint: String, indexer_address: String, registry_subgraph: String, network_subgraph: String, private_key: Option<String>, mnemonic: Option<String>, ...
- RadioInfrastructure: graphcast_network: GraphcastNetworkName, topics: Vec<String>, coverage: CoverageLevel, collect_message_duration: i64, slack_token: Option<String>, slack_channel: Option<String>, discord_webhook: Option<String>, telegram_token: Option<String>, telegram_chat_id: Option<i64>, metrics_host: String, metrics_port: Option<u16>, server_host: String, server_port: Option<u16>, persistence_file_path: Option<String>, radio_name: String, filter_protocol: Option<bool>, id_validation: IdentityValidation, topic_update_interval: u64, log_level: String, log_format: LogFormat, ...
- Waku: waku_host: Option<String>, waku_port: Option<String>, waku_node_key: Option<String>, waku_addr: Option<String>, boot_node_addresses: Vec<String>, waku_log_level: Option<String>, discv5_enrs: Option<Vec<String>>, discv5_port: Option<u16>,
- Replace checks like Parser::possible_values and Parser::min_values with Arg::value_parser
Utilize Waku struct when passing it into GraphcastAgent configurations, which first need an update in the SDK

Alternative considerations

just a nice-to-have

Add banner to Subgraph Studio to advertise one-shot CLI

Expectation proposal
We should add a banner to Subgraph Studio that lets subgraph developers know that they can use our one-shot CLI to send messages about when they plan to publish a new version of their subgraph(s). That banner should direct them to our docs where we lay out the steps they need to take to send a message.

Alternative considerations
We could think of ways to integrate the one-shot CLI into Subgraph Studio itself, but that would require enormous effort since we would need to either 1. somehow wrap the existing one-shot CLI in WASM and create bindings for JS (which includes bundling the Go and C compilers) or 2. create a JS clone of one-shot CLI (using js-waku)

Enable daily summary mode for notifications

Allow users to configure an optional env var NOTIFICATIONS_MODE, default being live where we keep the current notification behaviour (sending messages when a divergence is spotted during result comparison), second option being daily where once a day we send a notification with the current comparison results.

Open questions
Should we add another var - NOTIFICATIONS_TIME where users can set preferred (local or UTC) time for the daily notifications?

Motivated by #71

Split off one-shot CLI to individual repo

Resolved by #35

Optionally pass in dependencies

Problem statement
It is useful to be able to know which dependencies are attached to a given POI attestation. This would help us root cause divergence issues.

Expectation proposal
Imagine an Indexer could pass any number of supported dependencies as configuration. If this were a flag, it could take the form:

--dep <type>:<id>=<uri>

poi-radio --dep postgresql:primary=postgresql://host:5432 --dep chain:mainnet=http://geth:8545

For each provided flag, handler logic could be defined for the type that allows POI Radio to extract the version. For example, for the chain type, the POI radio could call web3_clientVersion at the provided uri to get the client version. A SQL statement could similarly be executed to get PostgreSQL version.

The resulting dependency information could be attached to POI messages.

Alternative considerations

Utilizes meta field in waku message
Expose a route on indexer-service/versions

Migration handler for JSON → DB store

Zeros appearing in comparison results panel

Describe the bug
Zeros appearing in the comparison results

Expected behavior
Zeros should not be appearing in the comparison results, because if a POI comparison has happened for a given subgraph on a given block, that should mean there is a local attestation for it

Screenshots

Persist store in DB instead of JSON

Problem statement
In order to save the Radio's state between reruns, currently we persist the state in a JSON file. That does the job of seamlessly restarting the Radio, but as Radio traffic, features, message types, collected data, etc, increases we will need a more scaleable approach to data persistence.

Expectation proposal
We should adopt an approach, similar to the one in Listener Radio, which uses sqlx.

By default, it could use sqlite and still keep all the data in a local file, but more advanced users should be able to provide a postgres endpoint and store the Radio's data there.

Open questions
Should we still keep JSON as an option?

Update Subgraph Radio config in Stakesquid stack

We need to update the current Subgraph Radio configuration in Stakesquid's stack (both testnet and mainnet).

swap poi-radio image with subgraph-radio
add Grafana dashboard config
support configurable env vars (with sensible defaults)
make the prometheus container scrape Subgraph Radio metrics
update docs relating to Subgraph Radio

[Feat.Req] Ratelimit subgraph upgrades

Problem statement

When subgraph versioning functionality is enabled by an indexer, the indexer will expose indexer management for subgraphs that is covered by auto_upgrade configuration. This means that a subgraph owner from that covered set of subgraphs can trigger automatic offchain syncing to the indexer. While there is a degree of trust to the subgraph owner and there is no direct attack vector, indexer is more vulnerable to random version upgrade messages.

There is ensured way to verify the new deployment hash is truly the upgrade of the existing subgraph.

Expectation proposal

Ratelimit upgrade for a subgraph deployment. Assuming that the persisted_state tracks version upgrade messages received within the last 24 hours (or something configurable), add a conditional to only send offchain sync request to indexer if the identifier is not seen in tracked version upgrade messages identifier and new_hash

Alternative considerations
Enforce a lower bound to subgraph total signal (This might not be so necessarily as the existing subgraph identifier is pre-determined by the local indexer

Tracking: Subgraph Radio Frontend

Introduction

As it operates, Subgraph Radio surfaces and handles a wide range of data, including:

Messages flowing through the Graphcast network on the subgraph-radio partition (based on the content topic)
Underlying Waku data (number of connected peers, boot nodes, other metadata)
Miscellaneous data used for the Radio's internal operations

The Radio handles this data and saves some of it to an in-memory store. That store gets persisted to a local JSON file at an interval, in order for the Radio to pick up where it left off between reruns. The data in the store is made available through a GraphQL endpoint that is exposed on the Radio's HTTP server. Aside from that, some of the data flowing through the Radio is being tracked with the help of Prometheus metrics and can be optionally exposed for scraping by a Prometheus server. The combination of the data in the store and the metrics serves as a base for the Grafana dashboard configuration. We can already see that the Radio has access to a pool of useful data that can be sampled at any time.

Subgraph Radio also defines multiple helper functions that are used to gather external data, needed for it's internal logic. This includes helpers for querying data from different GraphQL APIs (Core Network Subgraph, Registry Subgraph), as well as different parts of the Indexer stack - Graph Node endpoint(s), (optionally) Indexer Management Server, etc. These helpers can be used to fetch any data that might not be readily available in the store, but can be important especially for more complex and highly specific tasks.

Problem statement

All of the data mentioned above is currently scattered in different places and it's hard for users to find what they need, especially if they want to dig deeper and understand a specific event, like when and why a POI divergence has happened, which Indexers agree or disagree with the user's locally generated POI, how they're separated in groups, and more. All this is in the context of the POI cross-checking feature of Subgraph Radio, but a user might also be interested to know when a Subgraph Developer has signalled an intent to upgrade their Subgraph, who that Developer is, when was the last time they published a new version, etc, in the case of the Subgraph Upgrade Pre-sync feature. Looking forward the use cases for a richer interface to the Radio's data set will only grow larger.

Existing tooling

Subgraph Radio users can currently utilise the Grafana dashboard JSON provided in the repo, in order to set up their dashboard and monitor different panels based on Prometheus metrics exposed by the Radio. That provides a snapshot of the current state of the Radio and also some historical data of how the metrics have changed over time.

While the Grafana dashboard is certainly helpful for monitoring the state of the Radio, diving further into the Radio's data needs to happen through the HTTP server's GraphQL API. The GraphQL API provides a lot of useful query options, it has its limitations, after all it can only serve data that's readily available to the Radio (in other words - data that is saved locally).

To illustrate the issue more clearly - let's say a user monitors Subgraph Radio using the Grafana dashboard, they notice a POI divergence for a given subgraph on a given block. They then use the HTTP server's GraphQL endpoint to see all the senders that have sent a public POI that differs from the user's locally generated public POI. This is enough manual work as it is, but on top of that the user is still unable to identify those senders by their display name or Indexer URL, for instance. To do that, the user would have to send more request to the Network subgraph GraphQL endpoint.

Potential solution

Abstract

Subgraph Radio can serve a frontend application that utilises the in-memory store to visualise POI comparison results, as well as other useful data. This frontend should provide an intuitive interface for users to click-through items and dig deeper into relevant data.

Specific (implementation ideas)

Basic

We can start with a single view, similar to the Comparison Results panel from the Grafana dashboard (also drawing inspiration from Poifier's interface). It's important to note that the goal of this frontend is not to mimick/duplicate the panels in the dashboard, the dashboard will remain in use, which is why we don't need to copy or replicate the other panels.

This Comparison Results table view should immediately convey the following information:

Subgraphs being actively cross-checked
Blocks that the comparisons have happened at
The comparison results themselves
Number of matching and divergent Subgraphs
The consensus public POI for each Subgraph
Stake weight of each public POI
Sender groups (by public POI)

This table view should be customisable, users should be able to filter results by subgraph, block, comparison result, sender(s), etc. Applying more than one filter at a time should be supported as well. Users should also be able to click on items such as, for instance, Subgraph deployment hash, sender address, block number, to dig deeper and view all the information we can provide for that item (for instance if it's a block number - we should display all the Subgraphs that were compared at that block number, if it's a Subgraph - all the comparison results we've saved for that Subgraph, all the senders that have also attested public POIs for it, all the blocks we've compared it on, etc).

All of this filtering and partitioning of data should use client-side routing.

Advanced

After the basic tabular view is in place, we can start supporting more advanced operations, such as:

Sending graphql requests to the Radio's internal GraphQL endpoint, for more custom and/or complex queries if needed
Sending graphql requests to the Core Network and Registry Subgraphs to get more detailed sender data
Interfacing with other parts of the Indexer stack like Graph Node endpoint(s), Indexer Agent, etc
Sending messages
Changing the Radio's configurations on the fly

Implementation Issues

Replicate Comparison Results view in Subgraph Radio frontend

Problem statement
As a first step to creating a Subgraph Radio frontend view, we should replicate the Comparison Results panel from the current Grafana dashboard configuration:

Expectation proposal
A yew.rs app within Subgraph Radio should be bootstrapped with a basic view - one table showing the Comparison Results. None of this even needs to be interactive for now, we just need to visualise it.

The data can be queried from Subgraph Radio's HTTP server's GraphQL endpoint.

Alternative considerations
We could also kick-off the frontend with more/less scope, but this fees like a nice and concrete first step.

[Feat.Req] Release support for Polygon PoS subgraphs

Problem statement

Original report by @stake-machine.

Subgraph Radio currently throwing this warning:

  2023-08-12T05:45:26.920034Z  WARN graphcast_sdk: err_msg: "Subgraph is indexing an unsupported network unknown, please report an issue on https://github.com/graphops/graphcast-rs"
    at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/graphcast-sdk-0.4.0/src/lib.rs:137

Dependent on graphops/graphcast-sdk#266

Boot node connectivity issues

stake-machine is reporting seeing these errors when running a mainnet Subgraph Radio:

2023-07-18T20:31:25.113Z ERROR gowaku.node2.lightpush lightpush/waku_lightpush.go:163 creating stream to peer {"peer": "16Uiu2HAmAbxWtHhJBMQ37X3qEDCKMhFVzgqnHkwwrkJAkAryRpdM", "error": "failed to dial 16Uiu2HAmAbxWtHhJBMQ37X3qEDCKMhFVzgqnHkwwrkJAkAryRpdM:\n * [/ip4/95.217.154.162/tcp/31900] dial backoff"}
2023-07-18T20:34:17.517Z ERROR gowaku.node2.filter legacy_filter/waku_filter.go:360 requesting subscription {"fullNode": false, "error": "failed to dial 16Uiu2HAm5uqfdh7z2YTEps2MhvsXTk3uvSHZ9AtVkzipZZGbKJEL:\n * [/ip4/5.78.76.185/tcp/31900] dial tcp4 5.78.76.185:31900: connect: connection refused"}

He can't seem to connect to our boot node(s) and therefore is unable to send/receive messages

Improve one-shot CLI

Problem statement
The current one-shot CLI is perfect for sending one-off subgraph versioning update messages (as well as any message really, with a few tweaks), but we need it to be more user-friendly. This means possibly changing its name and adding a script that runs it within Docker, eliminating the need for users to install all pre-requisites like Go, Clang, etc. We also need to extract it to a separate repo.

Expectation proposal
Users (subgraph devs) should ideally be able to pull a Docker image (GHCR package) and run it with custom arguments.

Alternative considerations
We could also skip this and instead wait for graphcast-web to be functional before recommending the subgraph versioning feature to subgraph developers, but that will take a lot longer and getting early feedback is vital.

[Feat.Req] Add indexer management server url

Problem statement

Currently the radio only connects to the Graph node, but doesn't connect to the indexer components. For any management of indexing status, the radio should go through indexer's server for consistency in management.

Expectation proposal

Add indexer_management_server_url: Option<String> to radio configs.
Add auto_upgrade: Coverage (default: comprehensive) to radio configs.
- Add none as a variant to the Coverage enum
Add validation to the url upon startup if url is provided.

Related to #6

Invalid PublicPoiMessage tried as UpgradeIntentMessage

Problem statement
For each deployment and sender, we can expect 2 PublicPoiMessages to fail to validate due to first time sender and first nonce, and subsequently tried as a UpgradeIntentMessage. This should not cause issues since the validity will fail, but might be misleading for the users

Expectation proposal
When parsing version upgrade message, check the Error that caused PublicPoiMessage to fail
Skip over the two ppoi message that we can deterministically expect invalid: first time sender and nonce

We can update the validity check to throw away just the first message by saving sender and nonce before erroring out

Bug: Far too many notifications

Describe the bug
Bug report from Sergey at P2P.org:

We have enabled discord notifications today. We are getting 16-20 messages per minute from the subgraph-radio right now and already got around 25 notifications per each subgraph. We are running v0.1.6. Is there a way to fine-tune the notifications?

DeploymentHealthMessage user story

Problem statement
Indexers could gather the Deployment health status from gossip peers for more detailed information when they failed to reach consensus from the Public PoI messages.

Spike first

Did not receive any immediate interest from IOH
identify the benefit for the users to communicate this type of message
identify the value add aside from indexer's already exposed public status API

Implementation Expectations

New DeploymentHealthMessage struct should contain fields

deployment: String
health: Health // Enum
errors: Vec<SubgraphError>

and each SubgraphError is a either nonFatal or fatal error returned by the graph node status endpoint

error_type: ErrorType // Enum
message: String
block: Optional<BlockPointer>
handler: Optional<String>
deterministic: Bool

Construct DeploymentHealthMessage from indexing statuses query
Comparison mechanism for DeploymentHealthMessage
- perhaps event driven instead of periodic messages like nPOI
- Send notifications for major health discrepancy
- Explore other use cases like unicast channels for automatic debug info sharing
Update local_attestations struct to consist fields for deployment health (or directly to PersistedState)

Alternative considerations
Generalize message handlers

add waku bootnodes ENR in Subgraph Radio bootstrapping nodes

Add waku ENR in the Radio's ENRs
Make sure "SNI protocol" issue doesn't cause a panic in the Radio - waku-org/waku-rust-bindings#66

Feedback, Suggestions and Questions

Please share your feedback, feature suggestions, questions, and everything in between

Add Subgraph Radio support to Launchpad

Problem statement
There's manual work required to add the Subgraph Radio to Launchpad

Expectation proposal
Submit a pull request to launchpad to include the Subgraph Radio as part of the default indexing stack
Add Helm chart to cluster
Doc updates that are required

Alternative considerations
N/A

Additional context
N/A

Warp syncing subgraph data

Problem statement

Indexers should be able to provide warp sync (such as subgraph snapshot and substream flatfiles) as a type of data service, whether it is the latest or historical data. This service allows the requesting node to access the data without indexing themselves. The requesting indexer can have immediate access to recent (or historically ranged) blocks' subgraph data and serve queries for that chunk of data, and the indexer can optionally backfill for the earlier blocks.

To be clear, it is NOT suitable to warp sync over gossipsub and it is NOT suitable to make payment agreements over gossipsub, as it is not efficient nor has security guarantees. However, it is possible to perform a handshake pre-check through gossips before any on-chain activities.

Expectation proposal

Spike then discuss
Clearly defined usage and motivation for participants
Identify security threats and performance concerns for the new/existing network participants
Value matching - how to match data requester with someone who is willing to share the data on the network; Using posted price from the service side is sufficient to start with, but it is not the most efficient
FTP library for DB snapshot: https://github.com/veeso/suppaftp
- Potentially request feature from Graph-node to support snapshot and ingest commands for graphman CLI (is this generalizable for substreams and sql-as-a-service?)
- Direct message relay channel with asymmetric encryption, handshake messages with key sharing

Brief description of what we can expect for the general process

sequenceDiagram
    participant FTPS as Direct FTPS Channel
    participant Client as New Indexer to Subgraph A
    participant Network as Graphcast Network
    participant Server as Existing Indexer Radio (Server)
    participant Blockchain as Blockchain
    Server-->Client: Monitoring channel A
    Server-->>Network: Boardcast Public PoI messages
    Client-->>Network: Boardcast Warp Sync request
    Note right of Client: A at recent block X, set price
    opt Willing and able to accept request
        Server-->>Network: Acknowledge message with PoI
    end
    Client-->Network: Collect Acknowledgement msg
    Client->>Client: Select a set of acceptable indexers
    Client->>FTPS: Expose FTPS port
    Client->>Blockchain: Boardcast request on-chain with Time-Lock, deposit sync reward
    Blockchain->>Server: Verify client's on-chain agreement
    alt Verfied agreement
        Server->>Server: Take database snapshot with verification
        Server->>FTPS: Establish FTPS channel
        FTPS->>Server: Handshake
        Server->>FTPS: Send snapshot over FTPS channel
        FTPS->>Client: Receive snapshot
    end
    Client->>Client: Ingest Snapshot into Local Database
    alt Invalid Snapshot
        Client->>Blockchain: Proof for failure
        Note over Blockchain: Time lock expired
        Blockchain->>Client: Deposit (sync reward) + % Indexer collateral
    else Verified Snapshot
        Blockchain->>Server: Sync reward + collateral
    end
    Client-->>Network: Boardcast Public PoI messages

Gossip Handshake

Requesting node A is starting to index a subgraph, they broadcasts a warp sync request (synchronize packet) for the desired chunk of subgraph data, which could be either latest or historic data. The request can include a price in which they would like to pay for the target data.
Receivers check internally for data availability and generate a provable response and send back to Node A (synchronize-acknowledge packet). They might not need to take a snapshot at this point, or they can take a snapshot and compress it to a hash to include in the respond for verification. Considerably, the receiver could also provide proofOfIndexing hash or perform a random query to the dataset in question.
If Node A gets an response and was able to verify the message, node A should respond with an acknowledgement (acknowledge packet). If Node A gets a few competing responses, select a random one or the fastest responding one to send the acknowledgement.
There could be several more rounds of messages for negotiation of security parameters, the exchange of cryptographic keys, server and client addresses, and the verification of digital certificates. There's no guarantees for the subsequent transmissions. There should be (asymmetric) encryptions and authentications to secure the communications.

On-chain Verifiability

Once the handshake finishes successfully, both sides should opt-in to an on-chain agreement for payment transfer.

After ensuring the opt-in, the responding channel prepares the requested data. Assume on-chain agreement has specifications such as to the subgraph deployment, acceptable range for the block, file format, cryptographic scheme. It is reasonable for the requester to deposit the payment and for the responder to collateralize at the point of opt-in, and reasonable to include a time-lock that allows disputes or automatically transfer the payment and collateral once the lock expires.

File Transfer

Suppose there's a generalizable way to export and import the requested data,

For example, graph-node has access to the database and intelligent to the DB structure, graphman would be the best interface for "snapshot" and "ingest" features that respectively reads and write a snapshot from a database.
To warp sync a substream, we should look into a substream flatfiles are written and how should it be exported and imported.

Requesting client or the responding service make a direct connection using FTP, FTPS (secure), or GridFTP (parallel data streams).

Service grabs the snapshot from its graph node access, accepts the connection request, then send over the snapshot file.
The client receives the file, verifies the file by internal consistency, state consistency (from any pre-existing states) and block requirements. After verification, the data can be ingested into the database.

Alternative considerations

Potential resources
OpenEth doc
Pokadot doc
Zcash sync library doc

Additional context

libp2p imposes msg size limitation: https://github.com/status-im/nim-libp2p/blob/1681197d67e695e6adccccf56ce0e2d586317d67/libp2p/muxers/mplex/coder.nim#L40
But there's auto-splitting of messages and no limitations on the number of splits

[Feat.Req] Add peer/network metrics

To track #49

Investigate Grafana behaviour

Comparison result ratios not working properly

Describe the bug
Given this query

query{
   comparisonResults(identifier: "QmdemKB9KFeuDcCxRn2iBuRE35LSZ63vDBCdKaBtaw2Qm9") {
    deployment
    blockNumber
    resultType
    localAttestation {
      ppoi
    }
    attestations {
      senders
      stakeWeight
      ppoi
    }
  }
}

the HTTP server's GraphQL endpoint returns:

query{
   comparisonResults(identifier: "QmdemKB9KFeuDcCxRn2iBuRE35LSZ63vDBCdKaBtaw2Qm9") {
    deployment
    blockNumber
    resultType
    localAttestation {
      ppoi
    }
    attestations {
      senders
      stakeWeight
      ppoi
    }
  }
}

But the panel in Grafana shows a count ratio 2:0* for that subgraph hash on that block. The stake ratio also shows a 0* local stake. This happens for all the divergent subgraph, as well as the ones where there's no remote ppoi to compare (only a local one).

Expected behaviour
The count ratio should be 2:1*

Validate indexer address input configuration to ensure its parsed as a string and not a number

Stricter validation for INDEXER_ADDRESS

Off the back of #62 , we should implement stricter checks for the INDEXER_ADDRESS variable. The Radio should fail to start if the provided PRIVATE_KEY/MNEMONIC don't resolve to the provided INDEXER_ADDRESS, using either the Graphcast Registry or the Network Subgraph.

Before that check, we should do a minimal check to see if the provided INDEXER_ADDRESS is saved by the Radio as a valid eth address, because in some cases (for instance missing quotes in a .yml file config) the address format can be malformed.

Wrong address returned for local sender identity

Describe the bug
A wrong addressed is being checked for local sender (only seems to happen sometimes, this is an intermittent log). It doesn't seem to disturb the Radio's operation, it's just misleading to read.

Expected behavior
The address should be the actual address of the Radio's corresponding Indexer address and stake.

Logs

  2023-08-16T14:44:31.917038Z  INFO subgraph_radio::config: Initializing radio operator for indexer identity, my_address: "310181730876301567336374126543954007774436252014", my_stake: 0.0

graphops / subgraph-radio Goto Github PK

subgraph-radio's People

Contributors

Stargazers

subgraph-radio's Issues

Problem statement

Triggering Creation of POI Messages

Sending the POI Message

Receiving and Processing

Cross-checking Calculation

Persisted State of Summary for POI Messages

Expectation proposal

Alternative considerations

Introduction

Problem statement

Existing tooling

Potential solution

Abstract

Specific (implementation ideas)

Basic

Advanced

Implementation Issues

Recommend Projects

Recommend Topics

Recommend Org