Giter Site home page Giter Site logo

paritytech / substrate-telemetry Goto Github PK

View Code? Open in Web Editor NEW
303.0 303.0 206.0 96.27 MB

Polkadot Telemetry service

License: GNU General Public License v3.0

TypeScript 29.85% HTML 0.10% CSS 4.74% Dockerfile 0.31% Rust 64.28% Shell 0.12% JavaScript 0.60%

substrate-telemetry's Introduction

Frontend Backend

Polkadot Telemetry

Overview

This repository contains the backend ingestion server for Substrate Telemetry (which itself is comprised of two binaries; telemetry_shard and telemetry_core) as well as the Frontend you typically see running at telemetry.polkadot.io.

The backend is a Rust project and the frontend is React/Typescript project.

Substrate based nodes can be connected to an arbitrary Telemetry backend using the --telemetry-url (see below for more complete instructions on how to get this up and running).

Messages

Depending on the configured verbosity, substrate nodes will send different types of messages to the Telemetry server. Verbosity level 0 is sufficient to provide the Telemetry server with almost all of the node information needed. Using this verbosity level will lead to the substrate node sending the following message types to Telemetry:

system.connected
system.interval
block.import
notify.finalized

Increasing the verbosity level to 1 will lead to additional "consensus info" messages being sent, one of which has the identifier:

afg.authority_set

Which we use to populate the "validator address" field if applicable.

Increasing the verbosity level beyond 1 is unnecessary, and will not result in any additional messages that Telemetry can handle (but other metric gathering systems might find them useful).

Getting Started

To run the backend, you will need cargo to build the binary. We recommend using rustup.

To run the frontend make sure to grab the latest stable version of node and install dependencies before doing anything:

nvm install stable
(cd frontend && npm install)

Terminal 1 & 2 - Backend

Build the backend binaries by running the following:

cd backend
cargo build --release

And then, in two different terminals, run:

./target/release/telemetry_core

and

./target/release/telemetry_shard

Use --help on either binary to see the available options.

By default, telemetry_core will listen on 127.0.0.1:8000, and telemetry_shard will listen on 127.0.0.1:8001, and expect the telemetry_core to be listening on its default address. To listen on different addresses, use the --listen option on either binary, for example --listen 0.0.0.0:8000. The telemetry_shard also needs to be told where the core is, so if the core is configured with --listen 127.0.0.1:9090, remember to pass --core 127.0.0.1:9090 to the shard, too.

Terminal 3 - Frontend

cd frontend
npm install
npm run start

Once this is running, you'll be able to navigate to http://localhost:3000 to view the UI.

Terminal 4 - Node

Follow up installation instructions from the Polkadot repo

If you started the backend binaries with their default arguments, you can connect a node to the shard by running:

polkadot --dev --telemetry-url 'ws://localhost:8001/submit 0'

Note: The "0" at the end of the URL is a verbosity level, and not part of the URL itself. Verbosity levels range from 0-9, with 0 denoting the lowest verbosity. The URL and this verbosity level are parts of a single argument and must therefore be surrounded in quotes (as seen above) in order to be treated as such by your shell.

Docker

Building images

To build the backend docker image, navigate into the backend folder of this repository and run:

docker build -t parity/substrate-telemetry-backend .

The backend image contains both the telemetry_core and telemetry_shard binaries.

To build the frontend docker image, navigate into the frontend folder and run:

docker build -t parity/substrate-telemetry-frontend .

Run the backend and frontend using docker-compose

The easiest way to run the backend and frontend images is to use docker-compose. To do this, run docker-compose up in the root of this repository to build and run the images. Once running, you can view the UI by navigating a browser to http://localhost:3000.

To connect a substrate node and have it send telemetry to this running instance, you have to tell it where to send telemetry by appending the argument --telemetry-url 'ws://localhost:8001/submit 0' (see "Terminal 4 - Node" above).

Run the backend and frontend using docker

If you'd like to get things running manually using Docker, you can do the following. This assumes that you've built the images as per the above, and have two images named parity/substrate-telemetry-backend and parity/substrate-telemetry-frontend.

  1. Create a new shared network so that the various containers can communicate with eachother:

    docker network create telemetry
    
  2. Start up the backend core process. We expose port 8000 so that a UI running in a host browser can connect to the /feed endpoint.

    docker run --rm -it --network=telemetry \
        --name backend-core \
        -p 8000:8000 \
        --read-only \
        parity/substrate-telemetry-backend \
        telemetry_core -l 0.0.0.0:8000
    
  3. In another terminal, start up the backend shard process. We tell it where it can reach the core to send messages (possible because it has been started on the same network), and we listen on and expose port 8001 so that nodes running in the host can connect and send telemetry to it.

    docker run --rm -it --network=telemetry \
        --name backend-shard \
        -p 8001:8001 \
        --read-only \
        parity/substrate-telemetry-backend \
        telemetry_shard -l 0.0.0.0:8001 -c http://backend-core:8000/shard_submit
    
  4. In another terminal, start up the frontend server. We pass a SUBSTRATE_TELEMETRY_URL env var to tell the UI how to connect to the core process to receive telemetry. This is relative to the host machine, since that is where the browser and UI will be running.

    docker run --rm -it --network=telemetry \
        --name frontend \
        -p 3000:8000 \
        -e SUBSTRATE_TELEMETRY_URL=ws://localhost:8000/feed \
        parity/substrate-telemetry-frontend
    

    NOTE: Here we used SUBSTRATE_TELEMETRY_URL=ws://localhost:8000/feed. This will work if you test with everything running locally on your machine but NOT if your backend runs on a remote server. Keep in mind that the frontend docker image is serving a static site running your browser. The SUBSTRATE_TELEMETRY_URL is the WebSocket url that your browser will use to reach the backend. Say your backend runs on a remote server at foo.example.com, you will need to set the IP/url accordingly in SUBSTRATE_TELEMETRY_URL (in this case, to ws://foo.example.com/feed).

    NOTE: Running the frontend container in read-only mode reduces attack surface that could be used to exploit a container. It requires however a little more effort and mounting additional volumes as shown below:

    docker run --rm -it -p 80:8000 --name frontend \
       -e SUBSTRATE_TELEMETRY_URL=ws://localhost:8000/feed \
       --tmpfs /var/cache/nginx:uid=101,gid=101 \
       --tmpfs /var/run:uid=101,gid=101 \
       --tmpfs /app/tmp:uid=101,gid=101 \
       --read-only \
       parity/substrate-telemetry-frontend
    

With these running, you'll be able to navigate to http://localhost:3000 to view the UI. If you'd like to connect a node and have it send telemetry to your running shard, you can run the following:

docker run --rm -it --network=telemetry \
  --name substrate \
  -p 9944:9944 \
  chevdor/substrate \
  substrate --dev --telemetry-url 'ws://backend-shard:8001/submit 0'

You should now see your node showing up in your local telemetry frontend:

image

Deployment

This section covers the internal deployment of Substrate Telemetry to our staging and live environments.

Deployment to staging

Every time new code is merged to master, a new version of telemetry will be automatically built and deployed to our staging environment, so there is nothing that you need to do. Roughly what will happen is:

Deployment to live

Once we're happy with things in staging, we can do a deployment to live as follows:

  1. Ensure that the PRs you'd like to deploy are merged to master.
  2. Tag the commit on master that you'd like to deploy with the form v1.0-a1b2c3d.
    • The version number (1.0 here) should just be incremented from whatever the latest version found using git tag is. We don't use semantic versioning or anything like that; this is just a dumb "increment version number" approach so that we can see clearly what we've deployed to live and in what order.
    • The suffix is a short git commit hash (which can be generated with git rev-parse --short HEAD), just so that it's really easy to relate the built docker images back to the corresponding code.
  3. Pushing the tag (eg git push origin v1.0-a1b2c3d) will kick off the deployment process, which in this case will also lead to new docker images being built. You can view the progress at https://gitlab.parity.io/parity/substrate-telemetry/-/pipelines.
  4. Once a deployment to staging has been successful, run whatever tests you need against the staging deployment to convince yourself that you're happy with it.
  5. Visit the CI/CD pipelines page again (URl above) and click the "play" button on the "Deploy-production" stage to perform the deployment to live.
  6. Confirm that things are working once the deployment has finished by visiting https://telemetry.polkadot.io/.

Rolling back to a previous deployment

If something goes wrong running the above, we can roll back the deployment to live as follows.

  1. Decide what image tag you'd like to roll back to. Go to https://hub.docker.com/r/parity/substrate-telemetry-backend/tags?page=1&ordering=last_updated and have a look at the available tags (eg v1.0-a1b2c3d) to select one you'd like. You can cross reference this with the tags available using git tag in the repository to help see which tags correspond to which code changes.
  2. Navigate to https://gitlab.parity.io/parity/substrate-telemetry/-/pipelines/new.
  3. Add a variable called FORCE_DEPLOY with the value true.
  4. Add a variable called FORCE_DOCKER_TAG with a value corresponding to the tag you want to deploy, eg v1.0-a1b2c3d. Images must exist already for this tag.
  5. Hit 'Run Pipeline'. As above, a deployment to staging will be carried out, and if you're happy with that, you can hit the "play" button on the "Deploy-production" stage to perform the deployment to live.
  6. Confirm that things are working once the deployment has finished by visiting https://telemetry.polkadot.io/.

substrate-telemetry's People

Contributors

akru avatar alvicsam avatar arshamteymouri avatar aurevoirxavier avatar bulatsaif avatar chevdor avatar cmichi avatar cybai avatar dependabot[bot] avatar dvdplm avatar gilescope avatar gterzian avatar jacogr avatar jsdw avatar lexnv avatar lovelaced avatar ltfschoen avatar lurpis avatar maciejhirsz avatar niklasad1 avatar pmespresso avatar q9f avatar romanb avatar s3krit avatar sergejparity avatar simonljs avatar stefashkaa avatar tadeohepperle avatar woss avatar xanewok avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

substrate-telemetry's Issues

Allow defining a list of watched nodes

As the list grow, telemetry is nice to monitor some node of interest.
As a user, I would like to define some 'watched' nodes that stay pinned on top of the list

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on all branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please delete the greenkeeper/initial branch in this repository, and then remove and re-add this repository to the Greenkeeper App’s white list on Github. You'll find this list on your repo or organization’s settings page, under Installed GitHub Apps.

Implementation versions for ChainX

When I view the ChainX tabs in Telemetry, the "Implementation" version tabs have values as shown in below screenshots (i.e. ? or ? CARGO_PKG....):

screen shot 2019-02-03 at 1 08 40 pm

screen shot 2019-02-03 at 1 08 14 pm

Group nodes by origin

New import format:

{"msg":"block.import","level":"INFO","ts":"2018-07-13T17:22:05.789520968+02:00","origin":"NetworkInitialSync","best":"a7db922a8f98d182fedc3bb4649fb96cf7e1825ef300137009624bfa5c6d4fed","height":20784}

We should use origin to sort nodes. Additionally, if origin is NetworkInitialSync the node should be held in a limbo somewhere...

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on all branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please delete the greenkeeper/initial branch in this repository, and then remove and re-add this repository to the Greenkeeper App’s white list on Github. You'll find this list on your repo or organization’s settings page, under Installed GitHub Apps.

Fix FE tests

We've missed on CI after moving the repo to paritytech org. At some point during that migration jest on FE stopped working, need to fix that.

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on all branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please delete the greenkeeper/initial branch in this repository, and then remove and re-add this repository to the Greenkeeper App’s white list on Github. You'll find this list on your repo or organization’s settings page, under Installed GitHub Apps.

Downgrade best block on chain if node that produced it left

When removing a Node, check if that node is at highest block, if so reset the best block down to highest Node after removal.

Pitfall to avoid: this can be O(n) for every removal, the search through the list should terminate on first node that matches current best block.

Display finalized block number

Would be useful to see how GRANDPA is progressing.
Longer-term it would be awesome to have telemetry/visualization of the GRANDPA voting.

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on all branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please delete the greenkeeper/initial branch in this repository, and then remove and re-add this repository to the Greenkeeper App’s white list on Github. You'll find this list on your repo or organization’s settings page, under Installed GitHub Apps.

Frontend does not build

I tried following the README using node 10 and 9.6 for #29.
No problem with the backend but starting the frontend fails:

$ cd packages/frontend/
12:15 will@KI-2773 frontend (will-docker)*$ yarn build
yarn run v1.7.0
$ react-scripts-ts build
Creating an optimized production build...
Starting type checking and linting service...
Using 1 worker with 2048MB memory limit
ts-loader: Using [email protected] and /Users/will/.../dotstats/packages/frontend/tsconfig.prod.json
Failed to compile.

/Users/will/.../dotstats/node_modules/@types/react-dom/index.d.ts
(19,39): Cannot use namespace 'ReactInstance' as a type.


error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Connections aren't properly pruned?

From live server logs:

[System] 6788 open telemetry connections; 4768 open feed connections

There is no way we have 6788 connected nodes and I doubt we have that many people looking at telemetry at the same time.

Not particularly critical as the server is still handling traffic well with resources being well in control (360mb RAM, CPU use peaks at 4%, mostly 0)

A punished validator node still shows as a validator node in telemetry

Scenario

  • Create a full node using docker.
  • Create an account.
  • Restart the full node as a validator node with the key.
  • Stake the account.
  • Wait till it becomes a validator.

Observation

  • After sometimes check the validators section on staking overview.
  • You may find the newly added validator is no longer presents.
  • Telemetry will still shows the node as a validator node.

Note

Newly added validator node kicked out from the network after 15 or 30 minutes.
This happens most of the time

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on all branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please delete the greenkeeper/initial branch in this repository, and then remove and re-add this repository to the Greenkeeper App’s white list on Github. You'll find this list on your repo or organization’s settings page, under Installed GitHub Apps.

"% CPU Use" column displays 100x higher value than "% CPU" shown in macOS Activity Monitor and Memory Use is about 1.4x higher

On macOS 10.13.6, with rustc --version rustc 1.29.0 (aa3ca1994 2018-09-11) and ./target/debug/polkadot --version polkadot 0.3.0-d12426b6-x86_64-macos

I started syncing my node with:

./target/debug/polkadot --validator \
  --chain krummelanke \
  --execution both \
  --name "AUSSIE STAKE! 🔥🔥🔥" \
  --port 30333 \
  --pruning 256 \
  --rpc-port 9933 \
  --telemetry-url ws://telemetry.polkadot.io:1024 \
  --ws-port 9944

Then when I went to https://telemetry.polkadot.io/# the "% CPU Use" column displayed ~19000, whereas Activity Monitor on my macOS said the process name "polkadot" was only using ~190, so it appears to be showing a value 100x larger than it should be. See screenshot below.

CPU Use (Polkadot Dotstats vs macOS Activity Monitor)
screen shot 2018-09-23 at 7 33 17 pm

Memory Use (Polkadot Dotstats vs macOS Activity Monitor)
screen shot 2018-09-23 at 7 41 18 pm

Memory Use is always shown about 1.4x higher on Polkadot Dotstats vs macOS Activity Monitor

Move all text filter logic to the Filter component

To further reduce the size of Chain component and make the interactions between components more isolated.

We could (should?) also put the Filter directly into List and Map instead of having it hanging in Chain.

Errors building dotstats common package

I've modified the common package to export another utility function, but I need to build the common package in order for other files to import from it.

When I try and build the common package by running yarn build:common, I get the following output:

$ yarn build:common
yarn run v1.7.0
$ tsc -p packages/common
error TS5055: Cannot write file '/Users/Me/code/blockchain/clones/paritytech/dotstats/packages/common/build/feed.d.ts' because it would overwrite input file.

error TS5055: Cannot write file '/Users/Me/code/blockchain/clones/paritytech/dotstats/packages/common/build/helpers.d.ts' because it would overwrite input file.

error TS5055: Cannot write file '/Users/Me/code/blockchain/clones/paritytech/dotstats/packages/common/build/id.d.ts' because it would overwrite input file.

error TS5055: Cannot write file '/Users/Me/code/blockchain/clones/paritytech/dotstats/packages/common/build/index.d.ts' because it would overwrite input file.

error TS5055: Cannot write file '/Users/Me/code/blockchain/clones/paritytech/dotstats/packages/common/build/types.d.ts' because it would overwrite input file.

error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

If I remove "declaration": true from packages/common/tsconfig.json no errors are produced but I still can't import the new exported functions. I also tried adding the following to the tsconfig.json (if this makes any sense) but it still produces the errors:

"exclude": [
  "build/**/*.d.ts"
]

Input validation

  • Limit node name to 64 utf16 codepoints, or unicode characters.
  • Make sure all values make sense (hashes are actually hashes, number values are correct numbers).
  • Have some heuristics to kick misbehaving nodes.

Better `timeDiff`

The server sends a timestamp to the client every 10 seconds, the difference between local time and server time is then used to provide better values for block time events (particularly the Ns ago).

Instead of re-calculating the value based on most recent server time, given enough data has been sent to the client, the client should store up to 10 most recent timeDiffs, filter extremes (e.g.: ignore 2 highest and 2 lowest values) and then use the average of the remaining values.

Average block time is always null and so is not appearing in UI

If I run a node and its generating blocks, it sends message.payload to the front-end in https://github.com/polkadot-js/dotstats/blob/master/packages/frontend/src/Connection.ts#L102, but in BestBlock the message.payload is [820468, 1532502970369, null], where the last element in the array is the averageBlockTime that's sent from the backend in https://github.com/polkadot-js/dotstats/blob/master/packages/backend/src/Chain.ts#L59, whose default value is null. Should it be changing?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.