cowprotocol / watch-tower Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 1.0 1.25 MB

Conducting the music of Composable CoWs 🎶🐮

License: GNU General Public License v3.0

TypeScript 99.32% Shell 0.14% Dockerfile 0.15% JavaScript 0.39%

cow-protocol cowswap erc1271 erc20 ethersjs

watch-tower's Introduction

Watch-Tower for Programmatic Orders 🐄🤖

Overview

The programmatic order framework requires a watch-tower to monitor the blockchain for new orders, and to post them to the CoW Protocol OrderBook API. The watch-tower is a standalone application that can be run locally as a script for development, or deployed as a docker container to a server, or dappnode.

Deployment

If running your own watch-tower instance, you will need the following:

An RPC node connected to the Ethereum mainnet, Arbitrum One, Gnosis Chain, or Sepolia.
Internet access to the CoW Protocol OrderBook API.

CAUTION: Conditional order types may consume considerable RPC calls.

NOTE: deployment-block refers to the block number at which the ComposableCoW contract was deployed to the respective chain. This is used to optimise the watch-tower by only fetching events from the blockchain after this block number. Refer to Deployed Contracts for the respective chains.

NOTE: The pageSize option is used to specify the number of blocks to fetch from the blockchain when querying historical events (eth_getLogs). The default is 5000, which is the maximum number of blocks that can be fetched in a single request from Infura. If you are running the watch-tower against your own RPC, you may want to set this to 0 to fetch all blocks in one request, as opposed to paging requests.

Docker

The preferred method of deployment is using docker. The watch-tower is available as a docker image on GitHub. The tags available are:

latest - the latest version of the watch-tower.
vX.Y.Z - the version of the watch-tower.
main - the latest version of the watch-tower on the main branch.
pr-<PR_NUMBER> - the latest version of the watch-tower on the PR.

As an example, to run the latest version of the watch-tower via docker:

docker run --rm -it \
  -v "$(pwd)/config.json.example:/config.json" \
  ghcr.io/cowprotocol/watch-tower:latest \
  run \
  --config-path /config.json

NOTE: See the example config.json.example for an example configuration file.

DAppNode

For DAppNode, the watch-tower is available as a package. This package is held in a separate repository.

Running locally

Requirements

node (>= v16.18.0)
yarn

CLI

# Install dependencies
yarn
# Run watch-tower
yarn cli run --config-path ./config.json

Architecture

Events

The watch-tower monitors the following events:

ConditionalOrderCreated - emitted when a single new conditional order is created.
MerkleRootSet - emitted when a new merkle root (ie. n conditional orders) is set for a safe.

When a new event is discovered, the watch-tower will:

Fetch the conditional order(s) from the blockchain.
Post the discrete order(s) to the CoW Protocol OrderBook API.

Storage (registry)

The watch-tower stores the following state:

All owners (ie. safes) that have created at least one conditional order.
All conditional orders by safe that have not expired or been cancelled.

As orders expire, or are cancelled, they are removed from the registry to conserve storage space.

Database

The chosen architecture for the storage is a NoSQL (key-value) store. The watch-tower uses the following:

level
Default location: $PWD/database

LevelDB is chosen it it provides ACID guarantees, and is a simple key-value store. The watch-tower uses the level package to provide a simple interface to the database. All writes are batched, and if a write fails, the watch-tower will throw an error and exit. On restarting, the watch-tower will attempt to re-process from the last block that was successfully indexed, resulting in the database becoming eventually consistent with the blockchain.

Schema

The following keys are used:

LAST_PROCESSED_BLOCK - the last block (number, timestamp, and hash) that was processed by the watch-tower.
CONDITIONAL_ORDER_REGISTRY - the registry of conditional orders by safe.
CONDITIONAL_ORDER_REGISTRY_VERSION - the version of the registry. This is used to migrate the registry when the schema changes.
LAST_NOTIFIED_ERROR - the last time an error was notified via Slack. This is used to prevent spamming the slack channel.

Logging

To control logging level, you can set the LOG_LEVEL environment variable with one of the following values: TRACE, DEBUG, INFO, WARN, ERROR:

LOG_LEVEL=WARN

Additionally, you can enable module specific logging by specifying the log level for the module name:

# Enable logging for an specific module (chainContext in this case)
LOG_LEVEL=chainContext=INFO

# Of-course, you can provide the root log level, and the override at the same time
#   - All loggers will have WARN level
#   - Except the "chainContext" which will have INFO level
LOG_LEVEL=WARN,chainContext=INFO

You can specify more than one overrides

LOG_LEVEL=chainContext=INFO,_placeOrder=TRACE

The module definition is actually a regex pattern, so you can make more complex definitions:

# Match a logger using a pattern
#  Matches: chainContext:processBlock:100:30212964
#  Matches: chainContext:processBlock:1:30212964
#  Matches: chainContext:processBlock:5:30212964
LOG_LEVEL=chainContext:processBlock:(\d{1,3}):(\d*)$=DEBUG

# Another example
#  Matches: chainContext:processBlock:100:30212964
#  Matches: chainContext:processBlock:1:30212964
#  But not: chainContext:processBlock:5:30212964
LOG_LEVEL=chainContext:processBlock:(100|1):(\d*)$=DEBUG

Combine all of the above to control the log level of any modules:

 LOG_LEVEL="WARN,commands=DEBUG,^checkForAndPlaceOrder=WARN,^chainContext=INFO,_checkForAndPlaceOrder:1:=INFO" yarn cli

API Server

Commands that run the watch-tower in a watching mode, will also start an API server. By default the API server will start on port 8080. You can change the port using the --api-port <apiPort> CLI option.

The server exposes automatically:

An API, with:
- Version info: http://localhost:8080/api/version
- Dump Database: http://localhost:8080/api/dump/:chainId e.g. http://localhost:8080/api/dump/1
Prometheus Metrics: http://localhost:8080/metrics

You can prevent the API server from starting by setting the --disable-api flag for the run command.

The /api/version endpoint, exposes the information in the package.json. This can be helpful to identify the version of the watch-tower. Additionally, for environments using docker, the environment variable DOCKER_IMAGE_TAG can be used to specify the Docker image tag used.

Developers

Requirements

node (>= v16.18.0)
yarn
npm

Local development

It is recommended to test against the Goerli testnet. To run the watch-tower:

# Install dependencies
yarn
# Run watch-tower
yarn cli run --config-path ./config.json

Testing

To run the tests:

yarn test

Linting / Formatting

# To lint the code
yarn lint

# To fix linting errors
yarn lint:fix

# To format the code
yarn fmt

Building the docker image

To build the docker image:

docker build -t watch-tower .

watch-tower's People

Contributors

Stargazers

Watchers

Forkers

ahhda

watch-tower's Issues

chore: standardise variable nomenclature

Background

I'm frustrated by the creep that has been introduced over chain, chainId, chainContext, context etc. Let's standardise here 😀

Details

Refactor the variables so that they have a clear meaning:

chainId MUST be SupportedChainId | number | string
chainContext MUST contain chainId, and rpc

Acceptance criteria

Clearly defined variables

[Epic] WatchTower run with ZERO errors

WatchTower has many errors that are thrown to our alert/log system.

These errors are most of the time false positives of cases we want to handle better.

Handle special cases

This epic will point to some initiatives that would handle these cases.

Improve Logging / Monitoring

Also, as part of this effort on improving logging end monitoring:

Allow to debug locally with Production State: cowprotocol/composable-cow#54
Create a DEV environment to be able to re-iterate the watch-tower: cowprotocol/composable-cow#45
Add sentry, to overcome shortcomings of Tenderly logs and improve our observability/debug experience: cowprotocol/composable-cow#43
Allow to re-process old blocks, to re-create a failure indexing an order locally cowprotocol/composable-cow#41
Do not fail in intermediate errors processing events, but fail if there was at least one error so Tenderly can show the FAILED STATUS: cowprotocol/composable-cow#36
Handle errors writing in Tenderly Storage: cowprotocol/composable-cow#53

Other MISC

Add CLA: see https://github.com/cowprotocol/cla

chore: Reduce log verbosity

Background

For now logs are still very verbose. On its way to production we need to make sure relevant logs are visible, but we hide the irrelevants.

Now that we can increase logs per component on demand, we can also be less verbose and have a less noisy log level: WARN (and increase on very critical modules)

chore: top of waterfall 🌊

Background

Many PR comments for the new dockerised version are better addressed at the end of the PR waterfall. This issue tracks those.

Details

All PR action items best suited for top of water fall are integrated.

Acceptance criteria

chore: Create SHADOW watch Tower

Background

Watch Towers might fail. We should have a shadow watch tower, ideally using different RPC endpoints, that will ensure that if the main watch towers are not posting the orders in time, they post it themselves and also they write in the logs.

Details

They could watch orders pending to be posted, but they will defer the posting of the order 2min

feat: single-chain mode

Problem

The current default mode for running the watch-tower is only multi-chain. This is a requirement for self-hosted (eg. dappnode), however there's minimal blast radius protection (one chain may affect another's watch-tower).

Alternatives considered

Build blast radius protection into the current run mode of the watch-tower. This is likely an over-engineered solution that can be mitigated better by strict division between chain micro-services.

Acceptance criteria

run command moved to run-multi.
run command implements a single chain's watch-tower.
Command line configuration options for single-run are backwards compatible.

bug: breaking change in model

Problem:
A refactor caused a change to the model's object keys, with composableCow being renamed to address.

Solution:
Rewind the change to the model to maintain consistency. Address with in-line documentation that composableCow should be viewed as "ComposableCoW compatible".

References:
Originally posted by @anxolin in #10 (comment)

feat: don't index orders from incompatible contracts

These events cause noise in the alert system / logs, so they should be filtered out.

Here is a suggestions on how to do it from @mfw78:
https://github.com/cowprotocol/composable-cow/pull/46/files#r1298547398

bug: new block timeout

Problem

There may be times where the RPC that the watch tower is connected to fails to advance the block chain. This is likely an RPC error, and should be reported via logging. Causes may include RPC rate limiting, connectivity issues, downtime.

Impact

The impact of this is high. This would not result in loss of funds, but would result in service degradation.

To reproduce

Run your own RPC.
Run the watch tower.
Disconnect your internet from your RPC 😅

Expected behaviour

Logs should become noisy indicating that there is an underlying RPC issue.

feat: state rebuilder

Problem

It's frustrating when there are errors within Tenderly as we are unable to verify if the data held in state is accurate. So far there have been no complaints, likely due to running a prod and dev instance, with the dev having the optimisations built in, and therefore it hasn't been moved to parallel / non-sequential execution by Tenderly admins.

Alternatives considered

N/A

Additional context

It may also be handy to audit all the previous orders to ensure that the parts were at least placed.

Acceptance criteria

Script written for rebuilding registry state.

chore: Review Prometheus metrics and add additional ones

Background

Review metrics, and reiterate the initial proposal.

perf: remove owners with no orders

Background

When observing the performance of the tenderly watch tower in production, as time went on, it was observed that the storage has become less and less reliable (as noted by logs in loggly showing saving of registry fails occasionally). This has often been worked around by running multiple watch towers (sub-optimal solution).

When observing the production state for mainnet, the registry is 120kb in size. There are two reasons for this bloat:

Removal of conditional orders from the watch tower was not straight-forward without the now-implemented TWAP-specific logic.
A decision was not yet taken on whether or not to remove owners from the registry with no orders (can retain the fact that they are likely composable-cow enabled.

Details

Storage space should be limited to an absolute bare minimum given the seeming high cost of this storage API. Therefore, it's desired to:

Correctly prune defunct conditional orders (TWAP-specific logic): 120kb --> 17kb. (Implemented)
Prune owners with no conditional orders: 17kb -> 12kb. (This issue).

Acceptance criteria

Owners with no conditional orders are pruned from the registry.

feat: implement own event watcher / block watcher

feat: dump registry of chain

Background

I'm frustrated by not being able to retrieve the state of a specific chain's conditional order registry from the leveldb database in order to compare with what's running in the current Tenderly Web3 Actions watchtower.

Details

Implement a dump-db subcommand that will query the registry from leveldb, and return it in JSON prettified for comparison.

Acceptance criteria

Able to retrieve 1:1 comparison of the registry.

chore: unify logging

feat: posted order lru cache

Problem

As there is no method available for the conditional order to tell the watch-tower what action to take when the order is already in the order book, this may result in continually polling each block / spamming the API endpoint.

Alternatives considered

A full implementation of cowprotocol/composable-cow#74, however there is likely some other underlying work that would be done around the polling to unify/refactor the interfaces. This would be implemented at the ComposableCoW, and not the smart order as such (to maintain the single entry-point multiplexing), and therefore is for a later revision.

Acceptance criteria

Implemented an LRU cache to reduce pressure on the order book.

feat: Add Specific validation from smart orders by delegating to BaseConditionalOrder instances

Add Specific validation from smart orders by delegating to BaseConditionalOrder instances.

Make a new method in BaseConditionalOrder that would allow to delegate the validation to BaseConditionalOrder instances.

The validations should tell the watch-tower as a result one of the following:

Validations is OK
Error validating: for unhandled errors? (alternative, we could throw)
It's not yet time to place an order
This order will never generate a discrete order (i.e. expired TWAP)

chore: dont generate contract types any more, use SDK

We depend on the SDK now, so we can use the SDK types, including:

ComposableCoW,
ComposableCoWInterface,
IConditionalOrder,
etc.

This task is for deleting the process of generating the types, and using the SDK imports instead

bug: logging tools not initialised

Problem

In the refactor / migration from Tenderly to standalone, the logging tools via initContext are no longer instantiated.

Impact

The impact of this bug is high as it reduces the ability to observe the watch-tower's performance as it runs.

To reproduce

Look at the code to see that initContext isn't called in the code path.

Expected behaviour

Logging works as in the Web3 Action.

chore: Store the context

When we index a new order, there might be already some context information that we want to include.

Instead of fetching the context over and over for following checks, we can just store it when we add an order.

For TWAP it means, we store the t0

Move error handling to SDK

@mfw78 did some nice error handlings in #15

Now the polling happen in the SDK. See:

#16
and cowprotocol/cow-sdk#149

We might want to move some of the legacy logic to the SDK.

Here is a pointer in the code. The legacy is executed if the Factory can't create an order. This happens if the handler is unknown:
https://github.com/cowprotocol/tenderly-watch-tower/pull/16/files#diff-8e2dc90f4a6ebdf01f8603707f37f7794972fdf1827102bfce89497804f72837R157

We will need to decide if we want to index unknown handlers. If so, we can have in the SDK also some generic logic for the polling.

refactor: naming conventions

Background

Some variable names are becoming confusing, causing issues with readability / reasoning of the code base.

Details

Variable usage to be checked for consistency / reasoning

Acceptance criteria

PollResults saved to lastHint, NOT lastResult.
Refactor naming around WAITING_TIME_SECONDS_FOR_NOT_BALANCE to something more idiomatic relating to reducing back pressure on the API.

refactor: remove tenderly

chore: Make Tenderly config DRYer

This issue is to make sure I finish this PR (i merged before it was completed):
cowprotocol/composable-cow#45

The PR aimed to provide an easy way to deploy to different deployment instances. This became important because now we are doing small reiterations in Tenderly, but its important that we don't compromise the stability of the indexing.

For this reason, we should keep production stable, and the PRs could deploy in development.

I've made a full deployment in a secondary Tenderly account and created a new Sentry project for DEV.

Why am I merging before the time

Cause I managed to have the thing working, but I duplicated the file which is not good. So I want to be able to reiterate already and do deployments in DEVELOP.

I have some ideas on how to make it DRYer (do not repeat ourselves), like using a templating system, but it feels overengineered and I would prefer to wait for Tenderly support to suggest their preferred solution before I do something suboptimal.

feat: observability

Problem

As the standalone watch-tower moves into production, observability is critical in ensuring the reliability of the service.

Alternatives considered

Use of loggly / sentry. These have been trialed, and while loggly is suitable for some log analysis, this can easily be achieved by ELK. Loggly as well doesn't provide much in terms of metrics monitoring - this is the bread and butter of Prometheus / Grafana.

Acceptance criteria

/health endpoint that returns not ready if the chain is still in warm-up.
/metrics endpoint with targeted metrics distributed throughout.

bug: low level calls cause log spamming if not adhering to the interface

Problem

When the watch tower polls a custom order type, the selector / revert error returned has no guarantee of being compliant with the interface. It's currently being observed that old TWAP orders from the pre-production version of the contracts are returning OrderNotValid() selectors, as opposed to OrderNotValid(string) which is causing these to not be recognised by the low-level call handlers.

Impact

Causes needless log spamming, and doesn't promote strict interface adherence.

To reproduce

Observe the goerli watch tower in production.

Expected behaviour

If an order doesn't adhere to the interface, it should not be monitored. Any order that doesn't return a known custom error for the revert should be dropped immediately.

Screenshots/logs

2023-10-05T01:31:28.999Z ERROR checkForAndPlaceOrder:_pollLegacy:5:2.8@9810859: Error on CALL to getTradeableOrderWithSignature. Simulate: https://dashboard.tenderly.co/gp-v2/watch-tower-prod/simulator/new?network=5&contractAddress=0xfdaFc9d1902f4e0b84f65F49f244b32b31013b74&rawFunctionInput=0x26e0a1960000000000000000000000008654d1136f2a760ba3e1c9e131cb9ad217921b52000000000000000000000000000000000000000000000000000000000000008000000000000000000000000000000000000000000000000000000000000002400000000000000000000000000000000000000000000000000000000000000260000000000000000000000000910d00a310f7dc5b29fe73458f47f519be547d3d000000000000000000000000000000000000000000000000000000189e3ae4af00000000000000000000000000000000000000000000000000000000000000600000000000000000000000000000000000000000000000000000000000000140000000000000000000000000b4fbf271143f4fbf7b91a5ded31805e42b2208d600000000000000000000000091056d4a53e1faa1a84306d4deaec71085394bc80000000000000000000000008654d1136f2a760ba3e1c9e131cb9ad217921b5200000000000000000000000000000000000000000000000000b1a2bc2ec500000000000000000000000000000000000000000000000000062a37aa81e9beb9780000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000000007080000000000000000000000000000000000000000000000000000000000000000506960793899dbd9225c61c44bd44927462151c82dec95315cf937f6b95ef21f00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
2023-10-05T01:31:28.999Z ERROR checkForAndPlaceOrder:_handleGetTradableOrderCall:5:2.8@9810859: checkForAndPlaceOrder:_handleGetTradableOrderCall:5:2.8@9810859 Unexpected error LowLevelError: low-level call failed
    at _pollLegacy (/usr/src/app/dist/src/domain/checkForAndPlaceOrder.js:340:19)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async _processConditionalOrder (/usr/src/app/dist/src/domain/checkForAndPlaceOrder.js:141:26)
    at async checkForAndPlaceOrder (/usr/src/app/dist/src/domain/checkForAndPlaceOrder.js:70:32)
    at async processBlock (/usr/src/app/dist/src/domain/chainContext.js:213:20)
    at async JsonRpcProvider.<anonymous> (/usr/src/app/dist/src/domain/chainContext.js:155:21) {
  data: '0xf3ec7a9f'
}

In this case, 0xf3ec7a9f corresponds to the previous OrderNotValid() selector prior to the insertion of the string for there reason (implemented just before push to production).

Tenderly watch-tower version/commit hash

Version: v1.0.1-rc.0

chore: Deploy to production

Problem

Define a release process, and deploy a stable version in production.

chore: integrate other PRs

feat: merkle tree support

Problem

Currently all smart orders that are processed by the watch tower are only single orders. There is presently no testing of merkle proofs.

Alternatives considered

N/A

Additional context

There may be cross-library dependencies on cow-sdk when handling multiple orders.

Acceptance criteria

Watch tower successfully handles create / remove / update of merkle roots.

chore: nomenclature

Problem:

Be more direct about what we're compatible with.

Solution:

Rename isCompatible to isComposableCowCompatible.
Insert a const for const composableCowBytecode = composableCow.deployedBytecode.object.

References:

Originally posted by @anxolin in #10 (comment)

bug: FluentD indexes wrong the logs

Problem

Currently is mixing logs from other pods in the cluster because of the parsing of multiline logs

Solution

Start all logs with a timestamp

perf: heuristically reject bad contracts emitting events

Problem

The watch tower listens for events, that are not filtered by a specific contract address. Options:

Filter by specific address: This makes the system more brittle, as contracts are redeployed with new features, the watch tower would be dependent on specific changes for the new deployment address.
Listen to events, blockchain-wide: This is the current method deployed, but introduces additional overhead by asserting that the contracts implement the required interface.

Solution

At the core of the watch tower is the regular checking for orders to be placed via the getTradeableOrderWithSignature function. If we were to check only for this function selector (bytes4), the number of false check passes may still be high. We can add an additional check for the cabinet(address,bytes32) function to reduce the number of false positives and create a more robust heuristic.

feat: Allow specific order implementations to signal the next checkTime

WatchTower default behaviour will check, for every single block, if its time to place an order.

This, doesn't scale well, because each check will require some RPC calls, and validations, and there's situations where this can be avoided.

For example, in TWAP it is deterministic what is the next moment in time where we will be able to place the next discreate order.

This PR is about signaling the watch tower what is the next checkTime, and waiting patiently until the time is right.

bug: becoming rate limited on orderbook api

Problem

When using axios with no backoff / rate-limiter, in conjunction with a local ethereum node, the warm-up causes a large spike in API requests to the backend, resulting in rate limiting.

Impact

This has a medium to high impact as it is likely to result in to-be-initialised watch-tower failing to warmup and sync to the current blockchain's state.

To reproduce

Run locally to an ethereum / gnosis chain node.
Attempt to sync from the deployment block.
Observe rate limiting.

Expected behaviour

Doesn't become rate limited.

Tenderly watch-tower version/commit hash

Commit hash b5ae796.

Additional context

N/A

feat: make it standalone

Background

It's very frustrating dealing with the current setup of the actions as there are moving parts that are abstracted and out of our control, creating race conditions affecting the reliability of the watch tower.

Details

Remove Web3 Actions / Tenderly from the code-base and make it so that the watch tower can be run as a completely separate instance with no external dependencies.

Acceptance criteria

test: unit coverage for poll legacy

Background

It's frustrating working on the polling legacy logic as it is detailed and lacks unit testing, inhibiting robustness / trustworthiness.

Details

Implement a test harness for unit testing the legacy polling logic. This should be accomplished using jest test harnesses with mock for the getTradeableOrderWithSignature, and unit testing for customErrorDecode.

Acceptance criteria

Test coverage of customErrorDecode
Happy path test coverage of getTradeableOrderWithSignature

bug: watch dog timing out on goerli

Problem

Goerli WatchTower deployment is crashing

2023-10-04T07:31:09.006Z ERROR chainContext:runBlockWatcher:5: Watchdog timeout

2023-10-04T07:30:40.056Z ERROR checkForAndPlaceOrder:_pollLegacy:5:2.9@9806576: Error on CALL to getTradeableOrderWithSignature. Simulate: https://dashboard.tenderly.co/gp-v2/watch-tower-prod/simulator/new?network=5&contractAddress=0xfdaFc9d1902f4e0b84f65F49f244b32b31013b74&rawFunctionInput=0x26e0a1960000000000000000000000008654d1136f2a760ba3e1c9e131cb9ad217921b52000000000000000000000000000000000000000000000000000000000000008000000000000000000000000000000000000000000000000000000000000002400000000000000000000000000000000000000000000000000000000000000260000000000000000000000000910d00a310f7dc5b29fe73458f47f519be547d3d000000000000000000000000000000000000000000000000000000189e3b096000000000000000000000000000000000000000000000000000000000000000600000000000000000000000000000000000000000000000000000000000000140000000000000000000000000b4fbf271143f4fbf7b91a5ded31805e42b2208d600000000000000000000000091056d4a53e1faa1a84306d4deaec71085394bc80000000000000000000000008654d1136f2a760ba3e1c9e131cb9ad217921b5200000000000000000000000000000000000000000000000000b1a2bc2ec500000000000000000000000000000000000000000000000000063fa67b061775f0c90000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000000007080000000000000000000000000000000000000000000000000000000000000000506960793899dbd9225c61c44bd44927462151c82dec95315cf937f6b95ef21f00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

2023-10-04T07:30:40.056Z ERROR checkForAndPlaceOrder:_handleGetTradableOrderCall:5:2.9@9806576: checkForAndPlaceOrder:_handleGetTradableOrderCall:5:2.9@9806576 Unexpected error LowLevelError: low-level call failed     at _pollLegacy (/usr/src/app/dist/src/domain/checkForAndPlaceOrder.js:340:19)     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)     at async _processConditionalOrder (/usr/src/app/dist/src/domain/checkForAndPlaceOrder.js:141:26)     at async checkForAndPlaceOrder (/usr/src/app/dist/src/domain/checkForAndPlaceOrder.js:70:32)     at async processBlock (/usr/src/app/dist/src/domain/chainContext.js:213:20)     at async JsonRpcProvider.<anonymous> (/usr/src/app/dist/src/domain/chainContext.js:155:21) {   data: '0xf3ec7a9f' }

Impact

Causing restarts as well as alerts for on call

To reproduce

Check for logs in goerli-watch-tower deployment

Expected behaviour

The service should not crash

Screenshots/logs

Search for "Watchdog timeout" in ES
Example, https://production-6de61f.kb.eu-central-1.aws.cloud.es.io/app/r/s/Uk7oi

Tenderly watch-tower version/commit hash

v1.0.1-rc.0

feat: dappnode package

Problem

For true self-sovereignty, and trustless environments, I should be able to easily run my own watch-tower.

Alternatives considered

Alternatives would be for users to run directly via CLI, or via docker (both reasonable options). Providing a dappnode package "neatens" this solution, and makes it more accessible.

Acceptance criteria

Revise run-multi configuration to take a mix of RPC and deployment block (ie. chain-config is variadic and remove deploymentBlock option.
address option to filter monitor by a user specified addresses.
Entrypoint script for automatic configuration as much as possible.
Dappnode package defined with CI/CD to push to Pinata.

chore: Setup alerts

Background

Research on best way to create alerts, define channels, and process to review issues.

Probably we will use Grafana/Prometheus for this. But validate with backend

chore: dockerise the service for deployment

chore: Delete dead code

When we sort out Tenderly Logs:

cowprotocol/composable-cow#52 (comment)

dep: nethermind returns custom errors in manner not expected by ethers

Problem

Gnosis Chain is often used as a testing ground for low value-at-risk trades in a "testing" environment. However, Gnosis Chain is heavily biased towards Nethermind as an execution client, and as such, inherits bugs there. Notably, NethermindEth/nethermind#6024 refers to Nethermind not correctly returning revert codes (as may be expected).

Solution

There are two options:

Investigate use of eth_estimateGas to see if this would return the correct revert value.
Derive some intermediate step for handling of Gnosis Chain until nodes are migrated to a version of Nethermind that fixes this fundamental issue.

CC: @cowprotocol/backend @cowprotocol/frontend

chore: Create Grafana Dashboard

Background

Make sure we can check the health, performance, and debug when there's an issue with the watch-tower

feat: use leveldb for persistence

chore: use commander for environment / cli configuration

feat: smart contract hinted polling

Problem

It was always frustrating with the initial orders from ComposableCoW not being able to relay important information to the watch-tower to allow it to optimise it's polling / performance.

Alternatives considered

Implement custom logic for each handler (order type). This is not a scalable and resilient solution though as it requires all conditional order authors to duplicate business logic between the smart contract and an SDK module. This then also means that they need to PR / push upstream this logic for the watch-tower to consume it.

Acceptance criteria

Conditional orders are able to provide hints to the watch-tower.

refactor: model

Currently, watch tower depends on the model to do its tasks.

Each version could introduce potentially breaking changes in the model, this of course would break the indexing.
For this reason we kept the original model stable.

However, it would be good to be able to evolve the model over time.

Tasks:

Version model #21
Do not use typechain generated structs

Version model

This issue suggests using some versioning for the model, and some simple migration logics.

Do not use IConditionalOrder.ConditionalOrderParamsStruct

IConditionalOrder.ConditionalOrderParamsStruct has a weird definition:

export type ConditionalOrderParamsStruct = {
    handler: PromiseOrValue<string>;
    salt: PromiseOrValue<BytesLike>;
    staticInput: PromiseOrValue<BytesLike>;
  };

Actually, this is not what ends up being persisted. See the hack: https://github.com/cowprotocol/tenderly-watch-tower/blob/88ab5365f303a0f316b70eaea98a1604767ba45a/actions/checkForAndPlaceOrder.ts#L201

const [handler, salt, staticInput] = conditionalOrder.params as any as [
      string,
      string,
      string
    ];

https://github.com/cowprotocol/tenderly-watch-tower/blob/88ab5365f303a0f316b70eaea98a1604767ba45a/actions/checkForAndPlaceOrder.ts#L199

bug: memory leak on gnosis chain

Problem

During staging, it has been observed that there has been a continual steady memory leak. This is resulting in the pod being OOMKilled.

Impact

The impact is relatively contained as the pod is able to gracefully recover, but this may lead to momentary downtime. The risk observed in practice for this bug is low to medium (recovery is generally within a couple of blocks).

To reproduce

Deploy to infrastructure (such as Kubernetes).
Observe pod memory consumption growth.

Expected behaviour

The service should be able to start up and run with a steady resource state. There will be some memory growth due to extensive metrics logging, but this shouldn't result in frequent OOMKill (expected OOMKill <= once a month).

Screenshots/logs

Watch-tower version/commit hash

Commit: 8240a27

Additional context

Research on this topic has suggested the problems may lie in:

prom-client having a memory leak.
The express-prometheus-middleware leaking memory.
Excess journal build-up in the registry (LevelDB) between writes.

chore: Address comments in consolidated PRs

Background

We are deploying watch tower to PROD, and we need some of the changes in the open PRs. This is why I will consolidate all those open PRs and close version.

Some PRs have follow ups, so this issue is just a reminder to handle the comments:

Merged this PR but might have some comments:

chore: migrate to yarn (coherence with other projects)

right now it has both yarn and npm

cowprotocol / watch-tower Goto Github PK

watch-tower's Introduction

Watch-Tower for Programmatic Orders 🐄🤖

Overview

Deployment

Docker

DAppNode

Running locally

Requirements

CLI

Architecture

Events

Storage (registry)

Database

Schema

Logging

API Server

Developers

Requirements

Local development

Testing

Linting / Formatting

Building the docker image

watch-tower's People

Contributors

Stargazers

Watchers

Forkers

watch-tower's Issues

Background

Details

Acceptance criteria

Handle special cases

Improve Logging / Monitoring

Other MISC

Background

Background

Details

Acceptance criteria

Background

Details

Problem

Suggested solution

Alternatives considered

Acceptance criteria

Problem

Impact

To reproduce

Expected behaviour

Problem

Suggested solution

Alternatives considered

Additional context

Acceptance criteria

Background

Background

Details

Acceptance criteria

Background

Details

Acceptance criteria

Problem

Suggested solution

Alternatives considered

Acceptance criteria

Problem

Impact

To reproduce

Expected behaviour

Background

Details

Acceptance criteria

Why am I merging before the time

Problem

Suggested solution

Alternatives considered

Acceptance criteria

Problem

Impact

To reproduce