Giter Site home page Giter Site logo

mustekala's Introduction

mustekala's People

Contributors

dryajov avatar gitter-badger avatar kumavis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mustekala's Issues

[WIP] Mustekala roadmap

Mustekala Roadmap

Requirements

Ethereum storage requierements

  • Facilitating account and token balance
    • Identify the slice associated with the token address
  • Get the slice
    • figure out if the slice is complete
      • we'll know if this is a final slice or not and download the remainder when we run the contract(s)
  • Run the contract
    • intercept the storage calls and download the slices that contain that storage (this is done already)

Kitsunet requirements

  • Nodes need to figure out which slices they are serving
    • We can use some sort of consistent hashing to map slices onto peers
    • The node also keeps the slices that it needs, for example for the accounts it's interested in
    • as a possible optiomization, we might also want to have a notion of topologicaly close nodes that collaborate with
  • block source
    • mvp: infura via http
  • slice source
    • ideas:
        1. slice producing geth forks joins kitsunet
        • cons: more complexity, deploy frequently, ensure browser + node coverage
        1. slice producing geth fork exposes http/ws accessed by "activated" node
        • pro: simple, lets us focus on gossip/topo

MVP

As a first step, we want to have a few slices that we subscribe to and get the accounts balances and token balances.

  • multiple slice feeds
  • block header feed
  • passive nodes track header and 1 slice
  • active nodes change their slice subscription

Documentation of this Project

Documentation of this Project

Overview

Mustekala is the name of the Ethereum Light Client project of Metamask.

Motivation

State of the art

  • As of 2018.06.06, MetaMask relies on RPC communications with either the user's local node, the INFURA service or a custom remote node.
  • The functionalities obtained via RPC can be grouped as
    • Obtention of Ethereum Data (State, Storage and Indexes)
      • Example: Last Block
      • Example: Account balance query
      • Example: Logs
    • Execution of code in the EVM
      • Example: Token operations
    • Broadcasting of transactions to Ethereum's Network devp2p
      • To be minted into a block

Current Challenges

  • As of 2018.06.06, the required blockchain storage for a full archive node is 1 TB. This can be mitigated by having fast synchronization, requiring the current plus some recent states, to a figure near 10 to 20 GB, both cases being really intensive in storage.
  • Moreover, the process of synchronizing can be very demanding on IOPS: Instead of pulling each block header from devp2p, checking which elements of the state changed, and pulling this delta, what nodes do is to execute each transaction over the state.
  • Light clients only synchronize the canonical chain block headers and pull the exact elements of the state, plus their merkle proofs. However the light client protocol is not part of the common protocol ran via the majority of nodes, i.e. a node has to be willing by configuration to run this protocol and admit a certain number of light clients to consume its data.
  • Both fast synchronized nodes and light clients will ignore or prune past state data, which is not convenient to certain Dapp developers who rely on indexes such as logs in the blockchain.
  • The discovery process of devp2p does not guarantee "good nodes", i.e. nodes synchronized to the current block, willing to connect and share its information to the requester.
  • Finally, as the managing and maintainance of synchronized ethereum nodes becomes a burden for users, they become dependent to external services to provide that facility. Economies of scale are needed to run such infrastructure, which in turn, converge to some form or other of centralization.

TLDR

The current network devp2p is broken

  • Is hard to discover useful peers
  • Is hard to get just the data you want
  • Do you want to run a quick script to "just hook you up to any node in the network and be able to know your favorite token balance?" YOU CAN NOT, sorry

The proposal of Mustekala

  • Make your browser a peer of a million p2p network
    • We name our fox network kitsunet
  • Easily discover other peers to consume and share ethereum data
  • Easily set up Hubs in your laptop or server to boost the data availability
    • These DO NOT need to synchronize the whole state (GBs), but only the elements you need
  • By using libp2p, you can easily write your program to consume the data you want
  • By implementing IPLD, every tiny bit of ethereum information can be accessed instantly

Advantages of this approach

  • Data availability at face value
  • Easy consumption of information
  • More devices able to interact with the blockchain
  • Expand the horizon for Dapps

Even More!

  • Migration from devp2p to libp2p as the de-facto network for ethereum
  • Cattle vs Pets approach for providers of data
  • Insanely fast synchronizations

Architecture

The following 4-layer architecture comprises the project:

  • Layer 1: devp2p data sources
  • Layer 2: IPFS Bridges and Hubs
  • Layer 3: Kitsunet Peers
  • Layer 4: Content Routing System

A more thoroughly discussion is in this document

MVP

The concept is straightforward: Get the data from an Ethereum node, dump it into IPFS, consume it in a browser peer.

More of this point in this document

Beyond MVP

A plethora of scaling challenges and use cases emerge from the MVP, including (but not limited to)

  • Extend the mainstream ethereum clients to kitsunet (share and consume data)
  • Multi Blockchain support
  • IoT devices

The comprehensive reference in this file

Sprints

List of Sprints: Dates and Goals

Other

Sprints

Sprints

2018.08.20 - 2018.08.30

PoC

  • Kolmogorov PoC
    • Issue Link: #8

Development

  • @hermanjunge

    • bridges
    • metrics framework
    • optimization plan
    • live documentation of the process
      • with the goal of finding adepts on the way
  • @dryajov

    • workout issues with circuit relay network
    • cleanup/refactor metamask visualizer
    • work on the kitsunet client

2018.08.06 - 2018.08.16

PoC

  • Kolmogorov PoC
    • Issue Link: #8

Development


2018.07.23 - 2018.08.02

PoC

  • Kolmogorov PoC
    • Issue Link: #8

Development


2018.07.02 - 2018.07.19

PoC

  • Kolmogorov PoC
    • Issue Link: #8

Development


2018.06.18 - 2018.06.22

Short sprint. Team retreat next week.

PoC

  • Kolmogorov PoC
    • Issue Link: #8

Development

  • @hermanjunge

    • Bentobox: consume from geth and parity to IPFS client
      • PR Link: #7
        • block headers
        • uncles
        • state trie nodes
        • storage trie nodes
        • transactions (as trie nodes)
        • transaction receipts (as trie nodes)
  • @dryajov

    • start working on the Consume and Visualize Data section of #8
    • get webrtc working more reliably

2018.06.04 - 2018.06.14

Team

PoC

  • Kolmogorov PoC features write-up
    • Issue Link: #8

Documentation

  • Write documentation

Development

  • Microservice: consume from parity to IPFS client

    • @hermanjunge
    • PR Link: #7
      • block headers
      • uncles
      • state trie nodes
      • storage trie nodes
      • transactions (as trie nodes)
      • transaction receipts (as trie nodes)
  • Improve kitsunet peer discovery over rendezvous

[Experiment] - 1st data exchange protocol in kitsunet

  1. a kitsunet node has interest on some slices (can be random per id, can be balances it needs, or storage it will use)
  2. a KN finds neighbours with similar slices. If not it says "i cant find similar neighbours" "not same interest found"
  3. a KN subscribes to some something that tells it on block header updates
  4. a KN asks their similar interest neighbours (SIN) for the delta between the slice for root X and slice of root Y
  5. Upon receiving the delta, a KN, prepares a want list. That is, it communicates to peers that it needs the trie nodes so it can get them in a distributed way.

Mesh Testing, Feat/stability

Linked to MetaMask/mesh-testing#53


DO NOT MERGE YET - experimental branch to work out stability issues in libp2p.

For the last few days I've been troubleshooting some stability issues with libp2p, I'll describe what those are and what the possible fixes for them are as well bellow:

libp2p connection and stability issues

The current issues with connection management in libp2p prevent it from being able to connect to more than ~10 simultaneous peers, I've observed that after we reach that threshold things become very unstable and we're no longer able to send messages over pubsub reliably, as well as maintain those connections open for any significant period of time (connection drops). This is due to several issues:

  • Physical (non muxed) connections don't get reused correctly, they instead get thrown away on each dial, which under certain situations can lead to connections piling up, which eventually backups the connection queue (connection backlog). The particular line where this might be happening is this - https://github.com/libp2p/js-libp2p-switch/blob/master/src/dial.js#L184.

  • Too many concurrent dials and low connection timeout. I'm not entirely sure whether this is a real issue just yet, but I've seen improvements lowering the number of concurrent dials done by libp2p-switch (https://github.com/libp2p/js-libp2p-switch/blob/master/src/transport.js#L11) to ~2 as well as increasing the timeout to ~2mins (https://github.com/libp2p/js-libp2p-switch/blob/master/src/transport.js#L15), in some cases. This is not conclusive and might be affected by other factors, but definitely something to keep an eye on.

  • Wrong way of closing connections. We don't seem to be closing connections properly in libp2p when the connection manager signals a disconnect, we do close muxed connection, but the physical connection is left dangling in some cases, leading to connection pile up. This should not happen, as destroying the muxed should in theory destroy the connection but I've seen a number of cases where this doesn't happen. In any case, I believe we should have an explicit Connection.destroy in the interface-connection code to ensure that the connection closes properly in all cases.

  • No way of detecting stale connections. Libp2p supports a variety of transports, and not all of those transports have a way of detecting weather the connection has gone stale (other side died/dropped), for this I believe we need a heartbeat mechanism that would detect and disconnect stale/dangling connections and perform the required/correct steps needed to properly clean them up.

Action items:

  • Fix the dial flow to ensure that connections don't get lost/untracked before making sure they are properly cleaned up
  • Fix all possible disconnect issue
  • Add a heartbeat mechanism
  • Experiment with Connection.destroy() in interface-connection

One thing that puzzled me while troubleshooting this, was that our deployed mesh doesn't seem to be running into this issues and we're able to maintain a stable mesh with well over ~100 simultaneous peers, but once I tried adding a circuit-relay node to the mesh everything would start falling apart. The reason is that we're keeping the amount of concurrent connections in libp2p-wrtc-star to 4, this due to wrtc specific limitations in the browser. This effectively mitigates the issues listed above when using wrtc, but shows up as soon as another transport is used, in this case websocket, which is used to connect to the relay node.


Disclaimer: It's a bit difficult to pinpoint this issues when so many things are happening at the same time, hence some of my observations/conclusions might be off, but I believe I've identified at least some of the issues. The action items above should increase stability considerably.

Research Kitsunet libp2p mesh stability and scalability issues

Build a stable libp2p based kitsunet mesh.

  • Troubleshoot stability issues when connecting to many nodes from both the browser and node
  • Make sure that kitsunet peers can discover other peers in the network (rndvz, custom discovery, etc)
  • Work on getting relay performance up to guaranty mesh stability

Improvements to the bridge in go-ethereum list

TODO LIST



  • Mustekala Bridges Monitoring
    • Based on EthStats
    • Variables
      • Last Block
      • Memory (geth default monitoring variables)
      • Memory (Slice cache)
  • Propose other metrics for the bridges, implement them
  • Consider in the future that parity bridges will have to be able to report to this monitor

  • Memory Management
    • Slice Cache: Insert, Query, Eviction Strategies.

  • Implement geth "kitsunet sync"


  • Blog Post of Slice Map visualization and stats.

  • Testing of Features (Continuous Integration)

  • Enable the consumption of slices via web sockets

  • Explore the option of having the geth light client consuming and seeding mustekala slices

[Experiment] - Slice "delta" computation prototype

So we have a slice of the ethereum state valid for a given root (and subsequently a block), this exercise points to get the best algorithm to determine the difference of the same slice for a different rule. This way, we can schedule the trie-node requests to update our slice. Get it?

The Roadmap

Yet another roadmap...

Moving from MetaMask/eth-ipfs-browser-client#1

Ethereum Browser Light Client Roadmap (MVP)

Divided in five big areas

  • Obtaining Data from devp2p
  • Kitsunet: metamask peer network
  • Bridge Peers: Make eth contain available to libp2p
  • Browser Peers Why we are here
  • Metamask Extension Ops

OBTAINING DATA FROM devp2p

  • Devp2p Node Scrapper
  • Block Header Sync
  • Client Fork (to add trie nodes to RedisDB)
  • Snapshot download (from devp2p)

KITSUNET (Metamask peer network)

  • Add mustekala box mk1 docker to repo
    • Deploy
  • Signalling servers
    • Docker package
    • Deploy
    • Network bootstrapping
  • PubSub Testing for blockchain head update
    • May include signature of messages / "network of trust" ?
  • Messaging testing
    • Request an "index"
  • Metrics
    • Collection

BRIDGE (Make eth content available to libp2p)

(go-ipfs)

  • Mount RedisDB Datastore problem
  • Bootstrap into kitsunetto
  • Custom Content Routing System
    • Sharding criteria, metadata indexes, "delta computations", etc
  • Publish new blockchain head
  • Respond to requests
    • eth - ipld dag requests
    • CRS indexes requests

PEERS

(js-ipfs)

  • Bootstrap into kitsunetto
  • Subscribe to new blockchain head
  • Custom Content Routing System
    • Maintaining Ethereum Trie Shards
      • Deterministic and/or discrete
      • Incentive system
      • Updating them on triggers, request indexes, compute deltas, request parts.
  • Manage other peers requests
    • eth - ipld dag requests
    • CRS indexes requests

Metamask Extension Ops

  • eth_blockNumber
  • eth_getBalance
  • ... more to come. (See breakdown in comment below)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.