ipfs-rust / ipfs-embed Goto Github PK

View Code? Open in Web Editor NEW

347.0 17.0 49.0 1.13 MB

A small embeddable ipfs implementation

Rust 99.97% Dockerfile 0.03%

ipfs-embed's Introduction

ipfs-embed

A small, fast and reliable ipfs implementation designed for embedding in to complex p2p applications.

node discovery via mdns
provider discovery via kademlia
exchange blocks via bitswap
lru eviction policy
aliases, an abstraction of recursively named pins
temporary recursive pins for building dags, preventing races with the garbage collector
efficiently syncing large dags of blocks

Some compatibility with go-ipfs can be enabled with the compat feature flag.

Getting started

use ipfs_embed::{Config, DefaultParams, Ipfs};
use libipld::DagCbor;
use libipld::store::Store;

#[derive(Clone, DagCbor, Debug, Eq, PartialEq)]
struct Identity {
    id: u64,
    name: String,
    age: u8,
}

#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let cache_size = 10;
    let ipfs = Ipfs::<DefaultParams>::new(Config::new(None, cache_size)).await?;
    ipfs.listen_on("/ip4/0.0.0.0/tcp/0".parse()?).await?;

    let identity = Identity {
        id: 0,
        name: "David Craven".into(),
        age: 26,
    };
    let cid = ipfs.insert(&identity)?;
    let identity2 = ipfs.get(&cid)?;
    assert_eq!(identity, identity2);
    println!("identity cid is {}", cid);

    Ok(())
}

Below is some notes on the history of ipfs-embed. The information is no longer accurrate for the current implementation.

What is ipfs?

Ipfs is a p2p network for locating and providing chunks of content addressed data called blocks.

Content addressing means that the data is located via it's hash as opposed to location addressing.

Unsurprisingly this is done using a distributed hash table. To avoid storing large amounts of data in the dht, the dht stores which peers have a block. After determining the peers that are providing a block, the block is requested from those peers.

To verify that the peer is sending the requested block and not an infinite stream of garbage, blocks need to have a finite size. In practice we'll assume a maximum block size of 1MB.

To encode arbitrary data in to 1MB blocks imposes two requirements on the codec. It needs to have a canonical representation to ensure that the same data results in the same hash and it needs to support linking to other content addressed blocks. Codecs having these two properties are called ipld codecs.

A property that follows from content addressing (representing edges as hashes of the node) is that arbitrary graphs of blocks are not possible. A graph of blocks is guaranteed to be directed and acyclic.

{"a":3}

{
  "a": 3,
}

{"/":"QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u"}

Block storage

Let's start with a naive model of a persistent block store.

trait BlockStorage {
    fn get(&self, cid: &Cid) -> Result<Option<Vec<u8>>>;
    fn insert(&mut self, cid: &Cid, data: &[u8]) -> Result<()>;
    fn remove(&mut self, cid: &Cid) -> Result<()>;
}

Since content addressed blocks form a directed acyclic graph, blocks can't simply be deleted. A block may be referenced by multiple nodes, so some form of reference counting and garbage collection is required to determine when a block can safely be deleted. In the interest of being a good peer on the p2p network, we may want to keep old blocks around that other peers may want. So thinking of it as a reference counted cache may be a more appropriate model. We end up with something like this:

trait BlockStorage {
    fn get(&self, cid: &Cid) -> Result<Option<Vec<u8>>>;
    fn insert(&mut self, cid: &Cid, data: &[u8], references: &[Cid]) -> Result<()>;
    fn evict(&mut self) -> Result<()>;
    fn pin(&mut self, cid: &Cid) -> Result<()>;
    fn unpin(&mut self, cid: &Cid) -> Result<()>;
}

To mutate a block we need to perform three steps. Get the block, modify and insert the modified block and finally remove the old one. We also need a map from keys to cids, so even more steps are required. Any of these steps can fail leaving the block store in an inconsistent state, leading to data leakage. To prevent data leakage every api consumer would have to implement a write-ahead-log. To resolve these issues we extend the store with named pins called aliases.

trait BlockStorage {
    fn get(&self, cid: &Cid) -> Result<Option<Vec<u8>>>;
    fn insert(&mut self, cid: &Cid, data: &[u8], references: &[Cid]) -> Result<()>;
    fn evict(&mut self) -> Result<()>;
    fn alias(&mut self, alias: &[u8], cid: Option<&Cid>) -> Result<()>;
    fn resolve(&self, alias: &[u8]) -> Result<Option<Cid>>;
}

Assuming that each operation is atomic and durable, we have the minimal set of operations required to store dags of content addressed blocks.

Networked block storage - the ipfs-embed api

impl Ipfs {
    pub fn new(storage: Arc<S>, network: Arc<N>) -> Self { .. }
    pub fn local_peer_id(&self) -> &PeerId { .. }
    pub async fn listeners(&self) -> Vec<Multiaddr> { .. }
    pub async fn external_addresses(&self) -> Vec<Multiaddr> { .. }
    pub async fn pinned(&self, cid: &Cid) -> Result<Option<bool>> { .. }
    pub async fn get(&self, cid: &Cid) -> Result<Block> {
        if let Some(block) = self.storage.get(cid)? {
            return Ok(block);
        }
        self.network.get(cid).await?;
        if let Some(block) = self.storage.get(cid)? {
            return Ok(block);
        }
        log::error!("block evicted too soon");
        Err(BlockNotFound(*cid))
    }
    pub async fn insert(&self, cid: &Cid) -> Result<()> {
        self.storage.insert(cid)?;
        self.network.provide(cid)?;
        Ok(())
    }
    pub async fn alias(&self, alias: &[u8], cid: Option<&Cid>) -> Result<()> {
        if let Some(cid) = cid {
            self.network.sync(cid).await?;
        }
        self.storage.alias(alias, cid).await?;
        Ok(())
    }
    pub async fn resolve(&self, alias: &[u8]) -> Result<Option<Cid>> {
        self.storage.resolve(alias)?;
        Ok(())
    }
}

Design patterns - ipfs-embed in action

We'll be looking at some patterns used in the chain example. The chain example uses ipfs-embed to store a chain of blocks. A block is defined as:

#[derive(Debug, Default, DagCbor)]
pub struct Block {
    prev: Option<Cid>,
    id: u32,
    loopback: Option<Cid>,
    payload: Vec<u8>,
}

Atomicity

We have to different db's in this example. The one managed by ipfs-embed that stores blocks and aliases and one specific to the example that maps the block index to the block cid, so that we can lookup blocks quickly without having to traverse the entire chain. To guarantee atomicity we define two aliases and perform the syncing in two steps. This ensures that the synced chain always has it's blocks indexed.

const TMP_ROOT: &str = alias!(tmp_root);
const ROOT: &str = alias!(root);

ipfs.alias(TMP_ROOT, Some(new_root)).await?;
for _ in prev_root_id..new_root_id {
    // index block may error for various reasons
}
ipfs.alias(ROOT, Some(new_root)).await?;

Dagification

The recursive syncing algorithm will perform worst when it is syncing a chain, as every block needs to be synced one after the other, without being able to take advantage of any parallelism. To resolve this issue we increase the linking of the chain by including loopbacks, to increase the branching of the dag.

An algorithm was proposed by @rklaehn for this purpose:

fn loopback(block: usize) -> Option<usize> {
    let x = block.trailing_zeros();
    if x > 1 && block > 0 {
        Some(block - (1 << (x - 1)))
    } else {
        None
    }
}

Selectors

Syncing can take a long time and doesn't allow selecting the subset of data that is needed. For this purpose there is an experimental alias_with_syncer api that allows customizing the syncing behaviour. In the chain example it is used to provide block validation, to ensure that the blocks are valid. Altough this api is likely to change in the future.

pub struct ChainSyncer<S: StoreParams, T: Storage<S>> {
    index: sled::Db,
    storage: BitswapStorage<S, T>,
}

impl<S: StoreParams, T: Storage<S>> BitswapSync for ChainSyncer<S, T>
where
    S::Codecs: Into<DagCborCodec>,
{
    fn references(&self, cid: &Cid) -> Box<dyn Iterator<Item = Cid>> {
        if let Some(data) = self.storage.get(cid) {
            let ipld_block = libipld::Block::<S>::new_unchecked(*cid, data);
            if let Ok(block) = ipld_block.decode::<DagCborCodec, Block>() {
                return Box::new(block.prev.into_iter().chain(block.loopback.into_iter()));
            }
        }
        Box::new(std::iter::empty())
    }

    fn contains(&self, cid: &Cid) -> bool {
        self.storage.contains(cid)
    }
}

Efficient block storage implementation - ipfs-embed internals

Ipfs embed uses SQLite to implement the block store, which is a performant embeddable SQL persistence layer / database.

type Id = u64;
type Atime = u64;

#[derive(Clone)]
struct BlockCache {
    // Cid -> Id
    lookup: Tree,
    // Id -> Cid
    cid: Tree,
    // Id -> Vec<u8>
    data: Tree,
    // Id -> Vec<Id>
    refs: Tree,
    // Id -> Atime
    atime: Tree,
    // Atime -> Id
    lru: Tree,
}

impl BlockCache {
    // Updates the atime and lru trees and returns the data from the data tree.
    fn get(&self, cid: &Cid) -> Result<Option<Vec<u8>>> { .. }
    // Returns an iterator of blocks sorted by least recently used.
    fn lru(&self) -> impl Iterator<Item = Result<Id>> { self.lru.iter().values() }
    // Inserts into all trees.
    fn insert(&self, cid: &Cid, data: &[u8]) -> Result<()> { ... }
    // Removes from all trees.
    fn remove(&self, id: &Id) -> Result<()> { ... }
    // Returns the recursive set of references.
    fn closure(&self, cid: &Cid) -> Result<Vec<Id>> { ... }
    // A stream of insert/remove events, useful for plugging in a network layer.
    fn subscribe(&self) -> impl Stream<Item = StorageEvent> { ... }
}

Given the description of operations and how it's structured in terms of trees, these operations are straight forward to implement.

#[derive(Clone)]
struct BlockStorage {
    cache: BlockCache,
    // Vec<u8> -> Id
    alias: Tree,
    // Bag of live ids
    filter: Arc<Mutex<CuockooFilter>>,
    // Id -> Vec<Id>
    closure: Tree,
}

impl BlockStorage {
    // get from cache
    fn get(&self, cid: &Cid) -> Result<Option<Vec<u8>>> { self.cache.get(cid) }
    // insert to cache
    fn insert(&self, cid: &Cid, data: &[u8]) -> Result<()> { self.cache.insert(cid, data) }
    // returns the value of the alias tree
    fn resolve(&self, alias: &[u8]) -> Result<Option<Cid>> { ... }
    // remove the lru block that is not in the bag of live ids and remove it's closure from
    // the closure tree
    fn evict(&self) -> Result<()> { ... }
    // aliasing is an expensive operation, the implementation is sketched in pseudo code
    fn alias(&self, alias: &[u8], cid: Option<&Cid>) -> Result<()> {
        // precompute the closure
        let prev_id = self.alias.get(alias)?;
        let prev_closure = self.closure.get(&prev_id)?;
        let new_id = self.cache.lookup(&cid);
        let new_closure = self.cache.closure(&cid);

        // lock the filter preventing evictions
        let mut filter = self.filter.lock().unwrap();
        // make sure that new closure wasn't evicted in the mean time
        for id in &new_closure {
            if !self.cache.contains_id(&id) {
                return Err("cannot alias, missing references");
            }
        }
        // update the live set
        for id in &new_closure {
            filter.add(id);
        }
        for id in &prev_closure {
            filter.delete(id);
        }
        // perform transaction
        let res = (&self.alias, &self.closure).transaction(|(talias, tclosure)| {
            if let Some(id) = prev_id.as_ref() {
                talias.remove(alias)?;
            }
            if let Some(id) = id.as_ref() {
                talias.insert(alias, id)?;
                tclosure.insert(id, &closure)?;
            }
            Ok(())
        });
        // if transaction failed revert live set to previous state
        if res.is_err() {
            for id in &prev_closure {
                filter.add(id);
            }
            for id in &closure {
                filter.delete(id)
            }
        }
        res
    }
}

Efficiently syncing dags of blocks - libp2p-bitswap internals

Bitswap is a very simple protocol. It was adapted and simplified for ipfs-embed. The message format can be represented by the following enums.

pub enum BitswapRequest {
    Have(Cid),
    Block(Cid),
}

pub enum BitswapResponse {
    Have(bool),
    Block(Vec<u8>),
}

The mechanism for locating providers can be abstracted. A dht can be plugged in or a centralized db query. The bitswap api looks as follows:

pub enum Query {
    Get(Cid),
    Sync(Cid),
}

pub enum BitswapEvent {
    GetProviders(Cid),
    QueryComplete(Query, Result<()>),
}

impl Bitswap {
    pub fn add_address(&mut self, peer_id: &PeerId, addr: Multiaddr) { .. }
    pub fn get(&mut self, cid: Cid) { .. }
    pub fn cancel_get(&mut self, cid: Cid) { .. }
    pub fn add_provider(&mut self, cid: Cid, peer_id: PeerId) { .. }
    pub fn complete_get_providers(&mut self, cid: Cid) { .. }
    pub fn poll(&mut self, cx: &mut Context) -> BitswapEvent { .. }
}

So what happens when you create a get request? First all the providers in the initial set are queried with the have request. As an optimization, in every batch of queries a block request is sent instead. If the get query finds a block it returns a query complete. If the block wasn't found in the initial set, a GetProviders(Cid) event is emitted. This is where the bitswap consumer tries to locate providers by for example performing a dht lookup. These providers are registered by calling the add_provider method. After the locating of providers completes, it is signaled by calling complete_get_providers. The query manager then performs bitswap requests using the new provider set which results in the block being found or a block not found error.

Often we want to sync an entire dag of blocks. We can efficiently sync dags of blocks by adding a sync query that runs get queries in parallel for all the references of a block. The set of providers that had a block is used as the initial set in a reference query. For this we extend our api with the following calls.

/// Bitswap sync trait for customizing the syncing behaviour.
pub trait BitswapSync {
    /// Returns the list of blocks that need to be synced.
    fn references(&self, cid: &Cid) -> Box<dyn Iterator<Item = Cid>>;
    /// Returns if a cid needs to be synced.
    fn contains(&self, cid: &Cid) -> bool;
}

impl Bitswap {
    pub fn sync(&mut self, cid: Cid, syncer: Arc<dyn BitswapSync>) { .. }
    pub fn cancel_sync(&mut self, cid: Cid) { .. }
}

Note that we can customize the syncing behaviour arbitrarily by selecting a subset of blocks we want to sync. See design patterns for more information.

License

MIT OR Apache-2.0

ipfs-embed's People

Contributors

Stargazers

Watchers

Forkers

dvc94ch johnptoohey sagarjois huhn511 fuzzrnet kingwel-xie itling matthiasbeyer wngr paulwang001 sbnair tribe-health actyx joepio rkuhn kamikazebr cim-labs mundisnetwork thebearda 0xbillw gqadonis ikhomyakov evanrichter mayhemheroes dariusc93 dmgolembiowski wahello jredrado iohzrd douganderson444 jakswim sam2much96 iq-scm xn3cr0nx niczy gf2 akavel hyperion2144 ipfs-nexivil andrewwill231 nico-kairos clemens-git76 druide vcs222 amiller68 wenliangky asamuj 0xmmbd coredumped7893

ipfs-embed's Issues

[noob] what does "no go-ipfs compatibility" mean?

In the readme, I noticed the following disclaimer:

It does not aim at being compatible in any way with go-ipfs.

Could I kindly ask if you would fancy trying to help me understand what does this mean, for an IPFS noob? Most notably, does it mean:

(a) that files shared on IPFS by go-ipfs or js-ipfs are not visible/gettable from ipfs-embed, and reverse? (i.e. they are completely "separate ecosystems/networks")
(b) or, that the files are cross-visible, but just the "local on-disk database and contents of downloaded/shared files" is not compatible? i.e. in other words, they are "network compatible", but ipfs-embed is just not (and will not be) a drop-in replacement for go-ipfs and reverse?

TIA! ❤️

(edit: In my particular case, I'm interested in writing a small personal-use app for sharing photos over IPFS - more or less trying to rewrite https://github.com/akavel/catation in Rust - and wonder if that's currently something I could manage to achieve in Rust somehow)

gc

add broadcast support

dht

in particular these tests should be completed

test store::tests::test_exchange_kad ... ignored
test store::tests::test_provider_not_found_kad ... ignored

support custom network protocols

after some experimentation, I'm not sure if there is a simple solution to this

Is there any example like in README.md?

I want to use this crate and i tried to build code it README.md#Getting Started.
But it has too many error to fix it. And code in /examples, dont use ipfs network. (Just chain)
Could you fix code in README.md to working with ipfs network?

Support for windows / WSL2

ipfs-embed does not work on windows WSL2, due to mdns not working.

this can be traced back to if-watch not working. I think WSL2 is for better or worse a very popular platform, especially for developers. So it should work out of the box, otherwise adoption will be limited.

The best way to accomplish this would probably be to use the fallback poll based approach whenever the OS specific approach does not work for whatever reason.

Optimize bitswap

We should try to get it at least as fast as the previous record holder, libp2p-ax-bitswap. Preferably without having to alter the protocol.

ipfs-embed should be executor agnostic

ipfs-embed is a library, which is to be used not standalone, but embedded in bigger applications. It should be the decision of the user, where and how the futures spawned by this library are executed.

The currently used async_global_executor offers the possibility to select the runtime, but uses a global resource to store a handle to said runtime, which can only be initialized once. This is restricting in a sense, such that users can't replace a runtime without restarting the process (users might want to do that in order to implement the Let It Crash pattern within process boundaries).

I see the following possibilities to provide that:

Use a comprehensive crate that abstracts of the executor. I found the following, but none of them immediately fit the bill:

agnostik: Seems this is to be replaced, and considered buggy by its authors, see [0]
async_executors: this is a bit too opinionated for my case
async-spawner: like async_global_executor uses a global resource; not sure about your future intentions with this crate, too
... ?

Provide a very small trait users can implement (like rust-libp2p) does. Returning tasks handle might complicate it, but we could probably come up with a small API surface.

Now I reckon that the public API of ipfs-embed will probably not become nicer by exposing that, but we might guard it behind a feature flag, defaulting to async_global_executor like it is right now.

investigate lowering mdns bandwith requirements

Transactional database ops

Currently, we don't have a way to do transactional database opts. Which severely limits performance when interacting with stores that have paranoid transactional guarantees, such as sqlite in synchronous = normal or synchronous = full mode. But I do like paranoid transactional guarantees.

The ipfs sqlite block store does support transactions as of some time ago, so we should expose that in the public interface of ipfs-embed.

compile error: trait bound not satisfied for `sled::ivec::IVec`

I noticed this when trying to compile the client in sunshine-node, which uses ipfs-embed

and I reproduced after pulling this repo and running cargo update

➜  ipfs-embed git:(master) ✗ carb        
   Compiling syn v1.0.32
   Compiling bytes v0.5.5
   Compiling pin-project-internal v0.4.22
   Compiling proc-macro-nested v0.1.6
   Compiling remove_dir_all v0.5.3
   Compiling tinyvec v0.3.3
   Compiling serde v1.0.113
   Compiling pin-project-lite v0.1.7
   Compiling adler32 v1.1.0
   Compiling object v0.20.0
   Compiling indexmap v1.4.0
   Compiling ring v0.16.15
   Compiling clear_on_drop v0.2.4
   Compiling addr2line v0.12.2
   Compiling quote v1.0.7
   Compiling tempfile v3.1.0
   Compiling quicksink v0.1.2
   Compiling miniz_oxide v0.3.7
   Compiling unicode-normalization v0.1.13
   Compiling ed25519-dalek v1.0.0-pre.3
   Compiling petgraph v0.5.1
   Compiling idna v0.2.0
   Compiling url v2.1.1
   Compiling backtrace v0.3.49
   Compiling sled v0.32.0
   Compiling synstructure v0.12.4
   Compiling futures-macro v0.3.5
   Compiling prost-derive v0.6.1
   Compiling thiserror-impl v1.0.20
   Compiling asn1_der_derive v0.1.2
   Compiling data-encoding-macro-internal v0.1.8
   Compiling async-attributes v1.1.1
   Compiling libp2p-core-derive v0.19.1
   Compiling libipld-cbor-derive v0.3.0
   Compiling async-std v1.5.0
   Compiling asn1_der v0.6.3
   Compiling data-encoding-macro v0.1.8
   Compiling thiserror v1.0.20
   Compiling pin-project v0.4.22
   Compiling multibase v0.8.0
   Compiling prost v0.6.1
   Compiling futures-util v0.3.5
   Compiling prost-types v0.6.1
   Compiling prost-build v0.6.1
   Compiling libp2p-core v0.19.1
   Compiling libp2p-identify v0.19.1
   Compiling libp2p-secio v0.19.1
   Compiling libp2p-kad v0.19.0
   Compiling libp2p-bitswap v0.4.1
   Compiling futures-executor v0.3.5
   Compiling futures v0.3.5
   Compiling futures_codec v0.3.4
   Compiling rw-stream-sink v0.2.1
   Compiling wasm-timer v0.2.4
   Compiling yamux v0.4.7
   Compiling unsigned-varint v0.3.3
   Compiling multihash v0.11.2
   Compiling multistream-select v0.8.1
   Compiling parity-multiaddr v0.9.0
   Compiling cid v0.5.1
   Compiling libipld-core v0.3.0
   Compiling libipld-macro v0.3.0
   Compiling libipld-cbor v0.3.0
   Compiling libipld v0.3.0
   Compiling libp2p-swarm v0.19.0
   Compiling libp2p-yamux v0.19.0
   Compiling libp2p-mplex v0.19.1
   Compiling libp2p-tcp v0.19.1
   Compiling libp2p-ping v0.19.2
   Compiling libp2p-mdns v0.19.1
   Compiling libp2p v0.19.1
   Compiling ipfs-embed v0.1.0 (/Users/4meta5/sunshine-protocol/ipfs/ipfs-embed)
error[E0277]: the trait bound `sled::ivec::IVec: std::convert::From<std::boxed::Box<[u8]>>` is not satisfied
   --> src/network/mod.rs:101:70
    |
101 |                     if let Err(err) = self.storage.insert(&cid, data.into(), Visibility::Public) {
    |                                                                      ^^^^ the trait `std::convert::From<std::boxed::Box<[u8]>>` is not implemented for `sled::ivec::IVec`
    |
    = help: the following implementations were found:
              <sled::ivec::IVec as std::convert::From<&[u8; 0]>>
              <sled::ivec::IVec as std::convert::From<&[u8; 10]>>
              <sled::ivec::IVec as std::convert::From<&[u8; 11]>>
              <sled::ivec::IVec as std::convert::From<&[u8; 12]>>
            and 35 others
    = note: required because of the requirements on the impl of `std::convert::Into<sled::ivec::IVec>` for `std::boxed::Box<[u8]>`

error[E0277]: the trait bound `sled::ivec::IVec: std::convert::From<std::boxed::Box<[u8]>>` is not satisfied
  --> src/store.rs:93:64
   |
93 |         Box::pin(async move { Ok(self.storage.insert(cid, data.into(), visibility)?) })
   |                                                                ^^^^ the trait `std::convert::From<std::boxed::Box<[u8]>>` is not implemented for `sled::ivec::IVec`
   |
   = help: the following implementations were found:
             <sled::ivec::IVec as std::convert::From<&[u8; 0]>>
             <sled::ivec::IVec as std::convert::From<&[u8; 10]>>
             <sled::ivec::IVec as std::convert::From<&[u8; 11]>>
             <sled::ivec::IVec as std::convert::From<&[u8; 12]>>
           and 35 others
   = note: required because of the requirements on the impl of `std::convert::Into<sled::ivec::IVec>` for `std::boxed::Box<[u8]>`

error[E0277]: the trait bound `sled::ivec::IVec: std::convert::From<std::boxed::Box<[u8]>>` is not satisfied
   --> src/store.rs:103:51
    |
103 |             .map(|Block { cid, data }| (cid, data.into()));
    |                                                   ^^^^ the trait `std::convert::From<std::boxed::Box<[u8]>>` is not implemented for `sled::ivec::IVec`
    |
    = help: the following implementations were found:
              <sled::ivec::IVec as std::convert::From<&[u8; 0]>>
              <sled::ivec::IVec as std::convert::From<&[u8; 10]>>
              <sled::ivec::IVec as std::convert::From<&[u8; 11]>>
              <sled::ivec::IVec as std::convert::From<&[u8; 12]>>
            and 35 others
    = note: required because of the requirements on the impl of `std::convert::Into<sled::ivec::IVec>` for `std::boxed::Box<[u8]>`

error: aborting due to 3 previous errors

For more information about this error, try `rustc --explain E0277`.
error: could not compile `ipfs-embed`.

To learn more, run the command again with --verbose.

support registering arbitrary request-response protocols

support private swarm

expose sqlite vacuum

support private nets

add yamux and deflate support to transport while at it

Could this Support Web?

Hey there, this is a very interesting project! I was wondering whether or not it was possible to make this work in the browser with WASM. Are there traits that could be used to abstract the storage and networking components so that we could make browser-compatible implementations? Are there any other issue that we might run into?

I'm mostly curious and not sure if I'd end up using it for anything serious, but was wondering how that might work.

enable sim open when libp2p 0.40 is released

Better documentation and tutorials

If I understand correctly, the intention is to have the ipfs-embed interface more or less stable from now on.

I think now would be a good time to write more documentation, including examples for how to use the different components.

writing the docs might lead to some improvements in the APIs
It would be very useful for people to get started with ipfs
If the examples are checked on build, that would provide additional insurance and incentives to keep the interface stable.

fix build on aarch64

zerocopy doesn't work on aarch64

Adding wrong addresses to DHT

I believe there is a bug in in the handling of IdentifyEvent

The field observed_addr in this field refers to local node, and not the peer.
https://docs.rs/libp2p/0.29.0/libp2p/identify/enum.IdentifyEvent.html
The code in https://github.com/ipfs-rust/ipfs-embed/blob/master/net/src/behaviour.rs#L159 is adding this address as the address of peer to kademlia.

Derive Debug for Ipfs

...and for its components.

Slightly annoying to not have it.

Fantastic work so far!

First, I'm really impressed by all the work you've done so far. I really want to contribute, because I believe in your vision, which, as I see it, is to make a version of IPFS that is better than go-ipfs, using all that the Rust community has to offer, including sled and blake3, and focusing on a subset of forward-looking features. rather than simply reimplementing what currently exists, which is a fool's errand.

Go has some crypto library optimizations that are absolutely really well-optimized, as I've noticed from some comments on the rage project, but the Go solution of well-optimized assembly being less-than-ideal, for obvious reasons.

I am super confused about one thing, however: I would prefer this library use a QUIC implementation that implements libnoise. How is that going? I can't quite make sense of the discussion occurring all around the libp2p repo, including in this thread: libp2p/rust-libp2p#1334

Finally, I've had lots of trouble trying to get this particular repo to build on my machine. Can you tell me the current state of this project, and where I can jump in to help move things along?

remove async-std dependency

waiting on interval api in async-io and a solution for task spawning

make dht completely optional

expose syncer

create better benchmarks (and examples) for storage engine using unixfs

@troublescooter

add way of keeping permanent connections open to peers

requires investigation if it is really needed. the answer can be found experimentally or deep inside the swarm sources.

rewrite sled block store

the ipfs-sqlite-block-store supports features like reverse_alias, missing_blocks, tmp_pin, stats, supports O(1) aliasing and has an incremental garbage collector. the current ipfs-embed-db needs to be redesigned to support these features.

for debugability the block store needs to be able to support dumping a snapshot and starting with an in memory copy of a snapshot

todo: hook up request canceling

support for gossipsub

Is any support for publish/subscribe planned? At the moment it looks like this wouldn't be possible without forking the crate.

Extremely long compile times when compiling examples/sync.rs with nightly compiler

Hello everyone,

after copying the content of examples/sync.rs into an empty rust project and adding the following versions of the dependencies:

[dependencies]
anyhow = "1.0.48"
async-std = { version = "1.10.0", features = ["attributes"] }
futures = "0.3.18"
ipfs-embed = "0.22.4"
libipld = "0.12.0"
rand = "0.8.4"
tracing-subscriber = {version="0.3.2", features=["env-filter"]}

when trying to compile it with either the latest stable (rustc 1.56.1 (59eed8a2a 2021-11-01)) or nightly (rustc 1.58.0-nightly (dd549dcab 2021-11-25)) nightly-x86_64-pc-windows-msvc toolchain on Windows 10 i get compile time exceeding 2 hours, i can provide a verbose compiler log if required.

Best regards,

Earthnuker

replace block store with ipfs-sqlite-block-store

nat traversal

add quic transport

nquic looks like the way to go

libp2p/rust-libp2p#1334
quinn-rs/quinn#719

add a sync progress stream

ipfs-embed should cleanup its resources

Seems ipfs-embed is spawning some background resources, which are not dropped if the main Ipfs handle is dropped.

Reproducer:

use std::time::Duration;

use ipfs_embed::{Config, Ipfs};
use libipld::{store::StoreParams, IpldCodec};

#[derive(Debug, Clone)]
struct Sp;

impl StoreParams for Sp {
    type Hashes = libipld::multihash::Code;
    type Codecs = IpldCodec;
    const MAX_BLOCK_SIZE: usize = 1024 * 1024 * 4;
}

#[async_std::main]
async fn main() -> anyhow::Result<()> {
    for _ in 1..256 {
        let config = Config::new(None, 1024 * 1024);
        let ipfs = Ipfs::<Sp>::new(config).await?;

        drop(ipfs)
    }

    std::thread::sleep(Duration::from_secs(300_000_000));

    Ok(())
}

One thing I noticed is that every time a new blocking-<nr> thread with the postfixed integer incremented is spawned (up to 500):

➜  ~ ps -e -T | grep 168268 | wc -l
270

There might be other resources that have not been properly released; for example I have seen an error message with concurrent access to the sqlite block store at one time.

Would be nice to have a future to release everything, which can be awaited. Another option would probably be to let users provide a custom executor, so that the runtime can be cleaned up by the user.

add fuzz testing to test for deadlocks and concurrency issues

substrate light client example

expose swarm metrics

Allow specifying optional temp pin when storing a block

If you are building a DAG, it is very likely that you don't want the rug to be pulled under you by GC. So having an additional temp_pin option guides you to correct API usage. I can't think of many cases where you would not want a temp pin when building a dag.

(When you get a dag from somewhere external, it is most likely that you get it via sync, so you don't really interact with individual nodes)

The current two separate methods are also a bit inefficient because they require two write transactions where there used to be one back when insert had an optional temp pin.

update libipld to use new store trait

create a conformance test suite using the ipfs-api

graphsync

still missing a graphsync implementation

keep handles so that background tasks can be shutdown if `Ipfs` is dropped

parity-db vs sled

Evaluate parity-db as a storage backend for content addressed blocks

Default block size too small for go-ipfs interop

The maximum block size in go-ipfs is 4 mb, so it is possible to create a block in go-ipfs that can not be bitswapped.

Either we should increase the default block size to 4mb (exact size is somewhere in the maze of repos that is go-ipfs), or at least document that the default size will not work seamlessly for go-ipfs interop.

Note that at Actyx we might have some old blocks from the go-ipfs times laying around which are >1mb.

promtheus monitoring

The exact requirements still need to be determined, but at least dumping the contents of the db in a running instance should be supported. @rklaehn

monitoring basic storage and network metrics (https://docs.rs/prometheus/0.11.0/prometheus/)

privacy improvements

the nymtech mixnet looks pretty cool. we should add a transport for it to anonymize the networking layer without too much effort. this is not a complete solution as all the individual protocols used by ipfs-embed would need to be analyzed in terms of what information they leak to whom.