Giter Site home page Giter Site logo

ssb-replication-scheduler's Introduction

ssb-replication-scheduler

Triggers replication of feeds identified as friendly in the social graph or in private groups.

Depends on ssb-friends and ssb-tribes2, and calls ssb-ebt APIs.

Installation

Prerequisites:

  • Requires Node.js 12 or higher
  • Requires ssb-db2 (it may work with ssb-db if you don't use partial replication)
  • Requires ssb-friends version 5.0 or higher
  • Requires ssb-ebt version 9.0 or higher
npm install --save ssb-replication-scheduler

Add this secret-stack plugin like this:

 const SecretStack = require('secret-stack')
 const caps = require('ssb-caps')

 const createSsbServer = SecretStack({ caps })
     .use(require('ssb-master'))
     .use(require('ssb-db2'))
     .use(require('ssb-ebt'))
     .use(require('ssb-friends'))
+    .use(require('ssb-replication-scheduler'))
     .use(require('ssb-conn'))
     // ...

Usage

Typically there is nothing you need to do after installing this plugin. As soon as the SSB peer is initialized, ssb-replication-scheduler will automatically query the social graph, and either request replication or stop replication, depending whether the feed is friendly or blocked.

Opinions embedded in the scheduler:

  • Replication is enabled for:
    • The main feed, ssb.id, because this allows you to recover your feed
    • Any friendly feed at a distance of at most config.friends.hops
      • Includes your friends (if config.friends.hops >= 1)
      • Includes friends of friends (if config.friends.hops >= 2)
      • Includes friends of friends of friends (if config.friends.hops >= 3)
      • And so forth
  • Replication is strictly disabled for:
    • Any feed you explicitly block

There are two APIs available in case you want to have more control over this module: start() and reconfigure(). Read more about these at the bottom of this file.

Configuration

Some parameters and opinions can be configured by the user or by application code through the conventional ssb-config object. The possible options are listed below:

{
  replicationScheduler: {
    /**
     * Whether the replication scheduler should start automatically as soon as
     * the SSB app is initialized. When `false`, you have to call
     * `ssb.replicationScheduler.start()` manually. Default is `true`.
     */
    autostart: true,

    /**
     * If `partialReplication` is an object, it tells the replication scheduler
     * to perform partial replication, whenever remote feeds support it. If
     * `partialReplication` is `null` (which it is, by default), then all
     * friendly feeds will be requested in full.
     *
     * Read below more about this configuration.
     */
    partialReplication: null,
  }
}

Configuring partial replication

The config.replicationScheduler.partialReplication object describes the tree of meta feeds that we are interested in replicating, for each hops level. For each hops level we have a certain template to describe how replication should work at that level. Notice that this configuration cannot specify who we replicate (that's the job of ssb-friends and your chosen hops, see the Usage section above), this configuration just specifies how should we replicate a friendly peer, in other words, the level of granularity for those peers.

Template per hops

The high-level overview of the partialReplication configuration is:

replicationScheduler: {
  partialReplication: {
    0: TEMPLATE_FOR_HOPS_0,
    1: TEMPLATE_FOR_HOPS_1,
    2: TEMPLATE_FOR_HOPS_2_AND_ABOVE,
    group: TEMPLATE_FOR_GROUP_MEMBERS,
  }
}

Soon we'll show how those TEMPLATE_FOR_HOPS work, but for now notice that the highest number will handle all the hops beyond that number, e.g. notice how 2 is the highest number and it means that TEMPLATE_FOR_HOPS_2_AND_ABOVE configures how to replicate peers at hops 2 or 3 or 4 or higher. There's nothing special about the number 2, it could also have been this:

replicationScheduler: {
  partialReplication: {
    0: TEMPLATE_FOR_HOPS_0,
    1: TEMPLATE_FOR_HOPS_1_AND_ABOVE,
  }
}

Or even this (which means we use the same template for all peers, regardless of their hops distance):

replicationScheduler: {
  partialReplication: {
    0: TEMPLATE_FOR_HOPS_0_AND_ABOVE,
  }
}

Or even fractional numbers:

replicationScheduler: {
  partialReplication: {
    0: TEMPLATE_FOR_HOPS_0,
    0.5: TEMPLATE_FOR_HOPS_HALF,
    1: TEMPLATE_FOR_HOPS_1_AND_ABOVE,
  }
}

Template structure

A Template is JSON which describes how should we do partial replication. If the template is null or a falsy value, then it means that for that hops level we don't do partial replication and we will do full replication (which means pre-2022 SSB replication of the peer's main feed).

When the template is a JSON array, it means we want to replicate only some leaf feeds in the "metafeed tree", where the root of the tree is always the root meta feed. The structure of the tree is assumed to follow the "tree structure v1", which means we're only concerned about the leaf feeds.

Each item in the template should be an object describing which keys in a leaf feed must match exactly the values given for that leaf to be replicated. So that if we write {purpose: 'git-ssb'}, it means we are interested in matching the leaf feed that has the field purpose exactly matching the value "git-ssb". All specified fields must match, but omitted fields are allowed to be any value. If you omit all the fields, i.e. if you pass the empty object {}, then this means "replicate ALL leaf feeds".

Special variables

Some values are special, in the sense that they are not taken literally, but are going to be substituted by other context-relative values. These special variables are always prefixed with $.

  • Special values
    • $main
    • $root
    • $groupSecret (only in purpose field)

If the value of a field, e.g. in ssb-ql-0 queries, are the special strings "$main" or "$root", then they respectively refer to the IDs of the main feed and of the root meta feed. The shape {purpose: '$groupSecret'} corresponds to any leaf feed where the purpose matches one of the group secrets known by the local peer.

Example

In the example below, we set up partial replication with the meaning:

  • For hops 0 (that is, "yourself"), replicate some app feeds and all index feeds
  • For hops 1 (direct friends), replicate only 5 specific index feeds
  • For hops 2 and beyond, replicate only 2 specific index feeds
partialReplication: {
  0: [
    { purpose: 'main' },
    { purpose: 'coolgame' },
    { purpose: 'git-ssb' },
    { purpose: 'index' }
  ],

  1: [
    {
      purpose: 'index',
      metadata: {
        querylang: 'ssb-ql-0',
        query: { author: '$main', type: null, private: true },
      },
    },
    {
      purpose: 'index',
      metadata: {
        querylang: 'ssb-ql-0',
        query: { author: '$main', type: 'post', private: false },
      },
    },
    {
      purpose: 'index',
      metadata: {
        querylang: 'ssb-ql-0',
        query: { author: '$main', type: 'vote', private: false },
      },
    },
    {
      purpose: 'index',
      metadata: {
        querylang: 'ssb-ql-0',
        query: { author: '$main', type: 'about', private: false },
      },
    },
    {
      purpose: 'index',
      metadata: {
        querylang: 'ssb-ql-0',
        query: { author: '$main', type: 'contact', private: false },
      },
    },
  ],

  2: [
    {
      purpose: 'index',
      metadata: {
        querylang: 'ssb-ql-0',
        query: { author: '$main', type: 'about', private: false },
      },
    },
    {
      purpose: 'index',
      metadata: {
        querylang: 'ssb-ql-0',
        query: { author: '$main', type: 'contact', private: false },
      },
    },
  ],

  3: [
    {
      purpose: 'index',
      metadata: {
        querylang: 'ssb-ql-0',
        query: { author: '$main', type: 'about', private: false },
      },
    },
  ],
}

APIs

ssb.replicationScheduler.start() => void (sync)

ssb.replicationScheduler.reconfigure(config) => void (sync)

At any point during the execution of your program, you can reconfigure the replication rules using this API. The configuration object passed to this API has the same shape as config.replicationScheduler (see above) has.

Security considerations

The exclusion spec says that we should stop replicating new messages from an excluded member. That is not implemented so far (see also relevant proposed updates to the spec) because of a lack of time. So an excluded member could in theory keep posting to the group, even if they wouldn't be able to see things remaining members posted.

License

LGPL-3.0

ssb-replication-scheduler's People

Contributors

powersource avatar staltz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

arj03

ssb-replication-scheduler's Issues

Sympathy replication, an idea

I had an idea of implementing sympathetic replication like this:

partialReplication: {
  0: [
    { purpose: 'main' },
    { purpose: 'git-ssb' },
    { purpose: 'index' }
  ],

  1: [
    { purpose: 'main' },
    { purpose: 'git-ssb', $certainty: 50 },
  ],

Note $certainty: 50, this means that with 50% probability I will replicate a friend's git-ssb feed. This wouldn't be random, because we need every sbot session to always replicate the same "lucky" friend git-ssb feed, so determinism.

I thought that one way to achieve determinism is to pluck the first nibble of the friend's subfeed, and replicate that subfeed only if it belongs to my set of "chosen lucky nibble". With 50%, suppose that my deterministic lucky nibbles are: 0, 2, 4, 6, 8, a, c, e. If I had chosen 100% certainty, then the lucky nibbles would be all the 16 possible: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f. The lucky nibbles can be selected "randomly" too.

I don't know if this is too complicated, maybe there is a simpler and easier way of deterministically choosing whether a friend's subfeed is "lucky" given my certainty parameter, but my point is that sympathetic replication overall seems like it could be achieved with this simple $certainty value.

Thoughts @arj03 ?

Correct group replication test

const msgAtC = await p(alice.db.get)(bobHi.key).catch(t.error)
t.equals(msgAtC.content.text, 'hi', "carol has replicated bob's group msg")

When i change this to carol.db.get, which seems to be intended, I get an error

โœ– 20) Error: Msg %BYhQdJTDN52qRn8Rh1RJYa5icwgMnWmjPQxxjalxZns=.sha256 not found in leveldb index
      Expected error to be falsy
      At: async Test.<anonymous> (/test/integration/groups.js:607:18)

Big plan for partial replication

(Poor title for this issue, and I'm not sure which repo to put this in, but this repo seems to be the highest level)

Context

Problems

Consider what happens to messages in the log (avoid duplicates, avoid ssb-ebt replication bugs and incorrect state) when:

  1. Switching from partial replication to full replication
    • Due to hops change: e.g. when a peer used to be hops 2 for you (and you had configured indexed feed replication for hops 2) but they change to hops 1 (which configures to replicate the main feed) because you followed them
    • Due to subset replication: e.g. when you fetched and appended a metafeed/announce msg (from the main feed) via ssb-subset-rpc but after that replicated the full main feed for that peer
  2. Switching from full replication to partial replication
    • Due to upgrading to metafeeds: e.g. when a peer adopts a new version of the JS stack that has metafeeds, your peer will detect that and switch replication from full to partial (e.g. using indexed feeds)
  3. Indexed feeds plus full replication
    • If a pub wants to fully replicate main feed AND replicate indexed-feed msgs then ssb-ebt needs to be changed so that indexed replication DOESNT do addOOO.

cc @arj03 FYI

Tasks

  • ssb-ebt: Add benchmarks for indexed feed replication ssbc/ssb-ebt#73
  • ssb-db2: addOOO should not update Base index ssbc/ssb-db2#395
  • ssb-db2: Update deleteFeed to support ooo feeds in a world where addOOO doesn't update Base index ssbc/ssb-db2#417
  • ssb-replication-scheduler: delete the main feed when switching from full replication to metafeeds and indexed feeds replication
  • ssb-replication-scheduler: delete the partially replicated feeds when switching to full replication
  • ssb-replication-scheduler: tests for all the use cases mentioned in "Problems" above
  • (For perf) ssb-db2: New API to synchronously get the state (base.getLatest() is not that, it does a leveldb read)
  • (For perf) ssb-ebt: Update indexed.js to refuse payloads in case state exists for that feed, and this must improve benchmarks
  • (For bandwidth perf) ssb-ebt: specify ranges of sequences ssbc/ssb-ebt#80
  • (For private group removal) ssb-ebt: specify range ssbc/ssb-ebt#79

(groups): Don't forward Alice's metafeed tree to Bob if Alice blocks Bob

Say I follow both Alice and Bob, but Alice blocks Bob.

It's currently a feature in ssb-replication-scheduler that we won't forward Alice's (main) feed to Bob, and it's implemented here:

ssb.ebt.block(source, dest, value === -1)

But that's only for main feeds. Bob is still capable of replicating Alice's metafeed tree.

We should support preventing Bob from getting Alice's metafeed tree (any of Alice's feeds).

But there is an exception to this: Bob can always get Alice's group leaf feed (and the whole branch above that) if both Alice and Bob are in the same group.

Also mention ssb-subset-rpc as a maybe required dep

I was having trouble with (partial?) replication, and things started to work once I installed ssb-subset-rpc, but that's not mentioned in the readme here.

my setup before the fix

module.exports = function startSbot() {
  const stack = SecretStack({ caps: { shs } })
    .use(require("ssb-db2/core"))
    .use(require("ssb-classic"))
    .use(require("ssb-bendy-butt"))
    .use(require("ssb-meta-feeds"))
    .use(require("ssb-box2"))
    .use(require("ssb-db2/compat/feedstate"))
    .use(require("ssb-db2/compat/ebt"))
    .use(require("ssb-db2/compat/db")) // for legacy replicate
    .use(require("ssb-db2/compat/history-stream")) // for legacy replicate
    .use(require("ssb-friends"))
    .use(require("ssb-ebt"))
    .use(require("ssb-tribes2"))
    .use(require("ssb-lan"))
    //.use(require("ssb-subset-rpc")) // this (uncommented) is all i added to fix it
    .use(require("ssb-replication-scheduler"));

  const sbot = stack({
    path: dir,
    keys,
    ebt: {
      // logging: true,
    },
    db2: {
      flushDebounce: 10,
      writeTimeout: 10,
    },
    tribes2: {
      // timeoutLow: opts.timeoutLow,
      // timeoutHigh: opts.timeoutHigh,
    },
    friends: {
      hops: 1,
    },
    replicationScheduler: {
      //debouncePeriod: 1,
      partialReplication: {
        0: [{}],
        1: [{ purpose: "main" }, { purpose: "group/additions" }],
        group: [{ purpose: "$groupSecret" }],
      },
    },
  });

  sbot.name = "demo";
  sbot.ebt.registerFormat(bendyButtFormat);

  return sbot;
};

Debounce friend stream

When adding a lot of messages for a feed, right now you might get a follow and a unfollow or block right after. Would be good if these were debounced a bit so that ebt doesn't start requesting these messages.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.