cabal-club / cabal-core Goto Github PK
View Code? Open in Web Editor NEWCore database and replication for cabal.
License: GNU Affero General Public License v3.0
Core database and replication for cabal.
License: GNU Affero General Public License v3.0
currently, all peers are able to change the channel topic for all other peers, even when they are hidden by some peers. ideally, a user i don't want to communicate with can't set the channel topic for my view.
I think it would make sense to pass the "sparse: true" flag to hyperdb for normal clients. If a cabal dat has a lot of history, that might take a long time to download.
For seeders, it probably makes sense to have "sparse: false" so they archive the complete history of a chat.
Hyperswarm just landed the ability to export/import bootstrapping nodes via
in the near-ish future, it would be cool if we could persist & load discovered nodes, such that if the default set of nodes for some reason doesn't work, it allows people to either cycle through discovered nodes, or setup their own nodes and then add them to cabal in some kind of config or similar
from irc:
14:07:44 < cblgh> password protected cabals would be interesting
14:07:56 < cblgh> it could allow for having a known cabal address, that you can connect to from anywhere
14:08:11 < cblgh> (e.g. a dns-pinned address like cabal.chat)
14:09:04 < cblgh> it would allow for doing npx cabal cabal://cabal.chat from anywhere
14:09:21 < cblgh> but only allowing allowed people to read its contents / post to it
17:55:17 <@noffle> actually that kinda sounds equivalent to an encrypted cabal /w a blind peer, right? blind peers get the sync-only key, and members get the decryption-key as well
17:55:20 <@noffle> which is kinda like a password
17:55:32 <@noffle> but maybe you want something short enough to remember, is your idea?
18:16:41 < cblgh> noffle: yeah basically
18:16:59 < cblgh> something that you could use while at a friends computer and you need to send a message or check something
18:17:14 <@noffle> yeah!
We could skip running indexers for folx running cabal headlessly purely as a backup/sync node, which would make sync faster (no cpu fighting over sync vs indexing) and lower draw on cpu/io.
It would be really nice if both peers, before replicating, could exchange metadata about themselves and decide whether to sync. I wrote something for doing just this for Mapeo called handshake-stream. For now, the payload could just be a cabal protocol version string.
This would let clients potentially show meaningful sync errors, and also avoid showing peers by their pubkey on the sidebar when we aren't even replicating with them.
Please feel free to disregard if you don't want to go ahead with this, but this is a small tweak that I'd like to propose that I think will improve the reliability of cabal especially for mobile, where clocks are less reliable.
The messages in cabal are currently ordered by 'real' timestamp and also here. This poses an issue for the often case where 'real' clocks are not reliable, if you go offline for 24+hrs or a few days, you'll start to notice your electronics will suffer from 'clock drift' .. or sometimes will just reset randomly to some date in January 2017 or something like that.
An improvement on this is where you take a number and add it to the 'real' clock time, and this number is based upon other numbers you have seen for messages in your log. This is called a vector clock. A simpler implementation can be done as a lamport clock. Just a silly old European way of taking a simple idea and putting someone's name on it, but it's pretty straightforward.
This is a quick watch and a really useful one if you have time: https://www.dotconferences.com/2019/12/james-long-crdts-for-mortals
The benefits for Cabal would be pretty unnoticed for most people especially us, since we are all usually connected to the Internet. But it might also help for messages that come 'back in time' as it would be more clear which exact message before (based on the number), that the person was responding to or messaging after.
Curious what people think about this change?
while working on cabal-club/cabal-cli#179, adding /ids
command to toggle showing a public-key suffix for all nicknames in chat, i noticed that i wanted to call a function every time a new peer has been added for the first time.
looking around in core it seems we don't have any such event; i think it would be a good thing to emit
Right now, if you post a message to a channel, everyone on the cabal swarm receives the plaintext of that message, whether or not they are in that channel. I think that is a bad default in terms of privacy and is not what somebody unfamiliar with cabal might expect. The privacy aspects are compounded by the historical append-only nature of hypercore data: all somebody needs to do to eavesdrop on an entire cabal is to connect once, download all the history, and disconnect, without ever showing up as having even joined a channel.
At the very least, each channel could be encrypted with a randomly-generated box key. A client could publish a type: 'chat/join'
message and a user (selected deterministicly on some schedule if no key has been sent in time) in one of those channels could send the box key for that channel encrypted to the client's public key. For invite-only channels, these type: 'chat/join'
messages could go into a queue to be manually verified or else the channel box keys could be sent encrypted directly. Private 1:1 conversations could use client public keys and wouldn't need the extra step of having a channel key.
Another way to do this could be to generate a unique random key per message and send the message decryption key to each user in the channel in an attachment on each message. I think deltachat might do something like that using autocrypt for group chat, but I'm not sure.
I did a bit of research on how search could be implemented in cabal.
The two general approaches are creating an index or applying the (possibly preprocessed) pattern to all messages.
With index
When creating and index, sparse suffix trees would probably be the best option. These are rather hard to implement, and methods using an index will always have some overhead. Searching would be much faster though.
Without index
Without an index, the simplest option is naive search. There are a lot of ways to improve this version, and speed it up considerably. One of these is the Boyer-Moore-Algorithm, which is used by grep for example.
It would be nice to have a way to publish some information about your current status. This way you can announce that you're afk
or on vacation, be back soon
or Tryin to make a change :-/
. Slack has something similar: https://slack.com/help/articles/201864558-Set-your-Slack-status-and-availability
One way it could work:
/setstatus [status]
- push a new status out to the swarm/status [user[.id]]
- get a user's status{
timestamp: number,
type: 'chat/status',
content: {
user: [user id],
status: [the new status]
}
}
ephemeral messages are messages whose content is only available for a poster-defined time interval. implementing this correctly entails encrypting the contents somehow.
they can be used for sending sensitive information (transmitting files or sharing email addresses) or for scheduling when to grab some pineapple pizza later that same day (i.e. time-limited information)
an approach similar to substack's private channel's proposal (#34) could be taken.
an ephemeral message is posted containing an arbitrary payload (text, file, emote; basically any other message type) and a future timestamp that details the message lifespan. the lifespan is regarded as depleted if a client's local time exceeds the ephemeral message's lifespan. the message payload is symmetrically encrypted and the symmetric key is transmitted to peers in some fashion:
once we have sparse log capabilities in cabal this could potentially be used to in combination with the ephemeral message payload. (i still think it is valuable to signify that there has been an ephemeral message; it might also be valuable to also allow for the option to erase all trace.)
Uncaught Error: /home/fabian/Development/Open/cabal-desktop/node_modules/noise-protocol/node_modules/sodium-native/build/Release/sodium.node: undefined symbol: _ZN2v816FunctionTemplate3NewEPNS_7IsolateEPFvRKNS_20FunctionCallbackInfoINS_5ValueEEEENS_5LocalIS4_EENSA_INS_9SignatureEEEiNS_19ConstructorBehaviorENS_14SideEffectTypeEPKNS_9CFunctionE
at process.func [as dlopen] (electron/js2c/asar.js:140)
at Object.Module._extensions..node (internal/modules/cjs/loader.js:1016)
at Object.func [as .node] (electron/js2c/asar.js:140)
at Module.load (internal/modules/cjs/loader.js:816)
at Module._load (internal/modules/cjs/loader.js:728)
at Function.Module._load (electron/js2c/asar.js:748)
at Module.require (internal/modules/cjs/loader.js:853)
at require (internal/modules/cjs/helpers.js:74)
at load (/home/fabian/Development/Open/cabal-desktop/node_modules/node-gyp-build/index.js:21)
at Object.<anonymous> (/home/fabian/Development/Open/cabal-desktop/node_modules/noise-protocol/node_modules/sodium-native/index.js:1)
Cabal already works great in no-internet situations and high-bandwidth situations, but what about metered bandwidth and high latency situations? Some places charge a lot for internet access: what if we had a replication mode that was optimized for these situations?
It would be nice if cabal-core exposed some Typescript definition files (*.d.ts) that define the shape of the API. These types make cabal-core usable in other Typescript projects, and provide useful autocomplete information for IDEs like VSCode.
Typescript can generate these files automatically using a combination of type inference and JSDoc. I have a branch that fixes most of the typing errors and includes some some .d.ts files generated by running tsc
. Ideally, we would have a simple script npm run generate-types
or something that would regenerate the definition files ( inform the user of any type errors, another bonus!) and we would just run that script before releasing to keep the types up to date.
Sadly, the Cabal
class is defined as a function which recognizes when it is being called as a function instead of a constructor and corrects that. However, this is too clever for Typescript to automatically recognize while also recognizing that Cabal extends EventEmitter
.
One solution is to declare Cabal
as a normal ES6 class instead of a constructor: class Cabal extends EventEmitter {...}
. However, since the canonical way to instantiate a new Cabal is therefore Cabal(...)
instead of new Cabal(...)
, this would be a breaking API change, which is a lot to ask just to add some typing support.
This issue is vague: I'm still trying to understand what's happening.
Since the early days of cabal I've noticed this general pattern where, as a cabal gets older & bigger on my machine, it seems to discover fewer peers, and hold a connection open with those peers more briefly. Eventually, it seems, I'm not able to really sync at all.
However, when I run cabal --temp $ADDRESS
I notice that, generally, I discover more peers, and those connections tend to stay open longer.
multifeed supports an encryption keys via opts.key
. We just need to add the plumbing so cabal-core offers way to pass it in by clients.
see the JSDoc implementation for cabal-client
initial issue: cabal-club/cabal-client#18 and pull request: cabal-club/cabal-client#19
thank you again @fenwick67!
Should we be using a hash of the cabal's key as the discovery key, rather than the key itself? I could be wrong but it sounds like discovery keys can be sniffed on discovery networks, so most chats would be effectively public.
https://github.com/cabal-club/cabal-core/blob/master/swarm.js#L6
Dat uses hypercores discovery key which is a hash to keep the actual dat key private, as seen here: https://github.com/mafintosh/hypercore/blob/master/index.js#L57
https://docs.datproject.org/docs/security-faq#is-it-possible-to-discover-read-keys-via-man-in-the-middle
Hey friends @cblgh @nikolaiwarner @ralphtheninja, any objections to me renaming this repo to cabal-core? I think it will be clearer that this is the backend and not the client.
from irc
10:30:17 <@telamohn> Let me make a couple of new code repos, i have an experimental core-store that let's you purge & delete/gc feeds
10:30:49 < cblgh> oh nice, so we could remove feeds that just joined but never posted anything?
10:30:59 <@telamohn> it's not to be confused with andrew-wosh's corestore which is more targeted towards being embedded in data-structures.
10:31:01 < cblgh> that's been kinda annoying tbh
10:31:08 <@telamohn> hmm
10:31:25 <@telamohn> cblgh: ofcourse, you actually don't even need to replicate them
10:32:33 <@telamohn> just implement the describe() and accept() function in cabal-core.
10:32:59 < cblgh> are those from multifeed?
10:33:01 <@telamohn> during describe attach the feed.length to meta.
10:33:17 <@telamohn> and during accept return false if feed.lenght is zero.
10:33:36 <@telamohn> *if meta.length is zero
10:34:01 <@telamohn> cblgh: well once the PR goes through they're gonna be in multifeed :)
10:34:49 <@telamohn> https://github.com/telamon/replic8#middleware-interface <-- i'm talking about this interface.
10:34:53 <@telamohn> sorry posted link a bit late
Something like
core.publish({type:'chat/text', content: 'hello'}, function (err, msg, uploadListener) {
uploadListener.on('uploaded', function (peerKey) {
// update UI
})
})
This'd be ephemeral data, and so wouldn't be persisted to disk (unless we wanted to start doing that). It'd be enough for a short-lived indicator beside a message to show that it's been shared to some peers in the swarm.
@sammacbeth had the following tip:
I found that discovery-swarm-webrtc (with a turn server) works better for difficult networks, and as a bonus it can use ipv6.
if we want to go down this TURN-server approach, since it seems implausible that regular clients with unconfigured firewalls and random NATs will be holepunchable in a significant amount of cases, (and cjdns sadly does not magically solve this problem either) maybe the following approach could work:
long-lived cabal peers become TURN servers, and they broadcast their existence on a predefined hyperswarm topic. newly started clients (with internet) that cannot connect (or: have not historically connected) to anyone on a given cabal key within <timeout>
proceed to query the hyperswarm TURN topic to find a TURN-server. since the TURN server does not know the cabal key, the contents that are being ferried through it are not readable by it.
for peers to deterministically end up at the same TURN server, the cabal key is used, somehow.
so, the long-lived super peers (i.e. peers which others can connect to) of one cabal help peers of cabals without super peers by ferrying encrypted traffic for them (i.e. becoming TURN servers, but maybe another technique is possible using the same idea)
peers should be able to turn off the automatic usage of TURN-after-timeout.
I think it could be very useful to downstream consumers if this module published a changelog[1]. It'd be nice to be able to get a high-level understanding of what is changing between minor and especially major versions.
When using Cabal behind a firewall/NAT environment, we need a way to specify explicit listening port or port ranges such that they can be allowed through. For example, my setup looks like this:
{ internet } <--> pfSense firewall <--> CentOS 7 firewalld <--> cabal
I must NAT through the firewall and open up a port(s) at firewalld as well.
As instructed in chat, a starting point is simply patching swarm.js
:
`hyperswarm({ preferredPort: 49737 })`
This does the trick as a test. However, a configurable set of port/port ranges would be better to allow for conflicts/etc. I would submit a PR but I just started poking around with this yesterday and don't know the code base yet ๐
posting @noffle's idea from irc here
00:17:53 < noffle> fyi that a cabal<->irc mirror exists: https://github.com/cabal-club/cabal-irc
00:20:58 < noffle> it'd be cool if cabal chat messages had an optional property that said "hey this is a msg from a bridge" so clinets could render those messages in a more aesthetically pleasing +
readable way
00:21:03 < noffle> very easy to add
A null reference error gets thrown from deep inside multifeed after cabal replication has been started with cabal.swarm()
. Please see my proof-of-concept with steps to reproduce for more details.
/root/poc/node_modules/multifeed/mux.js:203
feed.ready(function() { // wait for each feed to be ready before replicating.
^
TypeError: Cannot read property 'ready' of undefined
at /root/poc/node_modules/multifeed/mux.js:203:12
at Array.forEach (<anonymous>)
at startFeedReplication (/root/poc/node_modules/multifeed/mux.js:202:11)
at /root/poc/node_modules/multifeed/index.js:238:7
at release (/root/poc/node_modules/mutexify/index.js:23:13)
at /root/poc/node_modules/multifeed/index.js:275:11
at /root/poc/node_modules/multifeed/index.js:307:27
at /root/poc/node_modules/hypercore/index.js:213:15
at apply (/root/poc/node_modules/thunky/index.js:44:12)
at process._tickCallback (internal/process/next_tick.js:63:19)
I say "tentative" because I worry that this skirts a "centralization" cultural / architectural boundary :)
I've noticed in using cabal that (obviously) the reliability of message propagation for small cabals (a few peers) goes up significantly if I set up a 'superpeer' / 'seeder' cabal process on a always-online remote server.
It occurred to me that in the Secure Scuttlebutt world, "pubs" have a sort of special status, for this very reason -- in asynchronous p2p, communication these "relay stations" can really improve message distribution statistics.
If I were using cabal and I saw that a given cabal superpeer "relay" were online, I'd feel very confident that my message would make it through the system.
So, tentative proposal: what if we somehow highlighted the existence and status of cabal "relays" in the "connected peers" list?
This status could be derived from way that the cabal cli is invoked (with "-- seed"); or (more advanced) could somehow be derived from the statistics of the peer being available online.
I love how easy it is to set up a cabal "relay", by the way. And I'm playing with using the word "relay" instead of "superpeer" or "pub", because I think it conveys the role nicely in language that non-technical people can understand (though, maybe there's a better word?).
Perhaps this doesn't deserve it's own issue; it might best be combined with this one -- or perhaps this one ...
Cheers!
It would be useful sometimes to have a one-on-one conversation with another user. Would it possible to create a concept of a direct message?
Perhaps it's just a special channel for the two participants. I'm guessing everyone connected to a cabal can read every message so it may not be private but it still might be a useful feature.
Would it be possible to encrypt a message before it's sent so that only a single recipient may read it? The rest of the cabal would be there to help get the message to its destination but couldnt read the contents.
Without this, as new data comes in, the main node process has to share time between indexing those new messages and adding new messages to local hypercores. kappa-core has a pause
/resume
API now, so it's just a matter of either
now that we're using hyperswarm we should ideally try to clean up properly.
it looks like we should call the (undocumented) hyperswarm.destroy(cb)
function: hyperswarm/swarm.js#L294 . iirc maf said it was important to wait until the passed-in callback has fired, so as to not muddy up the dht.
i think it would also be good for cabal-client/cabal-desktop to properly kill all of the processes?
Line 62 in 779a500
Could this check for uri.hostname
as a fallback as well?
Related to password-protected cabals, it should be possible to create a cabal that has:
Such a cabal must have at least one or more admins to be protected in this way. It is important to protect entry as obscurity is not privacy, and important to protect reads in case the multifeed falls into hostile hands.
For example, consider this hypothetical scenario:
{
"type": "chat/encrypted",
"content": "..."
}
The value of the content
field can then be decrypted into a normal message:
{
type: 'chat/text',
content: {
text: 'default',
channel: 'hello friendo'
}
}
In this way the encryption key is never written to the log itself.
The end result is a private, encrypted cabal using public hyperswarm infrastructure.
originally suggested by @substack, it seems we have a few connectivity issues for sessions that are open for a long amount of time.
by adding a timeout to the replication stream, we can likely solve the problems we've been seeing
hypercore@8
hypercore-protocol
methods
What do y'all think about adopting hyperdiscovery for the swarm rather than using the raw discovery-swarm commands?
This'd be a breaking change since it'd use a different discovery key and would use the raw hypercore protocol for handshaking rather than the default discovery-swarm handshake.
So, breaking changes are scary, but this could be very beneficial: It'd make it easy to run on the web using discovery-swarm-web (getting that running has been a huge PITA). It'd also be less maintenance since we could reuse code between the rest of the dat ecosystem. Finally, once we get hyperswarm integrated into hyperdiscovery, it should be a minor version bump to get all it's benefits into Cabal. ๐
Would a PR introducing this breaking change be welcome?
Now that moderation (#45) is in, we can start removing messages from banned users.
Question: should all message history for a banned user be purged, or just all messages since the ban?
cc @cblgh @Karissa @substack @cinnamon-bun @nikolaiwarner @telamon
Yes! Excited that #45 is happening! Some questions and thoughts about next steps:
key
param on the cabal-core constructor is a different modkey than the last startup.ban/add
makes it desirable to not show chat messages from that use anymore. However, their messages still reside in the chat history view. There are multiple ways to do this, like a) rebuilding all views on moderation commands (no thanks), having the moderation view actually edit other existing views (like purging existing entries from the banned user), or having the APIs on the other views do filtering on their get
API based on moderation view state. I'm leaning toward (2) right now.from bashrc in the public cabal:
I was wondering if this is vulnerable to the same failure modes as irc. In irc without any logins to a central server the adversary can create a million accounts and have them all pump out garbage as a denial of service. Some of that was documented in the Snowden leaks.
i answered:
not really, flooding is hard to disambiguate for p2p distributed systems
what's the diff between someone flooding and someone coming online after a lengthy (but message-producing) internet absence
the resistance would be moderation actions as applied to flooders
other approaches could also temporarily restrict connections to only known ids (to prevent e.g. raids)
we need to be able to close an opened cabal
this would close the underlying kappa-core and stop replication
Breakdown:
multifeed#close
(using hypercore#close
) kappa-db/multifeed#10multifeed-index
that's runningkappa-core#close
, which 1) stops all indexes, and then 2) stops the multifeed
Just discovered Cabal, and I have to say it looks amazing all around!
My only minor suggestion is that I want to star something to keep it in mind and follow its progress, but the lack of a central cabal-club/cabal
repo for the project as a whole makes it difficult to infer what I should be following.
If this were SEO, it's like your project has multiple domains cabal-core
, cabal-cli
, etc as opposed to a single canonical domain.
Just think it could help cabal grow and reach more people.
On the technical side, I have nothing to contribute haha -- Keep up the great work pushing the boundaries of decentralized chat! ๐ฅ
Message schemes are really hard to change as the protocol gets older, so if there are any inclinations of interoperability I'd heavily suggest adopting some shared semantics.
{
"type": "chat/text",
"content": {
"channel": "default",
"text": "hello *world*"
},
"timestamp": 1576152732000
}
{
"@context": "https://www.w3.org/ns/activitystreams",
"type": "Note",
"to": "cabal:channel:default",
"mediaType": "text/markdown",
"content": "hello *world*",
"published": "2019-12-12T12:12:12Z"
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.