Giter Site home page Giter Site logo

earthstar-project / earthstar Goto Github PK

View Code? Open in Web Editor NEW
611.0 611.0 18.0 25.99 MB

Storage for private, distributed, offline-first applications.

Home Page: https://earthstar-project.org

License: GNU Lesser General Public License v3.0

TypeScript 98.81% JavaScript 1.01% Makefile 0.17%
browser databases deno distributed earthstar local-first offline-first p2p p4p sync

earthstar's People

Contributors

achou11 avatar anactualemerald avatar christianbundy avatar cinnamon-bun avatar dependabot[bot] avatar johanbove avatar sgwilym avatar tlowrimore avatar zachmandeville avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

earthstar's Issues

Fetch Interface based on URLs/ HTTP Methods

What's the problem you want solved?

I'd like to integrate protocol handlers for EarthStar in Electron apps or other browser-like apps, specifically in my p2p web browser project.

Ideally it should provide read and write access to stuff and have key management handled by the protocol handler.

We talked about this a bit on SSB here: %MbsFkcmbZU/pnx5JS5PpcVDtueMM0xslrzAXpqXfDwE=.sha256

Is there a solution you'd like to recommend?

I'd like to propose something that looks like the following psudo code:

GET earthstar://:workspace/path/to/document => JSON blob or whatever

GET earthstar://:workspace/path/prefix?author=example => Array of docs, or maybe a multipart response if it's raw files?

// If this is the first time you're using `name` as an identity
// The browser can prompt the user to initialize it
// If `name` exists but hasn't been used here before, ask permission from user
PUT earthstar://:name@:workspace/path/to/document {content in body}

// You can use whatever strings for methods, so why not invent SYNC? :P
SYNC earthstar://workspace/ {list of peers or pubs in body}

SYNC earthstar://workspace/path/to/document {list of peers or pubs in body}

I think that this could also be adapted for an HTTP API for some sort of daemon where earthstar:// could be repalced with http://somgateway.com/someapi/path/ to point at an HTTP server running either locally or remotely.

I'm interested in making a JS library called earthstar-fetch which will encapsulate this functionality in a similar way to dat-fetch and play around with it before drafting a full standard. A side effect would be that we'd get a fetch-like interface into regular web browsers along side protocol handlers in browsers like Agregore.

More comprehensive query options

What's the problem you want solved?

The query object specification only has limited ways to match against document fields.

For example, we have lowPath which does <= and highPath which does >. What if we want to do < or >=?

We need to add timestamp range queries too. Do we need a combinatorial explosion of query parameters?

Generalized query options for matching document fields

A more systematic way to do this is to combine a field and an operation:

let query = {
    timestamp_lte: 15000000000,  // timestamp less than or equal to
    path_prefix: "/wiki/",  // path has prefix
    author_in: ["@suzy.bxxxx", "@matt.bxxxxx"],  // author in list
    content_neq: "",   // content !== ""
}

Operations:

< lt
<= lte
> gt
>= gte
== === eq     // this is the default if no operation is specified
!= !== <> neq
prefix or startsWith
suffix or endsWith
in

Symbols or letters? Spaces or underscores?

{
    // which style is better?
    timestamp_lte: 15000000,
    "timestamp <=", 15000000,
}

This would make for quite a large Query type. Luckily the code doing the query could handle this in a generalized way by splitting the properties at _ instead of hardcoding each combination.

type Query = {
    path?: string,
    path_lt?: string,
    path_lte?: string,
    path_gt?: string,
    path_gte?: string,
    path_eq?: string,
    path_neq?: string,
    // etc
}

let doQuery = (query: Query) => {
    for (let [property, value] of Object.entries(query)) {
        let [fieldToQuery, operation] = property.split('_');
        // etc
    }
}

Other query options stay the same

Some query options are not about matching specific fields. These would continue to work in the old way, without an operation in the property name:

limit
includeHistory  (pending issue #44 )
now

Queryable metadata

If we implement #9 we would also need to query the metadata keys and values. Maybe like this?

The metadata we want to search for:

let doc = {
    path: "/posts/blah/blah",
    author: "@suzy.bxxxxxxx",
    ...etc...,
    metadata: {
        category: "gardening",
        createdTimestamp: 1500000077,
    }
}

The query:

let query = {
    metadata_category: "gardening",
    metadata_createdTimestamp_gt: 1500000000,
}

Fail loudly when storage.set() document is supplied with wrong kind of timestamp

What's the problem you want solved?

Timestamps should be in microseconds (Date.now() * 1000).

The Validator will reject timestamps that are small enough that they were probably accidentally made in milliseconds (Date.now()).

However, when calling storage.set(doc), the doc's timestamp is bumped forward to be later than the existing docs in that path. This can mask a bad timestamp because it happens before the timestamp validity check.

Is there a solution you'd like to recommend?

In storage.set(), check the doc timestamp with the Validator's timestamp check function and fail early if it's a bad timestamp.

Add helper functions for deterministic random colors by author address

What's the problem you want solved?

In a user interface, two authors could look identical if we only show their shortnames and display names and not their pubkeys.

Is there a solution you'd like to recommend?

Give each author a random color that's deterministically derived from their author address.

The function should also accept a salt parameter which could be made different on each device. That way an imposter won't be able to generate a pubkey with a similar looking color because they won't know your salt.

Here's a way I did this in the past. It will need to be adapted to base32.

export let detRandom = (s : string) : number => {
    // return random-ish float between 0 and 1, deterministically derived from a hash of the string
    let m = md5(s) as string;
    return parseInt(m.slice(0, 16), 16) / parseInt('ffffffffffffffff', 16);
};

Invite code format: combining workspace address, key, and pubs

What's the problem you want solved?

To invite someone to a workspace, you have to tell them:

That's too many things to copy-paste.

Is there a solution you'd like to recommend?

How can we combine all those things into a single string?

+gardening.xxxx?privatekey=yyyyyy&pubs=https://mypub.com,https://pub2.com

+gardening.xxxx.yyyyy|https://mypub.com|https://pub2.com

{"workspace":"+gardening.xxxx","privatekey":"yyyyyy","pubs":["https://mypub.com","https://pub2.com"]}

earthstar://gardening.xxx?...

???

detChoice problems in browser

What's the problem you want solved?

This usage of detChoice:

detChoice(author.address, ["alpha", "beta", "gamma"]);

Results in the following error:

image

Am I using it wrong somehow?

And: is this function meant to be used in the browser? Calling it in a client gives me warning messages about how resource-hungry the current tab is.

Comparison to Kappa-db?

What's the problem you want solved?

Hi, I see in the readme you compare to DAT, but it might be best to split out the comparison a bit to be more accurate, since the ecosystem is quite large.

Is there a solution you'd like to recommend?

Hyperspace will be the new RPC module for creating applications that are compatible in the dat ecosystem https://github.com/hyperspace-org. This has the same concerns you note with Hypercore, multi-writer is not possible out of the box and it is a bit more complex to do that.

Kappa-db (github.com/kappa-db/) is quite close to Earthstar, but it is less 'batteries-included' and more for customizing database behaviors. I really like the approach earthstar has taken to make these patterns more accessible to the common dev!

Thanks ~K

Real-time sync / change feed

Make a live sync mode where syncing lasts forever and changes are streamed as they occur.

Augment the existing sync by hooking up the onChange Store event.

Safely discovering which workspaces you have in common with another peer (without disclosing the others)

What's the problem you want solved?

Workspace addresses are supposed to be kept secret.

How can peers discover which workspaces they both have (so they can sync them), without disclosing the workspaces they don't have in common?

Example:
Peer1 has W1, W2
Peer2 has W2, W3.

They should discover they both have W2. Peer1 should not learn about W3. Peer2 should not learn about W1.

Is there a solution you'd like to recommend?

Share the hashes of the workspace addresses?

  • Peer1 and Peer2 each generate a random nonce.
  • Each peer hashes their workspaces with the nonces and shares them with each other: sha256(workspaceAddress + nonce1 + nonce2)

The hashes they have in common correspond to the workspaces they both have.

The hashes that are unique to one peer will reveal no information to the other peer.

A MitM won't learn the workspace addresses even if they know both of the nonces.

HTTP example

Alice: hey, here's a nonce, give me your workspace hashes.
--> GET /workspaceHashes?nonceA=foo

                Bob: ok, I made my own nonce too, here's the result
                <-- {
                  nonceA: "foo", nonceB: "bar",
                  workspaceHashes: [
                    // sha(workspace + nonceA + nonceB)
                    "bq49f8jq0o4f9jqf",
                    "b098ja0jhahahfa3",
                  ]
                }

Alice: now I can compute the same hashes from
my own workspace list, and now I know which
workspaces we have in common.

Security thoughts

The peers will learn the number of workspaces they each have. ๐Ÿคท They could add random fake entries to the list, but you could still collect a statistical sample and infer the real number.

The nonce prevents a replay attack by making the hashes specific to one particular sync session.

Add IndexedDb support to help with larger workspaces in the browser

[issue updated march 2021]

Currently there are 3 implementations of the IStorage interface:

  • storageMemory.ts - node or browser
  • storageSqlite.ts - node only
  • storageLocalStorage.ts - browser only

Browsers limit localStorage to 5mb per origin, so we can't have larger workspaces in the browser yet.

The solution is:

  • First, do #78 (Run tests in browsers) - we currently can only bundle and run one test file at a time; need to figure out how to run them all at once.
  • Add a storage class using IndexedDb

Sadly, IndexedDb uses an async API so we had to update the entire Earthstar system to support async calls. This is done now and we're ready to start on IndexedDb.

Adjustable future threshold for settings where device clocks are inaccurate

What's the problem you want solved?

If some peers have inaccurate clocks, their messages won't be able to sync around the network because they'll be "from the future".

Removing the "from the future" limit allows malicious peers to create documents that can't be overwritten by anyone else, because they have a timestamp of MAX_INT.

Is there a solution you'd like to recommend?

"From the future" is currently set to "10 minutes". Make this configurable and disable-able.

Apps that loosen this restriction should either:

  • Disallow multiple authors writing to the same path. Always use an author write restriction in paths like /wiki/[email protected].../Flowers
  • Or allow overwriting to shared paths, and accept that a malicious peer can create an non-overwritable document

Changes to make

  • Allow apps to change FUTURE_CUTOFF_MINUTES. Also allow null to disable it.
  • Consider making a single author use monotonic timestamps, e.g. each time they write they use a timestamp of min(now, myPreviousHighestTimestampAcossAllMyDocuments). This will help if their clock gets reset to 1970

Background

Unfortunately DAG backlinks don't work well in Earthstar because you might have gaps in your documents, so we have to use timestamps or version vectors, which are both vulnerable to MAX_INT type attacks. I don't have a solution to that except using the wall clock as a limit to force the numbers to grow slowly instead of jumping right to MAX_INT.

I also just learned about bloom clocks and I think they have the same vulnerability.

See timestamps.md for much more detail

Benchmark crypto libraries

The crypto needs of Earthstar can be filled by 2 swappable implementations so far:

  • cryptoChloride.ts - using chloride which chooses native or browser support as needed
  • cryptoNode.ts - using native node crypto

Benchmark

  • The speed of their main operations (signing & validation of signatures).
  • In native & browser contexts
  • Also how they affect the browserify bundle size

See also #4 Change crypto library for smaller browserify bundle - there are other libraries to try also

Make standard format for Layer names used in the first part of a path

For example, everyone's going to make apps that save to /todo/... but use incompatible data formats.

Figure out...

  • What is this part of the path called? It's a Layer name, or data format, or app name?

  • Recommend avoiding generic words. Name it todoodle or something instead of todo to avoid accidental collisions.

  • Recommend publishing Layer code as separate packages. That way other apps can use your data.

  • Make a registry of these format names. ...as a document in this repo, along with links to the Layer packages on npm.

  • Version the data. /todoodle-v1.2.3/...?

Putting the version into the path

We need a format for this version that will be compatible with path prefix searches -- the separator character has to be carefully chosen, maybe - is not the best choice. You might want to query for /todoodle-, /todoodle-v1, /todoodle-v.1.2, etc.

Normally path prefix searches end with / to make sure you get a specific folder, e.g. /todo/ to make sure you don't also get /todoooo by mistake. So we might want to use / as the version separator to avoid clashes with another format called /todoodle-doo which happens to include a dash...

Layer versioning

Should we use semantic versioning that matches the NPM package version for the Layer code?

Or maybe a simpler version number with just one integer, like we do with our document formats? Sometimes the code changes but the data format stays the same...

We have to think about forwards and backwards compatibility. New data and old code vs. old data and new code.

Implement participatingAuthor query in sqlite

This query field is implemented in the memory Storage, but not sqlite.

The different ways of querying by author are subtle, we need a diagram :)

types.ts:

    // If including history, find paths where the author ever wrote, and return all history for those paths by anyone
    // If not including history, find paths where the author ever wrote, and return the latest doc (maybe not by the author)
    participatingAuthor?: AuthorAddress,

    //// If including history, find paths with the given last-author, and return all history for those paths
    //// If not including history, find paths with the given last-author, and return just the last doc
    //lastAuthor?: AuthorAddress,

    // If including history, it's any revision by this author (heads and non-heads)
    // If not including history, it's any revision by this author which is a head
    versionsByAuthor?: AuthorAddress,

Also enable the tests at https://github.com/earthstar-project/earthstar/blob/master/src/test/storage.test.ts#L602-L607

Change some validator methods from private to public

What's the problem you want solved?

The validator methods that start with _ are useful to apps, but the _ suggests they are supposed to be treated as private methods.

_checkAuthorCanWriteToPath(doc.author, doc.path);
_checkTimestampIsOk(doc.timestamp, doc.deleteAfter, now);
_checkPathIsValid(doc.path)
etc

Is there a solution you'd like to recommend?

Remove the _.

These were originally private because they might be specific to a certain validator format and I was thinking ahead to handling multiple, different validator classes.

I'm still not quite sure how to handle that, but for now they can be public.

forget(): Delete local data using a replication filter

Related to #6 (Replication filters)

It should be possible to drop local content that matches or doesn't match a query. You would want to do this if you've just blocked someone, or if unfollowing someone means you want fewer people in your local database.

This isn't a delete message that will propagate, we're just forgetting local data.

  • Add a forget(query) method to IStorage
  • Do we also want the opposite, onlyKeep(query) ?

Queryable metadata? Or queryable content objects?

Sometimes we need to query items in several ways but the path only allows us to choose one access pattern.

Example: We might need to search for social media posts by author, by timestamp, by tag, by thread, etc etc.

Author and timestamp are already core database fields, but what about tag and thread? We could try to embed them into the paths for querying, but we have to choose one access pattern:

/posts/thread1/postA
    or
/posts/tagX/postA

...or we have to write two items, one in each place, which could get separated or out of sync with each other.

Considerations

  • Should be simple to serialize deterministically across different platforms and languages, for hashing and signatures
  • Should be simple to serialize and parse for wire transport (this can vary according to the transport and it doesn't have to be deterministic)
  • Should be easy to query in an SQL context, so maybe no deeply nested JSON
  • Should be easy to specify queries in a declarative way
  • Consider how this will work with encryption #10 (just the value), or #11 (wrapped entire items)

Benefits

  • Faster / more flexible queries for apps
  • Reduces pressure on the path to do too many things (support efficient queries, help with partial replication, etc). Paths could become shorter, info gets moved into the content or metadata.
  • Apps won't need to handle their own indexing
  • Authors don't have to anticipate future querying needs as much

Problems

  • Increased complexity, especially for Stores not built on a good underlying database

Option 1: metadata

Each item would have a k-v dictionary of metadata in addition to its regular path and content. Content and metadata will remain just strings. No nesting within metadata.

{
    path: '...',
    timestamp: 150000000,
    content: 'hello world',
    metadata: {
        tag: 'X',
        thread: '1',
    }
}

Option 2: make content into objects and index them

Move all this data inside content, which is now an object instead of a string.
Probably still limit all of the fields to contain strings with no nesting? Or maybe just atomic values (string / boolean / null / number)?

{
    path: '...',
    timestamp: 150000000,
    content: {
        text: 'hello world',
        tag: 'X',
        thread: '1',
    }
}

Unsolved challenges

  • How to declaratively make Query objects to query this. Do we practically need a whole query builder DSL?
  • A common but complex query would be "List all the threads, sorted by the most recent activity in them". In SQL this might require subqueries or GROUP BY.
  • How to encode lists, like the list of tags on a post? The Firebase style is to move the value into the key, like
{
    content: {
        'tag/gardening': true,
        'tag/flowers': true,
    }
}
  • It's tempting to use a triple-store or graph database style here, but to keep complexity down I'd like to stick with standard JSON style objects

TODO

  • Decide objects or metadata (see above)
  • Convert content from string to object
  • Update the feed format to hash and sign in a standardized way
  • Add query options to the query format for use cases like "documents where PROPERTY = VALUE"
  • Add indexing of properties to the Storage classes

Change optional document fields to required and nullable

What's the problem you want solved?

The deleteAfter field in a Document is optional and not nullable.

    deleteAfter?: number,

This seems to be awkward in GraphQL? Earthstar-graphql returns documents with deleteAfter: null.

Another optional field is also coming soon, workplaceSignature, for invite-only workspaces.

Is there a solution you'd like to recommend?

We could make the fields required and nullable.

    deleteAfter: number | null,

I actually prefer this since it makes the documents more self-documenting -- as a programmer, you won't ever be surprised by a document with a field you haven't seen before.

I had made them optional to save space, but it's only 17 bytes extra per document to make them required and null.

Sync queries (a.k.a. selective sync, replication queries)

During a sync, apps should be able to specify filters for incoming and outgoing data. This could use the same QueryOpts type we already have.

Pseudocode:

syncer.setIncomingSyncFilters([
    { pathPrefix: 'wiki/' },
    { pathPrefix: 'about/' },
]);
syncer.setOutgoingSyncFilters([
    { author: '@aaa' },  // only upload our own data
]);
syncr.sync(url);

When you supply multiple filters they get OR'd together. In other words, we send things that match ANY filter.

When starting a sync, we'll send the incoming filters to the other peer so they can avoid sending us things we don't want. We'll also apply the incoming filters on our end in case the other peer didn't pay attention.

And data we send will be filtered by the outgoing filters. This lets you only upload data of people you trust (yourself, or people you follow, or not blocked people).

Questions to resolve

  • Would these filters need to be different for different peers / pubs you're syncing with, instead of universal?
  • Would we specify a sort order (like newest first) in the query?
  • Would we want to sync in stages of priority, like "/about/" first, then "/wiki/", ?
  • Can we specify a number of hops (social radius steps) or do we need to explicitly list every author we want? It could be a long list. But we don't have a formal system for following yet so we can't track the number of hops.

Related conversations

How to prevent infinite spread of data across peers?

Background

In SSB, data is limited to spread N hops across the network of peers. This is tracked using the social graph (following).

In Earthstar we can't count the number of hops because we don't have a social graph (no following mechanism, yet). Instead we have two classes of peer: users and pubs. Pubs are unattended peers that are not closely associated with a single user.

Pubs are passive buckets for users to put or get data from. The only way for data to spread from pub to pub is via a user who syncs with both pubs.

Users can also sync directly with each other.

So data zigzags between users and pubs as it spreads:

    pub      pub     pub
   /   \    /   \   /
user    user --- user

Problem

A workspace's data could spread widely across pubs and users, far beyond the people who are actually using it.

There are a few ways a workspace could get onto a new peer, and no ways for workspaces to get forgotten by a peer.

How to limit the spread

Any 2 peers should be allowed to sync a workspace if they both know its address. This problem is all about discovering and adding new workspaces.

The sync protocol could:

  • Make it impossible to enumerate the workspaces held by the other side
  • Not automatically accept new workspaces; make this an explicit action

Users should:

  • Only sync workspaces with pubs they trust, not with every pub they can find. Make this decision separately for each workspace they have.

Pubs could:

  • Never learn about new workspaces from another pub
  • Forget workspaces if they haven't been accessed in 6 months
  • Limit their ability to accept new workspaces:
    • Stop accepting workspaces after they have, say, 10 of them
    • Use an allowlist of workspaces (set by the pub operator)
    • Only host workspaces related to certain users, such as the pub operator or an allowlist of their friends

Do we need harder rules to limit the spread? E.g. a workspace's data could somehow include an allowlist of pubs that are allowed to host it, and we hope that all peers will respect that list and not spread it further?

Figure out: fully wrapped encrypted documents

(see also: #49 simple encryption)

A document's path can contain sensitive information. We'd like to encrypt an entire document including the path, and put a different less sensitive path on the outside.

This plan requires a more complex Store class.

Outer wrapper layer. The path will probably be a random value chosen by the app, like a UUID.

{
    path: 'encrypted/abcad-29a3a-bbda9-2294a',
    content: 'xxxxxxxxx...encrypted...data...xxxxxxxx'
    timestamp
    author
    format
    workspace
    etc
}

Once the content is decrypted, it contains another document, the inner layer:

    {
        path: 'wiki/Ladybug',
        content: 'Ladybugs are a kind of small beetle',
        ....timestamp etc might be missing here, inherited from the outer layer
    }

Details

Stores decrypt items as they arrive.

When apps query the database they can see the outer documents AND the inner documents?

When syncing, we only send the outer documents.

When applying sync filters, we look at the outer path only. This allows untrusted pubs to sync without decrypting anything.

The outer path still needs to contain access control information like a tilde and author pubkey [email protected], because untrusted pubs need to know that.

Recipients

Encrypted data can be encrypted for an author (using their public key), for everyone in a closed workspace (using the workspace key), or for some other kind of key that an app added to the Store's keychain somehow.

Figure out

  • A Store needs to decrypt documents as they arrive, so it needs to have a persistent keychain of keys to try
  • When adding a new key to the Store's keychain, we have to try to decrypt all the existing documents again
  • The Store now has a perspective - it can see some private things - which makes it more complicated to support multiple authors using one local Store instance. Either each author needs their own Store instance, or the Store has to track some details so it won't reveal private data to the wrong author when querying.
  • This makes path handling in Stores more complicated - it's similar to the Path Template feature from #8 Immutable items. Plan a design that works with both features

Sync over duplex stream

Make a protocol / stream handler that accomplishes a sync over a duplex stream. This might be easier after we have RPC.

(Related: #14 set up RPC)

Sync over hyperswarm

(Related: #15 Sync over duplex stream)

Add the ability to sync over a hyperswarm connection.

What should "includeHistory" actually do?

Document versions with the same path are related to each other. When querying, sometimes we want to handle them as a group and sometimes individually.

We only have one query parameter for this, includeHistory, and it doesn't let us do everything we want.

(vocabulary: a "head" is the latest document at a path)

We might want to...

  • query all document versions independently, then maybe only keep ones that are heads
  • query all document versions independently, then maybe only keep the latest one in each path that matched the query
  • query each group to see if it contains some kind of match, then return the whole group
  • query each group to see if it contains some kind of match, then return the head of the group

This complexity will hit any kind of query that can match only certain document versions in a path. We previously ran into this with querying by author. At the time I solved that by adding 3 ways to query by author:

participatingAuthor: match author anywhere in history; includeHistory happens after that
versionsByAuthor: includeHistory happens first, then match author in each version one by one
lastAuthor: only match author on latest doc version; includeHistory expands after that

This is confusing. Is there a more general way to specify how to handle history when querying?

Access-controlled workspaces (invite-only)

How it works

Right now anyone can write to a workspace if they know its address.

Let's add a second kind of workspace which has a pubkey in its address. Every item posted in this kind of workspace must be signed by the workspace private key AND the author private key. This will be enforced by the feed format validator class.

So to write you need to know the workspace private key. You can give the workspace secret to someone to invite them.

This doesn't encrypt messages (yet) so it doesn't limit who can read, only who can write. Once we have encryption, authors can encrypt their messages to the workspace pubkey.

Key inclusion in workspace addresses

Currently workspace addresses have this format:

"+" NAME "." B32

+gardening.bzwo4h3   // hard to guess randomness, any length
+gardening.baweoijca48jao93jl39cajl94j3  // public key, 53 characters

B32 is a base32-encoded buffer that holds randomness that's hard to guess, or a public key.

There isn't yet a good way to tell if the B32 is a key or not. For now we can check if it's 53 characters long (a key) or shorter (random).

It'd be nice to also have an "invite format" that includes the private key.

Tasks

  • Find better way to distinguish public key from randomness
  • Add workspaceSignature field to validator class, and check for valid workspace signature there
  • The Keychain type now needs to include the workspace secret as well as the author secret, hmm.

Making new workspace: confusing error message about name constraints

Attempted to make new db with cli, documentation has a different command than the code btw, but more concerningly I ran into an error: workspace address does not start with "@".

adding a @ does not fix it

in https://github.com/cinnamon-bun/earthstar/blob/master/src/util/addresses.ts:

if (!addr.startsWith('//')) {
    return { workspaceParsed: null, err: 'workspace address does not start with "@"' };
}

the error message does not match the test

Proxy Store

(Before this, do #14 Set up RPC)

Make a class that looks like a Store but actually calls out across the network to a remote Store.

Set up RPC

Right now sync happens over an HTTP REST API.

Generalize this using some RPC framework so it can happen over other transports too.

JSON-RPC is a good choice.

Efficient sync

(First, probably do #14 set up RPC)

Right now we sync by sending everything. Instead, do something like:

  • Both sides agree on a query filter and sort order
  • Both sides look at the next 1000 items we would send
  • Send each other the hash of all those items together: hash(list of individual item hashes)
  • If the hashes agree, move on to next 1000 items
  • Otherwise, send the actual 1000 items

This could obviously recurse further into the 1000 items, but there are tradeoffs:

  • Limit the number of network roundtrips
  • Limit the amount of hash computation. Use a cheap one like xor or md5.

We can't pre-compute the hashes because replication filters might choose a different subset of items each time.

This should eventually support #6 Replication filters.

Add helper function to encrypt values

This is duplicated by #49

Expose something like private-box in the API so apps can easily encrypt values.

Use case is something like this:

earthstar.set({
    key: 'secrets!',
    value: encryptSecretBox(myData, somePubkey),
});

let mySecret = decryptSecretBox(earthstar.get('secrets!'), someSecretKey)

The Store doesn't know anything about this, it's just a helper function for apps.

This doesn't help with encrypting keys, see #11 for that.

This should be implemented as a function in the Crypto class, in each of the interchangeable implementations.

Remember that we use base32 to encode our private and public keys.

Change crypto library to get smaller browserify bundle

(Related: #3 Benchmark the crypto libraries)

The crypto needs of Earthstar can be filled by 2 swappable implementations so far:

  • cryptoChloride.ts - using chloride which chooses native or browser support as needed
  • cryptoNode.ts - using native node crypto

Chloride takes up 1.5mb when browserified.

We can

noble-ed25519 is nice -- small and pure JS -- but has an async API and we need a sync one. It's only async because it depends on a browser API to do sha512 async'ly. We could fork noble-25519 and make a sync version that depends on some other sha512 library.

We could also try some other crypto library such as sodium-universal. See https://github.com/cinnamon-bun/browser-crypto-diagram

The crypto backend is swappable here, but it's hardcoded:
https://github.com/earthstar-project/earthstar/blob/master/src/crypto/crypto.ts#L1-L2

It would be better to let apps decide which one they wanted, and hopefully a tree-shaking bundler would know to omit the ones not being used.

Immutable items

This idea is very old, please see the comments for newer ideas



Goal

Allow documents to be (optionally) immutable, meaning you can't overwrite them with newer versions.

How

Paths could contain a special marker which will be replaced with the document's hash:

/moderation-actions/{id}
    will get expanded to
/moderation-actions/#xxxxxxHashOfItemxxxxxx

Because it's hard/impossible to create hash collisions, nobody will be able to create another document version with the same hash, so this document version can never be overwritten with a newer one.

New concepts

This splits the idea of paths into two:

  • Path Templates - before replacing the id with the hash
  • Expanded Paths - after replacing

Documents that are being signed or sent across the wire only have Path Templates. The path templates are kept forever as part of the original item. They are not used for path lookups and queries (?)

Expanded paths are derived state, computed by the Storage when receiving documents. They are used for all kinds of path lookups and queries.

Details

We'll need to reserve a few characters for this, both for the Path Templates and the Expanded Paths.

You must not be able to create an Expanded Path directly, so in this example # would have to be disallowed from Path Templates.

Tasks

  • Decide on characters for Path Templates and Expanded Paths. See this comment for details on the available characters
  • Disallow the expanded-character from Path Templates, in the Validator
  • Update all the Storage classes to understand the two kinds of paths
  • Figure out which paths should appear in which situations (querying etc)

Bonus

Once we have these two kinds of path, we can add another expansion which is a shorthand for the author's own key:

/~@@/about/name
    expands to
/[email protected]/about/name

This is just an optimization to save space from repeating the author's full address all the time. It might not be worth the extra complexity.

Syncer gets stuck on GraphQL pubs

What's the problem you want solved?

If the Syncer tries to talk to a GraphQL pub, it gets a 400 error response and dies while trying to parse JSON. Then it never finishes the sync lifecycle and goes back to an idle state.

Is there a solution you'd like to recommend?

Probably need to catch an error during JSON decoding.

(Uncertain if we want to support GraphQL pubs here in the core package or not; but at least it should fail cleanly)

Write specifications

Write specification documents for...

Solid:

  • Author addresses
  • Workspace addresses
  • Document paths
  • Other document schema
  • Document hashing and signing

In flux:

  • URLs
  • Peer sync API methods

Speculative:

  • Peer finding (hyperswarm?)
  • Peer connections (REST, RPC, GraphQL, ?)

Consider switching from base58 to base32 to improve URL compatability

Problem

Workspace names need to be case sensitive to preserve their base58 secrets.

URL locations are supposed to be NOT case sensitive, and some URL parsers will lowercase them for you.

This matters if we want to make our own URL scheme like earthstar:// without fighting the URL parser.

Solution?

Switch workspace and author addresses to lowercase base32 encoding.

Secrets will be 52 chars long instead of 44. This isn't significantly worse.

See this comment for details.

Record a timestamp for when a document was received

What's the problem you want solved?

We can't completely trust the timestamps authors put into their documents.

They could put a very old timestamp. This doesn't break the core replication or conflict resolution parts of Earthstar, but it could matter in a user interface.

Is there a solution you'd like to recommend?

Storage implementations should record the time each document was locally received.

This would let apps say "Author claimed to send this in 1974. Message received 3 days ago". There's no way to know if that gap is from dishonesty or a propagation delay, but it could help users to know.

This is a kind of metadata about a document, not part of the document itself where it would have to be included in the author's signature. This would not be sent during syncing.

All the Storage methods return Documents, so I don't know where to put this additional data.

Furthermore: other metadata

We could also record...

  • Which pub or peer was the first to give us each document. I'm not sure this would be helpful and it might be too privacy-invasive.
  • How many other peers each document has been synced to. This would help you know if your data has been safely uploaded or not.
  • Expiration date for ephemeral documents #31

Add "contentIsEmpty" query field for deleted documents

What's the problem you want solved?

"Deleted documents" are docs with content: "".

We need to preserve them behind the scenes, as tombstones.

Sometimes we want them (when syncing), and sometimes not (when showing things in a UI).

Is there a solution you'd like to recommend?

  • Add a query field which controls if they are included in results or not.
  • Decide what the default value is
  • Implement in memory storage
  • Implement in sqlite storage (partially working, blocked on #44 )
  • Figure out how it interacts with document history (blocked on #44 What should "includeHistory" actually do?)

Improvements to querying documents for sync filters

What's the problem you want solved?

earthstar-graphql has rough support for sync filters, and this feature has been a little hairy to implement.

As of writing, earthstar-graphql's sync filters are shaped like this:

{
  pathPrefixes: string[],
  versionsByAuthors: []
}

The idea is that the peer/pub will return documents that match ANY of these rules.

However, querying documents works like this:

workspace.documents({
  pathPrefix: "/something"
  versionsByAuthor: "@test.1234"
})

And the documents returned must match ALL of the queries.

What this means is that implementing sync filters is a little bit hairy. Here's earthstar-graphql's implementation: https://github.com/earthstar-project/earthstar-graphql/blob/master/src/util.ts#L253

It calls the documents method once for each member of each property in the sync filters, and then puts all the different lists together. This method would probably get a little more unwieldy once more properties are supported.

Is there a solution you'd like to recommend?

Could there be ways to query a workspace's documents a bit more like how sync filters operate, i.e. using OR logic, and supporting lists for each property? A new method on IStorage, or a (breaking) change to documents?

Add ephemeral documents (deleteAfter a certain timestamp)

What's the problem you want solved?

It would be nice to have ephemeral documents that get deleted after a given date. This helps with privacy and reduces storage needs.

Is there a solution you'd like to recommend?

  • Add a new optional field to documents: deleteAfter: <timestamp>.
  • The Validator should consider the document invalid after the expiration date. This prevents it from being synced.
  • Reads and queries should filter out expired documents, as if they don't exist
  • Reads and queries can trigger local deletion of expired documents when they're encountered (optional)
    • StorageMemory
    • StorageSqlite doesn't do this yet. It does filter out expired docs, it just doesn't delete them from the database when it encounters them.
  • Update docs
  • Validator should enforce that ephemeral docs use special character ! in their path, and regular docs don't. Add tests too.
  • Storage implementations should proactively check for and delete expired documents occasionally, maybe at startup and every hour thereafter? And upon close(), kill that recurring task.

Standard format for `about`, profile info, following

Define a standard format for about info such as

  • Name
  • Profile info
  • User icon
  • Following
  • Blocking

Something like

about/~@xxxxxxxx/name = "Alex"
about/~@xxxxxxxx/info = "Here's some stuff about me"
about/~@xxxxxxxx/icon = "...base64 string of a low res image..."
about/~@xxxxxxxx/relationship/@aaaaaa = "follow" | "block"   ??? 

Sparse mode: handle documents without their actual content

Document signatures should depend on the hash of the content, not the content itself.

This would let you:

  • Download documents without their content, just metadata, and still verify signatures
  • Locally delete content but keep metadata (e.g. when removing blocked content).

Add contentHash field

  • Add required contentHash field to documents
  • Compute the hash when writing a document: sha256(content)
  • Use contentHash and not content when hashing a document
  • When verifying document, check the hash matches the content

Allow content to be null

  • Allow content to be null, meaning it's missing (note, empty documents are still content: "")
  • When verifying document, check the hash matches the content OR the content is null
  • Add query option includeContent: boolean (include or omit the content itself in the returned docs)
  • Add query option hasContent: boolean (find docs with missing content, or existing content)

Add simple helper functions for encrypting & decrypting content

This issue is duplicated by #10

What's the problem you want solved?

Users have public keys, let's send them some private messages! Or eventually, encrypt messages to the workspace public key so only people with the workspace private key can read it (e.g. the members of the invite-only workspace).

Complicated solution

#11 Fully wrapped encrypted documents

Easy solution

Just encrypt the content and nothing else. The path and author will be exposed.

The recipient could be specified in the path so they know where to look for messages, or we could make recipients scan through everything looking for documents they can decrypt.

The encryption can probably be done with some function in Chloride. For multi-recipient messages we can use private-box

Potential crypto modules we can use:

Todo

  • Research how to do basic message encryption with Chloride
  • Write a helper function in crypto.ts
  • Think about how this affects IStores -- will they try to decrypt stuff for you? How do they know what to try? Should an IStore become responsible for one specific author's decrypted view of the data, or stay neutral? Maybe this is an app-level concern?

More consistent error reporting

What's the problem you want solved?

This code uses a variety of paradigms for reporting errors. Variously:

  • Return null on error, with no access to the reason for error
  • Raise exception
  • Return { result, err }

Is there a solution you'd like to recommend?

Choose and standardize on one style.

Here are some options:
https://github.com/earthstar-project/earthstar/blob/master/ignore/errorstyles.ts

Criteria:

  • Typescript understands it
  • Usable from plain Javascript
  • Easy to use (short code)
  • Easy to understand (widely used)
  • Returns rich error information
  • Serializable (so errors can be returned from pub servers over HTTP)
  • Works well with async functions

Add close() to Storage

What's the problem you want solved?

Some Storage implementations might need to do clean-up operations when closing.

Is there a solution you'd like to recommend?

Add a close() method to the IStorage interface.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.