The kitsune from kitsune-soc

Ability to comment on posts

Implement the API endpoints to fetch comments to a parent post

Color palette & Theme

Color palette starting point:
https://www.color-hex.com/color-palette/14887

Support for the `Range` header on media uploads

Follow-up to #92

It would be nice to have support for the Range header, where a client can just request a part of a file.
S3 already has support for that header and implementing it for file located on disk should be doable, too.

DMs need to be more straightforward

As discussed with @aumetra , DMs need to be more streamlined, as with any other normal DMs in any other service, and not as complicated as @ing other people (which is confusing even for someone who basically lives online)

Optimise database access

Currently every call to a mapping function will do a few calls to the database.
This would be fine if it's done for only one or two entities, but this is done for every post on a user's timeline, the outbox, etc.

Since we already have a Redis instance running, I'm thinking about abstracting away some of the database accesses behind something like the repository pattern and then doing the caching via the repository.

Add CLI tool

Add a tool called kitsune-cli to give admin/moderator access, install frontends and emotes, etc.

OpenAPI documentation for the Mastodon API

Theoretically it's not really needed since Mastodon already has an own API documentation, but I'm personally not a fan of their documentation browsability.

We could use utoipa to generate our definitions at compile time.

Federate media attachments

Support for uploading media attachments and attaching them to posts locally was added in #92.
These attachments don't federate yet, nor do we store attachments from remote sources.

Replace reqwest with a custom HTTP client abstraction

reqwest is fine for the most part. The problem we are facing right now is that reqwest doesn't have a response body size limit.
This is problematic because we have to fetch a bunch of untrusted URLs where this could become a potential DoS vector.

I propose creating a new crate called something like kitsune-http-client that contains an opinionated HTTP client construction based on hyper and tower-http that utilises the Limited middleware for http-body to enforce a body limit.

We would then end up with a construction like this:

ServiceBuilder::new()
    .layer(follow_redirect)
    .layer(compression)
    .layer(map_body)
    .service(client)

(where map_body applies a transformation that wraps whatever body with the Limited middleware)

We unfortunately can't use the content_length function provided by reqwest since a bunch of servers are not sending the HTTP Content-Length header.

Alternatively we could also implement the response decoding manually, stream the body into memory via bytes_stream and keep track of how many bytes we already read manually.
Personally I don't find this a lot more ergonomic than composing a bunch of middleware together into an opinionated client stack.

Support for Update activities for posts

Nodeinfo endpoints

It's a de-facto standard among ActivityPub instances to expose nodeinfo endpoints to provide some metadata in a standardised way.

https://github.com/jhass/nodeinfo

Markdown preview for posts

Posts support Markdown formatting. Add a preview to the frontend to allow checking the formatting before posting

Refactor the mapping extension traits into services

This would allow us to easily add caching to the database accesses (namely to the Mastodon API entity mapper) since services can have state

Somewhat related to #42

Support overarching OAuth provider

Say I wanna bundle Kitsune together with some other app-servers as part of a omg.lol-like service.

For this type of hydra-service, a unified login is necessary. The best open source provider of that atm seems to be https://www.ory.sh — They also have an actively (auto-)updated Rust client.

So that as a user, I’d just sign up on Weird.one and once signed up I would already have accounts on Revolt-net, Kitsune-net etc.

Both TOTP and Yubikeys would be nice. Needs some exploration on the API side on how to implement it in a sensible way

Rework configuration

Dhall looks pretty interesting. This would maybe make for some better maintainable configuration than flat environment variables

Federation tests

Configurable database parameters

The parameters I mainly thought of was the maximum pool size, others that make sense as well can be added, too, of course.

Relevant SeaORM documentation

Support for `Block` activities

Rework ActivityPub types

Add more configurability for search

Right now you have to use the separate search service.
I want a similar configurability for the search service like we have for the cache:

High quality, performance (separate search service)
Not-so-high quality, a little slower (LIKE SQL query based)
None at all (the search bar only fetches posts/accounts by their URLs/Webfinger IDs)

Change post database schema

Instead of storing the reposts split from the post, we should probably store them in the same table and identify reposts by having the reposted_post_id set to a value (not a finalised name)

This makes aggregation of home timelines easier, and makes it easier to implement a potential quote repost feature

Allow changing poll answers

If you vote on a poll on Mastodon, you can't remove your vote nor change it. Probably completely doable by using Update activities (or, worst case, Delete and Create)

Extend instance configurability

This includes things like:

Character limit
Instance name
Instance description

Introduce event emitters and consumers

Proposal

An event-based architecture. Services that mutate the state, such as the post or account service, can emit events to consumers. These events notify about additions, updates, and deletions (and potentially more. This architecture should be kept extensible).

This architecture is useful for:

Cache invalidation
WebSocket updates pushed to clients

Implementation

The proposal would be implemented with Tokio broadcast channels in the beginning (since we are very much a monolithic application).
Instead of directly exposing the channels to the clients though, I would prefer to expose something like an event consumer struct that gives you a stream that one can poll for updates.

The idea behind this is that future changes to how events are exchanged would be trivial if the implementation details are as hidden as possible.
For example, we could switch to an RabbitMQ or Fluvio-based messaging backend in the future without having to make large code changes. The internals of the emitters and consumers may change, but that's all.

Rework database structure

I'm not entirely happy with the current database structure. Before making any more substantial changes to it, I would like to revise it.
Making changes later down the road is just even more annoying

Rework static file server

Right now we use tower_https static file server in conjunction with fallback to provide static file serving.
This is annoying since we need the fallback configuration for our SPA in the future.

We either have to wait for axum v0.6 and for the entire ecosystem to update, for .nest_service, or we have to rewrite tower_https ServeDir.

Currently tending to the former option.

Quote reposts

Reposts work via an announce activity and activities are just extended objects, so they can have content

Idea: Add content to the announce activity, if there's no content, it's a normal repost. If there is content it's a quote repost

Mastodon API compatibility

This server will feature its own GraphQL-based API and an optional feature for Mastodon API compatibility.

Mastodon endpoints for an MVP:

These will be implemented basically in tandem with the queries/mutations of the GraphQL endpoint

Add media proxy

Add a media proxy to the backend. It shouldn't be too difficult to implement since every attachment, remote or local, gets an own ID in our database.

The path could be something like /media-proxy/[attachment ID] and it just fetches and streams the media to the client.

This prevents IPs getting leaked to the remote server and allows admins to do some reverse-proxy level caching to decrease the bandwidth usage on the remote server

Rewrite HTTP signatures code

The current HTTP signatures code is more of a hack to get it working than anything else.

Goals for the rewrite are that the code is:

generic: Should work in any kind of application that needs HTTP signatures
portable: No dependency on OpenSSL since that library's portability story (especially to Windows) is horrible
standards compliant: It should implement as much of the RFC as possible

Parser-based post processing

Right now we match mentions via a regex. This is all well and good, but we need to match all of the following:

Mentions
Links
Hashtags
Emotes

These are all separate regex runs and replace calls. Instead of doing this, I propose writing a lexer-style parser for posts. This would tokenise the post up into its components (post, mention, link, etc.) and would allow us to do per-token transformations.

This could be extracted into its own crate, to allow others to take advantage of this parser infrastructure.

Relevant crates:

Proper OAuth2 flow

Replace the current token flow with an actual OAuth2 flow.
Required for the later plan of adding Mastodon API compatibility

Shuttle compatibility

If it becomes possible to host Kitsune on Shuttle, that’d make small Kitsune instances hostable on their free tier, thus making it more accessible.

Shuttle (with which I have no direct affiliation) is open source and made in Rust, so there’d be no technical nor proprietary lock-in.

What I’m most curious about to start with is to what extent this is even possible. I’ll ask the shuttle devs if they can speak to this.

Snapshot testing for some areas

Looking at one of these tests, they are large and somewhat unreadable.

We should explore the viability of snapshot tests to replace those.
Snapshot testing library: Insta

Announcements

Couldn't recently find the announcement board on corteximplant, that should be something to keep in mind when making our service.

Move URL construction into service

Change how posts are indexed

We should revamp the search service to subscribe to the post events instead of requiring the service to be called in each function that creates/deletes a status.

The service could the add/remove posts to the index upon receiving events and validating that the posts are public/unlisted.

Documentation

Kitsune has grown significantly in terms of features and configurability.
It's probably not a bad idea to start writing some nice documentation.

Possible SSGs for documentation:

Astro (would fit right in with our existing website)
VitePress

Rework job scheduler

Currently the more job workers we have, the more strain is put on the database (since every worker is querying for new jobs in a loop).
This is obviously not good. We should explore alternative designs for job workers that ideally only hit the database to update the job state and retrieve the job context.

Add default profile picture

Right now we just use a random profile picture URL I found on Google Images.
I also don't want to return some random black image or question mark (like Steam).

I'd like something similar to Mastodon's old default pictures (the ones they had years ago. It was something like a Kaomoji on light grey background; mastodon/mastodon@a41c348)

In-memory cache

Separate GraphQL objects from database models

The database models are used as GraphQL object definitions as well. It's at least worth thinking about whether we want some domain separation and split those two separate struct definitions; one for the database, one for GraphQL

Consider adding kitsune to SeaORM showcase!

Hey @aumetra, Thanks for adopting SeaORM!
It's our pleasure to see more inspirational projects were built on top of SeaORM :)

Let us know if you have any feature recommendation or feedback. Your contribution is what drive us forward!

Some learning resources for you: Documentation, Tutorial, Cookbook, Q&A
Join our Discord server to chat with others in the SeaQL community!

Feel free to submit a PR to showcase your project, SeaQL/sea-orm#403.

Migrate to axum v0.6

axum v0.6 introduces the concept of typed state. We should probably migrate our current state management (that's done via Extensions) to the new typed state extractor.

The current blocker is that async-graphql doesn't yet have an axum integration published on crates.io that's compatible with axum v0.6.
They have it on their git but I'd prefer to avoid a git dependency on a repository I don't control if I can.

Allow editing alt text after posting

On Mastodon, you do a typo or forget alt texts, you have to delete and re-post the entire post to change the alt text.
We can totally solve this by changing the attachments of the post object

Consider using v7 UUIDs

New UUIDs were released that provide orderability and would enable performant keyset paging.
We unfortunately cannot really rely on the timestamp field since it's not guaranteed to be unique so the sorting will not be stable.

We could theoretically keep the current v4 UUIDs and do a pagination over (created_at, id) which will have a stable chronological order since created_at provides the temporal orderability and id provides the uniqueness.
A problem we are facing though is that Mastodon is based on Snowflake IDs, which are inherently chronologically sortable, and their API is based around that by only giving the client the IDs to page by.

UUID version 7 fixes this by being completely chronologically sortable since their most significant bits are a unix timestamp in milliseconds. The less significant bits are random bytes generated by a CSPRNG, which provides the uniqueness.
And since these identifiers are in fact UUIDs, we wouldn't have to change our database schema.

The only caveat is that the v7 functionality in the uuid crate isn't stable yet since the UUID v7 specification hasn't been fixed yet and might change in the future.

OCR for media uploads (client-side)

Instead of requiring the user to type out the text on an image manually, we should add functionality to use OCR to automatically detect the text on an image.

Maybe useful library: https://github.com/naptha/tesseract.js

cc @qarnax801

Add media uploads

Add APIs for uploading media attachments.
These APIs would probably go through a storage abstraction, since we might want to use multiple storage backends (similar to kitsune-messaging).

Backends I definitely want to have are:

File system
S3

Custom emoji support

General design idea:

New configuration section called instance
- This section contains custom_emoji, description, name, etc.

The custom emoji type could look like this:

let CustomEmoji = { name: Text, path: Text }
let CustomEmojis = List CustomEmoji
in {
     CustomEmoji, 
     CustomEmojis
}

The custom emoji would then be mounted in some media sub-path, something like /media/custom-emoji/:emojiName, and would then, as shown in the path, indexed by their name
Not sure if this might be something that'd be better kept in the database, since this might need to synchronise between multiple nodes.

This is just a first design draft, completely open for discussion

kitsune-soc / kitsune Goto Github PK

kitsune's People

Contributors

Stargazers

Watchers

Forkers

kitsune's Issues

Proposal

Implementation

Recommend Projects

Recommend Topics

Recommend Org