kitsune-soc / kitsune Goto Github PK
View Code? Open in Web Editor NEW🦊 (fast) ActivityPub-federated microblogging
Home Page: https://joinkitsune.org
License: Other
🦊 (fast) ActivityPub-federated microblogging
Home Page: https://joinkitsune.org
License: Other
Implement the API endpoints to fetch comments to a parent post
Color palette starting point:
https://www.color-hex.com/color-palette/14887
Follow-up to #92
It would be nice to have support for the Range
header, where a client can just request a part of a file.
S3 already has support for that header and implementing it for file located on disk should be doable, too.
Currently every call to a mapping function will do a few calls to the database.
This would be fine if it's done for only one or two entities, but this is done for every post on a user's timeline, the outbox, etc.
Since we already have a Redis instance running, I'm thinking about abstracting away some of the database accesses behind something like the repository pattern and then doing the caching via the repository.
Add a tool called kitsune-cli
to give admin/moderator access, install frontends and emotes, etc.
Theoretically it's not really needed since Mastodon already has an own API documentation, but I'm personally not a fan of their documentation browsability.
We could use utoipa
to generate our definitions at compile time.
Support for uploading media attachments and attaching them to posts locally was added in #92.
These attachments don't federate yet, nor do we store attachments from remote sources.
reqwest
is fine for the most part. The problem we are facing right now is that reqwest
doesn't have a response body size limit.
This is problematic because we have to fetch a bunch of untrusted URLs where this could become a potential DoS vector.
I propose creating a new crate called something like kitsune-http-client
that contains an opinionated HTTP client construction based on hyper
and tower-http
that utilises the Limited
middleware for http-body
to enforce a body limit.
We would then end up with a construction like this:
ServiceBuilder::new()
.layer(follow_redirect)
.layer(compression)
.layer(map_body)
.service(client)
(where map_body
applies a transformation that wraps whatever body with the Limited
middleware)
We unfortunately can't use the content_length
function provided by reqwest
since a bunch of servers are not sending the HTTP Content-Length header.
Alternatively we could also implement the response decoding manually, stream the body into memory via bytes_stream
and keep track of how many bytes we already read manually.
Personally I don't find this a lot more ergonomic than composing a bunch of middleware together into an opinionated client stack.
It's a de-facto standard among ActivityPub instances to expose nodeinfo
endpoints to provide some metadata in a standardised way.
Posts support Markdown formatting. Add a preview to the frontend to allow checking the formatting before posting
This would allow us to easily add caching to the database accesses (namely to the Mastodon API entity mapper) since services can have state
Somewhat related to #42
Say I wanna bundle Kitsune together with some other app-servers as part of a omg.lol-like service.
For this type of hydra-service, a unified login is necessary. The best open source provider of that atm seems to be https://www.ory.sh — They also have an actively (auto-)updated Rust client.
So that as a user, I’d just sign up on Weird.one and once signed up I would already have accounts on Revolt-net, Kitsune-net etc.
Related:
Both TOTP and Yubikeys would be nice. Needs some exploration on the API side on how to implement it in a sensible way
Dhall looks pretty interesting. This would maybe make for some better maintainable configuration than flat environment variables
The parameters I mainly thought of was the maximum pool size, others that make sense as well can be added, too, of course.
Right now you have to use the separate search service.
I want a similar configurability for the search service like we have for the cache:
LIKE
SQL query based)Instead of storing the reposts split from the post, we should probably store them in the same table and identify reposts by having the reposted_post_id
set to a value (not a finalised name)
This makes aggregation of home timelines easier, and makes it easier to implement a potential quote repost feature
If you vote on a poll on Mastodon, you can't remove your vote nor change it. Probably completely doable by using Update
activities (or, worst case, Delete
and Create
)
This includes things like:
An event-based architecture. Services that mutate the state, such as the post or account service, can emit events to consumers. These events notify about additions, updates, and deletions (and potentially more. This architecture should be kept extensible).
This architecture is useful for:
The proposal would be implemented with Tokio broadcast channels in the beginning (since we are very much a monolithic application).
Instead of directly exposing the channels to the clients though, I would prefer to expose something like an event consumer struct that gives you a stream that one can poll for updates.
The idea behind this is that future changes to how events are exchanged would be trivial if the implementation details are as hidden as possible.
For example, we could switch to an RabbitMQ or Fluvio-based messaging backend in the future without having to make large code changes. The internals of the emitters and consumers may change, but that's all.
I'm not entirely happy with the current database structure. Before making any more substantial changes to it, I would like to revise it.
Making changes later down the road is just even more annoying
Right now we use tower_http
s static file server in conjunction with fallback
to provide static file serving.
This is annoying since we need the fallback configuration for our SPA in the future.
We either have to wait for axum v0.6
and for the entire ecosystem to update, for .nest_service
, or we have to rewrite tower_http
s ServeDir
.
Currently tending to the former option.
Reposts work via an announce activity and activities are just extended objects, so they can have content
Idea: Add content to the announce activity, if there's no content, it's a normal repost. If there is content it's a quote repost
This server will feature its own GraphQL-based API and an optional feature for Mastodon API compatibility.
Mastodon endpoints for an MVP:
These will be implemented basically in tandem with the queries/mutations of the GraphQL endpoint
Add a media proxy to the backend. It shouldn't be too difficult to implement since every attachment, remote or local, gets an own ID in our database.
The path could be something like /media-proxy/[attachment ID]
and it just fetches and streams the media to the client.
This prevents IPs getting leaked to the remote server and allows admins to do some reverse-proxy level caching to decrease the bandwidth usage on the remote server
The current HTTP signatures code is more of a hack to get it working than anything else.
Goals for the rewrite are that the code is:
Right now we match mentions via a regex. This is all well and good, but we need to match all of the following:
These are all separate regex runs and replace calls. Instead of doing this, I propose writing a lexer-style parser for posts. This would tokenise the post up into its components (post, mention, link, etc.) and would allow us to do per-token transformations.
This could be extracted into its own crate, to allow others to take advantage of this parser infrastructure.
Relevant crates:
Replace the current token flow with an actual OAuth2 flow.
Required for the later plan of adding Mastodon API compatibility
If it becomes possible to host Kitsune on Shuttle, that’d make small Kitsune instances hostable on their free tier, thus making it more accessible.
Shuttle (with which I have no direct affiliation) is open source and made in Rust, so there’d be no technical nor proprietary lock-in.
What I’m most curious about to start with is to what extent this is even possible. I’ll ask the shuttle devs if they can speak to this.
Looking at one of these tests, they are large and somewhat unreadable.
We should explore the viability of snapshot tests to replace those.
Snapshot testing library: Insta
Couldn't recently find the announcement board on corteximplant, that should be something to keep in mind when making our service.
We should revamp the search service to subscribe to the post events instead of requiring the service to be called in each function that creates/deletes a status.
The service could the add/remove posts to the index upon receiving events and validating that the posts are public/unlisted.
Currently the more job workers we have, the more strain is put on the database (since every worker is querying for new jobs in a loop).
This is obviously not good. We should explore alternative designs for job workers that ideally only hit the database to update the job state and retrieve the job context.
Right now we just use a random profile picture URL I found on Google Images.
I also don't want to return some random black image or question mark (like Steam).
I'd like something similar to Mastodon's old default pictures (the ones they had years ago. It was something like a Kaomoji on light grey background; mastodon/mastodon@a41c348)
The database models are used as GraphQL object definitions as well. It's at least worth thinking about whether we want some domain separation and split those two separate struct definitions; one for the database, one for GraphQL
Hey @aumetra, Thanks for adopting SeaORM!
It's our pleasure to see more inspirational projects were built on top of SeaORM :)
Let us know if you have any feature recommendation or feedback. Your contribution is what drive us forward!
Some learning resources for you: Documentation, Tutorial, Cookbook, Q&A
Join our Discord server to chat with others in the SeaQL community!
Feel free to submit a PR to showcase your project, SeaQL/sea-orm#403.
axum v0.6 introduces the concept of typed state. We should probably migrate our current state management (that's done via Extension
s) to the new typed state extractor.
The current blocker is that async-graphql doesn't yet have an axum integration published on crates.io that's compatible with axum v0.6.
They have it on their git but I'd prefer to avoid a git dependency on a repository I don't control if I can.
On Mastodon, you do a typo or forget alt texts, you have to delete and re-post the entire post to change the alt text.
We can totally solve this by changing the attachments of the post object
New UUIDs were released that provide orderability and would enable performant keyset paging.
We unfortunately cannot really rely on the timestamp field since it's not guaranteed to be unique so the sorting will not be stable.
We could theoretically keep the current v4 UUIDs and do a pagination over (created_at, id)
which will have a stable chronological order since created_at
provides the temporal orderability and id
provides the uniqueness.
A problem we are facing though is that Mastodon is based on Snowflake IDs, which are inherently chronologically sortable, and their API is based around that by only giving the client the IDs to page by.
UUID version 7 fixes this by being completely chronologically sortable since their most significant bits are a unix timestamp in milliseconds. The less significant bits are random bytes generated by a CSPRNG, which provides the uniqueness.
And since these identifiers are in fact UUIDs, we wouldn't have to change our database schema.
The only caveat is that the v7
functionality in the uuid
crate isn't stable yet since the UUID v7 specification hasn't been fixed yet and might change in the future.
Instead of requiring the user to type out the text on an image manually, we should add functionality to use OCR to automatically detect the text on an image.
Maybe useful library: https://github.com/naptha/tesseract.js
cc @qarnax801
Add APIs for uploading media attachments.
These APIs would probably go through a storage abstraction, since we might want to use multiple storage backends (similar to kitsune-messaging
).
Backends I definitely want to have are:
General design idea:
instance
custom_emoji
, description
, name
, etc.The custom emoji type could look like this:
let CustomEmoji = { name: Text, path: Text }
let CustomEmojis = List CustomEmoji
in {
CustomEmoji,
CustomEmojis
}
The custom emoji would then be mounted in some media sub-path, something like /media/custom-emoji/:emojiName
, and would then, as shown in the path, indexed by their name
Not sure if this might be something that'd be better kept in the database, since this might need to synchronise between multiple nodes.
This is just a first design draft, completely open for discussion
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.