Comments (4)
Resources have an explicit parent. Getting the parent is a cheap operation, but generating the children is expensive, as it currently requires a TPF query that iterates over all items. This specific operation should be very quick, but I'd like to make the optimizations generalizable.
One way to optimize this TPF query, is to keep track of incoming links. We could create a new inverted tree for this: ValueIndex<valueURL, MentionedIn<propery, Vec<subject>>>
. In human language: you can get a MentionedIn
for any value
, which contains a list of properties. For any given property, you can find the subjects that have used that property to refer to the valueURL.
So when we have this ValueIndex
, we need to add stuff into it. When do we build the index? I think we should update it whenever we add or remove a resource. We iterate over all the PropVals, check if they have an AtomicURL or ResourceArray, and add the items to the index. When a resource is deleted, it needs to be deleted from the index, too.
from atomic-server.
While I was working on the document editor, performance for Collections became pretty bad. This was due to the Documents creating a lot of Commits and Elements, and there was no indexing in the DB. When a Collection was to be fetched, a TPF query would be done, which in turn iterated over every resource... So I've started to build a value index. The Value index seems to be working properly, and it solves the performance issues I've had locally.
But a new issue is arising: it takes way to long to build the index. On my dev machine, where I have some 500MB of things in the store, it took about 30 minutes to build the index. It's insane. Every resource takes about 100+ ms. But why? I have no clue.
Maybe it's time to add some serious benchmarking.
from atomic-server.
Cause for slow indexing
- some Atoms have a lot of values (long resource arrays for
write
) - some value are very common (such as the "" empty string, or the
Commit
class URL)
... which respectively result in extremely big HashMaps
and HashSets
.
For every atom, the DB needs to read out and write these very big objects. The PropSubjectMap
approach might be fundamentally flawed, performance wise.
Some objects are over 1 MB. Which means reading and writing 1 MB for some atoms.
Length for map https://atomicdata.dev/agents/8S2U/viqkaAQVzUisaolrpX6hx/G/L3e2MTjWA83Rxk= is 1666566 bytes
Length for map https://atomicdata.dev/classes/Commit is 1666571
Value-property index
Instead of having a value
index, we could have a value-property
index. Each key would be a value-property
combination. This would simply be a Set
containing a bunch of subjects.
But... A ValueProperty index for https://atomicdata.dev/classes/Commit - https://atomicdata.dev/properties/isA
would still be huge, so I don't think this would actually solve anything.
Store commits in a separate index
Since about 90% of the time on indexing on my local machine was for the commits, we might be able to skip these, or treat them differently. Would probably mean that some queries would no longer work for commits. Maybe the commit collection will need to change, in order to achieve this.
Post way less resources
This problem only arises if we have... lots of resources. Which we might not need. NestedResources for Documents might be the best place to start.
Use a different approach to (de-)serializing sled data.
If I understand correctly, rkyv allows for mutating resources without deserializing and reserializing the binary. I think this could be a significant part of the slowdown.
Make it a background process
If building an index is slow, that might not be a really big problem, as long as it happens in the background....
We could spin up an actix thread on server initialization, which iterates over all resources.
Or maybe introduce an indexQueue
that contains a bunch of Atoms that have to be processed
from atomic-server.
I'm pretty content with the current implementation. Closing for now.
from atomic-server.
Related Issues (20)
- Dockerfile CI fails, outdated rust compiler HOT 5
- CI: test docker build before deploying
- Allow adjusting domain / server_url without running --initialize
- Auto tag docker images with versions, not just latest
- Remove `tpf` from `atomic-cli`
- URLs in search queries for filtering broken HOT 1
- Order dependence in imports - Class and Property definitions need to be defined first HOT 2
- Rebuild index async on boot
- build_index run twice on atomic-server boot
- Dockerfile broken `exec /atomic-server-bin: no such file or directory` HOT 1
- musl build for cross-platform docker images HOT 4
- Advanced CMS features tracking issue
- Refactor `for_agent` / clear up API usage HOT 1
- Check for unused dependencies in ci
- Importer should error if multiple resources with the same Local-id are imported
- Plugin parameters - as Properties instead of urls?
- Outreach / messaging / community engagement features tracking issue
- Rebrand repo to atomic-server
- Update swagger / openapi spec
- Bearer token support HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from atomic-server.