Giter Site home page Giter Site logo

Comments (11)

concretevitamin avatar concretevitamin commented on August 20, 2024

Uploaded a write capacity screenshot on the edgestore table: pic

It seems like writes are clustered in alternating periods of ~10 minutes. What's the loader instance / DynamoDB doing in the other no-activity 10-minutes?

from dynamodb-janusgraph-storage-backend.

amcp avatar amcp commented on August 20, 2024

Please see my detailed reply on your first issue. The alternating periods are due to writes alternating from edgestore to graphindex table. They will happen in parallel if you correctly estimate the required setting for storage.buffer-size.

Assuming you use the multiple-item data model throughout, and if the BatchGraph batch size is 1000, and each of your vertices has 5 properties that are indexed, adding the hidden exists property and the indexed id property, sounds like each vertex incurs 13 key-column mutations. 13*1000 = 13000 so I would double that to account for poorly formed input data and set storage.buffer-size to 26k in this case. About 7 writes would go to edgestore and 6 to graphindex so if you set edgestore provisioned writes to 7000 wps and graphindex to 6000 wps for the vertex load, it seems like you should be able to write 1000 vertices per second, or 10 million vertices in a little less than three hours. The number of properties and property indexes will determine how much throughput you need to provision for this data load.

from dynamodb-janusgraph-storage-backend.

amcp avatar amcp commented on August 20, 2024

Oh, and thank you for the warm wishes in your first issue! We are excited to share the DynamoDB Storage Backend for Titan.

from dynamodb-janusgraph-storage-backend.

concretevitamin avatar concretevitamin commented on August 20, 2024

@amcp Thanks so much for your quick & detailed responses!

I see where the issue is now. However, my graph is unfortunately complicated. Each vertex has 40 indexed properties, so each vertex incurs 42+41=83 mutations. Now let's assume storage.buffer-size is set to 83 * <buffer size in batch graph> (actually, one thing in the analysis that's not clear to me -- what's the recommended BatchGraph buffer size here? 1k, 100k?), so that we can fully utilize the 13000wps capacity.

For 10 million nodes, that's 10*10^6 * 83 / 13000 / 60 / 60, which is 18 hours! That's unfortunately too long. Does this calculation sound about right?

I've put in a service limit request yesterday to increase the write capacity. But the weekend has hit, which is unfortunate. Is it possible for that request to be expedited?

from dynamodb-janusgraph-storage-backend.

concretevitamin avatar concretevitamin commented on August 20, 2024

I'm now experimenting with the following: BatchGraph buffer size 200k, storage.buffer-size 16.8 million (83 * first number, with some scratch space), and s.d.client.executor.max-queue-length 16.8 million as well (since the comment says it should be as large).

Now the graphindex and edgestore writes are in parallel, but there's consistently ~50-60 writes per second. This is in comparison to my previous run, where -- although not in parallel -- I saw ~5000 writes per second.

Any ideas?

from dynamodb-janusgraph-storage-backend.

amcp avatar amcp commented on August 20, 2024

Yes, make sure you up the rate limiters, thread pool, and max HTTP connections in addition to the provisioning of the tables.

from dynamodb-janusgraph-storage-backend.

amcp avatar amcp commented on August 20, 2024

Write rate limits:
https://github.com/awslabs/dynamodb-titan-storage-backend/blob/0.5.4/src/test/resources/META-INF/dynamodb_store_manager_test.properties#L46

HTTP connections:
https://github.com/awslabs/dynamodb-titan-storage-backend/blob/0.5.4/src/test/resources/META-INF/dynamodb_store_manager_test.properties#L95

Threads:
https://github.com/awslabs/dynamodb-titan-storage-backend/blob/0.5.4/src/test/resources/META-INF/dynamodb_store_manager_test.properties#L119

Queue length:
https://github.com/awslabs/dynamodb-titan-storage-backend/blob/0.5.4/src/test/resources/META-INF/dynamodb_store_manager_test.properties#L123

Other considerations:

  1. Are you running on EC2?
  2. If you still are not able to achieve your previous throughput, turn on metrics and we can troubleshoot this some more.
    https://github.com/awslabs/dynamodb-titan-storage-backend/blob/0.5.4/src/test/resources/META-INF/dynamodb_store_manager_test.properties#L156

from dynamodb-janusgraph-storage-backend.

amcp avatar amcp commented on August 20, 2024

Rule of thumb for threads and connections: 1 thread and 1 connection per 50 writes per second in multiple item data model.

from dynamodb-janusgraph-storage-backend.

amcp avatar amcp commented on August 20, 2024

Closing this issue due to lack of activity. If you run into any more issues loading data, please let us know.

from dynamodb-janusgraph-storage-backend.

concretevitamin avatar concretevitamin commented on August 20, 2024

@amcp Thanks, Alex. I didn't get a chance to test your new suggestions. Will report back if I do.

from dynamodb-janusgraph-storage-backend.

BatteryAcid avatar BatteryAcid commented on August 20, 2024

Rule of thumb for threads and connections: 1 thread and 1 connection per 50 writes per second in multiple item data model.

@amcp What's the rule of thumb for reads?

from dynamodb-janusgraph-storage-backend.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.