Comments (3)
Parallel downloading implemented in: c004283 b9a4fc2 fb4bc71 d43b154
from us.
I would like to comment this: "Writes, on the other hand, cannot be significantly sped up by adding parallelism. This is because, as previously mentioned, writing requires accessing all of the hosts, so you'll always be bottlenecked by the slowest host. (Imagine that you have two hosts: one that takes 1s per sector, and one that takes 10s. You might be able to upload all of the first host's data before the second has finished a single sector, but this doesn't improve your overall throughput; you still need to wait for the second host to finish.)"
At first you need to add some levels of abstraction for polsibiility to implement more advanced algorithm. I propose to have next entities:
- job for upload - total data which must be uploaded
- chunk - part of very small data from job (may be it's not need)
- worker - it's process for upload data to host. (you will have several workers)
- pool of available hosts
- some datbase for posibillity to store map of allocation
Algorithm
- You have 1000 TB of data and need to upload it's to 2 hosts. It's mean that one host must to have not more than 512 GB of data, but really all this data can be uploaded to the 3 or more hosts. May be posible to upload 724 GB to one host and 300GB to the other host. All this must be customized by users - becuase it's deppend from the politic of security police.
- When job is started - some proces must provide data by chunks for each worker. Worker it's also separate process - you have 2 workers for 2 hosts.
- In the moment when worker requesting the next chunk to upload you need to check performance by some stored map of requested chunks for this job. Here you can detect that this worker already uploaded more data then other worker and you will have possibility to provide for worker not only next chunk but also nominate for uploding and some host from your pool.
This algo is very flexible and will improve uploading speed even if you will have some slow hosts.
from us.
Closing this; we can address other forms of I/O parallelism in separate issues.
from us.
Related Issues (20)
- Transaction spends a nonexisting siacoin output HOT 1
- Migrate to RHP3 HOT 1
- RenewContract: ReadResponse: consensus conflict: provided transaction set is standalone and invalid: transaction cannot have an output or payout that has zero value HOT 7
- PseudoKV roadmap (discussion) HOT 1
- NewEphemeralMetaDB misses initializing meta
- Hot swapping PseudoKV contracts HOT 12
- AddShard uses bucketChunks to calculate the next ID HOT 1
- Tooling to retry failed contract renewals HOT 5
- Conversion from untyped int to string yields a string of one rune, not a string of digits HOT 4
- BoltMetaDB.ForEachBlob can cause a deadlock HOT 9
- PseudoKV migration acceleration
- Data race in ParallelBlobDownloader HOT 1
- Rejected for low paying host missed output HOT 5
- TestCompatibility takes more than 10min and panics HOT 4
- Could not find the desired sector HOT 12
- Does migration re-upload everything? HOT 2
- Lock: couldn't read LoopLock response: unexpected EOF HOT 3
- Delete files in parallel (ParallelSectorDeleter) HOT 2
- HostSet.Close forgets releasing locks HOT 5
- tryLock is not released properly
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from us.