Giter Site home page Giter Site logo

Support parallel downloads about us HOT 3 CLOSED

lukechampine avatar lukechampine commented on June 9, 2024
Support parallel downloads

from us.

Comments (3)

lukechampine avatar lukechampine commented on June 9, 2024 2

Parallel downloading implemented in: c004283 b9a4fc2 fb4bc71 d43b154

from us.

vargrant avatar vargrant commented on June 9, 2024

I would like to comment this: "Writes, on the other hand, cannot be significantly sped up by adding parallelism. This is because, as previously mentioned, writing requires accessing all of the hosts, so you'll always be bottlenecked by the slowest host. (Imagine that you have two hosts: one that takes 1s per sector, and one that takes 10s. You might be able to upload all of the first host's data before the second has finished a single sector, but this doesn't improve your overall throughput; you still need to wait for the second host to finish.)"

At first you need to add some levels of abstraction for polsibiility to implement more advanced algorithm. I propose to have next entities:

  1. job for upload - total data which must be uploaded
  2. chunk - part of very small data from job (may be it's not need)
  3. worker - it's process for upload data to host. (you will have several workers)
  4. pool of available hosts
  5. some datbase for posibillity to store map of allocation

Algorithm

  1. You have 1000 TB of data and need to upload it's to 2 hosts. It's mean that one host must to have not more than 512 GB of data, but really all this data can be uploaded to the 3 or more hosts. May be posible to upload 724 GB to one host and 300GB to the other host. All this must be customized by users - becuase it's deppend from the politic of security police.
  2. When job is started - some proces must provide data by chunks for each worker. Worker it's also separate process - you have 2 workers for 2 hosts.
  3. In the moment when worker requesting the next chunk to upload you need to check performance by some stored map of requested chunks for this job. Here you can detect that this worker already uploaded more data then other worker and you will have possibility to provide for worker not only next chunk but also nominate for uploding and some host from your pool.

This algo is very flexible and will improve uploading speed even if you will have some slow hosts.

from us.

lukechampine avatar lukechampine commented on June 9, 2024

Closing this; we can address other forms of I/O parallelism in separate issues.

from us.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.