Comments (7)
There are many levers to pull, actually. How are you setting the pool, what kind of benchmark are you running, and do you have an idea of what your current bottleneck may be caused by? Since fsspec generally maintains its own IO thread/loop, a significant increase in performance is something I'd be happy to bake in.
from s3fs.
@martindurant I am currently passing this to the S3FileSystem
: config_kwargs={"max_pool_connections": 50},
.
I was checking with iftop what peak transfer rate was, it was just 50Mb out of 1Gbps network capacity (aks -> LakeFS on aks -> azure blob). It took around 15secs to read 6000 txt files. I think it could go faster but not sure :)
from s3fs.
Would you mind making a graph of max_pool versus throughput? How many files (~ coroutines) are in flight?
from s3fs.
@martindurant do you have some examples on how to access these things during execution?
from s3fs.
- I thought throughput was exactly what you were already measuring
- The number of files you should be able to get from a normal glob or expand_paths call.
- You could maybe use callbacks to measure the coroutines, but probably you would need to hack something into maybe fsspec.asyn._runner
from s3fs.
Related Issues (20)
- Failed to check IAM role name HOT 2
- Access Denied when IAM policy give access (Read/Write/Listing) to only a prefix area HOT 14
- difficult to perform delete_object request instead of delete_objects using S3FileSystem HOT 7
- S3fs doesn't check again if file exists HOT 4
- Inconsistent recursive `put` behavior when running an identical command twice successively HOT 1
- open_async file is closed on arrival HOT 1
- set_session does not seem to be thread / jobs safe HOT 4
- Random XAmzContentSHA256Mismatch Errors HOT 6
- Access denied when providing an authentication token associated with a set of permission policies to S3FileSystem HOT 3
- calling flush on s3fs fails HOT 2
- s3fs 2024.3.0 fails reading glob patterns through pandas HOT 12
- Question: is awscrt useful ? HOT 2
- Errors when installing s3fs on Sagemaker Studio HOT 1
- Why isn't Pathlib supported yet? HOT 1
- Working example of using Async/Await HOT 7
- Custom s3 compatible https endpoint not working, port forwarded to localhost works HOT 9
- Does aioboto3 Support Authentication with EC2 IAM Roles? HOT 3
- upload function didn't recognize the file path having "[]". HOT 4
- How to upload a list of files from local fs to cloud s3 fs async? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from s3fs.