Comments (7)
Quick question before going further: if you use GCS, why not use gcsfs?
Secondly, rm_file
(or async version) does I think exactly what you want, but only on one file at a time, with no recursion/expansion.
from s3fs.
Thanks for your quick response, @martindurant !
Quick question before going further: if you use GCS, why not use gcsfs?
That's something we can consider. My initial thought is that we simultaneously want to support AWS S3, GCS, seaweedfs, and localstack with a single common filesystem abstraction. So far s3fs
has been able to do that for us pretty well, with the exception of not having convenient access to call the delete_object
API (although there may be more inconsistencies we haven't run into yet).
Secondly,
rm_file
(or async version) does I think exactly what you want, but only on one file at a time, with no recursion/expansion.
I might be missing something, but from what I can tell, rm_file
will invoke _rm
, which on an instance of S3FileSystem
will force the use of the delete_objects
API. The implementation of _rm
on the parent class (AsyncFileSystem
) has exactly what I'm looking for (access to the delete_object
endpoint that even supports recursive
), but I haven't found a blocking/synchronous entrypoint for that logic on the S3FileSystem
class...
from s3fs.
a single common filesystem abstraction
The various implementations of fsspec are designed to be as close to each other as possible. Unfortunately, there can be many optional features, and S3 is particularly guilty of this, since it's not quite a standard, and the permissioning model is complex.
rm_file will invoke _rm
This is the current implementation of _rm_file:
async def _rm_file(self, path, **kwargs):
bucket, key, _ = self.split_path(path)
self.invalidate_cache(path)
try:
await self._call_s3("delete_object", Bucket=bucket, Key=key)
except ClientError as e:
raise translate_boto_error(e)
and you can see the call to "delete_object". The AsyncFileSystem's _rm
calls _rm_file
, which is why calling the superclass works for you. The "bulk delete" offered by S3FileSystem's _rm
is more efficient for most people.
from s3fs.
rm_file will invoke _rm
This is the current implementation of _rm_file:
Thanks for pointing that out. Indeed, the private method S3FileSystem._rm_file
would also work for me, although I'd lose out on the convenience of recursive
that AsyncFileSystem._rm
has.
Unfortunately, there also doesn't appear to be a public/blocking/synchronous entrypoint for calling _rm_file
. The public rm_file
API doesn't actually invoke _rm_file
(copied below from fsspec
).
def rm_file(self, path):
"""Delete a file"""
self._rm(path)
I think my solution suggestion for s3fs
would be to implement an override for the AbstractFileSystem's rm_file
method that calls _rm_file
instead of _rm
(ideally in strong imitation of AsyncFileSystem._rm
). However, I'm not sure how taboo it would be to override a method from the spec class...
from s3fs.
(ideally in strong imitation of
AsyncFileSystem._rm
). However, I'm not sure how taboo it would be to override a method from the spec class...
Actually, I can see that imitating AsyncFileSystem._rm
would not be appropriate since it would diverge from the method signature of AbstractFileSystem.rm_file
, so just overriding it to call _rm_file
instead of _rm
would seem more appropriate.
from s3fs.
The public rm_file API doesn't actually invoke _rm_file
It does! Sync variants of all the async methods listed in AsyncFileSystem are auto-generated.
In [4]: fs = fsspec.filesystem("s3")
In [6]: fs.rm_file??
Signature: fs.rm_file(path, **kwargs)
Docstring: Delete a file
Source:
async def _rm_file(self, path, **kwargs):
bucket, key, _ = self.split_path(path)
self.invalidate_cache(path)
try:
await self._call_s3("delete_object", Bucket=bucket, Key=key)
except ClientError as e:
raise translate_boto_error(e)
from s3fs.
The public rm_file API doesn't actually invoke _rm_file
It does! Sync variants of all the async methods listed in AsyncFileSystem are auto-generated.
Fantastic! I didn't realize that it would be sort of auto-generated. Thanks!
from s3fs.
Related Issues (20)
- S3fs doesn't check again if file exists HOT 4
- Inconsistent recursive `put` behavior when running an identical command twice successively HOT 1
- open_async file is closed on arrival HOT 1
- set_session does not seem to be thread / jobs safe HOT 4
- Random XAmzContentSHA256Mismatch Errors HOT 6
- Access denied when providing an authentication token associated with a set of permission policies to S3FileSystem HOT 3
- calling flush on s3fs fails HOT 2
- s3fs 2024.3.0 fails reading glob patterns through pandas HOT 12
- Question: is awscrt useful ? HOT 2
- Errors when installing s3fs on Sagemaker Studio HOT 1
- Why isn't Pathlib supported yet? HOT 1
- Working example of using Async/Await HOT 7
- Custom s3 compatible https endpoint not working, port forwarded to localhost works HOT 9
- How to Increase async httpconnection limit? HOT 7
- Does aioboto3 Support Authentication with EC2 IAM Roles? HOT 3
- upload function didn't recognize the file path having "[]". HOT 4
- How to upload a list of files from local fs to cloud s3 fs async? HOT 3
- Writing metadata with underscores fail silently HOT 1
- s3fs.exists incorrectly returns False after calling glob
- fsspec.generic.rsync(<s3_path>, <s3_path>) raises FileNotFoundError HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from s3fs.