Giter Site home page Giter Site logo

Comments (7)

martindurant avatar martindurant commented on August 11, 2024

Quick question before going further: if you use GCS, why not use gcsfs?

Secondly, rm_file (or async version) does I think exactly what you want, but only on one file at a time, with no recursion/expansion.

from s3fs.

tboddyspargo avatar tboddyspargo commented on August 11, 2024

Thanks for your quick response, @martindurant !

Quick question before going further: if you use GCS, why not use gcsfs?

That's something we can consider. My initial thought is that we simultaneously want to support AWS S3, GCS, seaweedfs, and localstack with a single common filesystem abstraction. So far s3fs has been able to do that for us pretty well, with the exception of not having convenient access to call the delete_object API (although there may be more inconsistencies we haven't run into yet).

Secondly, rm_file (or async version) does I think exactly what you want, but only on one file at a time, with no recursion/expansion.

I might be missing something, but from what I can tell, rm_file will invoke _rm, which on an instance of S3FileSystem will force the use of the delete_objects API. The implementation of _rm on the parent class (AsyncFileSystem) has exactly what I'm looking for (access to the delete_object endpoint that even supports recursive), but I haven't found a blocking/synchronous entrypoint for that logic on the S3FileSystem class...

from s3fs.

martindurant avatar martindurant commented on August 11, 2024

a single common filesystem abstraction

The various implementations of fsspec are designed to be as close to each other as possible. Unfortunately, there can be many optional features, and S3 is particularly guilty of this, since it's not quite a standard, and the permissioning model is complex.

rm_file will invoke _rm

This is the current implementation of _rm_file:

    async def _rm_file(self, path, **kwargs):
        bucket, key, _ = self.split_path(path)
        self.invalidate_cache(path)

        try:
            await self._call_s3("delete_object", Bucket=bucket, Key=key)
        except ClientError as e:
            raise translate_boto_error(e)

and you can see the call to "delete_object". The AsyncFileSystem's _rm calls _rm_file, which is why calling the superclass works for you. The "bulk delete" offered by S3FileSystem's _rm is more efficient for most people.

from s3fs.

tboddyspargo avatar tboddyspargo commented on August 11, 2024

rm_file will invoke _rm

This is the current implementation of _rm_file:

Thanks for pointing that out. Indeed, the private method S3FileSystem._rm_file would also work for me, although I'd lose out on the convenience of recursive that AsyncFileSystem._rm has.

Unfortunately, there also doesn't appear to be a public/blocking/synchronous entrypoint for calling _rm_file. The public rm_file API doesn't actually invoke _rm_file (copied below from fsspec).

    def rm_file(self, path):
        """Delete a file"""
        self._rm(path)

I think my solution suggestion for s3fs would be to implement an override for the AbstractFileSystem's rm_file method that calls _rm_file instead of _rm (ideally in strong imitation of AsyncFileSystem._rm). However, I'm not sure how taboo it would be to override a method from the spec class...

from s3fs.

tboddyspargo avatar tboddyspargo commented on August 11, 2024

(ideally in strong imitation of AsyncFileSystem._rm). However, I'm not sure how taboo it would be to override a method from the spec class...

Actually, I can see that imitating AsyncFileSystem._rm would not be appropriate since it would diverge from the method signature of AbstractFileSystem.rm_file, so just overriding it to call _rm_file instead of _rm would seem more appropriate.

from s3fs.

martindurant avatar martindurant commented on August 11, 2024

The public rm_file API doesn't actually invoke _rm_file

It does! Sync variants of all the async methods listed in AsyncFileSystem are auto-generated.

In [4]: fs = fsspec.filesystem("s3")

In [6]: fs.rm_file??
Signature: fs.rm_file(path, **kwargs)
Docstring: Delete a file
Source:
    async def _rm_file(self, path, **kwargs):
        bucket, key, _ = self.split_path(path)
        self.invalidate_cache(path)

        try:
            await self._call_s3("delete_object", Bucket=bucket, Key=key)
        except ClientError as e:
            raise translate_boto_error(e)

from s3fs.

tboddyspargo avatar tboddyspargo commented on August 11, 2024

The public rm_file API doesn't actually invoke _rm_file

It does! Sync variants of all the async methods listed in AsyncFileSystem are auto-generated.

Fantastic! I didn't realize that it would be sort of auto-generated. Thanks!

from s3fs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.