Giter Site home page Giter Site logo

Comments (3)

mdavis-xyz avatar mdavis-xyz commented on July 19, 2024

Actually, I just noticed that the .seek() function isn't actually called. It merely needs to be defined, it doesn't have to work.
Therefore I think this is a bug with the code, not a documentation issue.

from io import BytesIO
import polars as pl

class MyFile:
    def __init__(self, f):
        self.f = f
        self.closed = False

    def read(self, size=-1):
        assert not self.closed
        return self.f.read(size)

    def seek(self, pos, whence=os.SEEK_SET):
        raise NotImplementedError()
    
    def close(self):
        self.closed = True
        self.f.close()


csv = b"""a,b
1,2
3,4
"""

with BytesIO(csv) as bio:
    mf = MyFile(bio)
    pl.read_csv(mf)

This code does not throw a NotImplementedError error. The dataframe is created successfully.

from polars.

cjackal avatar cjackal commented on July 19, 2024

I think the polars documentation should point directly to the python standard definition of file-like object, which explicitly says that a file-like object should implement IOBase, rather than giving a vague rephrasing of it?

from polars.

mdavis-xyz avatar mdavis-xyz commented on July 19, 2024

I can confirm that when I inherit from IOBase there is no error. (Even if I set seekable() to return True, seek is still not called.)

Although, that definition you referenced does not actually define a file-like object as being subclassed from io.IOBase. Merely that a file-like object is an object with some unspecified subset of the methods defined in the whole io module. So the example class I gave does technically satisfy this definition. (And the function signature of .read() and others doesn't even have to be the same, according to the docs for io.IOBase. )

So the docs for polars still need to define what methods are needed. e.g. open('file.csv', 'w') gives you a file-like object, but obviously that's not suitable. Being read-able is obvious, but it's not obvious whether it should have .readline(), or .seek() implemented. (And it's possible to have some seek capabilities implemented without others. e.g. forward and not back. Or no seeking relative to the file end.) What about being iterable? Merely saying it's subclassed from io.IOBase does not clarify this.

I think that if polars throws an error when .seek() is not defined, but doesn't actually call seek, then that's not in the spirit of duck typing. So I'm still of the view that the behaviour of read_csv should be changed to not require seek() to be defined. And if we don't do that, then just document that it needs to be defined but doesn't need to work.

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.