Comments (3)
Actually, I just noticed that the .seek() function isn't actually called. It merely needs to be defined, it doesn't have to work.
Therefore I think this is a bug with the code, not a documentation issue.
from io import BytesIO
import polars as pl
class MyFile:
def __init__(self, f):
self.f = f
self.closed = False
def read(self, size=-1):
assert not self.closed
return self.f.read(size)
def seek(self, pos, whence=os.SEEK_SET):
raise NotImplementedError()
def close(self):
self.closed = True
self.f.close()
csv = b"""a,b
1,2
3,4
"""
with BytesIO(csv) as bio:
mf = MyFile(bio)
pl.read_csv(mf)
This code does not throw a NotImplementedError
error. The dataframe is created successfully.
from polars.
I think the polars documentation should point directly to the python standard definition of file-like object, which explicitly says that a file-like object should implement IOBase
, rather than giving a vague rephrasing of it?
from polars.
I can confirm that when I inherit from IOBase
there is no error. (Even if I set seekable()
to return True
, seek is still not called.)
Although, that definition you referenced does not actually define a file-like object as being subclassed from io.IOBase
. Merely that a file-like object is an object with some unspecified subset of the methods defined in the whole io module. So the example class I gave does technically satisfy this definition. (And the function signature of .read()
and others doesn't even have to be the same, according to the docs for io.IOBase
. )
So the docs for polars still need to define what methods are needed. e.g. open('file.csv', 'w')
gives you a file-like object, but obviously that's not suitable. Being read-able is obvious, but it's not obvious whether it should have .readline()
, or .seek()
implemented. (And it's possible to have some seek capabilities implemented without others. e.g. forward and not back. Or no seeking relative to the file end.) What about being iterable? Merely saying it's subclassed from io.IOBase
does not clarify this.
I think that if polars throws an error when .seek()
is not defined, but doesn't actually call seek, then that's not in the spirit of duck typing. So I'm still of the view that the behaviour of read_csv
should be changed to not require seek()
to be defined. And if we don't do that, then just document that it needs to be defined but doesn't need to work.
from polars.
Related Issues (20)
- Include example with function accepting multiple arguments in `Expr.map_batches`
- Documentation issue in `normalize`/`name` parameter from `.value_counts()` method HOT 2
- writing to os.devnull
- `Series[list].explode()` should not return `None` for empty lists HOT 2
- write_database to snowflake with adbc engine spouts context canceled error log
- Cannot tell if hvplot version 0.10.0 >= 0.9.1 HOT 2
- dtype 'Time' gets converted to i64 when collect(streaming=True) is used. HOT 1
- `.agg_groups()` PanicException when not used in a group_by context
- Additional Parameter for json_normalize HOT 5
- fold shouldn't require that acc and exprs share the same dtype
- Adding `descending` parameter to `Expr.over` HOT 5
- polars.LazyFrame.head recommends using fetch()
- Reading large json file error: ComputeError: InputTooLarge at character 0
- Serialize for AnyType has a todo!() HOT 1
- File cache invalidation not triggered for HTTP if size is the same
- Loading parquet written from an Arrow table produces non-deterministic incorrect numbers since 1.2.0 HOT 3
- illegal hardware instruction with python 3.12.4 and polars 1.1.0 on MacOS Sanoma 14, M2 HOT 3
- Panic when call `hash()` on `struct` dtype HOT 2
- Multiple - Reading into a single DataFrame - read_csv - Error when using encoding = latin1
- Big integer error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.