Comments (5)
In pandas, I use the read_csv
function of pandas and then use the period parameter sep='\s+'
to split the data.
df = pd.read_csv(filename, header=None, skiprows=6, sep='\s+')
from polars.
yeah, this also works but as I said currently polars does not support regex or string separator but only a single char.
there are workarounds but they are not very nice 😆
DATA = """\
11.50225 34.62792 341.48861 60.23845 33.86916 340.52216
16.08011 46.36068 112.74108 82.09562 45.90745 112.68871
5.44448 64.20202 84.74526 92.26079 63.48149 84.83877
154.21007 40.30874 284.20968 248.08102 40.32464 284.05453
44.78606 81.08370 306.90320 207.53215 80.58101 307.01056
187.79354 52.18742 348.14328 254.43741 52.35809 348.16040
3.19632 58.35471 336.89014 83.53841 59.67276 335.88022
4.53459 54.00255 23.75481 66.02106 51.58699 23.86702
"""
pl.read_csv(DATA.encode(), has_header=False, new_columns=["data"]).with_columns(
pl.col("data")
.str.strip_chars(" ")
.str.replace_all(" +", " ")
.str.split(" ")
.list.to_struct()
).unnest(columns="data").with_columns(pl.all().cast(pl.Float64))
shape: (8, 6)
┌───────────┬──────────┬───────────┬───────────┬──────────┬───────────┐
│ field_0 ┆ field_1 ┆ field_2 ┆ field_3 ┆ field_4 ┆ field_5 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═══════════╪══════════╪═══════════╪═══════════╪══════════╪═══════════╡
│ 11.50225 ┆ 34.62792 ┆ 341.48861 ┆ 60.23845 ┆ 33.86916 ┆ 340.52216 │
│ 16.08011 ┆ 46.36068 ┆ 112.74108 ┆ 82.09562 ┆ 45.90745 ┆ 112.68871 │
│ 5.44448 ┆ 64.20202 ┆ 84.74526 ┆ 92.26079 ┆ 63.48149 ┆ 84.83877 │
│ 154.21007 ┆ 40.30874 ┆ 284.20968 ┆ 248.08102 ┆ 40.32464 ┆ 284.05453 │
│ 44.78606 ┆ 81.0837 ┆ 306.9032 ┆ 207.53215 ┆ 80.58101 ┆ 307.01056 │
│ 187.79354 ┆ 52.18742 ┆ 348.14328 ┆ 254.43741 ┆ 52.35809 ┆ 348.1604 │
│ 3.19632 ┆ 58.35471 ┆ 336.89014 ┆ 83.53841 ┆ 59.67276 ┆ 335.88022 │
│ 4.53459 ┆ 54.00255 ┆ 23.75481 ┆ 66.02106 ┆ 51.58699 ┆ 23.86702 │
└───────────┴──────────┴───────────┴───────────┴──────────┴───────────┘
However, best way if the file is not huge is probably to read the data, replace all \s+
with ',' and then read_csv
the "clean" csv using polars
from polars.
afaik this is not possible with polars currently because the separator must be a single character.
what you are looking for is the equivalent of pandas read_fwf
to read "fixed-width-formatted" data (https://pandas.pydata.org/docs/reference/api/pandas.read_fwf.html)
there are a few issues already but it is not yet supported.
from polars.
no, because the implementation of the separator
param behaviour in the read_csv
method only accept a single byte character.
from polars.
As answered above: this is not possible.
from polars.
Related Issues (20)
- read_json errors when parsing NaN HOT 2
- Incorrect ``by` column should not have null values in 'rolling by' expression` error
- Filters not being pushed down to pyarrow dataset HOT 11
- Creating DataFrame from python list HOT 1
- Panic when displaying LazyFrame in Jupyter HOT 3
- `LazyFrame.filter` fails to parse non-`Sequence` iterables
- select in join with same-named columns returns the wrong data HOT 2
- `InvalidOperationError` when using fill_null after `.list.sum()` on bool list
- ColumnNotFoundError when doing SQL `GROUP BY` on a column projection (function on column)
- Panic when using `Series.clear` on nested Object types
- parquet file created by .sink_parquet got Invalid thrift: protocol error
- Request for rolling_top_k HOT 1
- Polars to_numpy slower with chunked array than going via pandas HOT 7
- Using a List Comprehension to Generate Expressions Results in the Wrong `.name.map()` HOT 2
- Stack Overflow (?) crash when using `all_horizontal` HOT 4
- Errors out when calling .strftime('%+') HOT 2
- Allow for right-open cumulative operations
- Filtering dates in SQL using BETWEEN raises a ComputeError HOT 1
- Filter doesn't work as expected HOT 3
- scan_csv infer_schema_length=None doesn't merge across files HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.