Giter Site home page Giter Site logo

Comments (5)

Esword618 avatar Esword618 commented on July 22, 2024 1

In pandas, I use the read_csv function of pandas and then use the period parameter sep='\s+' to split the data.

df = pd.read_csv(filename, header=None, skiprows=6, sep='\s+')

from polars.

JulianCologne avatar JulianCologne commented on July 22, 2024 1

yeah, this also works but as I said currently polars does not support regex or string separator but only a single char.

there are workarounds but they are not very nice 😆

DATA = """\
     11.50225    34.62792   341.48861    60.23845    33.86916   340.52216
     16.08011    46.36068   112.74108    82.09562    45.90745   112.68871
      5.44448    64.20202    84.74526    92.26079    63.48149    84.83877
    154.21007    40.30874   284.20968   248.08102    40.32464   284.05453
     44.78606    81.08370   306.90320   207.53215    80.58101   307.01056
    187.79354    52.18742   348.14328   254.43741    52.35809   348.16040
      3.19632    58.35471   336.89014    83.53841    59.67276   335.88022
      4.53459    54.00255    23.75481    66.02106    51.58699    23.86702
"""

pl.read_csv(DATA.encode(), has_header=False, new_columns=["data"]).with_columns(
    pl.col("data")
    .str.strip_chars(" ")
    .str.replace_all(" +", " ")
    .str.split(" ")
    .list.to_struct()
).unnest(columns="data").with_columns(pl.all().cast(pl.Float64))

shape: (8, 6)
┌───────────┬──────────┬───────────┬───────────┬──────────┬───────────┐
│ field_0field_1field_2field_3field_4field_5   │
│ ------------------       │
│ f64f64f64f64f64f64       │
╞═══════════╪══════════╪═══════════╪═══════════╪══════════╪═══════════╡
│ 11.5022534.62792341.4886160.2384533.86916340.52216 │
│ 16.0801146.36068112.7410882.0956245.90745112.68871 │
│ 5.4444864.2020284.7452692.2607963.4814984.83877  │
│ 154.2100740.30874284.20968248.0810240.32464284.05453 │
│ 44.7860681.0837306.9032207.5321580.58101307.01056 │
│ 187.7935452.18742348.14328254.4374152.35809348.1604  │
│ 3.1963258.35471336.8901483.5384159.67276335.88022 │
│ 4.5345954.0025523.7548166.0210651.5869923.86702  │
└───────────┴──────────┴───────────┴───────────┴──────────┴───────────┘

However, best way if the file is not huge is probably to read the data, replace all \s+ with ',' and then read_csv the "clean" csv using polars

from polars.

JulianCologne avatar JulianCologne commented on July 22, 2024

afaik this is not possible with polars currently because the separator must be a single character.

what you are looking for is the equivalent of pandas read_fwf to read "fixed-width-formatted" data (https://pandas.pydata.org/docs/reference/api/pandas.read_fwf.html)

there are a few issues already but it is not yet supported.

#8312 #3151

from polars.

IsmaelMousa avatar IsmaelMousa commented on July 22, 2024

no, because the implementation of the separator param behaviour in the read_csv method only accept a single byte character.

from polars.

stinodego avatar stinodego commented on July 22, 2024

As answered above: this is not possible.

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.