Giter Site home page Giter Site logo

Comments (7)

stinodego avatar stinodego commented on July 20, 2024

Why do you think this should be convertible to a float? Looks like a bad input to me.

from polars.

jonashaag avatar jonashaag commented on July 20, 2024

I found this in a random CSV file that I wanted to parse. I guess it's a relatively rare but legitimate ("non-")character that's inserted by Excel/etc for number formatting

from polars.

mcrumiller avatar mcrumiller commented on July 20, 2024

Was it the first two bytes? FEFF is the optional byte-order mark for BE Utf-16, and it's probably ending up in your CSV due to being exported with a specified encoding and byte order. I've had plenty of CSVs that get wonky due to this mark, and I usually open them and re-encode them as UTF-8 in Notepad++ or something similar.

I don't know if polars checks the byte order mark if it exists. Maybe it should, but if so that should be its own issue. But it doesn't seem to have caused many problems for people up until now.

from polars.

jonashaag avatar jonashaag commented on July 20, 2024

No, it was between two separators like this

transaction_id;amount
abcdef;-\ufeff42

from polars.

mcrumiller avatar mcrumiller commented on July 20, 2024

According to https://www.unicode.org/faq/utf_bom.html#bom6, that shouldn't be there at all. I'd do a replace prior to trying to do anything with that string.

from polars.

jonashaag avatar jonashaag commented on July 20, 2024

Hmm ok sounds like an actual bug in my data then.

The only annoying thing that remains a potential todo is to improve displaying of the character in the error message. Although not sure how to display it since it's a "zero width" character...

from polars.

stinodego avatar stinodego commented on July 20, 2024

Yeah this doesn't sound like something Polars should fix. Closing then!

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.