Giter Site home page Giter Site logo

Comments (3)

RDWimmers avatar RDWimmers commented on June 2, 2024 1

I see two options now, both breaking changes:

  • Require more content about the file argument (e.g. with a string_content=["path","file"] argument, but that would only have a meaning for file arguments of type str (kinda ugly)
  • As pandas.read_csv() does: Accept str | io.StringIO types for file and always infer the str type as a Path, and the io.StringIO as file content. This would require the user to convert a string to StringIO object

I like the second one. we can set the type of filepath_or_buffer to Path | io.StringIO. then we dont use strings and can use the Path.is_file() method.

from pygef.

tlukkezen avatar tlukkezen commented on June 2, 2024

In the current implementation, we just try to parse the possible contents of file and if it fails we move on to the next. The last parsing option that fails (parsing as bro-xml by default) is the one that provides the error. This is random behaviour and is the reason that the error that is returned doesn't reflect the actual problem.

I see the following resolutions now:

  • Require more content about the file argument (e.g. with a string_content=["path","file"] argument, but that would only have a meaning for file arguments of type str (kinda ugly) (Breaking change)

  • As pandas.read_csv() does: Accept str | io.StringIO types for file and always infer the str type as a Path, and the io.StringIO as file content. This would require the user to convert a string to StringIO object. (Breaking change)

  • Return a general custom exception (e.g. CPTParsingError) when all parsing options have failed. Although this will still return a ValueError for an erroneous gef-file path.

from pygef.

tversteeg avatar tversteeg commented on June 2, 2024

When a string has the form of a path separated by slashes or even a single short word it can never be a valid XML or GEF file right? All XML files need to start with a < character, so that's easy, and all GEF files have the form of key: value, so I think a proper heuristic would be:

Is almost certainly path if:

  1. Is the length <= 255 characters
  2. Does it end with case-insensitive .gef or .xml
  3. Does it not contain any < or : characters
  4. Can it be opened from disk

from pygef.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.