Giter Site home page Giter Site logo

pygef's People

Contributors

breinbaas avatar dependabot[bot] avatar ic144 avatar jmmaljaars avatar maarten-betman avatar martinapippi avatar rdwimmers avatar ritchie46 avatar sboonstraabt avatar tlukkezen avatar tversteeg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pygef's Issues

Redundant columns. Can be derived.

Some columns are derived from other columns. Much like property decorator in python classes.

Reduce redundant data by:

self._df = <dataframe w/ base columns>

@property
def df(self):
    df = self.__df.assign(derived_a=self.__df['a'] + 2)
    df = self.__df.assign(derived_b=self.__df['b'] + 2)
    return df

Bug in the grouping algorithm

The following code produces and empty DataFrame

import os
from pygef import Cpt

path_cpt = os.path.join(os.environ.get("DOC_PATH"), "../pygef/test_files/cpt.gef")

cpt = Cpt(path_cpt)
cpt.classify(classification="robertson", do_grouping=True, min_thickness=0.2, water_level_NAP=-10)

BUG: Parsing of zid is not working as expected

Current behaviour:

The regex string #ZID[=\s+]+[^,]*[,\s+]+([^,]+) in pygef.utils line 127 is not working as expected.

How to reproduce:

Insert the following text in https://regex101.com/. Using #ZID[=\s+]+[^,]*[,\s+]+([^,]+) the zid will not be parsed correctly.
#TESTID = B38C2094
#XYID = 31000,108025,432470
#ZID = 31000,-1.5
#MEASUREMENTTEXT = 9, maaiveld, vast horizontaal niveau

Possible solution:

Use #ZID[=\s+]+[^,]*[,\s+]+([^?!,$|\s$]+).

inclination column voids not handled well when correcting for depth

Had an issue where the column voids in the inclination column are used to correct for the depth. In combination with a pre-excavation this leads to an incorrect starting depth with respect to the reference level.

EDIT
More detailed: if the first row of the GEF contains a void in the inclination with a value such as -9999 then this value is used to correct for the depth, this will lead to significant errors in the corrected depth when a large pre-excavated depth is present. The desired solution is to deal with voids before the values that are used to indicate a void can be used for calculations.

Unable to read gef file

Hello,

I have been trying to read a gef file using the following code:

from pygef import Cpt

Cpt(r"..\GO\46358_10.GEF")

Unfortunately I get an error:

thread '<unnamed>' panicked at 'python apply failed: Any(InvalidOperation("abs not supportedd for series of type Float64"))', src\lazy\apply.rs:35:19
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "C:\Users\921479\Desktop\GEO\Z002\notebooks\test_gef.py", line 5, in <module>
    Cpt(r"..\GO\46358_10.GEF")
  File "C:\Users\921479\Desktop\GEO\Z002\notebooks\.venv\lib\site-packages\pygef\cpt.py", line 159, in __init__
    parsed = _GefCpt(path)
  File "C:\Users\921479\Desktop\GEO\Z002\notebooks\.venv\lib\site-packages\pygef\gef.py", line 276, in __init__
    self.parse_data(self._headers, self._data, column_names)
  File "C:\Users\921479\Desktop\GEO\Z002\notebooks\.venv\lib\site-packages\polars\lazy\frame.py", line 299, in collect
    return pl.eager.frame.wrap_df(ldf.collect())
pyo3_runtime.PanicException: python apply failed: Any(InvalidOperation("abs not supportedd for series of type Float64"))

I think there is a problem in the .collect() statement when the cpt data is converted to a DataFrame. Has anyone come across this issue before? Does there exist a workaround? Thanks for your consideration.

GEF with single line data, can't be read

Hi,
I have several boreholes gef files and in all gef where the description has a single line (see the attached file below) the code returns

self._df = PyDataFrame.read_csv(

RuntimeError: Any(NoData("empty csv"))

Can you support this? Below the gef file I am referring to

#GEFID= 1, 1, 0
#FILEOWNER= DataWS
#FILEDATE= 2022, 12, 22
#PROJECTID= Lob van Gennep, 2102701 HB, -
#COLUMN= 2
#COLUMNINFO= 1, m, Laag van, 1
#COLUMNINFO= 2, m, Laag tot, 2
#COMPANYID= -, -, 31
#DATAFORMAT= ASCII
#COLUMNSEPARATOR= ;
#COLUMNTEXT= 1
#LASTSCAN= 1
#XYID= 31000, 196276.20, 412672.60, 0.01, 0.01
#ZID= 31000, 13.22, 0.01
#MEASUREMENTCODE= NEN5104, 1, 0, 0, NNI 1989
#MEASUREMENTTEXT= 3, -, plaatsnaam boring
#MEASUREMENTTEXT= 5, 2022-03-02, datum boorbeschrijving
#MEASUREMENTTEXT= 6, Tla, beschrijver lagen
#MEASUREMENTTEXT= 7, 31000, locaal coördinatiesysteem
#MEASUREMENTTEXT= 8, 31000, locaal referentiesysteem
#MEASUREMENTTEXT= 9, maaiveld, vast horizontaal niveau
#MEASUREMENTTEXT= 13, -, boorbedrijf
#MEASUREMENTTEXT= 14, Nee, openbaar
#MEASUREMENTTEXT= 16, 2022-03-02, datum boring
#MEASUREMENTTEXT= 18, Nee, Peilbuis aanwezig
#MEASUREMENTTEXT= 23, Tla, naam boormeester
#MEASUREMENTTEXT= 31, EDM, boormethode1
#MEASUREMENTVAR= 16, 2.500000, m, eind diepte boring
#MEASUREMENTVAR= 31, 2.500000, m, diepte onderkant boortraject1
#SPECIMENTEXT= 11, 1, monstercode monster1
#SPECIMENTEXT= 12, 2022-03-02, datum monster1
#SPECIMENTEXT= 13, 13:18:29, tijd monster1
#SPECIMENTEXT= 14, G, (on)geroerd monster1
#SPECIMENTEXT= 18, 2, monstercode monster2
#SPECIMENTEXT= 19, 2022-03-02, datum monster2
#SPECIMENTEXT= 20, 13:18:29, tijd monster2
#SPECIMENTEXT= 21, G, (on)geroerd monster2
#SPECIMENTEXT= 25, 3, monstercode monster3
#SPECIMENTEXT= 26, 2022-03-02, datum monster3
#SPECIMENTEXT= 27, 13:18:29, tijd monster3
#SPECIMENTEXT= 28, G, (on)geroerd monster3
#SPECIMENVAR= 1, 3.000000, -, aantal monsters
#SPECIMENVAR= 12, 1.000000, m, onderkant monster1
#SPECIMENVAR= 18, 1.000000, m, bovenkant monster2
#SPECIMENVAR= 19, 2.000000, m, onderkant monster2
#SPECIMENVAR= 25, 2.000000, m, bovenkant monster3
#SPECIMENVAR= 26, 2.500000, m, onderkant monster3
#PROCEDURECODE= GEF-BORE-Report, 1, 0, 0, -
#TESTID= 183.HB4
#REPORTCODE= GEF-BORE-Report, 1, 0, 0, -
#RECORDSEPARATOR= !
#OS= DOS
#LANGUAGE= NL
#EOH=
0.0000e+000;2.5000e+000;'Kz3';;'DO BR';;'KHRD';!

Plot error with grouped option

Running the following code

def test_plot_classification_grouped(self):
        gef = Cpt("./tests/test_files/cpt.gef")
        gef.plot(
            show=False,
            classification="three_type_rule",
            do_grouping=True,
            min_thickness=0.2,
            water_level_NAP=-10,
        )

with this cpt

throws an error;

  File "c:\Users\brein\Documents\Development\Python\pygef\.env\lib\site-packages\matplotlib\axes\_axes.py", line 2381, in bar
    bottom = y - height / 2
TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "C:\Python310\lib\unittest\case.py", line 59, in testPartExecutor
    yield
  File "C:\Python310\lib\unittest\case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "C:\Python310\lib\unittest\case.py", line 549, in _callTestMethod
    method()
  File "c:\Users\brein\Documents\Development\Python\pygef\tests\test_plot.py", line 45, in test_plot_classification_grouped
    gef.plot(
  File "c:\Users\brein\Documents\Development\Python\pygef\pygef\cpt.py", line 355, in plot
    return plot.plot_cpt(
  File "c:\Users\brein\Documents\Development\Python\pygef\pygef\plot_utils.py", line 119, in plot_cpt
    fig = add_grouped_classification(
  File "c:\Users\brein\Documents\Development\Python\pygef\pygef\plot_utils.py", line 250, in add_grouped_classification
    plt.barh(
  File "c:\Users\brein\Documents\Development\Python\pygef\.env\lib\site-packages\matplotlib\pyplot.py", line 2403, in barh
    return gca().barh(
  File "c:\Users\brein\Documents\Development\Python\pygef\.env\lib\site-packages\matplotlib\axes\_axes.py", line 2551, in barh
    patches = self.bar(x=left, height=height, width=width, bottom=y,
  File "c:\Users\brein\Documents\Development\Python\pygef\.env\lib\site-packages\matplotlib\__init__.py", line 1412, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "c:\Users\brein\Documents\Development\Python\pygef\.env\lib\site-packages\matplotlib\axes\_axes.py", line 2383, in bar
    raise TypeError(f'the dtypes of parameters y ({y.dtype}) '
TypeError: the dtypes of parameters y (object) and height (object) are incompatible```

Change "NAP" to "ref" or allow only NAP as reference system

Currently it is considered that the height system is always NAP, but this is not always the case. The attribute ParseGEF.height_system can also be different from the one associated to NAP.

image

We can change the name "NAP" to the more generic "ref" in the whole code.

Import from gpkg

There are now CPTs saved in gpkg format. I have a reader that user provides a box with coordains (or polygon) and it reads all cpts exist there (and plots in vtk file format, perhaps irrelevant here).
It would be great if you could make your package compatible with this file. It saves a ton of time to bring each xml/gef separately (plus you don't need an xml reader anymore).

Is this something you can support?

bro_reader - Copy.txt

Test IDs that contain spaces are not fully parsed

If a test ID contains spaces e.g. #TESTID= CPT 01 then it is parsed as: CPT. This becomes problematic when a series of CPTs is enumerated in this way.

The desired solution for this would be to parse everything on the line behind #TESTID= , so in the example this would result in CPT 01.

Fix: empty column is added when data row doesn't ends with COLUMNSEPARATOR

A new and empty column is added when the rows in the cpt data end with solely the column-separator value (in _GefCpt.parse_data())

The last (redundant) column separator is only removed when it is followed by "!"

Solution

Data records should always be stripped of trailing Column AND Record separators, even if one of them is not present.

Feature request: Extract Cone Id

Request that feature is added to extract cone_id.
Rationale is:

  • We compare cone_id's to the calibration certificates provided by the soil investigation subcontractors;
  • We track how many meters and tests the individual cones have done, this can flag the cones for recalibration;
  • Monitor the zero-drift over time;
  • Cone id is saved in database linked to corresponding tests. If a cone appears faulty, all previous tests can be inspected

I will make a PR with this feature

Handle columnvoid headers for each column separately

Currently, the first encountered valid #COLUMNVOID header is used and replaced with None in all columns. The other #COLUMNVOID values are effectively ignored.

Desired functionality

The #COLUMNVOID values are provided for each column separately in the .gef file and should be applied to the corresponding columns only.

polars internal errors

for polars[pyarrow]<0.16.3:

TypeError: with_columns() takes from 1 to 2 positional arguments but 3 were given

bump polars to 0.16.3

Robertson classification and pandas

In pandas 1.4.3, when I classify a CPT, i get an error
'DataFrame' object does not support 'Series' assignment by index. Use 'DataFrame.with_columns'

Add Coordinate Reference System EPSG codes as attributes of Cpt object

The pygef objects currently have coordinates (e.g. x, y and z) but have no proper universal Coordinate Reference System (CRS) definition. Only the "height_system" is a vertical CRS attribute, but the codes are linked to the GEF format, which is not a universal standard.

As a user I would like to be able to access an attribute of a Cpt object with the EPSG codes for both the horizontal (x & y) and vertical (z) oriented coordinates. These could be two attributes, e.g. xy_epsg, z_epsg.

EPSG codes are universally recognized geodetic definitions and have a scope way beyond geotechnical engineering, which makes a spatial object defined with epsg codes easy to work with by anyone.

The few CRS codes that are defined in the GEF format can be mapped to an EPSG code upon parsing, which will make the "height_system" attribute obsolete. See for instance the EPSG of RD and NAP (most commonly used in the Netherlands)

Add use_old_naming flag

This function should start with being True and after a new major version it should be False.

Bug plot boreholes

The soil type NBE (not a recognized soil) seems to generate a wrong plot.

path_bore = os.path.join(os.environ.get("DOC_PATH"), "../pygef/test_files/example_bore.gef")

bore = Bore(path_bore)
bore.plot()

image

read_cpt should return dedicated exception if file is not gef or xml

If the user provides a file to read_cpt that is neither in the correct .gef or .xml format, pygef tries to read as an xml and throws confusing exceptions.

Specifically, if a user provides a .gef file with erroneous format, the user gets BroXMLParser exceptions, which makes no sense.

Expected result
The user should get an insightful exception (e.g. custom UnknownFileFormatException) that explains that the provided file cannot be parsed as a .gef or .xml

Remove pandas dependency

There's still a pandas dependency needed for some pl.from_pandas() and df.to_pandas() calls. When these have been removed pandas can also be removed from the requirements.

Allow import/export from JSON

The current format is not serializable, so it can't be non-trivially sent over a network connection. We need to add Cpt.from_json() and Cpt.to_json() methods (same for Borehole).

Bug parse CPT as a result of a Polars PanicException

Parsing the following cpt gives an error:

/opt/conda/lib/python3.7/site-packages/pygef/cpt.py in __init__(self, path, content)
    151             assert content["string"] is not None, "content['string'] must be specified"
    152             if content["file_type"] == "gef":
--> 153                 parsed = _GefCpt(string=content["string"])
    154             elif content["file_type"] == "xml":
    155                 parsed = _BroXmlCpt(string=content["string"])

/opt/conda/lib/python3.7/site-packages/pygef/gef.py in __init__(self, path, string)
    288                         calculate_friction_number(column_names),
    289                         self.calculate_elevation_with_respect_to_nap(
--> 290                             self.zid, self.height_system
    291                         ),
    292                     ]

/opt/conda/lib/python3.7/site-packages/polars/lazy/frame.py in collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, string_cache, no_optimization)
    297             string_cache,
    298         )
--> 299         return pl.eager.frame.wrap_df(ldf.collect())
    300 
    301     def fetch(

PanicException: python apply failed: Any(InvalidOperation("abs not supportedd for series of type Float64"))

file.zip

How to reproduce:
pygef.Cpt(path="./file.gef")

Issue with plotting with Robertson classification

Hi there,

I am a new user of the package pygef.

I sucessfully plotted a cpt but when I add "roberston" to the plotting function it returns an error:

cpt = Cpt(path)
a = cpt.plot("robertson")
a.show()


Traceback (most recent call last):
  File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\win32com\server\policy.py", line 303, in _Invoke_
    return self._invoke_(dispid, lcid, wFlags, args)
  File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\win32com\server\policy.py", line 308, in _invoke_
    return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None)
  File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\win32com\server\policy.py", line 637, in _invokeex_
    return func(*args)
  File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\xlwings\server.py", line 235, in CallUDF
    res = call_udf(script, fname, args, this_workbook, FromVariant(caller))
  File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\xlwings\udfs.py", line 539, in call_udf
    ret = func(*args)
  File "c:\Users\amo.IPC\OneDrive\02_docadmin\Software_and_Spreadsheets\Python\methods\testfile.py", line 11, in cpt
    a = cpt.plot("robertson")
  File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\pygef\cpt.py", line 301, in plot
    df = self.classify(
  File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\pygef\cpt.py", line 208, in classify
    df = robertson.classify(
  File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\pygef\robertson\__init__.py", line 39, in classify
    return iterate_robertson(
  File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\pygef\robertson\util.py", line 96, in iterate_robertson
    df["n"] = n
  File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\polars\internals\dataframe\frame.py", line 1401, in __setitem__
    raise TypeError(
TypeError: 'DataFrame' object does not support 'Series' assignment by index. Use 'DataFrame.with_columns'

any advise?

plot inverted

when plot use_offset=True the invert_yaxis must be turned off

matplotlib internal errors

for matplotlib==3.4.2

 TypeError: __init__() got an unexpected keyword argument 'layout'

for matplotlib==3.5.0

ValueError: Cannot __getitem__ on Series of dtype: 'Float64' with argument: '(slice(None, None, None), None)' of type: '<class 'tuple'>'.

bump matplotlib version to 3.6.0

Replace the transpose operation

There's a very costly transpose operation that should be replaced, because there's probably a mistake in the logic where the transpose is necessary.

Avoid division by zero warning

../pygef/gef.py:611: RuntimeWarning: invalid value encountered in true_divide
  df = df.assign(friction_number=(df["fs"].values / df["qc"].values * 100))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.