cemsbv / pygef Goto Github PK
View Code? Open in Web Editor NEWParse soil measurement data.
Home Page: https://cemsbv.github.io/pygef
License: MIT License
Parse soil measurement data.
Home Page: https://cemsbv.github.io/pygef
License: MIT License
in case of a df.lazy().single_op().collect()
, you can just write df.single_op()
Originally posted by @ritchie46 in #71 (comment)
Some columns are derived from other columns. Much like property
decorator in python classes.
Reduce redundant data by:
self._df = <dataframe w/ base columns>
@property
def df(self):
df = self.__df.assign(derived_a=self.__df['a'] + 2)
df = self.__df.assign(derived_b=self.__df['b'] + 2)
return df
The following code produces and empty DataFrame
import os
from pygef import Cpt
path_cpt = os.path.join(os.environ.get("DOC_PATH"), "../pygef/test_files/cpt.gef")
cpt = Cpt(path_cpt)
cpt.classify(classification="robertson", do_grouping=True, min_thickness=0.2, water_level_NAP=-10)
There's no API reference documentation for the CPTData object
The regex string #ZID[=\s+]+[^,]*[,\s+]+([^,]+)
in pygef.utils
line 127 is not working as expected.
Insert the following text in https://regex101.com/. Using #ZID[=\s+]+[^,]*[,\s+]+([^,]+)
the zid will not be parsed correctly.
#TESTID = B38C2094
#XYID = 31000,108025,432470
#ZID = 31000,-1.5
#MEASUREMENTTEXT = 9, maaiveld, vast horizontaal niveau
Use #ZID[=\s+]+[^,]*[,\s+]+([^?!,$|\s$]+)
.
The cpt-data is always assumed to have a "!" value for the record-separator in _GefCpt.parse_data()
, which is not desired.
Parse the #RECORDSEPARATOR header and use it for splitting the cpt-data records.
update soil distribution based on review CRUX
From Polars version 0.12 it's not possible to do the following:
plt.plot(pl.DataFrame["x"], pl.DataFrame["y"])
This creates a NotImplementedError
Suggest the following:
plt.plot(pl.DataFrame["x"].to_numpy(), pl.DataFrame["y"].to_numpy())
One of the lines to change >
https://github.com/cemsbv/pygef/blob/master/pygef/plot_utils.py#L88
Had an issue where the column voids in the inclination column are used to correct for the depth. In combination with a pre-excavation this leads to an incorrect starting depth with respect to the reference level.
EDIT
More detailed: if the first row of the GEF contains a void in the inclination with a value such as -9999 then this value is used to correct for the depth, this will lead to significant errors in the corrected depth when a large pre-excavated depth is present. The desired solution is to deal with voids before the values that are used to indicate a void can be used for calculations.
#71 removes the interpolation, add it again. There's a # TODO
for it.
Hello,
I have been trying to read a gef file using the following code:
from pygef import Cpt
Cpt(r"..\GO\46358_10.GEF")
Unfortunately I get an error:
thread '<unnamed>' panicked at 'python apply failed: Any(InvalidOperation("abs not supportedd for series of type Float64"))', src\lazy\apply.rs:35:19
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "C:\Users\921479\Desktop\GEO\Z002\notebooks\test_gef.py", line 5, in <module>
Cpt(r"..\GO\46358_10.GEF")
File "C:\Users\921479\Desktop\GEO\Z002\notebooks\.venv\lib\site-packages\pygef\cpt.py", line 159, in __init__
parsed = _GefCpt(path)
File "C:\Users\921479\Desktop\GEO\Z002\notebooks\.venv\lib\site-packages\pygef\gef.py", line 276, in __init__
self.parse_data(self._headers, self._data, column_names)
File "C:\Users\921479\Desktop\GEO\Z002\notebooks\.venv\lib\site-packages\polars\lazy\frame.py", line 299, in collect
return pl.eager.frame.wrap_df(ldf.collect())
pyo3_runtime.PanicException: python apply failed: Any(InvalidOperation("abs not supportedd for series of type Float64"))
I think there is a problem in the .collect()
statement when the cpt data is converted to a DataFrame. Has anyone come across this issue before? Does there exist a workaround? Thanks for your consideration.
Hi,
I have several boreholes gef files and in all gef where the description has a single line (see the attached file below) the code returns
self._df = PyDataFrame.read_csv(
RuntimeError: Any(NoData("empty csv"))
Can you support this? Below the gef file I am referring to
#GEFID= 1, 1, 0
#FILEOWNER= DataWS
#FILEDATE= 2022, 12, 22
#PROJECTID= Lob van Gennep, 2102701 HB, -
#COLUMN= 2
#COLUMNINFO= 1, m, Laag van, 1
#COLUMNINFO= 2, m, Laag tot, 2
#COMPANYID= -, -, 31
#DATAFORMAT= ASCII
#COLUMNSEPARATOR= ;
#COLUMNTEXT= 1
#LASTSCAN= 1
#XYID= 31000, 196276.20, 412672.60, 0.01, 0.01
#ZID= 31000, 13.22, 0.01
#MEASUREMENTCODE= NEN5104, 1, 0, 0, NNI 1989
#MEASUREMENTTEXT= 3, -, plaatsnaam boring
#MEASUREMENTTEXT= 5, 2022-03-02, datum boorbeschrijving
#MEASUREMENTTEXT= 6, Tla, beschrijver lagen
#MEASUREMENTTEXT= 7, 31000, locaal coördinatiesysteem
#MEASUREMENTTEXT= 8, 31000, locaal referentiesysteem
#MEASUREMENTTEXT= 9, maaiveld, vast horizontaal niveau
#MEASUREMENTTEXT= 13, -, boorbedrijf
#MEASUREMENTTEXT= 14, Nee, openbaar
#MEASUREMENTTEXT= 16, 2022-03-02, datum boring
#MEASUREMENTTEXT= 18, Nee, Peilbuis aanwezig
#MEASUREMENTTEXT= 23, Tla, naam boormeester
#MEASUREMENTTEXT= 31, EDM, boormethode1
#MEASUREMENTVAR= 16, 2.500000, m, eind diepte boring
#MEASUREMENTVAR= 31, 2.500000, m, diepte onderkant boortraject1
#SPECIMENTEXT= 11, 1, monstercode monster1
#SPECIMENTEXT= 12, 2022-03-02, datum monster1
#SPECIMENTEXT= 13, 13:18:29, tijd monster1
#SPECIMENTEXT= 14, G, (on)geroerd monster1
#SPECIMENTEXT= 18, 2, monstercode monster2
#SPECIMENTEXT= 19, 2022-03-02, datum monster2
#SPECIMENTEXT= 20, 13:18:29, tijd monster2
#SPECIMENTEXT= 21, G, (on)geroerd monster2
#SPECIMENTEXT= 25, 3, monstercode monster3
#SPECIMENTEXT= 26, 2022-03-02, datum monster3
#SPECIMENTEXT= 27, 13:18:29, tijd monster3
#SPECIMENTEXT= 28, G, (on)geroerd monster3
#SPECIMENVAR= 1, 3.000000, -, aantal monsters
#SPECIMENVAR= 12, 1.000000, m, onderkant monster1
#SPECIMENVAR= 18, 1.000000, m, bovenkant monster2
#SPECIMENVAR= 19, 2.000000, m, onderkant monster2
#SPECIMENVAR= 25, 2.000000, m, bovenkant monster3
#SPECIMENVAR= 26, 2.500000, m, onderkant monster3
#PROCEDURECODE= GEF-BORE-Report, 1, 0, 0, -
#TESTID= 183.HB4
#REPORTCODE= GEF-BORE-Report, 1, 0, 0, -
#RECORDSEPARATOR= !
#OS= DOS
#LANGUAGE= NL
#EOH=
0.0000e+000;2.5000e+000;'Kz3';;'DO BR';;'KHRD';!
Todo: Most of these pipes likely can be done in a single select. We only have to check which ones depend on the result of a previous one. Those need to be done in separate select queries.
Originally posted by @ritchie46 in #71 (comment)
Running the following code
def test_plot_classification_grouped(self):
gef = Cpt("./tests/test_files/cpt.gef")
gef.plot(
show=False,
classification="three_type_rule",
do_grouping=True,
min_thickness=0.2,
water_level_NAP=-10,
)
with this cpt
throws an error;
File "c:\Users\brein\Documents\Development\Python\pygef\.env\lib\site-packages\matplotlib\axes\_axes.py", line 2381, in bar
bottom = y - height / 2
TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Python310\lib\unittest\case.py", line 59, in testPartExecutor
yield
File "C:\Python310\lib\unittest\case.py", line 591, in run
self._callTestMethod(testMethod)
File "C:\Python310\lib\unittest\case.py", line 549, in _callTestMethod
method()
File "c:\Users\brein\Documents\Development\Python\pygef\tests\test_plot.py", line 45, in test_plot_classification_grouped
gef.plot(
File "c:\Users\brein\Documents\Development\Python\pygef\pygef\cpt.py", line 355, in plot
return plot.plot_cpt(
File "c:\Users\brein\Documents\Development\Python\pygef\pygef\plot_utils.py", line 119, in plot_cpt
fig = add_grouped_classification(
File "c:\Users\brein\Documents\Development\Python\pygef\pygef\plot_utils.py", line 250, in add_grouped_classification
plt.barh(
File "c:\Users\brein\Documents\Development\Python\pygef\.env\lib\site-packages\matplotlib\pyplot.py", line 2403, in barh
return gca().barh(
File "c:\Users\brein\Documents\Development\Python\pygef\.env\lib\site-packages\matplotlib\axes\_axes.py", line 2551, in barh
patches = self.bar(x=left, height=height, width=width, bottom=y,
File "c:\Users\brein\Documents\Development\Python\pygef\.env\lib\site-packages\matplotlib\__init__.py", line 1412, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "c:\Users\brein\Documents\Development\Python\pygef\.env\lib\site-packages\matplotlib\axes\_axes.py", line 2383, in bar
raise TypeError(f'the dtypes of parameters y ({y.dtype}) '
TypeError: the dtypes of parameters y (object) and height (object) are incompatible```
@martinapippi Is the Pygef team planning to release a new version to PyPI following the recent PR's?
There are now CPTs saved in gpkg format. I have a reader that user provides a box with coordains (or polygon) and it reads all cpts exist there (and plots in vtk file format, perhaps irrelevant here).
It would be great if you could make your package compatible with this file. It saves a ton of time to bring each xml/gef separately (plus you don't need an xml reader anymore).
Is this something you can support?
This operation fill_nan
will be redundant if we write the functions correctly. Make sure that any computations we do not generate nans.
Originally posted by @ritchie46 in #71 (comment)
If a test ID contains spaces e.g. #TESTID= CPT 01
then it is parsed as: CPT
. This becomes problematic when a series of CPTs is enumerated in this way.
The desired solution for this would be to parse everything on the line behind #TESTID=
, so in the example this would result in CPT 01
.
requirements in setup.py should not be pinned to an exact version, but rather allow a range of patch versions
This should all be expressions.
col("fs") / col("qc") * 100.0
pl.lit(0.0).alias("friction_number")
Originally posted by @ritchie46 in #71 (comment)
A contribution guide should be added.
Example: https://github.com/ritchie46/polars/blob/master/CONTRIBUTING.md
A new and empty column is added when the rows in the cpt data end with solely the column-separator value (in _GefCpt.parse_data()
)
The last (redundant) column separator is only removed when it is followed by "!"
Data records should always be stripped of trailing Column AND Record separators, even if one of them is not present.
Request that feature is added to extract cone_id.
Rationale is:
I will make a PR with this feature
Not for this PR per se, but we can later change these kind of function to return the expression needed in the select query.
Originally posted by @ritchie46 in #71 (comment)
This code gives an error. Can you fix this? Or am I doing something wrong? Thank you!
from pygef.gef import ParseGEF
File = ".GEF"
gef = ParseGEF(File)
gef.plot(classification='been_jeffries',water_level_NAP= -1,min_thickness= 0.2,show=True)
Currently, the first encountered valid #COLUMNVOID header is used and replaced with None in all columns. The other #COLUMNVOID values are effectively ignored.
The #COLUMNVOID values are provided for each column separately in the .gef file and should be applied to the corresponding columns only.
for polars[pyarrow]<0.16.3
:
TypeError: with_columns() takes from 1 to 2 positional arguments but 3 were given
bump polars to 0.16.3
In pandas 1.4.3, when I classify a CPT, i get an error
'DataFrame' object does not support 'Series' assignment by index. Use 'DataFrame.with_columns'
The pygef objects currently have coordinates (e.g. x, y and z) but have no proper universal Coordinate Reference System (CRS) definition. Only the "height_system" is a vertical CRS attribute, but the codes are linked to the GEF format, which is not a universal standard.
As a user I would like to be able to access an attribute of a Cpt object with the EPSG codes for both the horizontal (x & y) and vertical (z) oriented coordinates. These could be two attributes, e.g. xy_epsg
, z_epsg
.
EPSG codes are universally recognized geodetic definitions and have a scope way beyond geotechnical engineering, which makes a spatial object defined with epsg codes easy to work with by anyone.
The few CRS codes that are defined in the GEF format can be mapped to an EPSG code upon parsing, which will make the "height_system" attribute obsolete. See for instance the EPSG of RD and NAP (most commonly used in the Netherlands)
This function should start with being True
and after a new major version it should be False
.
If the user provides a file to read_cpt
that is neither in the correct .gef or .xml format, pygef tries to read as an xml and throws confusing exceptions.
Specifically, if a user provides a .gef file with erroneous format, the user gets BroXMLParser exceptions, which makes no sense.
Expected result
The user should get an insightful exception (e.g. custom UnknownFileFormatException) that explains that the provided file cannot be parsed as a .gef or .xml
Once #71 is done.
Replace for loops over columns with frame_equal
or even better a (currently non-existing) assert_frame_equals
function.
There's still a pandas dependency needed for some pl.from_pandas()
and df.to_pandas()
calls. When these have been removed pandas can also be removed from the requirements.
The current format is not serializable, so it can't be non-trivially sent over a network connection. We need to add Cpt.from_json()
and Cpt.to_json()
methods (same for Borehole).
Parsing the following cpt gives an error:
/opt/conda/lib/python3.7/site-packages/pygef/cpt.py in __init__(self, path, content)
151 assert content["string"] is not None, "content['string'] must be specified"
152 if content["file_type"] == "gef":
--> 153 parsed = _GefCpt(string=content["string"])
154 elif content["file_type"] == "xml":
155 parsed = _BroXmlCpt(string=content["string"])
/opt/conda/lib/python3.7/site-packages/pygef/gef.py in __init__(self, path, string)
288 calculate_friction_number(column_names),
289 self.calculate_elevation_with_respect_to_nap(
--> 290 self.zid, self.height_system
291 ),
292 ]
/opt/conda/lib/python3.7/site-packages/polars/lazy/frame.py in collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, string_cache, no_optimization)
297 string_cache,
298 )
--> 299 return pl.eager.frame.wrap_df(ldf.collect())
300
301 def fetch(
PanicException: python apply failed: Any(InvalidOperation("abs not supportedd for series of type Float64"))
How to reproduce:
pygef.Cpt(path="./file.gef")
Hi there,
I am a new user of the package pygef.
I sucessfully plotted a cpt but when I add "roberston" to the plotting function it returns an error:
cpt = Cpt(path)
a = cpt.plot("robertson")
a.show()
Traceback (most recent call last):
File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\win32com\server\policy.py", line 303, in _Invoke_
return self._invoke_(dispid, lcid, wFlags, args)
File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\win32com\server\policy.py", line 308, in _invoke_
return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None)
File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\win32com\server\policy.py", line 637, in _invokeex_
return func(*args)
File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\xlwings\server.py", line 235, in CallUDF
res = call_udf(script, fname, args, this_workbook, FromVariant(caller))
File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\xlwings\udfs.py", line 539, in call_udf
ret = func(*args)
File "c:\Users\amo.IPC\OneDrive\02_docadmin\Software_and_Spreadsheets\Python\methods\testfile.py", line 11, in cpt
a = cpt.plot("robertson")
File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\pygef\cpt.py", line 301, in plot
df = self.classify(
File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\pygef\cpt.py", line 208, in classify
df = robertson.classify(
File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\pygef\robertson\__init__.py", line 39, in classify
return iterate_robertson(
File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\pygef\robertson\util.py", line 96, in iterate_robertson
df["n"] = n
File "C:\Users\amo.IPC\AppData\Local\Programs\Python\Python310\lib\site-packages\polars\internals\dataframe\frame.py", line 1401, in __setitem__
raise TypeError(
TypeError: 'DataFrame' object does not support 'Series' assignment by index. Use 'DataFrame.with_columns'
any advise?
Requirements in requirements.txt should be pinned to an exact version
Replace these instances marked by a # TODO
:
df["gamma_predict"] = np.tile(1.0, len(df.rows()))
Group classification is missing a lot of unit tests.
Plot GEF with based on elevation_with_respect_to_nap
not correct when passing NAP level.
see GEF file:
test.gef.txt
when plot use_offset=True
the invert_yaxis
must be turned off
for matplotlib==3.4.2
TypeError: __init__() got an unexpected keyword argument 'layout'
for matplotlib==3.5.0
ValueError: Cannot __getitem__ on Series of dtype: 'Float64' with argument: '(slice(None, None, None), None)' of type: '<class 'tuple'>'.
bump matplotlib version to 3.6.0
There's a very costly transpose operation that should be replaced, because there's probably a mistake in the logic where the transpose is necessary.
../pygef/gef.py:611: RuntimeWarning: invalid value encountered in true_divide
df = df.assign(friction_number=(df["fs"].values / df["qc"].values * 100))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.