geostatsguy / geostatspy Goto Github PK

GeostatsPy Python package for spatial data analytics and geostatistics. Mostly a reimplementation of GSLIB, Geostatistical Library (Deutsch and Journel, 1992) in Python. Geostatistics in a Python package. I hope this resources is helpful, Prof. Michael Pyrcz

Home Page: https://pypi.org/project/geostatspy/

License: MIT License

Python 3.63% Jupyter Notebook 96.37%

dataanalytics geostatistics modeling spatial statistics

geostatspy's Introduction

I'm Michael Pyrcz (a.k.a. GeostatsGuy), a professor working in Data Analytics, Geostatistics and Machine Learning at The University of Texas at Austin, Austin, Texas, USA and a Ukrainian Canadian. I share all of my university content to support my students, potential students and working professionals interested to learn about data science. I have a lot of well-documented workflows in Python, R (and even Excel) in my repositories, including all of the hands-on exercises and demonstrations for all of my lectures shared freely on my YouTube channel. Follow me on Twitter, where I share resources and positivity daily.

geostatspy's People

Contributors

Stargazers

Watchers

Forkers

bidofvic linwang741 chookee geomodeller mharty3 johnpharmd rahuketu86 whenn0406 moyad10 gzarrate kstuyen wenpann r4gis mtoqeerpk alexgigliotti sdlresearch jessepisel libiner ljwolf acesilva hdanque tylrhall wangbt amanojas zhang01ga kennyzhang danilecug xyt556 avikravi dsfulf travissalomaki weiwei-mao xiaji santonugoswami sdyinzhen gencobatalay fang-helen jlmaurer leochencipher ssanyas14 a276007862 tigerforeveryoung stratgroup rinabek 7mod12 zebo616 chiukelly mathematom sonyapieklik kyzerpolzin thingimajig13 dingdonged marcomalone achandlr snemana232 nisharam7 vaishnavbipin sjoopelli ashwanth2001 tobiasglaubach geology-computer-api wassemalward sgmcdonnell aflorez721 arv1manch thrasher1985 ksingha55 nathangeology lukasmosser fahimnis kdawar1 alexleogc zhouchuanyou newyusiyu1987 sudo-jkmate qubalee rakyness caitlinlien allenjune tab881 jimogaos phacampos olawaleibrahim jiayang-x wgonzalez25 abv-hub scifiss aiko4 andrewcalderwood rsarpongstreetor fuyang19871016 xxqhh untamedantelope yx577 sherif-med geofferyj tabris-trees geofiber cwru-sdle liweirong1120

geostatspy's Issues

gamv.out is not generated [Ubuntu 18.04]

Hi,
When I execute GSLIB.gamv_2d following error appears:
FileNotFoundError: [Errno 2] No such file or directory: 'gamv.out'

The path of the executables of the GSLIB library is given at the first line. Other files gamv.par,
gamv_out.dat are generated correctly.

It does not gamv.out file generated properly?

My code:

nlag= 100; lagdist=100; azi=90; atol=22.2; bstand=100

lag_p, vario_p, nppiso_p = GSLIB.gamv_2d(data, 'X', 'Y', 'SBS', nlag, lagdist, azi, atol, bstand)

Conda

Hello, is it possible to install as a Conda package?

Thanks!

How to generate myself "variogram"?

Thanks for your outsstanding work to help understanding and apply GeoStatistic.

And I want to make myself variogram to adapt my work and what should I do?

Or could you tell me the meaning of parameters about functions "make_variogram"?
I can not know the explain for th function.

Thanks for your any help!
Wish your reply.

add requirements.txt/dependencies to setup.py

it looks like geostatspy doesn't have a requirements.txt file or specify dependencies in setup.py. It is fairly easy to fix but I figured I would just post the issue in case someone else gets to it before I can

Parallelization for large datasets

Hi Michael,

Big fan of the repo - thank you very much. I haven't worked up the code yet, but I'd like to submit a PR for loop parallelization across cores for some functions. I'm working with datasets containing ~1e6 data points, creating ~1e8 pairs for each lag distance, and am finding that varmapv is particularly cumbersome at this time.

Thanks!

IndexError: index 4 is out of bounds for axis 0 with size 4

I am trying to implement the declustering function on my dataframe which consists of longitude, latitude, and sediment accumulation variables, but I am getting an out of index error. I am having trouble fixing this error myself, could someone help me figure this out? Below is the traceback:

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\etachen\AppData\Local\JetBrains\PyCharm 2021.2.2\plugins\python\helpers\pydev_pydev_bundle\pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "C:\Users\etachen\AppData\Local\JetBrains\PyCharm 2021.2.2\plugins\python\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/etachen/PycharmProjects/ai_algo_tests/declustering_medium_test.py", line 21, in
W, Csize, Dmean = geostat.declus(df,'Longitude', 'Latitude', 'Average Accretion (mm)',
File "C:\Users\etachen\Anaconda3\envs\ai_algo_tests\lib\site-packages\geostatspy\geostats.py", line 1607, in declus
cellwt[icell] = cellwt[icell] + 1.0
IndexError: index 4 is out of bounds for axis 0 with size 4

NameError: name variogram_loop_3d is not defined

Hi Michael,

I'm attempting to use geostats.gamv_3d after from geostatspy.geostats import *, but I can't seem to find where variogram_loop_3d is defined. I see an instantiation of it in the source code, but not where the function is defined. Please assist when you can.

Thanks!

Update citation

Request to update the citation.cff file with the most recent list of contributors.

'Simulation'

What is the 'simulation' reference for the following line? Could you provide insight to help me to move forward? from Variogram_Demo File:

#calculate a stochastic realization with standard normal distribution.
sim, value = GSLIB_sgsim_2d_uncond(1,nx,ny,cell_size, seed, range_min, azimuth, 'simulation')

The error is [ErrNo 2] No such file or directory: simulation.

I am assuming that the command is to create a simulation file based off of the previous dataset created in the code. I have my director mapped to the correct location in my system, etc.

Running Geostatspy on Ubuntu

Hello,

I have installed and run a script to test the installation of geostatspy in Windows and it works well. However, in Ubuntu the command "os.system(./sgsim sgsim.par)" is not running properly. The output file is empty.
Have you succeded using the package on Linux?
Thank you in advance for your attention.

Best regards,
Andrea

Python 3.9.1 incompatibility?

I cannot pip install. When i try I get an error which I believe it is due to the fact that I am running Python 3.9.1 (from what I can gather from web forums). I also think it is due to numba(?).

worth looking into it

GSLIB plotting functions should allow for axis specification

Hi Dr. Pyrcz,

I'm currently doing your HW #2 and thought it might be useful to include axis arguments in locmap_st and hist_st so that one can supply ax1, ax2, ax3 args for further subplot customization.

I can try and create a PR sometime, but wanted to bring it up.

Jake G.

Is it possible with the current implementation to conduct Kriging for unstructured grid or a set of (x, y) points?

In the example/tutorial notebook, Kriging is always done for a structured regular-spaced grid. I am wondering if it is possible to only conduct kriging for a list of (x, y) points? Thanks!

Correlated log-normally distributed random field

Is it possible to generate a correlated log-normally random field using GeoStatTools? If so would you please give me some hints?

Geostats (kb2d e gamv_3D)

Good afternoon,

First of all, I want to congratulate the team for their work. I am developing a research that consists in the elaboration of a three-dimensional geotechnical database and I was excited to find this library. I'm trying to put it with a graphical interface in QGIS which will allow it to be used more easily. While using the library I noticed some issues.

The kb2d function uses the real function on lines 2394, 2449, 2489. This function is not recognized and causes an error during execution.
Using the kb2d function, I always get the numpy.ndarray error.
The gamv_3D function is called the variogram_loop_3D function that does not exist in the library.

My sugestion is create class objects for 2d and 3d functions.

Thank you again for this work.

GeostatsPy AttributeError: module 'geostatspy.geostats' has no attribute 'gamv_2d'.

Hi there,

I am struggling with this error AttributeError: module 'geostatspy.geostats' has no attribute 'gamv_2d'.
I will appreciate if anybody help me to solve this issue.

AttributeError: 'PathCollection' object has no property 'verts'

Having issues with multiple geostatspy plotting commands where I get this error message consistently.

This is for the plotting commands:

geostatspy.GSLIB.locmap
geostatspy.GSLIB.pixelplt_st
geostatspy.GSLIB.locpix_st

Could be for more plotting commands as well, but these are the only ones that I have used in my code where an error has occurred.

numba acceleration

Given that a lot of geostatspy is written in pure Python, I would like to offer the suggestion that some minor refactoring be performed to enable adding numba @njit decorators to compute-intensive functions.

For example, taking the geostatspy.varmapv function, we can split the mainpulation of the pandas.DataFrame object from the numerical code:

def varmapv(df,xcol,ycol,vcol,tmin,tmax,nxlag,nylag,dxlag,dylag,minnp,isill): 

    # Parameters - consistent with original GSLIB    
    # df - DataFrame with the spatial data, xcol, ycol, vcol coordinates and property columns
    # tmin, tmax - property trimming limits
    # xlag, xltol - lag distance and lag distance tolerance
    # nlag - number of lags to calculate
    # azm, atol - azimuth and azimuth tolerance
    # bandwh - horizontal bandwidth / maximum distance offset orthogonal to azimuth
    # isill - 1 for standardize sill

    # Load the data

    df_extract = df.loc[(df[vcol] >= tmin) & (df[vcol] <= tmax)]    # trim values outside tmin and tmax
    nd = len(df_extract)
    x = df_extract[xcol].values
    y = df_extract[ycol].values
    vr = df_extract[vcol].values
    
    # Summary statistics for the data after trimming
   ...

After refactoring:

from numba import njit

def varmapv(df, xcol, ycol, vcol, tmin, tmax, nxlag, nylag, dxlag, dylag, minnp, isill): 

    # Parameters - consistent with original GSLIB    
    # df - DataFrame with the spatial data, xcol, ycol, vcol coordinates and property columns
    # tmin, tmax - property trimming limits
    # xlag, xltol - lag distance and lag distance tolerance
    # nlag - number of lags to calculate
    # azm, atol - azimuth and azimuth tolerance
    # bandwh - horizontal bandwidth / maximum distance offset orthogonal to azimuth
    # isill - 1 for standardize sill

    # Load the data
    df_extract = df.loc[(df[vcol] >= tmin) & (df[vcol] <= tmax)]    # trim values outside tmin and tmax
    nd = len(df_extract)
    x = df_extract[xcol].values
    y = df_extract[ycol].values
    vr = df_extract[vcol].values
    
    return _varmapv(nd, x, y, vr, nxlag, nylag, dxlag, dylag, minnp, isill)

@njit
def _varmapv(nd, x, y, vr, nxlag, nylag, dxlag, dylag, minnp, isill):

    
    # Summary statistics for the data after trimming
    ...

Timing of the current implementation is 580 ms on my machine, while the numba decorated version is is 2.02 ms

For scaling up to several thousand data points, a factor of over 100x is considerable!

Further optimization can be performed for functions that are parallelizable, letting numba release the GIL and optimize the function for multi-processing / multi-threading.

geostatsguy / geostatspy Goto Github PK

geostatspy's Introduction

geostatspy's People

Contributors

Stargazers

Watchers

Forkers

geostatspy's Issues

Recommend Projects

Recommend Topics

Recommend Org