Giter Site home page Giter Site logo

geostatsguy / geostatspy Goto Github PK

View Code? Open in Web Editor NEW
451.0 451.0 174.0 10.01 MB

GeostatsPy Python package for spatial data analytics and geostatistics. Mostly a reimplementation of GSLIB, Geostatistical Library (Deutsch and Journel, 1992) in Python. Geostatistics in a Python package. I hope this resources is helpful, Prof. Michael Pyrcz

Home Page: https://pypi.org/project/geostatspy/

License: MIT License

Python 3.63% Jupyter Notebook 96.37%
dataanalytics geostatistics modeling spatial statistics

geostatspy's Introduction

I'm Michael Pyrcz (a.k.a. GeostatsGuy), a professor working in Data Analytics, Geostatistics and Machine Learning at The University of Texas at Austin, Austin, Texas, USA and a Ukrainian Canadian. I share all of my university content to support my students, potential students and working professionals interested to learn about data science. I have a lot of well-documented workflows in Python, R (and even Excel) in my repositories, including all of the hands-on exercises and demonstrations for all of my lectures shared freely on my YouTube channel. Follow me on Twitter, where I share resources and positivity daily.

website twitter youtube linkedin linkedin

geostatspy's People

Contributors

achandlr avatar ashwanth2001 avatar clawson101 avatar geostatsguy avatar jessepisel avatar julianslz avatar jwolff98 avatar mathematom avatar qpenko avatar sharkman424 avatar sonyapieklik avatar stonewalljohnson avatar travissalomaki avatar whenn0406 avatar zebo616 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geostatspy's Issues

gamv.out is not generated [Ubuntu 18.04]

Hi,
When I execute GSLIB.gamv_2d following error appears:
FileNotFoundError: [Errno 2] No such file or directory: 'gamv.out'

The path of the executables of the GSLIB library is given at the first line. Other files gamv.par,
gamv_out.dat are generated correctly.

It does not gamv.out file generated properly?

My code:

nlag= 100; lagdist=100; azi=90; atol=22.2; bstand=100

lag_p, vario_p, nppiso_p = GSLIB.gamv_2d(data, 'X', 'Y', 'SBS', nlag, lagdist, azi, atol, bstand)

Conda

Hello, is it possible to install as a Conda package?

Thanks!

How to generate myself "variogram"?

Thanks for your outsstanding work to help understanding and apply GeoStatistic.

And I want to make myself variogram to adapt my work and what should I do?

Or could you tell me the meaning of parameters about functions "make_variogram"?
I can not know the explain for th function.
ๅ›พ็‰‡

Thanks for your any help!
Wish your reply.

add requirements.txt/dependencies to setup.py

it looks like geostatspy doesn't have a requirements.txt file or specify dependencies in setup.py. It is fairly easy to fix but I figured I would just post the issue in case someone else gets to it before I can

Parallelization for large datasets

Hi Michael,

Big fan of the repo - thank you very much. I haven't worked up the code yet, but I'd like to submit a PR for loop parallelization across cores for some functions. I'm working with datasets containing ~1e6 data points, creating ~1e8 pairs for each lag distance, and am finding that varmapv is particularly cumbersome at this time.

Thanks!

IndexError: index 4 is out of bounds for axis 0 with size 4

I am trying to implement the declustering function on my dataframe which consists of longitude, latitude, and sediment accumulation variables, but I am getting an out of index error. I am having trouble fixing this error myself, could someone help me figure this out? Below is the traceback:

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\etachen\AppData\Local\JetBrains\PyCharm 2021.2.2\plugins\python\helpers\pydev_pydev_bundle\pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "C:\Users\etachen\AppData\Local\JetBrains\PyCharm 2021.2.2\plugins\python\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/etachen/PycharmProjects/ai_algo_tests/declustering_medium_test.py", line 21, in
W, Csize, Dmean = geostat.declus(df,'Longitude', 'Latitude', 'Average Accretion (mm)',
File "C:\Users\etachen\Anaconda3\envs\ai_algo_tests\lib\site-packages\geostatspy\geostats.py", line 1607, in declus
cellwt[icell] = cellwt[icell] + 1.0
IndexError: index 4 is out of bounds for axis 0 with size 4

NameError: name variogram_loop_3d is not defined

Hi Michael,

I'm attempting to use geostats.gamv_3d after from geostatspy.geostats import *, but I can't seem to find where variogram_loop_3d is defined. I see an instantiation of it in the source code, but not where the function is defined. Please assist when you can.

Thanks!

'Simulation'

What is the 'simulation' reference for the following line? Could you provide insight to help me to move forward? from Variogram_Demo File:

#calculate a stochastic realization with standard normal distribution.
sim, value = GSLIB_sgsim_2d_uncond(1,nx,ny,cell_size, seed, range_min, azimuth, 'simulation')

The error is [ErrNo 2] No such file or directory: simulation.

I am assuming that the command is to create a simulation file based off of the previous dataset created in the code. I have my director mapped to the correct location in my system, etc.

Running Geostatspy on Ubuntu

Hello,

I have installed and run a script to test the installation of geostatspy in Windows and it works well. However, in Ubuntu the command "os.system(./sgsim sgsim.par)" is not running properly. The output file is empty.
Have you succeded using the package on Linux?
Thank you in advance for your attention.

Best regards,
Andrea

Python 3.9.1 incompatibility?

I cannot pip install. When i try I get an error which I believe it is due to the fact that I am running Python 3.9.1 (from what I can gather from web forums). I also think it is due to numba(?).

worth looking into it

GSLIB plotting functions should allow for axis specification

Hi Dr. Pyrcz,

I'm currently doing your HW #2 and thought it might be useful to include axis arguments in locmap_st and hist_st so that one can supply ax1, ax2, ax3 args for further subplot customization.

I can try and create a PR sometime, but wanted to bring it up.

Jake G.

Geostats (kb2d e gamv_3D)

Good afternoon,

First of all, I want to congratulate the team for their work. I am developing a research that consists in the elaboration of a three-dimensional geotechnical database and I was excited to find this library. I'm trying to put it with a graphical interface in QGIS which will allow it to be used more easily. While using the library I noticed some issues.

  1. The kb2d function uses the real function on lines 2394, 2449, 2489. This function is not recognized and causes an error during execution.
  2. Using the kb2d function, I always get the numpy.ndarray error.
  3. The gamv_3D function is called the variogram_loop_3D function that does not exist in the library.

My sugestion is create class objects for 2d and 3d functions.

Thank you again for this work.

AttributeError: 'PathCollection' object has no property 'verts'

Having issues with multiple geostatspy plotting commands where I get this error message consistently.
Verts_Error

This is for the plotting commands:

  • geostatspy.GSLIB.locmap
  • geostatspy.GSLIB.pixelplt_st
  • geostatspy.GSLIB.locpix_st

Could be for more plotting commands as well, but these are the only ones that I have used in my code where an error has occurred.

numba acceleration

Given that a lot of geostatspy is written in pure Python, I would like to offer the suggestion that some minor refactoring be performed to enable adding numba @njit decorators to compute-intensive functions.

For example, taking the geostatspy.varmapv function, we can split the mainpulation of the pandas.DataFrame object from the numerical code:

def varmapv(df,xcol,ycol,vcol,tmin,tmax,nxlag,nylag,dxlag,dylag,minnp,isill): 

    # Parameters - consistent with original GSLIB    
    # df - DataFrame with the spatial data, xcol, ycol, vcol coordinates and property columns
    # tmin, tmax - property trimming limits
    # xlag, xltol - lag distance and lag distance tolerance
    # nlag - number of lags to calculate
    # azm, atol - azimuth and azimuth tolerance
    # bandwh - horizontal bandwidth / maximum distance offset orthogonal to azimuth
    # isill - 1 for standardize sill

    # Load the data

    df_extract = df.loc[(df[vcol] >= tmin) & (df[vcol] <= tmax)]    # trim values outside tmin and tmax
    nd = len(df_extract)
    x = df_extract[xcol].values
    y = df_extract[ycol].values
    vr = df_extract[vcol].values
    
    # Summary statistics for the data after trimming
   ...

After refactoring:

from numba import njit

def varmapv(df, xcol, ycol, vcol, tmin, tmax, nxlag, nylag, dxlag, dylag, minnp, isill): 

    # Parameters - consistent with original GSLIB    
    # df - DataFrame with the spatial data, xcol, ycol, vcol coordinates and property columns
    # tmin, tmax - property trimming limits
    # xlag, xltol - lag distance and lag distance tolerance
    # nlag - number of lags to calculate
    # azm, atol - azimuth and azimuth tolerance
    # bandwh - horizontal bandwidth / maximum distance offset orthogonal to azimuth
    # isill - 1 for standardize sill

    # Load the data
    df_extract = df.loc[(df[vcol] >= tmin) & (df[vcol] <= tmax)]    # trim values outside tmin and tmax
    nd = len(df_extract)
    x = df_extract[xcol].values
    y = df_extract[ycol].values
    vr = df_extract[vcol].values
    
    return _varmapv(nd, x, y, vr, nxlag, nylag, dxlag, dylag, minnp, isill)

@njit
def _varmapv(nd, x, y, vr, nxlag, nylag, dxlag, dylag, minnp, isill):

    
    # Summary statistics for the data after trimming
    ...

Timing of the current implementation is 580 ms on my machine, while the numba decorated version is is 2.02 ms

For scaling up to several thousand data points, a factor of over 100x is considerable!

Further optimization can be performed for functions that are parallelizable, letting numba release the GIL and optimize the function for multi-processing / multi-threading.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.