gispocoding / eis_toolkit Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 6.0 108.75 MB

Python library for mineral prospectivity mapping

Home Page: https://eis-he.eu/

License: European Union Public License 1.2

Python 8.66% HTML 0.01% Jupyter Notebook 91.33% Dockerfile 0.01%

gis mineral-exploration modelling neural-networks python

eis_toolkit's People

Contributors

Stargazers

Watchers

Forkers

nialov dipak6697 zelioluca rajuranpe richardscottoz laish93

eis_toolkit's Issues

Add reproject vector

Add reproject vector functionality.

Change exact file paths to relative

Related to test.py and testing.ipynb files.

Add rasterize to convert vector data to raster

Rasterization is the process of "burning" the values to a raster where a vector geometry intersects raster cells. The burned value can either be a binary value or a value from the vector geometry. Geometries can be points, lines or polygons.

Add snap/align raster

Add function that snaps/aligns input raster.

Implement a draft command-line interface for `eis_toolkit`

Problem

If eis_toolkit is called using the command-line (e.g. using the subprocess module in Python) from the QGIS frontend, it needs a defined command-line interface.

Solution

Python has builtin solutions for creating a command-line interface (sys.argv, argparse). However, depending on the complexity it would more robust to use a mature, batteries included command-line library, such as click or typer.

Through this interface, any functions within the library can be called, but the arguments to those functions must then be provided through the command-line.

Implementation

Addition of a command-line interface file, typically named cli.py at eis_toolkit/cli.py. Additionally, to call the same command-line interface through the module system python -m eis_toolkit ..., a eis_tookit/__main__.py file with some boilerplate code is required.

I will create a draft click interface in the beginning as click is already a dependency of the project (used by some other dependency) so no additional dependencies are required.

Check documentation-related dependencies

The issue

Currently we depend on a lot of packages related to generating documentation. Some of these are most likely not needed, and could be removed.

Things to do

Find the minimum dependencies for the current workflow of generating documentation
(If needed) Refactor how documentation is generated

Add code style check to pytest GH action

Add checks for applying to the defined coding style principles into the automated GH workflow for running pytest for every PR.

Remember to update instructions too.

Add function for raster to pandas conversion

raster_to_pandas.py file for function
raster_to_pandas_test.py file for tests
custom exceptions for conversion
documentation page

Modify CONTRIBUTIONS.md

Transfer instructions about naming conventions from Teams to CONTRIBUTIONS.md file
Remove transferred parts from Teams doxc

Add jupyterlab

Jupyterlab could be a useful tool in development. I'm thinking the jupyterlab package should be a dev-dependency, and a notebooks directory could be added to the root of the repo to host the notebooks.

Or, do we want to separate notebook-based testing into another repo entirely?

add CONTRIBUTING.md

Currently we have a readme with instructions on environment setup. A CONTRIBUTING.md would be useful for holding more general contributing guidelines. At least:

Describe a suitable pull request process

Create surface derivatives

Create functionality for surface derivatives:

aspect
slope
different curvature sets
tests for valid results/values (e.g. cut occuring outliers if necessary)

Functionality will be based on single band raster datasets only.

Add unify rasters

Add unify rasters functionality. This consists of the following steps: reproject, resample, snap/align, clip/mask. It should be decided if some of these steps are optional and the user can choose.

While developing this functionality, some errors in all reprojecting, resampling and snapping were found. These modifications were done as part of this feature development.

Split local test data to input and output

Most testing will be done with local datasets (large files, sensitive data)
Split the local directory to input and output to keep the test data organized
- Data (input) Results (output)
Also add docs for adding test data to local

Add weights-of-evidence functionality for predictive mapping

Add weights-of-evidence (wofe) functionality: The WofE is a Bayesian method to estimate the probability of a hypothesis (H) based on the knowledge of occurrence of certain evidential events (E). Applied to predictive mapping of mineral deposits, the hypothesis to be predicted is the probability of existence of the targeted mineral deposit and the evidential events are mapped from the geoscientific datasets representing geological features such as lithology, structures, whole rock geochemistry etc.

Implementation using geospatial datasets involves quantification of spatial associations (i.e., the weights) between mineral deposits and the geospatial evidential layers and subsequent calculations of the posterior probabilities for potential of existence of a mineral deposit.

The WofE functionality will have the following steps:

Calculation of weights-of-evidences for multiclass (multi-feature) evidential events,
Reclassification of multiclass (multi-feature) evidential events to binary evidential events, based on the weights-of-evidences,
Recalculation of generalized weights-of-evidences after reclassification, and
Calculating posterior probabilities by combining the generalized weights of all the evidential events.

Computationally, the WofE functionality is to be implemented in the following two parts:

Quantifying the spatial association (i.e. weights) between the mineral deposit/occurrence and the evidential events.
Updating the posterior probabilities of the deposit occurrence by combining the weights-of-evidences of all the events.

Change of scope

Remove everything related to testing eis_toolkit within QGIS environment. Add

better instructions on testing eis_toolkit functions via command line
a new jupyter lab notebook for demonstrating how it can be utilized for testing purposes

Change LICENCE

Replace MIT licence with EUPL licence

Add docs for writing tests

Basic instructions
Where (not) to add files

Add random forest functionality

...content to be filled @BerndTorchala

Publish documentation as GitHub pages

Can be done after changing repository visibility to public.

Check rasterio licence first for possible problems!

Conda install & test workflow fails for Windows

Something maybe changed upstream in conda-forge which seems to have broken the Windows installation in GitHub Actions. E.g. https://github.com/GispoCoding/eis_toolkit/actions/runs/4413929172/jobs/7734997459.

Will need to test the environment.yaml file again and see if it can be produced locally. Until then you should ignore the failing install and testing.

Add polygonize

Add polygonize functionality. Reverse functionality of rasterize. Polygonize takes a raster as input and turns it into vector format. The input raster needs to be either binary raster or otherwise vector edges are drawn at the bounds of raster and/or to nodata values.

Add LICENCE

Most probably MIT

Extract processing functionality to separate functions

Separate checks from processing code

Refactor clip.py

Abstract file reading away from the processing function
- Use objects instead of paths as inputs and outputs
  - Inputs: rasterio object (opened dataset), polygon as shapely.geometry
  - Output: rasterio object (opened dataset)

Create docker image for developers

@eemilhaa Add here a more detailed description / execution plan.

Add function to extract window from raster

Create new branch for extract_window function and add required code for function, tests and exceptions

Update package names

Name preprocessing folder to geoprocess etc (depending on what gets agreed in the meeting).

Add also rest of the packages
Update the folder paths everywhere to match new package names (e.g. mkgendocs.yml)

Update notebooks

We have some example notebooks at notebooks/, but some of them are out of date.

Provide `pre-commit` configuration for formatting and linting of code

Problem

As I have noticed, some tools are provided for automatic formatting (e.g. black) and linting (e.g. flake8) in the invoke file, tasks.py. However, as long as running these are done manually, there is room for much developer "error".

Solution

pre-commit can be used to run most of the checks, except for mypy. It checks the files before you can commit anything locally and reformats them using repository-configured tools.

Implementation

Addition of a .pre-commit-config.yaml file in the repository. E.g.

---
repos:
  - repo: https://github.com/python/black
    rev: 22.10.0
    hooks:
      - id: black
        language_version: python3
  - repo: https://github.com/pycqa/isort
    rev: 5.10.1
    hooks:
      - id: isort
        args: ["--profile", "black"]
  - repo: https://github.com/pycqa/flake8
    rev: 6.0.0
    hooks:
      - id: flake8

Probably should not be a strict requirement for developers and of course how it works might not be apparent to some and teaching might be too much overhead for this single project. However, for those that use it, they can format the code and fix linting errors, maybe during pull requests before merges and know that between the users of pre-commit the style should stay strictly the same. A GitHub action could also be added that checks the code with pre-commit.

Add resample function

Implement a function that resamples input raster.

Add distance computations

Distance computations consists of calculating the distance to the nearest vector geometry for each raster cell. This distance is added as the value of the raster cell. The "empty" raster in which the values are calculated should be provided by the user or created from user inputs (e.g. bounds and cell size).

Add statistics report

Add implementations for the following basic statistical analysis functions:

Mean
Quantiles
Standard deviation (normal and relative)
Skewness

Improve snap raster function

Snap raster function has now a few constraints that should be removed.

Implement toolkit - plugin interface

EIS Toolkit will need an interface and an entrypoint for calls by the EIS QGIS plugin. The interface will parse CLI arguments / potential config file, call EIS Toolkit functions and return data to the plugin.

Background

To describe EIS QGIS plugin concisely, it will consists of:

EIS Wizard – a GUI that will guide the user to apply correct tools in appropriate order and facilitate managing a project
EIS Processing Algorithms – a collection of all the EIS algorithms meant to be called by end user, providing a GUI for each algorithm separately

Communication between EIS QGIS plugin and EIS Toolkit will happen exclusive between individual EIS Processing Algorithms and an entrypoint script in the Toolkit. EIS Wizard will utilize the processing algorithms when ordering computations. The implementation will be extended later if complicated workflows need to be ordered directly from the Toolkit, and not by sequencing calls to processing algorithms.

Implementation

EIS QGIS plugin will use EIS Toolkit via the subprocess module. EIS Toolkit will exist as an installed Python library in a Python environment that the plugin will call when the correct path is provided.

The chosen library for the CLI implementation is Typer.

Add gridding check functionality

Add gridding check functionality.

Add basic plotting functions for exploratory analysis.

Add basic plotting functions for exploratory analysis: functions to plot histograms, scatterplots, and boxplots from numpy ndarray input data.

Refactor how test data is handled

The issue

Currently all test data is pushed on to the remote. This is not optimal for two main reasons:

Some datasets needed in testing are not open and thus should not be available on the remote
File sizes can get out of control fast as data is added

Plan for solving

This can be solved by configuring a new folder specifically for test data:

gitignore all contents of said folder
Include the folder itself with .gitkeep

And then:

Instruct developers to copy test data locally to the test data folder
Write tests to look for data in the test data folder

Now, upon cloning, the repository comes with no data included. Only the empty folder is tracked by git, and the tests look for data there.

Things needing to be done

Configure the new folder
Add instructions for adding test data
(A separate issue?) Make sure all tests look for data in the correct path

add gridding check functionality

Improve/extend resampling

Currently, resampling takes upscale_factor parameter which defines the increase/decrease in pixel size. However, this might not be the most convenient way for users to resample their data. The current parameter could be replaced, or a new one could be added to define the target pixel size.

Check pre-commit

Check if precommit hooks (especially linting) function as expected

Support a development environment with `conda`

Problem

Currently the supported environments for developing the project are poetry and docker+poetry. As you probably well know, neither of these are work very well on Windows. An alternative is conda which overhauls the whole installation process for compiled Python dependencies. It works okay(ish) on Windows and especially many data scientists are used to the tooling. Supporting might make development easier for some people.

Solution

A simple environment.yml allows the creation of a working development environment:

name: eis_toolkit

channels:
  - conda-forge

# Changes here should be kept in sync with ./pyproject.toml
dependencies:
  - python >=3.10
  - gdal >=3.4.3
  - rasterio >=1.3.0,<1.4.0
  - pandas >=1.4.3,<1.5
  - geopandas >=0.11.1,<0.12
  - scikit-learn >=1.1.2,<1.2.0
  - matplotlib-base
  - statsmodels >=0.13.2,<0.14.0
  - keras >=2.9.0
  - tensorflow >=2.9.1
  - mkdocs-material >=8.4.0,<8.5.0
  # Dependencies for testing
  - pytest

The current test suite is also passed by this environment definition.

Implementation

Just requires adding the shown environment.yml file to the project.

Some of the version are not locked and differ from pyproject.toml due some combinations of versions being unable to be installed with conda based on some brief testing. I can iterate on the versions if it is deemed necessary at this point.

Additionally, to keep the environment.yml consistent with the poetry environment it is quite easy to add a conda workflow to test the environment with the existing test suite in different operating system. E.g. https://github.com/nialov/fractopo/blob/master/.github/workflows/conda.yml. In the future, if it is wanted, the package can also be distributed with conda using the conda-forge feedstock system.

I can do a pull request for the environment and workflow implementation if you consider it suitable. Understandably this adds some overhead and possibility for errors between programmers but maybe the pros outweigh the cons.

Also

To introduce myself here: I, Nikolas Ovaskainen ([email protected]), am a research scientist at the Geological Survey of Finland. I have working time for this project this year, so I will (hopefully) be making some feature contributions soon-ish.

Add instructions for testing

Add instructions for other users to test eis_toolkit functions via QGIS's python console.

In other words

take a look at https://stackoverflow.com/questions/41535915/python-pip-install-from-local-dir and select the relevant pieces to add to readme.rst
provide instructions on where to find QGIS's python console and how to execute test functions from there
provide instruction on how to test (everything else but gdal related stuff) on through notebook

Add winsorizing function

Create a new branch (from master branch) and add

winsorize.py file into eis_toolkit/eis_toolkit/transformations folder (corresponds to clip.py)
winsorize_test.py file into eis_toolkit/tests folder (corresponds to clip_test.py). Utilize data in Teams folder in your tests if possible (if not possible, add necessary files to the Teams folder)!
more custom exception classes for eis_toolkit/eis_toolkit/exceptions.py

Optional additions:

a new jupyter lab notebook file into eis_toolkit/notebooks folder for testing winsorize tool in practice (corresponds to testing_clip.ipynb)
a new page into documentation site (see instructions folder and documentation.md for help)

Do not make too large commits. Rather multiple commits than one massive commit :) After finishing the branch, create a pull request and add @pavetsu14 or @eemilhaa as Reviewer.

Add density computations

Density computations consist of calculating the number of vector points/geometries within each raster cell. Alternatively the points might be buffered and the amount of buffered points (=polygons) intersecting each raster cell could be calculated. These variations will at least be implemented.

gispocoding / eis_toolkit Goto Github PK

eis_toolkit's People

Contributors

Stargazers

Watchers

Forkers

eis_toolkit's Issues

Problem

Solution

Implementation

The issue

Things to do

Problem

Solution

Implementation

Background

Implementation

The issue

Plan for solving

Things needing to be done

Problem

Solution

Implementation

Also

Recommend Projects

Recommend Topics

Recommend Org