Giter Site home page Giter Site logo

gispocoding / eis_toolkit Goto Github PK

View Code? Open in Web Editor NEW
16.0 16.0 6.0 108.75 MB

Python library for mineral prospectivity mapping

Home Page: https://eis-he.eu/

License: European Union Public License 1.2

Python 8.66% HTML 0.01% Jupyter Notebook 91.33% Dockerfile 0.01%
gis mineral-exploration modelling neural-networks python

eis_toolkit's People

Contributors

alvella avatar chudasama-bijal avatar dipak6697 avatar eemilhaa avatar em-t avatar jtlait avatar jtpesone avatar lehtonenp avatar msmiyels avatar msorvoja avatar mtk112 avatar nialov avatar nmaarnio avatar pavetsu14 avatar raineekman avatar richardscottoz avatar tmiosmauli avatar tomironkko avatar tomiturunen1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eis_toolkit's Issues

Add rasterize to convert vector data to raster

Rasterization is the process of "burning" the values to a raster where a vector geometry intersects raster cells. The burned value can either be a binary value or a value from the vector geometry. Geometries can be points, lines or polygons.

Implement a draft command-line interface for `eis_toolkit`

Problem

If eis_toolkit is called using the command-line (e.g. using the subprocess module in Python) from the QGIS frontend, it needs a defined command-line interface.

Solution

Python has builtin solutions for creating a command-line interface (sys.argv, argparse). However, depending on the complexity it would more robust to use a mature, batteries included command-line library, such as click or typer.

Through this interface, any functions within the library can be called, but the arguments to those functions must then be provided through the command-line.

Implementation

Addition of a command-line interface file, typically named cli.py at eis_toolkit/cli.py. Additionally, to call the same command-line interface through the module system python -m eis_toolkit ..., a eis_tookit/__main__.py file with some boilerplate code is required.

I will create a draft click interface in the beginning as click is already a dependency of the project (used by some other dependency) so no additional dependencies are required.

Check documentation-related dependencies

The issue

Currently we depend on a lot of packages related to generating documentation. Some of these are most likely not needed, and could be removed.

Things to do

  • Find the minimum dependencies for the current workflow of generating documentation
  • (If needed) Refactor how documentation is generated

Add code style check to pytest GH action

Add checks for applying to the defined coding style principles into the automated GH workflow for running pytest for every PR.

Remember to update instructions too.

Modify CONTRIBUTIONS.md

  • Transfer instructions about naming conventions from Teams to CONTRIBUTIONS.md file
  • Remove transferred parts from Teams doxc

Add jupyterlab

Jupyterlab could be a useful tool in development. I'm thinking the jupyterlab package should be a dev-dependency, and a notebooks directory could be added to the root of the repo to host the notebooks.

Or, do we want to separate notebook-based testing into another repo entirely?

add CONTRIBUTING.md

Currently we have a readme with instructions on environment setup. A CONTRIBUTING.md would be useful for holding more general contributing guidelines. At least:

  • Describe a suitable pull request process

Create surface derivatives

Create functionality for surface derivatives:

  • aspect
  • slope
  • different curvature sets
  • tests for valid results/values (e.g. cut occuring outliers if necessary)

Functionality will be based on single band raster datasets only.

Add unify rasters

Add unify rasters functionality. This consists of the following steps: reproject, resample, snap/align, clip/mask. It should be decided if some of these steps are optional and the user can choose.

While developing this functionality, some errors in all reprojecting, resampling and snapping were found. These modifications were done as part of this feature development.

Split local test data to input and output

  • Most testing will be done with local datasets (large files, sensitive data)
  • Split the local directory to input and output to keep the test data organized
    • Data (input) Results (output)
  • Also add docs for adding test data to local

Add weights-of-evidence functionality for predictive mapping

Add weights-of-evidence (wofe) functionality: The WofE is a Bayesian method to estimate the probability of a hypothesis (H) based on the knowledge of occurrence of certain evidential events (E). Applied to predictive mapping of mineral deposits, the hypothesis to be predicted is the probability of existence of the targeted mineral deposit and the evidential events are mapped from the geoscientific datasets representing geological features such as lithology, structures, whole rock geochemistry etc.

Implementation using geospatial datasets involves quantification of spatial associations (i.e., the weights) between mineral deposits and the geospatial evidential layers and subsequent calculations of the posterior probabilities for potential of existence of a mineral deposit.

The WofE functionality will have the following steps:

  1. Calculation of weights-of-evidences for multiclass (multi-feature) evidential events,
  2. Reclassification of multiclass (multi-feature) evidential events to binary evidential events, based on the weights-of-evidences,
  3. Recalculation of generalized weights-of-evidences after reclassification, and
  4. Calculating posterior probabilities by combining the generalized weights of all the evidential events.

Computationally, the WofE functionality is to be implemented in the following two parts:

  1. Quantifying the spatial association (i.e. weights) between the mineral deposit/occurrence and the evidential events.
  2. Updating the posterior probabilities of the deposit occurrence by combining the weights-of-evidences of all the events.

Change of scope

Remove everything related to testing eis_toolkit within QGIS environment. Add

  • better instructions on testing eis_toolkit functions via command line
  • a new jupyter lab notebook for demonstrating how it can be utilized for testing purposes

Add polygonize

Add polygonize functionality. Reverse functionality of rasterize. Polygonize takes a raster as input and turns it into vector format. The input raster needs to be either binary raster or otherwise vector edges are drawn at the bounds of raster and/or to nodata values.

Refactor clip.py

  • Abstract file reading away from the processing function
    • Use objects instead of paths as inputs and outputs
      • Inputs: rasterio object (opened dataset), polygon as shapely.geometry
      • Output: rasterio object (opened dataset)

Update package names

Name preprocessing folder to geoprocess etc (depending on what gets agreed in the meeting).

  • Add also rest of the packages
  • Update the folder paths everywhere to match new package names (e.g. mkgendocs.yml)

Update notebooks

We have some example notebooks at notebooks/, but some of them are out of date.

Provide `pre-commit` configuration for formatting and linting of code

Problem

As I have noticed, some tools are provided for automatic formatting (e.g. black) and linting (e.g. flake8) in the invoke file, tasks.py. However, as long as running these are done manually, there is room for much developer "error".

Solution

pre-commit can be used to run most of the checks, except for mypy. It checks the files before you can commit anything locally and reformats them using repository-configured tools.

Implementation

Addition of a .pre-commit-config.yaml file in the repository. E.g.

---
repos:
  - repo: https://github.com/python/black
    rev: 22.10.0
    hooks:
      - id: black
        language_version: python3
  - repo: https://github.com/pycqa/isort
    rev: 5.10.1
    hooks:
      - id: isort
        args: ["--profile", "black"]
  - repo: https://github.com/pycqa/flake8
    rev: 6.0.0
    hooks:
      - id: flake8

Probably should not be a strict requirement for developers and of course how it works might not be apparent to some and teaching might be too much overhead for this single project. However, for those that use it, they can format the code and fix linting errors, maybe during pull requests before merges and know that between the users of pre-commit the style should stay strictly the same. A GitHub action could also be added that checks the code with pre-commit.

Add distance computations

Distance computations consists of calculating the distance to the nearest vector geometry for each raster cell. This distance is added as the value of the raster cell. The "empty" raster in which the values are calculated should be provided by the user or created from user inputs (e.g. bounds and cell size).

Add statistics report

Add implementations for the following basic statistical analysis functions:

  • Mean
  • Quantiles
  • Standard deviation (normal and relative)
  • Skewness

Implement toolkit - plugin interface

EIS Toolkit will need an interface and an entrypoint for calls by the EIS QGIS plugin. The interface will parse CLI arguments / potential config file, call EIS Toolkit functions and return data to the plugin.

Background

To describe EIS QGIS plugin concisely, it will consists of:

  • EIS Wizard โ€“ a GUI that will guide the user to apply correct tools in appropriate order and facilitate managing a project
  • EIS Processing Algorithms โ€“ a collection of all the EIS algorithms meant to be called by end user, providing a GUI for each algorithm separately

Communication between EIS QGIS plugin and EIS Toolkit will happen exclusive between individual EIS Processing Algorithms and an entrypoint script in the Toolkit. EIS Wizard will utilize the processing algorithms when ordering computations. The implementation will be extended later if complicated workflows need to be ordered directly from the Toolkit, and not by sequencing calls to processing algorithms.

Implementation

EIS QGIS plugin will use EIS Toolkit via the subprocess module. EIS Toolkit will exist as an installed Python library in a Python environment that the plugin will call when the correct path is provided.

The chosen library for the CLI implementation is Typer.

Refactor how test data is handled

The issue

Currently all test data is pushed on to the remote. This is not optimal for two main reasons:

  • Some datasets needed in testing are not open and thus should not be available on the remote
  • File sizes can get out of control fast as data is added

Plan for solving

This can be solved by configuring a new folder specifically for test data:

  • gitignore all contents of said folder
  • Include the folder itself with .gitkeep

And then:

  • Instruct developers to copy test data locally to the test data folder
  • Write tests to look for data in the test data folder

Now, upon cloning, the repository comes with no data included. Only the empty folder is tracked by git, and the tests look for data there.

Things needing to be done

  • Configure the new folder
  • Add instructions for adding test data
  • (A separate issue?) Make sure all tests look for data in the correct path

Improve/extend resampling

Currently, resampling takes upscale_factor parameter which defines the increase/decrease in pixel size. However, this might not be the most convenient way for users to resample their data. The current parameter could be replaced, or a new one could be added to define the target pixel size.

Check pre-commit

  • Check if precommit hooks (especially linting) function as expected

Support a development environment with `conda`

Problem

Currently the supported environments for developing the project are poetry and docker+poetry. As you probably well know, neither of these are work very well on Windows. An alternative is conda which overhauls the whole installation process for compiled Python dependencies. It works okay(ish) on Windows and especially many data scientists are used to the tooling. Supporting might make development easier for some people.

Solution

A simple environment.yml allows the creation of a working development environment:

name: eis_toolkit

channels:
  - conda-forge

# Changes here should be kept in sync with ./pyproject.toml
dependencies:
  - python >=3.10
  - gdal >=3.4.3
  - rasterio >=1.3.0,<1.4.0
  - pandas >=1.4.3,<1.5
  - geopandas >=0.11.1,<0.12
  - scikit-learn >=1.1.2,<1.2.0
  - matplotlib-base
  - statsmodels >=0.13.2,<0.14.0
  - keras >=2.9.0
  - tensorflow >=2.9.1
  - mkdocs-material >=8.4.0,<8.5.0
  # Dependencies for testing
  - pytest

The current test suite is also passed by this environment definition.

Implementation

Just requires adding the shown environment.yml file to the project.

Some of the version are not locked and differ from pyproject.toml due some combinations of versions being unable to be installed with conda based on some brief testing. I can iterate on the versions if it is deemed necessary at this point.

Additionally, to keep the environment.yml consistent with the poetry environment it is quite easy to add a conda workflow to test the environment with the existing test suite in different operating system. E.g. https://github.com/nialov/fractopo/blob/master/.github/workflows/conda.yml. In the future, if it is wanted, the package can also be distributed with conda using the conda-forge feedstock system.

I can do a pull request for the environment and workflow implementation if you consider it suitable. Understandably this adds some overhead and possibility for errors between programmers but maybe the pros outweigh the cons.

Also

To introduce myself here: I, Nikolas Ovaskainen ([email protected]), am a research scientist at the Geological Survey of Finland. I have working time for this project this year, so I will (hopefully) be making some feature contributions soon-ish.

Add winsorizing function

Create a new branch (from master branch) and add

  • winsorize.py file into eis_toolkit/eis_toolkit/transformations folder (corresponds to clip.py)
  • winsorize_test.py file into eis_toolkit/tests folder (corresponds to clip_test.py). Utilize data in Teams folder in your tests if possible (if not possible, add necessary files to the Teams folder)!
  • more custom exception classes for eis_toolkit/eis_toolkit/exceptions.py

Optional additions:

  • a new jupyter lab notebook file into eis_toolkit/notebooks folder for testing winsorize tool in practice (corresponds to testing_clip.ipynb)
  • a new page into documentation site (see instructions folder and documentation.md for help)

Do not make too large commits. Rather multiple commits than one massive commit :) After finishing the branch, create a pull request and add @pavetsu14 or @eemilhaa as Reviewer.

Add density computations

Density computations consist of calculating the number of vector points/geometries within each raster cell. Alternatively the points might be buffered and the amount of buffered points (=polygons) intersecting each raster cell could be calculated. These variations will at least be implemented.

MAC and eis_toolbox

Does QGIS use ns. default python (as in Linux) or does QGIS has its own separate python (as in Windows)?

Test osgeo4w shell's python version

Check whether this is QGIS version dependent thing in windows?

What can be done if osgeo4w shell's python version does not match with what our toolbox requires?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.