gispocoding / eis_toolkit Goto Github PK
View Code? Open in Web Editor NEWPython library for mineral prospectivity mapping
Home Page: https://eis-he.eu/
License: European Union Public License 1.2
Python library for mineral prospectivity mapping
Home Page: https://eis-he.eu/
License: European Union Public License 1.2
Add reproject vector functionality.
Related to test.py and testing.ipynb files.
Rasterization is the process of "burning" the values to a raster where a vector geometry intersects raster cells. The burned value can either be a binary value or a value from the vector geometry. Geometries can be points, lines or polygons.
Add function that snaps/aligns input raster.
If eis_toolkit
is called using the command-line (e.g. using the subprocess
module in Python) from the QGIS frontend, it needs a defined command-line interface.
Python has builtin solutions for creating a command-line interface (sys.argv
, argparse
). However, depending on the complexity it would more robust to use a mature, batteries included command-line library, such as click
or typer
.
Through this interface, any functions within the library can be called, but the arguments to those functions must then be provided through the command-line.
Addition of a command-line interface file, typically named cli.py
at eis_toolkit/cli.py
. Additionally, to call the same command-line interface through the module system python -m eis_toolkit ...
, a eis_tookit/__main__.py
file with some boilerplate code is required.
I will create a draft click
interface in the beginning as click
is already a dependency of the project (used by some other dependency) so no additional dependencies are required.
Currently we depend on a lot of packages related to generating documentation. Some of these are most likely not needed, and could be removed.
Add checks for applying to the defined coding style principles into the automated GH workflow for running pytest for every PR.
Remember to update instructions too.
Jupyterlab could be a useful tool in development. I'm thinking the jupyterlab package should be a dev-dependency, and a notebooks directory could be added to the root of the repo to host the notebooks.
Or, do we want to separate notebook-based testing into another repo entirely?
Currently we have a readme with instructions on environment setup. A CONTRIBUTING.md would be useful for holding more general contributing guidelines. At least:
Create functionality for surface derivatives:
Functionality will be based on single band raster datasets only.
Add unify rasters functionality. This consists of the following steps: reproject, resample, snap/align, clip/mask. It should be decided if some of these steps are optional and the user can choose.
While developing this functionality, some errors in all reprojecting, resampling and snapping were found. These modifications were done as part of this feature development.
Add weights-of-evidence (wofe) functionality: The WofE is a Bayesian method to estimate the probability of a hypothesis (H) based on the knowledge of occurrence of certain evidential events (E). Applied to predictive mapping of mineral deposits, the hypothesis to be predicted is the probability of existence of the targeted mineral deposit and the evidential events are mapped from the geoscientific datasets representing geological features such as lithology, structures, whole rock geochemistry etc.
Implementation using geospatial datasets involves quantification of spatial associations (i.e., the weights) between mineral deposits and the geospatial evidential layers and subsequent calculations of the posterior probabilities for potential of existence of a mineral deposit.
The WofE functionality will have the following steps:
Computationally, the WofE functionality is to be implemented in the following two parts:
Remove everything related to testing eis_toolkit within QGIS environment. Add
Add random forest functionality
...content to be filled @BerndTorchala
Can be done after changing repository visibility to public.
Something maybe changed upstream in conda-forge
which seems to have broken the Windows installation in GitHub Actions. E.g. https://github.com/GispoCoding/eis_toolkit/actions/runs/4413929172/jobs/7734997459.
Will need to test the environment.yaml
file again and see if it can be produced locally. Until then you should ignore the failing install and testing.
Add polygonize functionality. Reverse functionality of rasterize. Polygonize takes a raster as input and turns it into vector format. The input raster needs to be either binary raster or otherwise vector edges are drawn at the bounds of raster and/or to nodata values.
Most probably MIT
@eemilhaa Add here a more detailed description / execution plan.
Create new branch for extract_window function and add required code for function, tests and exceptions
Name preprocessing folder to geoprocess etc (depending on what gets agreed in the meeting).
We have some example notebooks at notebooks/
, but some of them are out of date.
As I have noticed, some tools are provided for automatic formatting (e.g. black
) and linting (e.g. flake8
) in the invoke
file, tasks.py
. However, as long as running these are done manually, there is room for much developer "error".
pre-commit
can be used to run most of the checks, except for mypy
. It checks the files before you can commit anything locally and reformats them using repository-configured tools.
Addition of a .pre-commit-config.yaml
file in the repository. E.g.
---
repos:
- repo: https://github.com/python/black
rev: 22.10.0
hooks:
- id: black
language_version: python3
- repo: https://github.com/pycqa/isort
rev: 5.10.1
hooks:
- id: isort
args: ["--profile", "black"]
- repo: https://github.com/pycqa/flake8
rev: 6.0.0
hooks:
- id: flake8
Probably should not be a strict requirement for developers and of course how it works might not be apparent to some and teaching might be too much overhead for this single project. However, for those that use it, they can format the code and fix linting errors, maybe during pull requests before merges and know that between the users of pre-commit
the style should stay strictly the same. A GitHub action could also be added that checks the code with pre-commit
.
Implement a function that resamples input raster.
Distance computations consists of calculating the distance to the nearest vector geometry for each raster cell. This distance is added as the value of the raster cell. The "empty" raster in which the values are calculated should be provided by the user or created from user inputs (e.g. bounds and cell size).
Add implementations for the following basic statistical analysis functions:
Snap raster function has now a few constraints that should be removed.
EIS Toolkit will need an interface and an entrypoint for calls by the EIS QGIS plugin. The interface will parse CLI arguments / potential config file, call EIS Toolkit functions and return data to the plugin.
To describe EIS QGIS plugin concisely, it will consists of:
Communication between EIS QGIS plugin and EIS Toolkit will happen exclusive between individual EIS Processing Algorithms and an entrypoint script in the Toolkit. EIS Wizard will utilize the processing algorithms when ordering computations. The implementation will be extended later if complicated workflows need to be ordered directly from the Toolkit, and not by sequencing calls to processing algorithms.
EIS QGIS plugin will use EIS Toolkit via the subprocess module. EIS Toolkit will exist as an installed Python library in a Python environment that the plugin will call when the correct path is provided.
The chosen library for the CLI implementation is Typer.
Add gridding check functionality.
Add basic plotting functions for exploratory analysis: functions to plot histograms, scatterplots, and boxplots from numpy ndarray input data.
Currently all test data is pushed on to the remote. This is not optimal for two main reasons:
This can be solved by configuring a new folder specifically for test data:
And then:
Now, upon cloning, the repository comes with no data included. Only the empty folder is tracked by git, and the tests look for data there.
Currently, resampling takes upscale_factor parameter which defines the increase/decrease in pixel size. However, this might not be the most convenient way for users to resample their data. The current parameter could be replaced, or a new one could be added to define the target pixel size.
Currently the supported environments for developing the project are poetry
and docker+poetry
. As you probably well know, neither of these are work very well on Windows. An alternative is conda
which overhauls the whole installation process for compiled Python dependencies. It works okay(ish) on Windows and especially many data scientists are used to the tooling. Supporting might make development easier for some people.
A simple environment.yml
allows the creation of a working development environment:
name: eis_toolkit
channels:
- conda-forge
# Changes here should be kept in sync with ./pyproject.toml
dependencies:
- python >=3.10
- gdal >=3.4.3
- rasterio >=1.3.0,<1.4.0
- pandas >=1.4.3,<1.5
- geopandas >=0.11.1,<0.12
- scikit-learn >=1.1.2,<1.2.0
- matplotlib-base
- statsmodels >=0.13.2,<0.14.0
- keras >=2.9.0
- tensorflow >=2.9.1
- mkdocs-material >=8.4.0,<8.5.0
# Dependencies for testing
- pytest
The current test suite is also passed by this environment definition.
Just requires adding the shown environment.yml
file to the project.
Some of the version are not locked and differ from pyproject.toml
due some combinations of versions being unable to be installed with conda
based on some brief testing. I can iterate on the versions if it is deemed necessary at this point.
Additionally, to keep the environment.yml
consistent with the poetry
environment it is quite easy to add a conda
workflow to test the environment with the existing test suite in different operating system. E.g. https://github.com/nialov/fractopo/blob/master/.github/workflows/conda.yml. In the future, if it is wanted, the package can also be distributed with conda
using the conda-forge
feedstock system.
I can do a pull request for the environment and workflow implementation if you consider it suitable. Understandably this adds some overhead and possibility for errors between programmers but maybe the pros outweigh the cons.
To introduce myself here: I, Nikolas Ovaskainen ([email protected]), am a research scientist at the Geological Survey of Finland. I have working time for this project this year, so I will (hopefully) be making some feature contributions soon-ish.
Add instructions for other users to test eis_toolkit functions via QGIS's python console.
In other words
Create a new branch (from master branch) and add
Optional additions:
Do not make too large commits. Rather multiple commits than one massive commit :) After finishing the branch, create a pull request and add @pavetsu14 or @eemilhaa as Reviewer.
Density computations consist of calculating the number of vector points/geometries within each raster cell. Alternatively the points might be buffered and the amount of buffered points (=polygons) intersecting each raster cell could be calculated. These variations will at least be implemented.
Add plot_rate_curve
function that creates success or prediction rate curve.
Create function that reprojects input raster to match given base raster.
Add general description of the eis_toolkit into Index page of the documentation site.
Does QGIS use ns. default python (as in Linux) or does QGIS has its own separate python (as in Windows)?
Check whether this is QGIS version dependent thing in windows?
What can be done if osgeo4w shell's python version does not match with what our toolbox requires?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.