Giter Site home page Giter Site logo

ai4er-cdt / geograph Goto Github PK

View Code? Open in Web Editor NEW
37.0 9.0 10.0 810.8 MB

GeoGraph provides a tool for analysing habitat fragmentation and related problems in landscape ecology. GeoGraph builds a geospatially referenced graph from land cover or field survey data and enables graph-based landscape ecology analysis as well as interactive visualizations.

Home Page: https://geograph.readthedocs.io

License: MIT License

Shell 0.02% Makefile 0.24% Python 12.28% Jupyter Notebook 87.46%
landscape-ecology landscape-evolution biodiversity landscape-connectivity remote-sensing landcover biodiversity-informatics python

geograph's Introduction

GeoGraph

Binder License: MIT Code style: black Documentation Status PyPI version DOI

GeoGraphViewer demo gif

Table of contents:

  1. Description
  2. Installation
  3. Requirements
  4. Documentation

1. Description

GeoGraph provides a tool for analysing habitat fragmentation, related problems in landscape ecology. GeoGraph builds a geospatially referenced graph from land cover or field survey data and enables graph-based landscape ecology analysis as well as interactive visualizations. Beyond the graph-based features, GeoGraph also enables the computation of common landscape metrics.

2. Installation

GeoGraph is available via pip, so you can install it using

pip install geograph

Done, you're ready to go!

See the documentation for a full getting started guide.

3. Requirements

GeoGraph is written in Python 3.8 and builds on NetworkX, ipyleaflet and many more packages. See the requirements directory for a full list of dependencies.

4. Documentation

Our documentation is available at geograph.readthedocs.io.

geograph's People

Contributors

croydon-brixton avatar crystallee1104 avatar dependabot[bot] avatar herbiebradley avatar kmacfarlanegreen avatar rdnfn avatar sdat2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geograph's Issues

Precision for calculating adjacent polygons

Question:
Should we allow for a given epsilon (e.g. EPS=1e-12) machine precision when calculating whether polygons touch in our build-up of the GeoGraph?

Context:
The reason I raise this issue is this issue in shapely here which explains that polygon touch operations do not currently implement a buffer against machine precision issues.
While it seems from that thread that this is not an issue when we use right-angle geometries only (such as our complex polygonized rasters), I do not know this for sure.

My take:
To be on the safe side, and to also support non-right-angle geometries for potential future use cases I think it would be good to implement the option of the epsilon-precision-touches. The trade-off to be considered is the graph-build-up speed. Depending on how much this epsilon-precision-touch might slow down the graph-build up I would make it the default or not.
@herbiebradley @arduinfindeis what's your take?

image

[feature] Is it possible to publish geograph on conda-forge?

Hi there. Thank you very much for this timely package! I have used it for a while and enjoy it very much along with pylandstats.

I was wondering if it is possible to publish geograph on conda-forge? Currently many users like to use anaconda to manage packages and environments, especially when dealing with spatial packages like geopandas, or 'fiona'. So it would be nice if geograph can be also available on that.

Regarding projection system(crs) for graph generation

Hi,

I tried theEPSG:3857 Projection.

"from geograph import GeoGraph
data= r'D:\Geograph\lulc_test\lulc_2021.tif'
graph = GeoGraph(data,crs="EPSG:3857")"

and

"from geograph.visualisation.geoviewer import GeoGraphViewer
viewer = GeoGraphViewer()
viewer.add_graph(graph, name='my_graph')
#viewer.enable_graph_controls()
viewer"

and I received the error:
"AttributeError Traceback (most recent call last)
Cell In[5], line 3
1 from geograph.visualisation.geoviewer import GeoGraphViewer
2 viewer = GeoGraphViewer()
----> 3 viewer.add_graph(graph, name='my_graph')
4 #viewer.enable_graph_controls()
5 viewer

File ~\anaconda3\envs\geograph-local\lib\site-packages\geograph\visualisation\geoviewer.py:231, in GeoGraphViewer.add_graph(self, graph, name, with_components)
226 self.logger.debug("Creating graph geometries layer (graph_geo_data).")
227 nodes, edges = graph_utils.create_node_edge_geometries(
228 current_graph, crs=self.gpd_crs_code
229 )
230 graph_geo_data = ipyleaflet.GeoData(
--> 231 geo_dataframe=edges.append(nodes)
232 .to_frame(name="geometry")
233 .reset_index(),
234 name=current_name + "_graph",
235 **self.layer_style["graph"]
236 )
238 # Creating choropleth layer for patch polygons
239 self.logger.debug("Creating patch polygons layer (pgon_choropleth).")

File ~\anaconda3\envs\geograph-local\lib\site-packages\geopandas\geoseries.py:235, in GeoSeries.append(self, *args, **kwargs)
234 def append(self, *args, **kwargs):
--> 235 return self._wrapped_pandas_method("append", *args, **kwargs)

File ~\anaconda3\envs\geograph-local\lib\site-packages\geopandas\geoseries.py:622, in GeoSeries._wrapped_pandas_method(self, mtd, *args, **kwargs)
620 def _wrapped_pandas_method(self, mtd, *args, **kwargs):
621 """Wrap a generic pandas method to ensure it returns a GeoSeries"""
--> 622 val = getattr(super(), mtd)(*args, **kwargs)
623 if type(val) == Series:
624 val.class = GeoSeries

AttributeError: 'super' object has no attribute 'append"

What might be the problem and is there a wayaround?

Create GeoGraph metrics

Description: it would be great to have a diverse set of metrics, for users this will be a key feature of our tool. To make it easy to add new metrics later on, I think it would be a good idea to agree on a unified signature for all GeoGraph metrics. I propose the following:

class Metric(TypedDict):
    """ This class defines a dictionary type for metric.

    Example: {'name': 'average component area', 'value':100, 'unit': 'm^2', 'variant':'component'}
    """
    name: str
    value: float
    unit: str
    variant: str  #e.g. 'graph', 'conventional', 'component'

def get_avg_component_area(graph: geograph.GeoGraph) -> Metric:

Metrics to create:
(We don't necessarily need all, but a few would be great)

  • Average component area
  • Number of components
  • Average patch area
  • Total area
  • Percentage of landscape
  • Mean patch isolation (average distance to the next-nearest patch)
  • Mean component isolation (average distance to the next-nearest component)

Additional features:

  • Should we make habitats (pseudo) GeoGraph instances with a df attribute that references the main df of the parent graph (without deep copying)? That way habitat's could be passed directly to metric functions.
  • It would be great if we could show changes in metrics along with changes in nodes and polygons via the GeoGraphTimeline class. Something like

Number of components: ⬆️ +10% (from 11 to 14)
Average component area: ⬇️ -5% (from 100 m^2 to 60m^2)

Additional info:
This is roughly where the metrics would go in the UI
Screenshot 2021-03-02 at 10 34 30

Docs: create docs website

Description: At the moment each user must use sphinx to create their own docs... this is suboptimal.

Tasks: Create online documentation website either via github pages or readthedocs using Sphinx html output.

Make graph components accessible

Description: for the UI it would be great to have the following method/function, either as part of GeoGraph or as a standalone function:

def get_graph_components(graph: nx.Graph()) -> gpd.GeoDataFrame():
    """Return a GeoDataFrame with graph components.
    
    This method takes an nx.Graph and determines the individual disconnected graph
    components that make up the graph. Each row of the returned GeoDataFrame 
    corresponds to a graph component, with entries in column 'geometry' being the union
    of all individual polygons making up a component.
    
    This method allows for the UI to visualise components and output their number as 
    a metric.
    
    More info on the definition of graph components can be found here:
    https://en.wikipedia.org/wiki/Component_(graph_theory)

    Args:
        graph (nx.Graph): nx.Graph of a GeoGraph
    """

Using the returned GeoDataFrame, the UI can then color code the components and output their number as a metric.

Fix LaTeX errors in report submodule

Description: there are a number of errors in the latex report submodule when compiling main.tex at the moment. Somebody familiar with the template should look at this and ideally fix them (if possible).

Latex output:

 LaTeX Error: Option clash for package xcolor.

See the LaTeX manual or LaTeX Companion for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.5 \RequirePackage
                   [dvipsnames]{color}
The package xcolor has already been loaded with options:
  []
There has now been an attempt to load it with options
  [dvipsnames]
Adding the global options:
  ,dvipsnames
to your \documentclass declaration may fix this.
Try typing  <return>  to proceed.
 Theme/linkcolors.sty, line 6

LaTeX Error: Option clash for package color.

See the LaTeX manual or LaTeX Companion for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.6 \RequirePackage
                   {hyperref}
The package color has already been loaded with options:
  []
There has now been an attempt to load it with options
  [dvipsnames]
Adding the global options:
  ,dvipsnames
to your \documentclass declaration may fix this.
Try typing  <return>  to proceed.

Package hyperref Info: Option `colorlinks' set `true' on input line 25.
 Theme/linkcolors.sty, line 25

Package kvsetkeys Error: Undefined key `footnotecolor'.

See the kvsetkeys package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.25 }
      
The keyval family of the key `footnotecolor' is `Hyp'.
The setting of the key is ignored because of the error.
 main.tex, line 57

Undefined control sequence.

The compiler is having trouble understanding a command you have used. Check that the command is spelled correctly. If the command is part of a package, make sure you have included the package in your preamble using \usepackage{...}.
 Learn more

\f@nch@olf ->\penname 
                      \strut 
l.57 \end{document}
                   
The control sequence at the end of the top line
of your error message was never \def'ed. If you have
misspelled it (e.g., `\hobx'), type `I' and the correct
spelling (e.g., `I\hbox'). Otherwise just continue,
and I'll forget about whatever was undefined.

Performance-improvement: Speed up de9im pattern matching

I feel like there has to be a way to make this much more efficient, maybe through stuff like numba or Cython, but that's something to leave for later.

Originally posted by @herbiebradley in #28 (comment)

Further info:
Side note: when searching for what the relate function does we came to this: https://github.com/libgeos/geos/blob/186bbd32fbf07d8b5d419cdfd64c14e2270a418a/src/geom/IntersectionMatrix.cpp#L89 which are the functions which a lot of Shapely stuff calls.

Overall, we haven't concentrated on this part much bc the string comparison is much faster than the overlap computations and rtree query so the overall speedup in the function would be only around 5%.

Node identification and operations

This issue concerns all issues related to:

  • node identification
  • operating on nodes (merging / adding /removing and the corresponding updates in the underlying datastructure of GeoGraph, such as updating ids, areas or the rtree for example).

Decide on citation style in report

Description: I would suggest to change the citation style in the report to an author-year-style. I think it reads significantly better, as you can write something like "Simon et al. (2021) propose a super cool graph-based method for habitat fragmentation ...". Number-based citations can be very tedious to read, and are harder to remember across sections, was it 87 or 78? If you have another preference, feel free to comment below so we can discuss.

Possible Solution:
One way to do the author-year-style would be with the following in the preamble (using natbib):

\usepackage{natbib}
\bibliographystyle{abbrvnat}
\setcitestyle{authoryear}

Then use \citet{ } and \citep{} to create Simon et al. (2021) and (Simon et al., 2021) respectively.

For adding the bibliography we would use

\bibliography{references}

Create GeoGraphViewer

Description: create an interactive and spatial data visualisation module that allows to visualize all stages of our models from input to graph to metrics. Ideally we would also be able to include temporal information in a useful way (either via automatic transformation or slider).

Features:

Basic (minimum viable product)
  • Show basic visualisation of GeoGraph
  • Folium
  • ipyleaflet
  • Enable habitat selection
  • Add polygons to visualisation
    • Add button selection to have graph and polygons in one line
  • Visualisation: Show poorly connected (single-edge) nodes #easy
  • Visualisation: Show individual components
  • Metrics: average component size
  • Metrics: add number of components to metrics panel
  • Metrics: add number of single edge nodes to metrics panel
  • Diff: Enable time change via slider
  • Diff: Enable diff selection (double slider) and visualisation
  • Diff: Show node diff
  • Diff: Show polygon diff
  • Allow full usage of dashboard (incl. data loading) via only voila CLI interface without need to code anything in Python (maybe with limited functionality). This would radically broaden our user-base, since many in community appear to use R instead of Python.
  • Add order (so last added graph metrics are shown)
Extended (nice to have if time)
  • Metrics
    • Show standard metrics in dashboard #conventional
    • Allow computation of standard metrics in current view of graph #conventional
  • Add heatmap layer for some standard metrics #conventional
  • Create R interface: building on the viola CLI interface
  • Show graph components distinctly (maybe creating a coloured buffer mostly transparent buffer around them)
  • Add custom style
    • Add geograph logo
    • Add custom dark-ish color scheme
  • Allow user to draw barrier on map
  • Remove edges accordingly

EDIT:

  • 2 March: added extended feature list

Merge polygons in the GeoGraph merge_classes function

In the merge_classes function in geograph.py, we can merge class labels together in the dataframe, but this may leave some neighbouring polygons of the same class: these need to be merged. The merge_nodes function should work.

See #42 (comment)

Creating this issue because I want to merge the PR quickly.

Distribution: set package distribution via PyPI up

Description: make the GeoGraph package installable via pip by adding it to PyPI (Python Package Index). This will significantly increase the potential reach of our project, and make it super easy for people to try.

Tasks:
To do this a few minimal requirements need to be meet

  • We need to define our basic package description using setuptools package or similar
    • Introduce our own versioning (probably starting with 0.0.1)
    • Define package (not dev) dependencies, ideally as few and as flexibly as possible
  • We need to build the package and upload it to PyPI (having registered before)

Additionally, it would be great to have some of these features

  • Introduce testing to our package (e.g. using pytest), in the beginning this doesn't need to be a proper unit test setup: it could just be a few tests if the code will run without errors. That would be especially useful to determine the range of dependencies we're able to work with.

Sources:

Allow minimal distance-based edges in a habitat graph

Description: Currently graph edges are only added to the graph when two polygons overlap or touch. For habitats there will almost always be a certain distance a species is able to travel across land that is not part of the habitat. Therefore we want to be able to add edges between polygons if they are within a certain maximum travel distance.

Potential solution: One way would be to add an edge to all polygons within a certain range (e.g. 10m) in the initial creation of the (parent-)graph and add an edge-attribute of min_distance between two polygons. Then when we add habitat subgraphs later we could easily filter the edges such that their min_distance is withing the max_travel_distance.

Further notes:
The create_habitat method of the GeoGraph class might take the following arguments

  • name (e.g. "eagle habitat")
  • list of landcover classes that are part of the habitat (e.g. ["pine forests", ...])
  • max_travel_distance: maximum travel distance outside habitat (e.g. 50m)
  • list of landcover classes that are uncrossable/impassable (e.g. ["motorway",...])

Performance-improvement: Combine boolean masks

If numpy does short-circuit evaluation on these things this it'd be slightly faster to combine boolean masks.

Does anyone know how numpy handles these type of cases (below)?

Case:
Case select_from_array[np.logical_or(condition_array1, condition_array2)]
Does it first evaluate both condition_array1 and condition_array2 in the slice [ ... ] and then or the conditions (in which case it'd probably be slower bc we would calculate the geometry overlaps for shapes which won't agree in class label).
Or does it calculate the first element of condition_array1 and then short-circuit decide if that element of condition_array2 even needs to be calculated? (in which case I think it should be slightly faster)

Originally posted by @Croydon-Brixton in #28 (comment)

Create full-stack case study

Description: It would be great to have one case study that illustrates our full stack of software, from WS1 to WS4. This would be best in the form of a jupyter notebook, ideally with an interactive part that allows the user to see all the different intermediary spatial data EO data -> classification -> polygons -> graph. Main purpose of this case study is to highlight the hard work done in all workstreams and enable new users to build on it 🚀

Features:
These are ordered, each building on top of the previous.

  • Create basic notebook that does the different steps one after the other (with one taking the input from the other). For readability this notebook should be as short as possible.
  • Add (time-static) spatial data from different time steps to GeoGraphViewer.
  • Figure out how we can make data for this case study accessible to other users (license, loading, hosting, etc.). This will make it easily reproducable and allow the notebook to serve as a great intro for new users to our library.
  • Optional: add all spatial data to GeoGraphTimeline, and let the user scroll through time in all spatial data layers.

Roughly what this what the visualisation part of the notebook might look like:
Screenshot 2021-03-02 at 12 33 07

EDIT:
To be more specific. The stich together the most basic version of this case study we would need the following contributions:

  • WS1:

    • EO data at a particular time (the visualisation will only use part of it)
    • A pre-trained model
    • A small notebook that loads this pre-trained model and applies it to EO data to create landcover data
  • WS2:

    • A notebook that runs the basic pylandstats analysis on the created landcover data. (See notebook recently added)
  • WS3&4:

    • Create GeoGraph and visualise it using the GeoGraphViewer.

Setup model evaluation framework

Purpose: we need to setup a rigurous framework to evaluate relative model performance. Whilst we create "metrics", these are part of the model and cannot be directly used to evaluate relative performance between models. How can evaluate models relative to each other? How do we know a model is "good"?

Possible evaluation methods:

  • Create simple test cases (as in papers such as Fahrig 2003), e.g two circles with increasing distance. See issue #10.
  • Pick a metric with ground truth (e.g. average patch size), and evaluate with respect to that ground truth.
  • What else?

TODOs:

  • Check how relative comparison is done in other papers

Reliability: Binder Notebooks Currently Don't Run

Hey, I just tried running the first notebook from binder. It broke on the first input with the output below.

As a general point, it would probably be good if there was an automatic test to see if the notebooks run without issue before merges. This would add a general smoke test to the project.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import rioxarray as rxr
import geopandas as gpd
import pylandstats as pls
from geograph import GeoGraph
from geograph.constants import UTM35N
from geograph.demo.binder_constants import DATA_DIR, ROIS

# Parse geotif landcover data
chernobyl_path = (
    lambda year: DATA_DIR / "chernobyl" / "esa_cci" / f"esa_cci_{year}_chernobyl.tif"
)

# Parse ROIS
rois = gpd.read_file(ROIS)
cez = rois[rois["name"] == "Chernobyl Exclusion Zone"]
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import rioxarray as rxr
import geopandas as gpd
import pylandstats as pls
from geograph import GeoGraph
from geograph.constants import UTM35N
from geograph.demo.binder_constants import DATA_DIR, ROIS
​
# Parse geotif landcover data
chernobyl_path = (
    lambda year: DATA_DIR / "chernobyl" / "esa_cci" / f"esa_cci_{year}_chernobyl.tif"
)
​
# Parse ROIS
rois = gpd.read_file(ROIS)
cez = rois[rois["name"] == "Chernobyl Exclusion Zone"]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xe
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Input In [2], in <cell line: 6>()
      4 import rioxarray as rxr
      5 import geopandas as gpd
----> 6 import pylandstats as pls
      7 from geograph import GeoGraph
      8 from geograph.constants import UTM35N

File /srv/conda/envs/notebook/lib/python3.8/site-packages/pylandstats/__init__.py:1, in <module>
----> 1 from .landscape import *
      2 from .spatiotemporal import *
      3 from .zonal import *

File /srv/conda/envs/notebook/lib/python3.8/site-packages/pylandstats/landscape.py:32, in <module>
     27 CELLLENGTH_RTOL = 0.001
     28 KERNEL_MOORE = ndimage.generate_binary_structure(2, 2)
     31 @transonic.boost
---> 32 def compute_adjacency_arr(padded_arr: 'uint32[:,:]', num_classes: 'int'):
     33     # flat-array approach to pixel adjacency from link below:
     34     # https://ilovesymposia.com/2016/12/20/numba-in-the-real-world/
     35     # the first axis of `adjacency_arr` is of fixed size of 2 and serves to
     36     # distinguish between vertical and horizontal adjacencies (we could also
     37     # use a tuple of two 2-D arrays)
     38     # adjacency_arr = np.zeros((2, num_classes + 1, num_classes + 1),
     39     #                          dtype=np.uint32)
     40     num_cols_adjacency = num_classes + 1
     41     horizontal_adjacency_arr = np.zeros(
     42         num_cols_adjacency * num_cols_adjacency, dtype=np.uint32)

File /srv/conda/envs/notebook/lib/python3.8/site-packages/transonic/aheadoftime.py:116, in boost(obj, backend, inline, boundscheck, wraparound, cdivision, nonecheck, nogil)
    113 if backend is not None and not isinstance(backend, str):
    114     raise TypeError
--> 116 ts = _get_transonic_calling_module(backend_name=backend)
    118 decor = ts.boost(
    119     inline=inline,
    120     nogil=nogil,
   (...)
    124     nonecheck=nonecheck,
    125 )
    126 if callable(obj) or isinstance(obj, type):

File /srv/conda/envs/notebook/lib/python3.8/site-packages/transonic/aheadoftime.py:90, in _get_transonic_calling_module(backend_name)
     88         ts = Transonic(frame=frame, reuse=False, backend=backend_name)
     89 else:
---> 90     ts = Transonic(frame=frame, reuse=False, backend=backend_name)
     92 return ts

File /srv/conda/envs/notebook/lib/python3.8/site-packages/transonic/aheadoftime.py:316, in Transonic.__init__(self, use_transonified, frame, reuse, backend)
    313     if path_ext_alt.exists():
    314         self.path_extension = path_ext = path_ext_alt
--> 316 self.reload_module_backend(module_backend_name)
    318 if not self.is_transpiled:
    319     logger.warning(
    320         f"Module {path_mod} has not been compiled for "
    321         f"Transonic-{backend.name_capitalized}"
    322     )

File /srv/conda/envs/notebook/lib/python3.8/site-packages/transonic/aheadoftime.py:344, in Transonic.reload_module_backend(self, module_backend_name)
    342     module_backend_name = self.module_backend.__name__
    343 if self.path_extension.exists() and not self.is_compiling:
--> 344     self.module_backend = import_from_path(
    345         self.path_extension, module_backend_name
    346     )
    347 elif self.path_backend.exists():
    348     self.module_backend = import_from_path(
    349         self.path_backend, module_backend_name
    350     )

File /srv/conda/envs/notebook/lib/python3.8/site-packages/transonic/util.py:360, in import_from_path(path, module_name)
    358 # for potential "local imports" in the module
    359 sys.path.insert(0, str(path.parent))
--> 360 module = importlib.util.module_from_spec(spec)
    361 spec.loader.exec_module(module)
    362 # clean sys.path

ImportError: numpy.core.multiarray failed to import

[feature] functionality and tutorial for landscape connectivity analysis

Hello. The package has already provided an excellent presentation of the graph structure of a landscape, mostly for visualization and component/disconnection identification. Is it possible to take one step further, by providing a toolkit to analyze more landscape connectivity indexes? Like dPC, IIC, IF and more?

Feature-request: Enable subtracting polygons from graph

Description: For the policy use-case it would be great if we could subtract a given list of polygons from an existing GeoGraph via a subtraction function with a similar signature as below. By subtracting, I mean any polygon in the graph df has its intersection with the given list of polygons removed. The resulting graph may have fewer nodes than the original.

This would be helpful for building a feature that allows users to evaluate the impact of a land cover change (e.g. removing the given polygons from a habitat) on the habitat. Combining this function with ipyleaflet draw control and the GeoGraphTimeline, it would be relatively simple to allow the user to interactively draw polygons corresponding e.g. to planned infrastructure. Overall, this could be a very powerful feature for policy advisors.

def subtract_pgons_from_graph(graph: geograph.GeoGraph, polygons: List[shapely.Polygon]) -> geograph.GeoGraph:
    """Return graph resulting from subtracting polygons from given graph."""

Note: This feature may be out of scope for pre-report-submission.

Implementation of a baseline

EDIT:
Remaining things to be done:

  • Get ESA CCI data as test data
  • Define region of interest
  • Clip raster to region of interest
  • Separate zones
  • Reproject to suitable CRS
  • Perform pylandstats analysis on timeseries from 1992 to 2015
  • Sanity checks
  • Transfer final pipeline code from jupyter notebooks to python script
  • Finalize plotting procedures and add to python script

Rewrite `merge_classes` to traverse through the graph.

Nice - I agree with your analysis, thank you for clarifying!

Regarding the proposed solution:
Yes I think that works! (:
How intensive is the merge operation? Does the sequential merging take much time? If so, we might be able to optimize it by leveraging the graph structure (forgive the pseudo code):

(1) Creating a set of all nodes with this class label (I'll call it _node_set, _ indicating that it's temporary )
(2) While _node_set is non-empty:
(2.0) Pop the first node in current_node = _node_set.pop()
(2.1) From current_node, do a graph traversal (e.g. bfs) along the nearest neighbors to find all nodes that can be reached from current_node by going through nodes with the same class label. Add those nodes to a list called nodes_to_merge. This will be one cluster of nodes that will be merged into a larger one.
(2.2) Merge all nodes in nodes_to_merge, with the final index of current_node
(2.2) Remove nodes_to_merge from _node_set

This way we'd be doing one "merge" per cluster of nodes (one in the above example), rather than one per neighbors (3 in the above example)

Originally posted by @Croydon-Brixton in #70 (comment)

GeoGaph problem with identical attributes

Description: When loading a GeoGraph from a dataframe that already contains some of the attributes that are automatically added to each node, the graph is not created because of a double key error.

Possible solution: Add all dataframe attributes as single dict node attribute (eg. df_attributes), that would avoid the possibility that the df and geograph internal keys match.

Reproducable test case:

from src.models import geograph
from src.data_loading import test_data

test_gdf = test_data.get_polygon_gdf("chernobyl_squares_touching")
test_gdf['class_label']=0
test_gdf
id geometry area class_label
0 0 POLYGON ((715639.122 5697662.734, 815639.122 5... 1.000000e+10 0
1 1 POLYGON ((815639.122 5697662.734, 915639.122 5... 1.000000e+10 0
graph = geograph.GeoGraph(test_gdf)
Step 1 of 2: Creating nodes and finding neighbours:   0%|          | 0/2 [00:00<?, ?it/s]



---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-10-5c05de071a97> in <module>
----> 1 graph = geograph.GeoGraph(test_gdf)


~/repos/gtc-biodiversity/src/models/geograph.py in __init__(self, data, attributes, graph_save_path, raster_save_path, tolerance, **kwargs)
    136         # Load from dataframe
    137         elif isinstance(data, gpd.GeoDataFrame):
--> 138             self._rtree = self._load_from_dataframe(
    139                 data, attributes, tolerance=self.tolerance
    140             )


~/repos/gtc-biodiversity/src/models/geograph.py in _load_from_dataframe(self, df, attributes, tolerance)
    399             row_attributes = dict(zip(attributes, [row[attr] for attr in attributes]))
    400             # add each polygon as a node to the graph with all attributes
--> 401             self.graph.add_node(
    402                 index,
    403                 rep_point=polygon.representative_point(),


TypeError: add_node() got multiple values for keyword argument 'area'

Self-directed edges for nodes in habitat

Description: it appears that the GeoGraph.add_habitat() method adds edges between nodes and themselves, but we probably don't want this as these edges don't have any meaning.

Example:
In the example here we have 2 nodes and 3 edges:

Screenshot 2021-03-01 at 19 16 12

Create get_diff and related methods for GeoGraphTimeline

Description: for the UI it would be great to have a GeoGraphTimeline class method like

    def get_diff(
        self, start_date: datetime.date(), end_data: datetime.date()
    ) -> Tuple[gpd.GeoDataFrame(), gpd.GeoDataFrame()]:
        """Get the node diff and polygon diff between two dates.

        This method returns the node diff and the polygon diff in the GeoGraphTimeline
        between `start_date` and `end_date`.

        The node diff is returned as a gpd.GeoDataFrame with columns 'id' and 'change',
        where 'id' is the unique node id and 'change' is one of
        ['added','removed','unchanged']. Change here indicates whether a node has been
        added, removed or remained unchanged between the dates. Further there could 
        be a column 'rate_of_change' that indicates for each node how much it has
        changed in percentages with a range [-1,1].

        The polygon diff is returned as a gpd.GeoDataFrame with columns 'id', 'geometry'
        and 'change'. The 'geometry' column contains polygons corresponding to changes
        between the times for node with id as in column 'id'. Column 'change' is one of
        ['added','removed','unchanged']. Each node may have multiple entries for
        different parts that have changed.

        Sidenote: If only years are considered, the 1.1.YEAR should be used for the
        dates within the GeoGraphTimeline class.



        Raises:
            NotImplementedError: [description]
        """

Additionally, it would be great if the GeoGraphTimeline class could have something like a dates attribute that contains a list of all the datetime.date()´s for the different GeoGraph´s in the timeline.

@Croydon-Brixton: does this sound realistic to you, do you think this is implementable? Would you change something about the interface?

Resolve rasterio & fiona dependencies

I suggest we add an issue to resolve this compatibility issue and remember to unpin it again after (or at least give a loose requirement). It comes from a problem in the interplay between fiona and rasterio in the latest rasterio version and might already be fixed now.

Originally posted by @Croydon-Brixton in #70 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.