Cheap is being expensive

hypertidy

[ ... ]

The hypertidy work plan

Build out a core set of packages for developing projects.

core components

point-in-polygon (see pfft, spacebucket, polyclip)
path-reconstruction-from-edges (dodgr magic)
distance-to and distance-from (best currently is spatstat and sf)
discretization and other forms of coordinate transformation
core input interfaces that are lazy, see lazyraster/vapour and tidync as workings toward a common framework

vapour

check vapour on MacOS
check license / copyright issues for vapour
build sfcore
put core raster logic in discrete

NetCDF

fix group_by for tidync
find a way to cut dimensions (dplyr::tile?)

anglr

integrate TR's quadmesh helpers

silicate

dynamic edges, record original start/end edge with throw-away Steiner points?
inner cascade and semi cascade, names for the concept, activate to start in the middle, formal join-ramp
finish edge and arc-node workers
restructure sc around the gibble concept
consolidate silicate (remove sc/scsf)
release unjoin and gibble for use by the sc family

overall approach

The hypertidy approach to complex data structures aims to bring the goals of the tidyverse to spatial data with this single point of perspective: the only thing that makes geo-spatial data special is the system of coordinate transformations that provide a family of compromises for generating and working with a particular set of space-properties. All the other aspects that get special attention are completely shared by other domains such as graphics, model structures, grid domains and aspects of user-interactivity and ease of use. Further, the tidy principles dictate that the majority of data manipulation and analysis is best handled using database principles and technologies, and "geo-spatial" is no different. No special handling is required, and we believe strongly, current idioms and established practices involving special handling hinder innovation, education and understanding generally. Other fields have essentially solved all of the main problems in data analysis, handling and user-interaction but traditions in other fields are preventing the use of these solutions in optimal ways.

This applies to drawings, GIS vector points, lines, areas, simple features, segment-based linear paths, triangulations and other forms of meshes, and we consider them all to be one of a piecewise linear complex or a simplicial complex, with (this bit is crucial) further levels of organization within and between primitive components.

Recent legacy optimizations made in geo-spatial fields have seen a strong focus on the path, which is an ordered sequence of coordinates, and the path is implicit, defined by "joining the dots" between each coordinate. There is a dual to the path, which is an unordered set of edges (a.k.a. line segments) where each pair of coordinates traversed by an edge is reference implicitly by name.

This provides a set of forms for complex structures that collectively allows generic transformation workflows.

1. Bespoke formats. This is what we have, there are many.
1. Structural vertex-instance set and path-geometry map.
1. Normal form path topology.
1. simplicial complex forms

The last two here include vertex-topology, in that each vertex is a unique coordinate and may be referenced multiple times. 1 is a special case for transition between path based forms

There are several required forms. It's not clear to me that the below is a sequence, for example arc-node is not necessarily a good pathway from 3 to 5, since 4 is an specialization of planar linear forms for polygons or networks, not an required intermediate.

1 and 2 suffer from requiring an implicit or structural order for the sequence of coordinates within a path.

Paths in multiple tables with a form of structure index, a run-length map. This is what sc_coord, sc_path, and sc_object provide.
Relational paths, no structural index and no de-duplication, all that is needed is vertex, path, object.
Relational paths normalized by vertex, requires a path_link_vertex table to link the path and de-duplicated vertices.
Relational paths normalized by vertex and by path. It's probably never worth normalizing a polgonal path, but it is worth it for arc-node models, such as TopoJSON, OSM ways, and the data at the core of the maps package.
Relational directed linear segments, first form treats segments in the way that 2 treat coordinates - no de-duplication, and so the path is record with the segment ID.
Relational undirected linear segments, this second form de-duplicates segments, ignoring their orientation, and requires a new link table between segment and path (that is where the orientatoin could be stored, if it's needed). (This is what TopoJSON does, storing a 1 or -1 for orientation.
Relational triangles, composed of segments.
Relational triangles, forget the segments.
Higher forms?

path

same for polygons and lines and points
involves normalization of vertices, but maybe it should not
object, path, coordinate
combining paths is trivial, possibly is the lowest common denom for merging

path_topological

could involve vertex normalization
arc normalization?
how to record/infer closed paths, probably is explicit?
object, path, coordinate, vertex
combining these is tough, need first to -- expand all coords? -- normalize vertices of separete inputs, then merge?

: arc normalization is problematic I think, does it imply segment normalizatoin first?

: is part of the key here to keep links to the inputs as they were?

segment / 1D primitive

true edge graph
definitely requires node inference and inclusion

triangles / 2D primitive

segment is truly a prerequisite
CGAL seems prone to duplicates and cross over segs

Inputs?

lines and lines, needs noding result is

Lessons from silicore

https://github.com/hypertidy/silicore#the-longer-silicore-story

Lessons from the space bucket

Point in polygon is core.

Determining if a point falls inside a ring is classifying that point with that path. When paths can be nested there needs to be logic for holes, and for multiple classifications - however it's achieved. Obviously the search space can be optimized for multi-points, multi-paths.

In hypertidy/pfft we isolate the conversion of an edge-form to a triangulation and its complementary point-in-path so we can filter out holes and classify triangle instances.

Paths or segments?

We absolutely need both, for intersections we need the triangle filtering/classification. The gibble is a run-length-encoding into the in-order set of coordinate instances, and this is straightforward from native-sf, and also straightforward from the dense vertex set as long as the order can be maintained - otherwise via path-composition from arbitrary edges. I feel that the gibble is invalidated by vertex densification, and probably we don't care if path-composition is trivial, and holes are inherent in triangulations anyway. At any rate, if we use a triangulation for intersection, then it is the dead-end, because it provides information about the SF inputs. Otherwise we leave the inputs behind and use only primitives, then restore as needed.

Extents is core

We need entities that act like a set of bbox/extents - storing only four numbers for each. An SF-form from these is purely on-demand. A raster is the densest, a rectlinear grid is next, and corner-based mesh is next, then the set of quads as a special case of the more general "set of extents".

We use an extents entity to quad-tree optimize things like point-in-polygon classification.

iteration	n subtractions	n dists
1	2M	-
2	M	-
3	M / 2	-
4	M / 4	M / 8

measure	value	percentile	noteworthy
files_R	5	34.7
files_src	14	95.0
files_vignettes	1	68.4
files_tests	5	81.7
loc_R	319	34.2
loc_src	2692	80.5
loc_vignettes	182	46.0
loc_tests	416	71.3
num_vignettes	1	64.8
n_fns_r	45	53.4
n_fns_r_exported	3	12.9
n_fns_r_not_exported	42	62.9
n_fns_src	94	78.5
n_fns_per_file_r	6	73.0
n_fns_per_file_src	7	63.1
num_params_per_fn	4	54.6
loc_per_fn_r	13	39.7
loc_per_fn_r_exp	27	58.8
loc_per_fn_r_not_exp	12	39.1
loc_per_fn_src	27	80.7
rel_whitespace_R	30	51.8
rel_whitespace_src	16	78.3
rel_whitespace_vignettes	12	15.1
rel_whitespace_tests	14	59.8
doclines_per_fn_exp	38	47.0
doclines_per_fn_not_exp	0	0.0	TRUE
fn_call_network_size	201	88.7

name	conclusion	sha	date
pages build and deployment	success	9a223d	2022-01-30
pkgcheck		965524	2022-01-30
pkgdown	success	965524	2022-01-30
R-CMD-check	success	965524	2022-01-30
test-coverage	success	965524	2022-01-30

package	version
pkgstats	0.0.3.88
pkgcheck	0.0.2.227

hypertidy / geodist Goto Github PK

geodist's Introduction

hypertidy

The hypertidy work plan

overall approach

Lessons from silicore

Lessons from the space bucket

Point in polygon is core.

Paths or segments?

Extents is core

geodist's People

Contributors

Stargazers

Watchers

Forkers

geodist's Issues

Checks for geodist (v0.0.7.022)

1. Statistical Properties

1a. Network visualisation

2. goodpractice and other checks

3a. Continuous Integration Badges

3b. goodpractice results

R CMD check with rcmdcheck

Test coverage with covr

Cyclocomplexity with cyclocomp

Static code analyses with lintr

Recommend Projects

Recommend Topics

Recommend Org

2. `goodpractice` and other checks

3b. `goodpractice` results

`R CMD check` with rcmdcheck