Design the new VR interface for BAG 2.0. This will replace the existing 'vr' interface

After reading issue <a class="issue-link js-issue-link" data-error-text="Failed to loa

Design a new Variable Resolution (VR) API about bag HOT 15 CLOSED

tdy-blamey commented on August 24, 2024

Design a new Variable Resolution (VR) API

from bag.

Comments (15)

tdy-smacgill commented on August 24, 2024

Part of the question is going to be, API aside, what if anything is going to change in the structure itself. If we're keeping the existing data structure and only updating the API for accessing the data, we're limited in what we can change. There's some convenience possibilities in either case.

To that end, a rough summary of what's currently available for variable-resolution surfaces in the BAG file format:

Data per raster pixel detailing resolutions and offsets for a tile associated with that pixel. Also referred to occasionally as a SuperGrid, this represents the resolution map as a fixed-resolution grid. (NOTE: CARIS variable-resolution surfaces represent resolution maps using a quadtree; exporting to BAG requires subdividing larger tiles to fit the grid.)
For every pixel, an offset into a global array of indexes. The offset is the start of an m*n list of indices for attributes within that pixel's variable-resolution grid.
Global arrays of attributes for variable-resolution nodes.

So, for a brief example: suppose the BAG's resolution is 10m. Data for a particular pixel indicates a resolution of 1m, offsets of 0.5m, and an attribute offset of 564. This means that the pixel represents a 10x10 grid at 1m resolution, where attribute values are drawn from indexes 564-653. The entries from that table can in turn be a no-data entry (indicating a hole in the data) or an index into the global Depth and Uncertainty arrays.

For sake of argument, assume none of that changes.

Convenience support in the API might be to do some work for the caller, allowing a tile to be queried and all useful data returned - the location, the grid size, and the depth/uncertainty arrays for that tile. This would allow slightly more convenient inspections without having to fully understand how all the underlying pieces connect. Is that the kind of thing we might want in the new API?

from bag.

tdy-smacgill commented on August 24, 2024

After reading issue #10 I have a slightly clearer idea of what's going on here.

The main challenge with the variable-resolution refinements is they don't fit into the Dataset and Layer paradigm as cleanly. Resolution and offset can easily be presented as layers, but the refinement values themselves remain more of an open problem. I'm leaning more and more towards simplifying the API to handle read/write by sub-layer - one can identify the grid for a given pixel location, and proceed to populate/read it as needed.

Direct access to the global array, even if it's used internally, might be blocked. The main risk here - albeit one that already exists - is modification. I'd need to review the existing spec to find out what happens if there's an attempt to resize a refinement grid after populating refinements, but my hunch is that it's nothing good. I'm going to go out on a limb and assume there's no plans to allow users to resize the main BAG grid - so should we enforce a workflow of 'define resolutions, then define refinement grids'?

The API might have to include an allocate_refinement_grid(row, column) type call. I'll keep pondering this to figure out what a refinement-grid-centric API could permit or block.

from bag.

tdy-smacgill commented on August 24, 2024

Another thought for consideration, if the goal here is to change the format anyway:

We could change how resolutions are stored. Other formats have had success using a quad-tree approach to variable-resolution tiles, allowing refinement points to represent grids of different sizes from different levels of the tree. The parents of these nodes generally have an intermediate or summary resolution, coarser than the data of the children and downsampling the refinements for a quick overview.

This would allow the following structure for variable-resolution surfaces, at the cost of a potentially larger file size:

The main raster acts as in a regular BAG, gridded at a known resolution. For variable-resolution surfaces, there is a tag to indicate whether this grid represents a summary or a refinement.
'Summary' grids are populated with approximations of the data within at a coarser resolution, and have subdivisions available. For a quad-tree, there are four subdivisions available - we can indicate null/empty entries if needed.
'Refinement' grids are populated with the fully refined data, but otherwise act like any other raster with multiple Layers available as needed.

Visualization of the variable-resolution surface thus allows for multiple levels of detail, ranging from a rough overview to a fully refined amount of data. Validation tools could focus exclusively on the 'Refinement' grids, with 'Summary' grids being used for quick visual checks. With explicit storage of separate grids for the 'Summary' nodes, some tools could even apply specialized downsampling rules like requiring a minimum data coverage.

Converting from the existing VR-BAG format to this one is the only difficulty; while the current format has an explicit overview summary for the entire dataset, along with individual refinement grids, the intermediate nodes would need to be generated at conversion time. This is, however, doable.

Caveat that these are rough notes of what might be possible for a revision of the VR-BAG format; other stakeholders may wish to contribute their thoughts.

from bag.

johnsonst commented on August 24, 2024

I think we would only want to store the coarsest summary resolution (Super Grid) and the final refinement resolution. All levels in between can be generated from the final resolution, so there is no need to store them in the BAG.

from bag.

mduzee commented on August 24, 2024

If the goal is to keep the size of the BAG files as small as possible, then we probably don't want to store all of the summary grids. But if we are interested in performance (for display), then we may want to store the summary grids. This is a similar concept to pyramid levels that are used by some image formats.

We, Caris, can't make the call. Hopefully we can get additional input from other stakeholders.

from bag.

johnsonst commented on August 24, 2024

I am fine with having the ability to store all the refinements but just don't want it to be a requirement.

The use case you mentioned about visualization is one I hadn't considered when I responded. I think a design exists that can accommodate both sets of uses (minimize storage and easiest visualization) .

from bag.

tdy-smacgill commented on August 24, 2024

If we want to leave out intermediate levels, we probably don't need intermediate resolutions either - we can basically say that something is a refinement or it's not. The quadtree could then be skeletal - intermediate nodes having no metadata other than 'which children have data' - or we could look into other structures.

Admittedly I'd have to go digging for anything that would be fitting here. I assume that refinement grids should not overlap, so the structure is space-partitioning. I feel like allowing for some refinement grids to be both coarser (larger space between nodes) and larger (more area covered at a single resolution) would allow more flexibility than the current Supergrid. I also assume we want both easy access to 'resolution at a particular geolocation' and 'total coverage of refinement grids'. As such, some kind of k-d tree or quadtree seems best, but there might be other space partitions that could suit our needs.

from bag.

tdy-smacgill commented on August 24, 2024

After some discussions, I think I see two viable paths to take.

1: Incremental Improvement. The Supergrid approach - a raster of tiles that have refinements - is workable. The main change I'd like to make is to have some kind of support for variable-size tiles; this could be accomplished with some kind of redirect flag. In more detail:
Suppose we have a Supergrid with 32m tiles. Suppose that there's a region of coverage that warrants a 20m resolution. Rather than have a large number of 2x2 refinements, we define a 256m x 256m block (8x8 tiles) as having a single 20m refinement grid. One of the tiles in that area (lower-left?) contains the information pointing to the refinement grid, the resolution, the offset, and the fact that it's an 8x8 section instead of 1x1. The rest of the tiles in the area simply redirect to that one information carrier with some kind of 'duplicate' flag (as opposed to a 'null' flag).
This approach lets us keep most of the existing paradigm, but adds support for larger sections of the dataset to be represented as coarser grids. In particular, this would allow supporting node spacings larger than the SuperGrid spacing.

2: Quadtree Approach: In this approach, we encode a tree of resolution data. Intermediate nodes are frankly secondary; any intermediate resolution is more of a 'resolution to view a summary at' rather than 'actual spacing of data in this area', and can likely be discarded outright. Specifically, the tree consists of three kinds of nodes:

Refinement nodes are the leaves of the tree; each is either NULL, or contains a resolution for an associated refinement grid. These act like the SuperGrid nodes of VR-BAG 1.0.
Intermediate nodes are the bones of the tree; each contains only cursory metadata on which quadrants of the node contain valid data, and possibly a summary of finest/coarsest resolutions contained within.
Summary nodes are optional. These have the data of the Intermediate node, but can also contain a 'rough' refinement grid at an arbitrary resolution. This allows BAGs to be written with viewable portrayals following whatever policies might be preferred - but these portrayals would not be mandatory, so that more space-efficient BAGs can also be written.

One possible use case for the summary nodes is to allow some kind of level policy. A user who wants to do a visual inspection of data coverage could set a policy that holes in data take priority over valid values, create a rendering at some intermediate resolution above the actual refinements (i.e. 25m for 1m to 4m data), and inspect to see where there might be holes in coverage.

Between these two options, the former is much faster to explain and learn, but I feel like the latter might be more robust. That's my personal opinion, though, and while I have an interest I'm not the only stakeholder here.

from bag.

GlenRice-NOAA commented on August 24, 2024

As an exchange format I think the BAG should contain as little redundant data as possible to keep the file sizes at a minimum. If consistent visualization and thus consistent intermediate layers is a requirement, perhaps we should define how the intermediate layers should be derived from the finest resolution or mapped to a specific data structure.

from bag.

tdy-smacgill commented on August 24, 2024

That's a reasonable stance to take. I was thinking of users who have particular visualization requirements at certain map scales, but an exchange format might not be the right place for that kind of approach.

In the name of reducing redundant data, a full quadtree is unnecessary. While having nested nodes can be useful for overviews, if this is meant purely as an exchange format then the expectation is that users can assemble that kind of metadata themselves. This means that directly viewing data within a BAG might be slower as aggregations must be done on the fly, but for an exchange format that's probably going to have to be the expectation.

I do, however, think there's important value to be had in supporting variable-size tiles. If this is going to be an exchange format, then it can't mandate that all resolution tiles be the same size; systems producing different-sized tiles will have to repartition data in a way that can't be reliably reconstructed exactly later on. I can see a few ways to accomplish this:

Unordered List. Allow the format to support any size of tiles, with arbitrary sizes and locations. Simply allow appending a tile to a list with a location, size, and resolution. Advantage is that this is the most flexible with respect to supported exports; any model of resolution map could be contained within. Disadvantage is that there's no index for reading and no controls preventing tiles from overlapping - which makes this the most difficult option for viewers to support.
SuperGrid With Indirection. As proposed earlier, keep the raster of tiles, with a flag to indicate that a tile is a piece of a larger block. This fits best with what is currently done with BAG, which makes it straightforward to implement for viewers. Disadvantage is that it requires some redundancy to be baked in for those larger tiles, with numerous dummy entries included.
Quadtree Node List. Rather than store the entire quadtree, we could simply store the list of Refinement nodes with a tree-index. Assign bits to NW/NE/SW/SE paths, and encode the path to a node. This list could be sorted for easy search/traversal, and a simple list of leaves would be the most space-efficient option. Disadvantages are that this enforces an equal binary partitioning of space, meaning that the full extent is guaranteed to be a power-of-2 multiple of the size of any tile.
Binary Tree Implementation. Rather than an exact quadtree, we could try supporting arbitrary division of space along the X/Y axes. This supports a great deal of flexibility, but might allow non-square refinement grids - which may not fit with current variable-resolution models. The nature of the k-d tree also means that more internals than a simple index key are needed; some representation of the internals will be needed.

I think any of these options could be used to represent a partitioning of space into refinement grids. My preference is option 3, as I've worked with quadtrees before - but once again, other opinions hold sway here.

from bag.

tdy-smacgill commented on August 24, 2024

@GlenRice-NOAA could you add the outcomes of yesterday's call to this issue? I never received an invitation, so I don't know what was decided.

from bag.

GlenRice-NOAA commented on August 24, 2024

Consolidated notes on the various decisions will be forthcoming.

An established time for the meeting was sent on the [email protected] on Oct 1. The agenda was sent to this same address on Oct 11. I understand emails to this is address are forwarded to navsurf_dev by default. There seem to be a few people that did not get these emails, but I am guessing that a spam filter may be to blame since several of those people seem to be at the same organization. We should probably post this kind of information on GitHub in the future to try to work around this kind of issue.

As for the variable resolution format specifically, we established a subgroup to explore storing the variable resolution refinements as arrays representing each refinement tile. From what I recall a summary of the discussion was that this would minimize the code required to reconstitute the refinements and make it easier for other software packages to extract the refinements. Ideally this will also simplify the product specification. The other proposed solutions were discussed but were deemed too tied to visualization requirements rather than simply supporting data exchange, and would be more complicated to define with broad agreement and utility.

Those who agreed (or were nominated by their employer) to be part of the refinement format subgroup included @GlenRice-NOAA, @giumas, @tdy-smacgill, Jeff Adams from Leidos, and Mark Paton from QPS. This group agreed to respond to proposals within 48 hrs such that we would not inhibit progress on the new 2.0 API under the NOAA contract. We also agreed to make decisions unanimously, although no response within 48 hrs was considered a abstention.

from bag.

tdy-smacgill commented on August 24, 2024

Could you confirm that people on [email protected] would have also received that invitation? My understanding was that dev was supposed to receive everything from general, but no one here received that meeting time or agenda.

I would still like to hear details on the decisions.

from bag.

GlenRice-NOAA commented on August 24, 2024

I have confirmed that navsurf_general includes all members of navsurf_dev. If neither email was received I would suspect an email filter of some sort.

I'm sure Brian will get the meeting notes out asap.

from bag.

GlenRice-NOAA commented on August 24, 2024

Updating the VR structure was explored by a We found that changing the structure of the BAG created too much overhead in terms of file size and poor compression. We recommended leaving the structure as is for now. The VR API has been reestablished in the V2 branch.

from bag.

Design a new Variable Resolution (VR) API about bag HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent