Giter Site home page Giter Site logo

rtree's Introduction

rtree


Coverity Scan
Maven Central
codecov

In-memory immutable 2D R-tree implementation in java using RxJava Observables for reactive processing of search results.

Status: released to Maven Central

Note that the next version (without a reactive API and without serialization) is at rtree2.

An R-tree is a commonly used spatial index.

This was fun to make, has an elegant concise algorithm, is thread-safe, fast, and reasonably memory efficient (uses structural sharing).

The algorithm to achieve immutability is cute. For insertion/deletion it involves recursion down to the required leaf node then recursion back up to replace the parent nodes up to the root. The guts of it is in Leaf.java and NonLeaf.java.

Backpressure support required some complexity because effectively a bookmark needed to be kept for a position in the tree and returned to later to continue traversal. An immutable stack containing the node and child index of the path nodes came to the rescue here and recursion was abandoned in favour of looping to prevent stack overflow (unfortunately java doesn't support tail recursion!).

Maven site reports are here including javadoc.

Features

  • immutable R-tree suitable for concurrency
  • Guttman's heuristics (Quadratic splitter) (paper)
  • R*-tree heuristics (paper)
  • Customizable splitter and selector
  • 10x faster index creation with STR bulk loading (paper).
  • search returns Observable
  • search is cancelled by unsubscription
  • search is O(log(n)) on average
  • insert, delete are O(n) worst case
  • all search methods return lazy-evaluated streams offering efficiency and flexibility of functional style including functional composition and concurrency
  • balanced delete
  • uses structural sharing
  • supports backpressure
  • JMH benchmarks
  • visualizer included
  • serialization using FlatBuffers
  • high unit test code coverage
  • R*-tree performs 900,000 searches/second returning 22 entries from a tree of 38,377 Greek earthquake locations on [email protected] (maxChildren=4, minChildren=1). Insert at 240,000 entries per second.
  • requires java 1.6 or later

Number of points = 1000, max children per node 8:

Quadratic split R*-tree split STR bulk loaded

Notice that there is little overlap in the R*-tree split compared to the Quadratic split. This should provide better search performance (and in general benchmarks show this).

STR bulk loaded R-tree has a bit more overlap than R*-tree, which affects the search performance at some extent.

Getting started

Add this maven dependency to your pom.xml:

<dependency>
  <groupId>com.github.davidmoten</groupId>
  <artifactId>rtree</artifactId>
  <version>VERSION_HERE</version>
</dependency>

Instantiate an R-Tree

Use the static builder methods on the RTree class:

// create an R-tree using Quadratic split with max
// children per node 4, min children 2 (the threshold
// at which members are redistributed)
RTree<String, Geometry> tree = RTree.create();

You can specify a few parameters to the builder, including minChildren, maxChildren, splitter, selector:

RTree<String, Geometry> tree = RTree.minChildren(3).maxChildren(6).create();

Geometries

The following geometries are supported for insertion in an RTree:

  • Rectangle
  • Point
  • Circle
  • Line

Generic typing

If for instance you know that the entry geometry is always Point then create an RTree specifying that generic type to gain more type safety:

RTree<String, Point> tree = RTree.create();

R*-tree

If you'd like an R*-tree (which uses a topological splitter on minimal margin, overlap area and area and a selector combination of minimal area increase, minimal overlap, and area):

RTree<String, Geometry> tree = RTree.star().maxChildren(6).create();

See benchmarks below for some of the performance differences.

Add items to the R-tree

When you add an item to the R-tree you need to provide a geometry that represents the 2D physical location or extension of the item. The Geometries builder provides these factory methods:

  • Geometries.rectangle
  • Geometries.circle
  • Geometries.point
  • Geometries.line (requires jts-core dependency)

To add an item to an R-tree:

RTree<T,Geometry> tree = RTree.create();
tree = tree.add(item, Geometries.point(10,20));

or

tree = tree.add(Entries.entry(item, Geometries.point(10,20));

Important note: being an immutable data structure, calling tree.add(item, geometry) does nothing to tree, it returns a new RTree containing the addition. Make sure you use the result of the add!

Remove an item in the R-tree

To remove an item from an R-tree, you need to match the item and its geometry:

tree = tree.delete(item, Geometries.point(10,20));

or

tree = tree.delete(entry);

Important note: being an immutable data structure, calling tree.delete(item, geometry) does nothing to tree, it returns a new RTree without the deleted item. Make sure you use the result of the delete!

Geospatial geometries (lats and longs)

To handle wraparounds of longitude values on the earth (180/-180 boundary trickiness) there are special factory methods in the Geometries class. If you want to do geospatial searches then you should use these methods to build Points and Rectangles:

Point point = Geometries.pointGeographic(lon, lat);
Rectangle rectangle = Geometries.rectangleGeographic(lon1, lat1, lon2, lat2);

Under the covers these methods normalize the longitude value to be in the interval [-180, 180) and for rectangles the rightmost longitude has 360 added to it if it is less than the leftmost longitude.

Custom geometries

You can also write your own implementation of Geometry. An implementation of Geometry needs to specify methods to:

  • check intersection with a rectangle (you can reuse the distance method here if you want but it might affect performance)
  • provide a minimum bounding rectangle
  • implement equals and hashCode for consistent equality checking
  • measure distance to a rectangle (0 means they intersect). Note that this method is only used for search within a distance so implementing this method is optional. If you don't want to implement this method just throw a RuntimeException.

For the R-tree to be well-behaved, the distance function if implemented needs to satisfy these properties:

  • distance(r) >= 0 for all rectangles r
  • if rectangle r1 contains r2 then distance(r1)<=distance(r2)
  • distance(r) = 0 if and only if the geometry intersects the rectangle r

Searching

The advantage of an R-tree is the ability to search for items in a region reasonably quickly. On average search is O(log(n)) but worst case is O(n).

Search methods return Observable sequences:

Observable<Entry<T, Geometry>> results =
    tree.search(Geometries.rectangle(0,0,2,2));

or search for items within a distance from the given geometry:

Observable<Entry<T, Geometry>> results =
    tree.search(Geometries.rectangle(0,0,2,2),5.0);

To return all entries from an R-tree:

Observable<Entry<T, Geometry>> results = tree.entries();

Search with a custom geometry

Suppose you make a custom geometry like Polygon and you want to search an RTree<String,Point> for points inside the polygon. This is how you do it:

RTree<String, Point> tree = RTree.create();
Func2<Point, Polygon, Boolean> pointInPolygon = ...
Polygon polygon = ...
...
entries = tree.search(polygon, pointInPolygon);

The key is that you need to supply the intersects function (pointInPolygon) to the search. It is on you to implement that for all types of geometry present in the RTree. This is one reason that the generic Geometry type was added in rtree 0.5 (so the type system could tell you what geometry types you needed to calculate intersection for) .

Search with a custom geometry and maxDistance

As per the example above to do a proximity search you need to specify how to calculate distance between the geometry you are searching and the entry geometries:

RTree<String, Point> tree = RTree.create();
Func2<Point, Polygon, Boolean> distancePointToPolygon = ...
Polygon polygon = ...
...
entries = tree.search(polygon, 10, distancePointToPolygon);

Example

import com.github.davidmoten.rtree.RTree;
import static com.github.davidmoten.rtree.geometry.Geometries.*;

RTree<String, Point> tree = RTree.maxChildren(5).create();
tree = tree.add("DAVE", point(10, 20))
           .add("FRED", point(12, 25))
           .add("MARY", point(97, 125));
 
Observable<Entry<String, Point>> entries =
    tree.search(Geometries.rectangle(8, 15, 30, 35));

Searching by distance on lat longs

See LatLongExampleTest.java for an example. The example depends on grumpy-core artifact which is also on Maven Central.

Another lat long example searching geo circles

See LatLongExampleTest.testSearchLatLongCircles() for an example of searching circles around geographic points (using great circle distance).

What do I do with the Observable thing?

Very useful, see RxJava.

As an example, suppose you want to filter the search results then apply a function on each and reduce to some best answer:

import rx.Observable;
import rx.functions.*;
import rx.schedulers.Schedulers;

Character result = 
    tree.search(Geometries.rectangle(8, 15, 30, 35))
        // filter for names alphabetically less than M
        .filter(entry -> entry.value() < "M")
        // get the first character of the name
        .map(entry -> entry.value().charAt(0))
        // reduce to the first character alphabetically 
        .reduce((x,y) -> x <= y ? x : y)
        // subscribe to the stream and block for the result
        .toBlocking().single();
System.out.println(list);

output:

D

How to configure the R-tree for best performance

Check out the benchmarks below and refer to another benchmark results, but I recommend you do your own benchmarks because every data set will behave differently. If you don't want to benchmark then use the defaults. General rules based on the benchmarks:

  • for data sets of <10,000 entries use the default R-tree (quadratic splitter with maxChildren=4)
  • for data sets of >=10,000 entries use the star R-tree (R*-tree heuristics with maxChildren=4 by default)
  • use STR bulk loaded R-tree (quadratic splitter or R*-tree heuristics) for large (where index creation time is important) or static (where insertion and deletion are not frequent) data sets

Watch out though, the benchmark data sets had quite specific characteristics. The 1000 entry dataset was randomly generated (so is more or less uniformly distributed) and the Greek dataset was earthquake data with its own clustering characteristics.

What about memory use?

To minimize memory use you can use geometries that store single precision decimal values (float) instead of double precision (double). Here are examples:

// create geometry using double precision 
Rectangle r = Geometries.rectangle(1.0, 2.0, 3.0, 4.0);

// create geometry using single precision
Rectangle r = Geometries.rectangle(1.0f, 2.0f, 3.0f, 4.0f);

The same creation methods exist for Circle and Line.

How do I just get an Iterable back from a search?

If you are not familiar with the Observable API and want to skip the reactive stuff then here's how to get an Iterable from a search:

Iterable<T> it = tree.search(Geometries.point(4,5))
                     .toBlocking().toIterable();

Backpressure

The backpressure slow path may be enabled by some RxJava operators. This may slow search performance by a factor of 3 but avoids possible out of memory errors and thread starvation due to asynchronous buffering. Backpressure is benchmarked below.

Visualizer

To visualize the R-tree in a PNG file of size 600 by 600 pixels just call:

tree.visualize(600,600)
    .save("target/mytree.png");

The result is like the images in the Features section above.

Visualize as text

The RTree.asString() method returns output like this:

mbr=Rectangle [x1=10.0, y1=4.0, x2=62.0, y2=85.0]
  mbr=Rectangle [x1=28.0, y1=4.0, x2=34.0, y2=85.0]
    entry=Entry [value=2, geometry=Point [x=29.0, y=4.0]]
    entry=Entry [value=1, geometry=Point [x=28.0, y=19.0]]
    entry=Entry [value=4, geometry=Point [x=34.0, y=85.0]]
  mbr=Rectangle [x1=10.0, y1=45.0, x2=62.0, y2=63.0]
    entry=Entry [value=5, geometry=Point [x=62.0, y=45.0]]
    entry=Entry [value=3, geometry=Point [x=10.0, y=63.0]]

Serialization

Release 0.8 includes flatbuffers support as a serialization format and as a lower performance but lower memory consumption (approximately one third) option for an RTree.

The greek earthquake data (38,377 entries) when placed in a default RTree with maxChildren=10 takes up 4,548,133 bytes in memory. If that data is serialized then reloaded into memory using the InternalStructure.FLATBUFFERS_SINGLE_ARRAY option then the RTree takes up 1,431,772 bytes in memory (approximately one third the memory usage). Bear in mind though that searches are much more expensive (at the moment) with this data structure because of object creation and gc pressures (see benchmarks). Further work would be to enable direct searching of the underlying array without object creation expenses required to match the current search routines.

As of 5 March 2016, indicative RTree metrics using flatbuffers data structure are:

  • one third the memory use with log(N) object creations per search
  • one third the speed with backpressure (e.g. if flatMap or observeOn is downstream)
  • one tenth the speed without backpressure

Note that serialization uses an optional dependency on flatbuffers. Add the following to your pom dependencies:

<dependency>
    <groupId>com.google.flatbuffers</groupId>
    <artifactId>flatbuffers-java</artifactId>
    <version>2.0.3</version>
    <optional>true</optional>
</dependency>

Serialization example

Write an RTree to an OutputStream:

RTree<String, Point> tree = ...;
OutputStream os = ...;
Serializer<String, Point> serializer = 
  Serializers.flatBuffers().utf8();
serializer.write(tree, os); 

Read an RTree from an InputStream into a low-memory flatbuffers based structure:

RTree<String, Point> tree = 
  serializer.read(is, lengthBytes, InternalStructure.SINGLE_ARRAY);

Read an RTree from an InputStream into a default structure:

RTree<String, Point> tree = 
  serializer.read(is, lengthBytes, InternalStructure.DEFAULT);

Dependencies

As of 0.7.5 this library does not depend on guava (>2M) but rather depends on guava-mini (11K). The nearest search used to depend on MinMaxPriorityQueue from guava but now uses a backport of Java 8 PriorityQueue inside a custom BoundedPriorityQueue class that gives about 1.7x the throughput as the guava class.

How to build

git clone https://github.com/davidmoten/rtree.git
cd rtree
mvn clean install

How to run benchmarks

Benchmarks are provided by

mvn clean install -Pbenchmark

Coverity scan

This codebase is scanned by Coverity scan whenever the branch coverity_scan is updated.

For the project committers if a coverity scan is desired just do this:

git checkout coverity_scan
git pull origin master
git push origin coverity_scan

Notes

The Greek data referred to in the benchmarks is a collection of some 38,377 entries corresponding to the epicentres of earthquakes in Greece between 1964 and 2000. This data set is used by multiple studies on R-trees as a test case.

Results

These were run on i7-920 @2.67GHz with rtree version 0.8-RC7:

Benchmark                                                               Mode  Cnt        Score       Error  Units

defaultRTreeInsertOneEntryInto1000EntriesMaxChildren004                thrpt   10   262260.993 ±  2767.035  ops/s
defaultRTreeInsertOneEntryInto1000EntriesMaxChildren010                thrpt   10   296264.913 ±  2836.358  ops/s
defaultRTreeInsertOneEntryInto1000EntriesMaxChildren032                thrpt   10   135118.271 ±  1722.039  ops/s
defaultRTreeInsertOneEntryInto1000EntriesMaxChildren128                thrpt   10   315851.452 ±  3097.496  ops/s
defaultRTreeInsertOneEntryIntoGreekDataEntriesMaxChildren004           thrpt   10   278761.674 ±  4182.761  ops/s
defaultRTreeInsertOneEntryIntoGreekDataEntriesMaxChildren010           thrpt   10   315254.478 ±  4104.206  ops/s
defaultRTreeInsertOneEntryIntoGreekDataEntriesMaxChildren032           thrpt   10   214509.476 ±  1555.816  ops/s
defaultRTreeInsertOneEntryIntoGreekDataEntriesMaxChildren128           thrpt   10   118094.486 ±  1118.983  ops/s
defaultRTreeSearchOf1000PointsMaxChildren004                           thrpt   10  1122140.598 ±  8509.106  ops/s
defaultRTreeSearchOf1000PointsMaxChildren010                           thrpt   10   569779.807 ±  4206.544  ops/s
defaultRTreeSearchOf1000PointsMaxChildren032                           thrpt   10   238251.898 ±  3916.281  ops/s
defaultRTreeSearchOf1000PointsMaxChildren128                           thrpt   10   702437.901 ±  5108.786  ops/s
defaultRTreeSearchOfGreekDataPointsMaxChildren004                      thrpt   10   462243.509 ±  7076.045  ops/s
defaultRTreeSearchOfGreekDataPointsMaxChildren010                      thrpt   10   326395.724 ±  1699.043  ops/s
defaultRTreeSearchOfGreekDataPointsMaxChildren032                      thrpt   10   156978.822 ±  1993.372  ops/s
defaultRTreeSearchOfGreekDataPointsMaxChildren128                      thrpt   10    68267.160 ±   929.236  ops/s
rStarTreeDeleteOneEveryOccurrenceFromGreekDataChildren010              thrpt   10   211881.061 ±  3246.693  ops/s
rStarTreeInsertOneEntryInto1000EntriesMaxChildren004                   thrpt   10   187062.089 ±  3005.413  ops/s
rStarTreeInsertOneEntryInto1000EntriesMaxChildren010                   thrpt   10   186767.045 ±  2291.196  ops/s
rStarTreeInsertOneEntryInto1000EntriesMaxChildren032                   thrpt   10    37940.625 ±   743.789  ops/s
rStarTreeInsertOneEntryInto1000EntriesMaxChildren128                   thrpt   10   151897.089 ±   674.941  ops/s
rStarTreeInsertOneEntryIntoGreekDataEntriesMaxChildren004              thrpt   10   237708.825 ±  1644.611  ops/s
rStarTreeInsertOneEntryIntoGreekDataEntriesMaxChildren010              thrpt   10   229577.905 ±  4234.760  ops/s
rStarTreeInsertOneEntryIntoGreekDataEntriesMaxChildren032              thrpt   10    78290.971 ±   393.030  ops/s
rStarTreeInsertOneEntryIntoGreekDataEntriesMaxChildren128              thrpt   10     6521.010 ±    50.798  ops/s
rStarTreeSearchOf1000PointsMaxChildren004                              thrpt   10  1330510.951 ± 18289.410  ops/s
rStarTreeSearchOf1000PointsMaxChildren010                              thrpt   10  1204347.202 ± 17403.105  ops/s
rStarTreeSearchOf1000PointsMaxChildren032                              thrpt   10   576765.468 ±  8909.880  ops/s
rStarTreeSearchOf1000PointsMaxChildren128                              thrpt   10  1028316.856 ± 13747.282  ops/s
rStarTreeSearchOfGreekDataPointsMaxChildren004                         thrpt   10   904494.751 ± 15640.005  ops/s
rStarTreeSearchOfGreekDataPointsMaxChildren010                         thrpt   10   649636.969 ± 16383.786  ops/s
rStarTreeSearchOfGreekDataPointsMaxChildren010FlatBuffers              thrpt   10    84230.053 ±  1869.345  ops/s
rStarTreeSearchOfGreekDataPointsMaxChildren010FlatBuffersBackpressure  thrpt   10    36420.500 ±  1572.298  ops/s
rStarTreeSearchOfGreekDataPointsMaxChildren010WithBackpressure         thrpt   10   116970.445 ±  1955.659  ops/s
rStarTreeSearchOfGreekDataPointsMaxChildren032                         thrpt   10   224874.016 ± 14462.325  ops/s
rStarTreeSearchOfGreekDataPointsMaxChildren128                         thrpt   10   358636.637 ±  4886.459  ops/s
searchNearestGreek                                                     thrpt   10     3715.020 ±    46.570  ops/s

There is a related project rtree-benchmark that presents a more comprehensive benchmark with results and analysis on this rtree implementation.

rtree's People

Contributors

a10y avatar ambling avatar balazsjonas avatar bryant1410 avatar davidmoten avatar dependabot[bot] avatar empirephoenix avatar killme avatar maxamel avatar vlaskinvlad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rtree's Issues

[Question] How can I improve performance ?

Hi David,

So I've been playing with the benchmarks changing some stuff to comply with my use case and see how this goes:

  • As you did with the greek points, i created entries composed of circles (geocircles, most of which have a radius of 0.1 km) . My data set contains 85k geopoints + radius.
  • In the search benchmarks, instead of searching a rectangle, i'm searching a point ("is that point in any of the circles ?").

Well, the bench ran for about an hour and the results were really bad: I barely reach 100 ops / sec whatever the maxChildren I used.

When I take a look at the visualization, it seems a bit weird:
starm4:
starm4

defaultm4:
defaultm4

Maybe It will mean something to you but it looks nothing like what you're showing on your main page.

So I tried replacing my circles by rectangles (i did that really quick just using the circle's center from my data and creating a square by adding +0.5 to the lat and lon). Now i'm starting to get decent results: 5k / ops. I also tried increasing the rectangle size (+1.0 instead of +0.5) and got down to 4k ops / sec.

The visualisation seems a bit more coherent there:
starM4:
rect-starm4

defaultM4:
rect-defaultm4

So here are the things I'd like to ask:

  • Are circles that bad ?
  • When using Geometries.circle(x, y, radius) can you confirm that x is the lon and y is the lat (provided they've been normalized beforehand) ? Also what is the unit of the radius ? I assumed it was km but now that i'm looking at it, it seems it's not and it would be more of a x-y variation value.

It feels like I'm doing something wrong.

Just to be clear. my ultimate goal is to find the best compromise between precision and speed to search all 85k (but i'm targeting 1m) geocircles that matches a specific geopoint.

Thanks for your insight on the matter.

Possible issue with SplitterRStar

I've been checking implementation and found following issue with method SplitterRStar
.getPairs - it is expected to generate possible partitionings of list of nodes where each partition has at least minSize items. However in the implementation 'last' partitioning is not checked - for example when there are 9 items in the list, and minSize is 3, possible partitionings should be: 3-6,4-5,5-4,6-3; but current implementation does not generate last one.

I'm not sure if it is a bug or expected behavior as I havent read original paper and I 'like' more picture generated by the original algorithm (without 'fixing this bug'). However if it is a bug then for-loop in SplitterRStar.getPairs should be changed to

for (int i = minSize; i < list.size() - minSize + 1; i++) 

Deserialization bug

Using 0.8-RC7, I've seen the following problem when trying to deserialize a tree:

Caused by: java.lang.RuntimeException: unexpected
    at com.github.davidmoten.rtree.fbs.SerializerFlatBuffers.readFully(SerializerFlatBuffers.java:165)
    at com.github.davidmoten.rtree.fbs.SerializerFlatBuffers.read(SerializerFlatBuffers.java:123)

The error occurs here:

    @VisibleForTesting
    static byte[] readFully(InputStream is, int numBytes) throws IOException {
        byte[] b = new byte[numBytes];
        int n = is.read(b);
        if (n != numBytes)
            throw new RuntimeException("unexpected");
        return b;
    }

I believe the contract of InputStream.read(byte[]) does not always do full reads, there should be some form of loop here to retry. I've seen such errors before.

Do let me know if I'm missing something.

Bulk loading (with STR)

I have a rather large set of entries (>= 300k) and I would like to somewhat improve the performance of the tree. The data is available beforehand so I think that bulk loading might help, in particular the "Sort-Tile-Recursive" strategy. I think this could be implemented fairly easily.

RTree is not serializable

Hi - it would be awesome if this RTree implementation supports serialization. Now it can take a long time to index very large data sets.

Rtree and R*tree return inconsistent results

Hi there. I've been looking for a Thread safe RTree implementation and yours is the only one I could find. However when I tried to use it in my software it failed to work. Upon further investigation I've found the R tree and R*tree appear to return inconsistent results.
Here is my test code:

import java.util.HashSet;

import java.util.Set;

import java.util.UUID;

import rx.Observable;

import com.github.davidmoten.rtree.Entry;

import com.github.davidmoten.rtree.RTree;

import com.github.davidmoten.rtree.geometry.Point;

import com.github.davidmoten.rtree.geometry.Rectangle;

public class RTreeTest {

   public static void main(String[] args) {

          RTree<UUID> tree1 = RTree.create();

          RTree<UUID> tree2 = RTree.star().create();



          Rectangle[] testRects = {

                 Rectangle.create(0, 0, 0, 0),

                 Rectangle.create(0, 0, 100, 100),

                 Rectangle.create(0, 0, 10, 10),

                 Rectangle.create(0.12, 0.25, 50.356, 50.756),

                 Rectangle.create(1, 0.252, 50, 69.23),

                 Rectangle.create(13.12, 23.123, 50.45, 80.9),

                 Rectangle.create(10, 10, 50, 50)

          };



          for (int i = 0 ; i < 50000 ; i++) {

                 Point point = nextPoint();

                 UUID randomUUID = UUID.randomUUID();

                 tree1 = tree1.add(randomUUID, point);

                 tree2 = tree2.add(randomUUID, point);

          }



          for (Rectangle r : testRects) {

                 Set<UUID> res1 = new HashSet<>();

                 Set<UUID> res2 = new HashSet<>();

                 Observable<Entry<UUID>> search1 = tree1.search(r);

                 Observable<Entry<UUID>> search2 = tree2.search(r);



                 search1.toBlocking().toIterable().forEach(u -> res1.add(u.value()));

                 search2.toBlocking().toIterable().forEach(u -> res2.add(u.value()));



                 System.out.println(r);

                 System.out.println(res1.size());

                 System.out.println(res2.size());

                 System.out.println("Trees found equal sets of stuff? " + res1.equals(res2) + "\n");

          }

   }



   private static Point nextPoint() {

          double randomX = Math.random() * 100;

          double randomY = Math.random() * 100;

          return Point.create(randomX, randomY);

   }

}

The trees should return the same set of Uuids for each test search area, but they don't on a typical run.
What am I doing wrong?

Thanks for the awesome library!

I write this issue not for reporting bug.
I'm using davidmoten/rtree for big,large map world for mmo SLG, AOI (area of interect) is very important feature for this kind of game.
I find a lot of r-tree libs for java, davidmoten is the best, not one of.
Most of all, davidmoten/rtree is very easy to make a dynamical rtree, every entity in the game world, could change their position if they want.
Thanks davidmoten.

Unable to serialize large R*Tree using FlatBuffer

I was trying to Serialize a large R*Tree using the APIs in this Library, but it failed with error :

Exception in thread "main" java.lang.AssertionError: FlatBuffers: cannot grow buffer beyond 2 gigabytes.
at com.google.flatbuffers.FlatBufferBuilder.growByteBuffer(FlatBufferBuilder.java:127)
at com.google.flatbuffers.FlatBufferBuilder.prep(FlatBufferBuilder.java:173)
at com.google.flatbuffers.FlatBufferBuilder.addOffset(FlatBufferBuilder.java:291)
at com.google.flatbuffers.FlatBufferBuilder.addOffset(FlatBufferBuilder.java:562)
at com.github.davidmoten.rtree.fbs.generated.Entry_.addObject(Entry_.java:33)
at com.github.davidmoten.rtree.fbs.generated.Entry_.createEntry_(Entry_.java:26)
at com.github.davidmoten.rtree.fbs.FlatBuffersHelper.addEntries(FlatBuffersHelper.java:77)
at com.github.davidmoten.rtree.fbs.SerializerFlatBuffers.addNode(SerializerFlatBuffers.java:96)
at com.github.davidmoten.rtree.fbs.SerializerFlatBuffers.addNode(SerializerFlatBuffers.java:102)
at com.github.davidmoten.rtree.fbs.SerializerFlatBuffers.addNode(SerializerFlatBuffers.java:102)
at com.github.davidmoten.rtree.fbs.SerializerFlatBuffers.addNode(SerializerFlatBuffers.java:102)
at com.github.davidmoten.rtree.fbs.SerializerFlatBuffers.addNode(SerializerFlatBuffers.java:102)
at com.github.davidmoten.rtree.fbs.SerializerFlatBuffers.addNode(SerializerFlatBuffers.java:102)
at com.github.davidmoten.rtree.fbs.SerializerFlatBuffers.addNode(SerializerFlatBuffers.java:102)
at com.github.davidmoten.rtree.fbs.SerializerFlatBuffers.addNode(SerializerFlatBuffers.java:102)
at com.github.davidmoten.rtree.fbs.SerializerFlatBuffers.addNode(SerializerFlatBuffers.java:102)
at com.github.davidmoten.rtree.fbs.SerializerFlatBuffers.write(SerializerFlatBuffers.java:73)

Seems like FlatBuffers can't serialize large structures with size more than 2GB? Am I mistaken here?
Does this implementation support any other serialization apart from FlatBuffer?
I saw few test cases for Kryo Serialization but don't think if the current structure supports this.
Please guide.

Longitude -180 normalized to 180

I encountered a problem using the R-Tree and the root cause appeared to be that Geometries.normalizeLongitude was adjusting -180 to 180.

That seems to contradict the documentation, which indicates that longitudes are normalized to [-180, 180).

Thanks!

Rectangle efficiency

I'm creating a large RTree with about 10M points. I wanted to use a flyweight pattern for the geometry, but alas I can't because 1) Util.mbr() creates a new Rectangle instance instead of using my geometry directly, and 2) Rectangle is a class not an interface with an implementation.

Why does this matter? The Rectangle instances take 600MB of my heap whereas the underlying (point) data takes less than 200MB. If Rectangle was an interface with a default implementation, and if Util could use it directly when the items collection only has 1 item in it, that could have a big performance gain in terms of memory. Done right, it wouldn't hurt speed in any significant way.

Benchmark fails with Cannot parse argument '-i' of option ['f']

Just trying to launch benchmarks with mvn clean install -Pbenchmark and I get the following error:

[INFO] --- exec-maven-plugin:1.3.2:exec (run-benchmarks) @ rtree ---
Error parsing command line:
 Cannot parse argument '-i' of option ['f']

I'm using maven 3.3.9.
Thanks :)

Geometry type classes are final

It's not a real issue, but the problem is that class names like Rectangle and Point collide with other class names, e.g. java.awt.Rectangle, so I'm stuck using full class names. Would that be possible to remove the final modifier from those?

groupBy hangs (0.7.5)

Hello, after last update groupBy is not working anymore,
here is the test for it

    @Test(timeout = 1000)
    public void testGroupBy() {
        RTree<Integer, Geometry> tree = RTree.star().create();

        tree = tree.add(1, Geometries.point(13.0, 52.0));
        tree = tree.add(2, Geometries.point(13.0, 52.0));
        tree = tree.add(3, Geometries.point(13.0, 52.0));
        tree = tree.add(4, Geometries.point(13.0, 52.0));
        tree = tree.add(5, Geometries.point(13.0, 52.0));
        tree = tree.add(6, Geometries.point(13.0, 52.0));

        Rectangle rectangle = Geometries.rectangle(12.9, 51.9, 13.1, 52.1);
        assertEquals(Integer.valueOf(6), tree.search(rectangle).count().toBlocking().single());
        assertEquals(Integer.valueOf(2), tree.search(rectangle)
                .groupBy(new Func1<Entry<Integer, Geometry>, Boolean>() {
                    @Override
                    public Boolean call(Entry<Integer, Geometry> entry) {
                        return entry.value() % 2 == 0;
                    }
                })
                .flatMap(new Func1<GroupedObservable<Boolean, Entry<Integer, Geometry>>, Observable<Integer>>() {
                    @Override
                    public Observable<Integer> call(GroupedObservable<Boolean, Entry<Integer, Geometry>> group) {
                        return group.count();
                    }
                })
                .count()
                .toBlocking()
                .single());
    }

type error of geometry.line in fbs schema file

I think this is an error rather than an intensional type change:

table Geometry_ {
...
line: Box_; // should be Line_
}

And also the type of the argument in FlatBuffersHelper.createLine

slowly in add

I try to add 1024 * 1024 node, in rtree
scala code:

var tree1: RTree[String, Geometry] = RTree.star().maxChildren(1024 * 1024).create()
val rects = ArrayBuffer[Entry[String, Geometry]]()
(0 until 1024).foreach { i =>
  (0 until 512).foreach { ii =>
    rects += new Entry((i*ii).toString, Rectangle.create(i * 4f, ii * 4f, (i + 1) * 4f, (ii + 1) * 4f))
  }
}
tree1 = tree1.add(rects.toIterable.asJava)

it take a lot of time, any way to speed it up???

What if i want to do the K-NN request without getting result in a time?

Hello, davidmoten
Thanks for reading my questions.
I want to do the K-NN query to a MBR but i don't know how much will k be. I want to get the result set incrementally. For example.It's a while loop. In each round, i want to get 5 more K-NN MBRs. What can i do to reach this goal in your code?

Looking forward to your reply.
Thanks very much!

BTW, your code really helped me a lot! Thank you very much!

asString() print nothing

I use 0.8-RC9 release. Here is a simple test code:

`public class ProcessFilesClient {
public static void main( String[] args ) {
RTree<String, Geometry> tree = RTree.minChildren(2).maxChildren(4).star().create();
System.out.println(tree.toString());

	tree.add("a", Geometries.point(2, 3));
	tree.add("b", Geometries.point(3, 8));
	tree.add("c", Geometries.point(3, 5));
	tree.add("f", Geometries.point(9, 17));
	tree.add("g", Geometries.point(87, 56));
	tree.add("j", Geometries.point(86, 54));
	tree.add("u", Geometries.point(89, 59));
	tree.add("p", Geometries.point(81, 50));
	System.out.println("str: " + tree.asString());

}

}`

but I print out:

'com.github.davidmoten.rtree.RTree@77459877
str: '

What is the problem? is there any dependencies I did not have?

Tree hangs when finding the nearest

I have the following example:

import com.github.davidmoten.rtree.RTree;
import com.github.davidmoten.rtree.geometry.Geometries;
import com.github.davidmoten.rtree.geometry.Line;

public class TreeExample {

    public static void main(String[] args) {

        RTree<String, Line> tree = RTree.star().create();

        for(int i = 0; i < 5;++i) {
            tree = tree.add(String.format("Hello %d", i), Geometries.line(-i, -i, 5 + i, i));
        }

        System.out.println(tree.nearest(Geometries.point(2, 0.4), Double.MAX_VALUE, 1)
                .toBlocking().single().value());

    }
}

Unfortunately the program hangs and I never get any output. Any ideas what is causing this?

I am using the following version:

<dependency>
  <groupId>com.github.davidmoten</groupId>
  <artifactId>rtree</artifactId>
  <version>0.7.2</version>
</dependency>
<dependency>
  <groupId>com.vividsolutions</groupId>
  <artifactId>jts-core</artifactId>
  <version>1.14.0</version>
</dependency>

RTree creation API might seem a bit contraintuitive

Hello,
First of all thanks for this library, it is very useful and overall well done. I just got stuck with it for some minutes, not realizing what was wrong in my code. I finally realized that the RTree object is returning a new copy of itself when adding a point. This makes sense with the concept of immutable RTree, and I recognize I should have paid more attention when reading the examples. However I think this API is contraintuitive - If the RTree is immutable then why does it have an add() method? - and might lead to the same mistake by other people. What I would suggest as an improved API is to make an RTreeBuilder, which would be in charge of receiving all the points and then create an immutable RTree, which wouldn't have the method add(). If online modifications to the RTree are needed maybe they can be provided via static methods of it, e.g. RTreeBuilder.delete(tree, point): RTree

Visualizer does not render points

I am trying to write a program to visualize the data points read from a file. However, rtree.visualize().save() does not render the data points as shown in the README example.

why need it so long time to create R tree?

code are in rtree/src/test/java/com/github/davidmoten/rtree/GreekEarthquakes.java

public static void main(String[] args) throws InterruptedException {
    RTree<Object, Point> tree = RTree.star().create();
    tree = tree.add(entries()).last().toBlocking().single();
    System.gc();
    Thread.sleep(10000000);  //???   I don't bear it
    System.out.println(tree.size());
}

10000000 > 2 hours.
Only 38377 rows == requires so long time??

      I don't understand.

Rtree Entries

is there any solution how to add entries with this manner
<id,String, Rtree> like (1,david,(-1,3))

`rtree.nearest` documentation wrong?

Looking at the implementation of rtree.nearest, it states that it returns objects in no particular order. However, it looks like the comparator that it uses is called ascendingDistance... I haven't actually tested this of course, but was wondering.

Does this code support k-nn query?

Thanks for your sharing.
I have many points which records latitude and longitude. I want to know whether this code support K-nn query for each point?
I don't read relevant message except

As of 0.7.5 this library does not depend on guava (>2M) but rather depends on guava-mini (11K). The nearest search used to depend on MinMaxPriorityQueue from guava but now uses a backport of Java 8 PriorityQueue inside a custom BoundedPriorityQueue class that gives about 1.7x the throughput as the guava class.

But there is no specific example for k-nn query.
Thanks so much.

Maven vs. Gradle

Hi David,
Nowadays I see more and more projects migrating from maven/ant to gradle.
Are there any plans of such a migration on your side?

Thanks.

Max.

Util.mbr() issues with negative numbers such as negative longitudes

Float.MIN_VALUE is "the smallest positive nonzero value of type float". This causes issues with negative values. In particular, I noticed in my tests that a negative longitude value was resulting in a maxX2 (and resulting Rectangle x2) value of 1.4E-45.

Please consider changing from:

float maxX2 = Float.MIN_VALUE;
float maxY2 = Float.MIN_VALUE;

to:

float maxX2 = -Float.MAX_VALUE;
float maxY2 = -Float.MAX_VALUE;

400000 geographic data bring rtree error

public class LatLongExampleTest { (I read from a.txt)
400000 data geographic data
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.github.davidmoten.rtree.internal.Util.replace(Util.java:78)
at com.github.davidmoten.rtree.internal.NonLeafHelper.add(NonLeafHelper.java:48)
at com.github.davidmoten.rtree.internal.NonLeafDefault.add(NonLeafDefault.java:47)
at com.github.davidmoten.rtree.RTree.add(RTree.java:320)
at com.github.davidmoten.rtree.RTree.add(RTree.java:346)
at com.github.davidmoten.rtree.Utilities.entries400000(Utilities.java:59)
at com.github.davidmoten.rtree.RTreeTest.main(RTreeTest.java:1067)

Support for additional Geometries

The current RTree supports adding/querying shapes of type Circle, Rectangle and Point.
Would it be possible to add other shapes like Line and Ellipse for example?

replace MinMaxPriorityQueue (and remove guava dependency)?

I read this stackoverflow report with comments from google employee (apparently) that other implementations would be significantly more performant than MinMaxPriorityQueue:

the constant factors will be significantly worse than a normal PriorityQueue. Using a normal PriorityQueue will do much better, or better yet, Ordering.greatestOf uses an O(n) time, O(k) memory algorithm. (We are giving some consideration to deprecating MinMaxPriorityQueue, just because it tends to get misused in this way.) – Louis Wasserman Aug 21 '15 at 17:33

I'll set up a JMH benchmark and experiment with other implementations. If anyone has one that they like I'd love to know, thanks!

Can't Serialize Custom Geometries

Hello,

I am trying to serialize my RTree of type <CustomObject, Polygon> using the following code:

OutputStream os = new FileOutputStream(new File("test.tree")); Serializer<Fence, Polygon> serializer = Serializers.flatBuffers().javaIo(); serializer.write(tree, os);

I keep getting the following error:

java.lang.RuntimeException: unexpected at com.github.davidmoten.rtree.fbs.FlatBuffersHelper.addEntries(FlatBuffersHelper.java:60) at com.github.davidmoten.rtree.fbs.SerializerFlatBuffers.addNode(SerializerFlatBuffers.java:96) at com.github.davidmoten.rtree.fbs.SerializerFlatBuffers.write(SerializerFlatBuffers.java:73) at CreateDataTests.main(CreateDataTests.java:31) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

It looks like the addEntries method of FlatBuffersHelper can only process the built in geometries. Is there a way around this?

Thanks

Retrieving the root mbr of the tree

thank you very much for this great implementation, I was wondering if there is a way I can retrieve the root node or basically the minimum bounding rectangle that covers all the points in the tree.

Search for nearest neighbors with bounds on distances along different axes, not the eucledian distance

The problem is with this method:

public Observable<Entry<T, S>> search(final Rectangle r, final double maxDistance) {
        return search(new Func1<Geometry, Boolean>() {
            @Override
            public Boolean call(Geometry g) {
                return g.distance(r) < maxDistance;
            }
        });
    }

which only accepts a single distance.

It would be nice to have something like public Observable<Entry<T, S>> search(final Rectangle r, final double maxDistX, double maxDistY) or public Observable<Entry<T, S>> search(final Rectangle r, final double[] maxDistance).

The use case: the dimensions of the axes are different, so using a single distance doesn't make much sense. Trying to bring both dimensions to the same scale is not so simple, plus I just know the constraints I want to put on each of those coordinates.

The fact that r-tree class is marked final and the constructors are private also doesn't help extending the code.

I know that I can run RTree.search(Rectangle) with the desired constraints, but there are cases where this can bring up a lot of unnecessary stuff (e.g. larger rectangles in the tree covering the same region) and cause unnecessary slowdows, while I only need to get a few neighbors of a certain point.

Thanks for the great library!

Suggestion on how does Immutability of this Implementation suitable for writing Concurrenct, Thread Safe Code

I was looking into RTree.add(), RTree.delete() and I am wondering if I can perform these operations using multiple threads without worrying about any synchronization construct at my end. I assume that it will not return me correct tree because if I add 2 entries to the RTree parallely, once both the threads complete, the resulting RTree wont have both entries, similarly for the Delete Operation.

Please clarify if my understanding is wrong.

If I talk about search operation, it is not going to mutate the RTree's state, so immutability has no role here. These questions are arising as I might have to create the RTree multiple times in the same process with too much of data and hence will have to perform add operation in multiple threads.

Regards
Saurabh

search with distance only valid for Rtree of rectangle or point

For example if you have

RTree<String, Circle> tree = RTree.create();
...
double maxDistance = 10;
entries = tree.search(rectangle, maxDistance);

Then the entries returned have mbrs within maxDistance rather than being exactly within maxDistance of the entry circles.

A fix for this will require an API change where the distance function is passed to the method.

RTree did not find the object I inserted.

I inserted a point and search a large rectangle for the point, but the result is empty.

        RTree<String, Geometry> rTree = RTree.create();
        rTree.add("first", Geometries.point(5,5));

        Observable<Entry<String, Geometry>> result = rTree.search(Geometries.rectangle(0,0,500,500));
        System.out.print(result.toList().toBlocking().single());

2 step process

Will your Rtree automatically handle

  • pruning by envelopes
  • intersection / contains (i.e. filtered cross join)

automatically, or should the user code perform the second task manually?
I would like to handle Multipolygons from JTS and am unsure if implementing my own geometry is better or if simply adding the envelope as a rectangle and then manually checking for intersection.

Results not correct

I tried a set of Lat, Lon to create R star Tree using this library and SQLLite Implementation R* Tree. The results are different. Has anyone tried to compare the results of this Library with SQLLite?

Provide RTree.search(Geometry) method

A feature request: it seems that search queries must only use rectangle geometries. Most often I would want to be searching by point and it feels wrong to create degenerate rectangles every time.

RTree<String> rtree = RTree.create();
rtree.add("foo", Geometries.rectangle(24.0d, 0.0d, 42.0, 1.0));
...

rtree.search(Geometries.point(36.0d, 0.5d));  // compile error

Thanks for your consideration!

Joining the rtree project

Hi David,
My name is Max. I found your project a few days ago and it sounds very interesting.
I was wondering if you're still working on it, and need some help developing, fixing bugs or anything else.
I would love to contribute somehow, even if it's with little things.
Let me know if you're interested.

Best Regards,
Max.

P.S.
I already cloned the project, imported to eclipse and got it working. I have two tests that are failing for some reason (testBaskpressureIterationForUpTo1000Entries , testVisualizerWithGreekData).

Making RTree Serializable

Hi,

I was trying to integrate this in Spark for one of my use cases. However, post constructing the RTree object I wanted to broadcast it to the executors, but failed since the RTree object was not serializable.

Work around: Initialized it once per executor, but still it would be great if we could simply mark it serializable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.