tensorics / tensorics-core Goto Github PK

View Code? Open in Web Editor NEW

10.0 5.0 2.0 4.59 MB

The core library of Tensorics - a Java Library for Manipulating Multi-Dimensional Data with Pleasure

Home Page: http://tensorics.org

License: Apache License 2.0

Java 100.00%

tensors java multikey-map

tensorics-core's People

Contributors

Stargazers

Watchers

Forkers

cs4r anlon-burke

tensorics-core's Issues

Consistently Treat Reduction with Resampling

Now, that a prototype of multidimensional resampling is in place, it is time to think, how to combine the two (i.e. use resampling during reduction). This is basically the generalization of the already available

from(aTensor).reduce(ACoordinate.class).byInterpolatedSlicing(...);

One way is of course to add more methods like this and stay with the dogma, to only reduce one dimension at a time. However, for resampling, another road was probed which probably looks a bit more modern (more stream-like ;-), namely potentially defining several dimension in one clause. The disadvantage would of course be that we have more than one parameter per method ;-)

What I mean could be something like the following:

Tensor<V> reduced = Tensorics.reduce(aTensor)
    .sliceAt(Coord1.class, c1) // exact; only available values would be taken into account
    .then().repeatAt(Coord2.class, c2) // 
    .toTensor();

As a bonus, one could even add at the end things like:

toScalar(); // would throw if not zero dimensions
to(ATensorbacked.class); // would throw if dimensions not compatible

With a starting point that is FieldAware one could accomplish:

TensoricDouble.reduce(aDoubleTensor)
    .sliceAt(Coord1.class, c1) // exact; only available values would be taken into account
    .then().repeatAt(Coord2.class, c2) // 
    .then().linearAt(Coord3.class, c3, Coord3::doubleValue)
    .then().average(Coord4.class)
    .then().rms(Coord5.class)
    .toTensor();

What are your opinions?

Make builder methods chainable

In the original API we compromised (common) method chaining of builders with a fluent API which turned out to be problematic, because of a lot of object creation and destruction for big tensors (builder.at(c1, c2, c3).put(value);).

If we intend to give this up in any case, there is no argument anymore why builder methods should not be chainable .... This would mean, that common things like:

builder.putAt(value1, ca1, cb1).putAt(value1, ca1, cb1).build(); would be possible.

Agree and streamline treatment of context in tensor

Currently, the context can give problems in the following cases:

in the equals method: Currently, it is taken into account, but should it? What could be alternatives? E.g. having an additional method, like isIdentical(), which takes into account the context and remove it from the default equals method?
E.g. when extracting a tensor, and then using it in a calculation, where broadcasting is required (with the same dimension): One solution could be that the context determination should become a method in the broadcasting strategy ... probably it is?? To be evaluated...

Drop the Entry Interface

Change the state from deprecated to removed as it is no longer used to contain/expose entries. Some time ago it was changed to a regular Map<Position, Value>

More convenient cartesian product method for positions

Dimension should not be limited to lowermost classes

Instead, it should be possible to e.g. use an interface as a dimension.

Remove asMap() method from Tensor interface

Although the asMap() method looked logical at a first sight, it turns out more and more hindering things to get more flexible: E.g. lazy tensors or tensors of an infinite number of elements cannot easily be viewed as a map.
One idea could be to extract an equivalent to an asMap method as a static method and remove it from the tensor interface.

Add possibility for tensors with 'preferred' dimensions

E.g. for long time series, which you know you're not going to slice them in the time dimension. With the current API, this creates huge tensors which are rather slow to process. One could imagine to have e.g. a small tensor (for the other dimensions) of collections/lists (for a 'preferred' dimension, e.g. time).

tensorics-ext-cern CircularBufferTensorbacked uses a similar approach, but has to flatten down to a regular tensor eventually...

Remove deprecated put-method variants from TensorBuilder

The original put() API was a bit problematic: The fluent API introduced a lot of object creations and destructions... This was solved by the putAt(...) like API. Still the old methods are still there, but should be removed.

An alternative would even be to introduce a put syntax, which would be the same like in a map:
aka
public void put(Position p, V value)

... this would still be somehow nicely readable if we provide a (static) Position factory method called 'at'. The cod then would result in:

builder.put(at(c1, c2, c3), value);

explicitely express multidimensionality and finite dimensionality

This comes from a comment from michi on the resampling pull request:
" I don't think it's clear from the code that a Tensoric is supposed to be bound to a certain dimension set, as the dimension set is by no way obtainable from the interface ..."

#33

I fully agree with Michis comment on the tensoric. I thought in similar ways, but left out this, as it was not immediately necessary ... and was not exactly shure how to implement it. I see the following options:

a) Extract an interface from shape, which would contain the following methods:

int dimensionality();
contains(...)
Set<Class<?>> dimensions(); (BTW I would like to introduce this as a replacement for dimensionSet(), as it is more conform to our other namings (and deprecate dimensionSet())

The name of the interface could be e.g. Dimensional
(NOTE: If the contains(..) method belongs here is not fully clear ... probably better a dedicated interface....
Shape and Tensoric would then implement this... (and thus also the tensor, which might be convenient in any case and simply delegate to shape())

b) Optionally, we could have a similar interface/object e.g. called Dimensionality which could be returned from Tensoric by a dimensionality() method....

c) We could strip down the Shape interface (which anyhow should be created) to those methods and e.g. rename the actual Shape (with the positions() method) to something like a FiniteShape....
Tensoric would then have a method:

Shape shape();

and Tensor could then override this by:

FiniteShape shape();

I think I like c) for the moment ... probably with a combination of a) so that we have the convenience methods in Tensor + Tensoric.

... still undesided ... input welcome ;-)

Add simple values() method in Tensorics to retrieve all the values of a Tensor

Context should become Position

The current implementation uses Context (e.g. tensor.context()) that is a wrapper for an ordinary Position.
It was justified as at some moment one would like to put more into the context. Nowadays we see that a Position is enough. Should be removed.

NOTE be careful on this change as it will break the tensor API...

Separate Expressions from the Data/Manipulation part of the package.

It is more like a reminder that at some moment it may be useful to separate this part from the others as it can exists on its own...

Add Boolean operations support

To provide basic operations for Boolean tensors. Would simplify their usage for state operations.

Provide a simple method to create Tensor<QuantifiedValue> of three tensors

should respect proper reshaping / boradcasting...

... ab it tricky as it is three tensors and not 2 ...

potentially a builder...

where should it g o into?

java.lang.IllegalArgumentException: Multiple entries with same key in an empty bi-dimensional tensor

I came across with some unexpected behaviour when creating a tensor with two dimensions, both of them having the same class.

In essence I wanted to perform the addition of two tensors that do not share common coordinates. I attach the code that illustrates the problem and the exception that it throws at line 6 -> builder2.putAt(1D, 0, 1); .

Reading Coordinates.mapOf method I deduced that I cannot have duplicate keys (dimensions) but I was able to create the first tensor (tensor1) indeed.

Snippet:

TensorBuilder<Double> builder1 = Tensorics.builder(Integer.class, Integer.class);
builder1.putAt(1D, 0, 0);
builder1.putAt(1D, 1, 1);
Tensor<Double> tensor1 = builder1.build();

TensorBuilder<Double> builder2 = Tensorics.builder(Integer.class, Integer.class);
builder2.putAt(1D, 0, 1);
builder2.putAt(1D, 1, 0);
Tensor<Double> tensor2 = builder2.build();

Tensor<Double> minus = calculate(tensor1).plus(tensor2);
System.out.println(minus);

Stack trace:

java.lang.IllegalArgumentException: Multiple entries with same key: class java.lang.Integer=1 and class java.lang.Integer=0
    at com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:150)
    at com.google.common.collect.RegularImmutableMap.checkNoConflictInBucket(RegularImmutableMap.java:104)
    at com.google.common.collect.RegularImmutableMap.<init>(RegularImmutableMap.java:70)
    at com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:254)
    at com.google.common.collect.ImmutableClassToInstanceMap$Builder.build(ImmutableClassToInstanceMap.java:107)
    at org.tensorics.core.tensor.Coordinates.mapOf(Coordinates.java:59)
    at org.tensorics.core.tensor.Position.<init>(Position.java:56)
    at org.tensorics.core.tensor.Position.of(Position.java:69)
    at org.tensorics.core.tensor.Position.of(Position.java:79)
    at org.tensorics.core.tensor.AbstractTensorBuilder.putAt(AbstractTensorBuilder.java:85)
    at org.tensorics.core.examples.sprint.MyTest.myTest(MyTest.java:23)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)

How to correctly treat equality of tensors?

E.g. is a Scalar equal to a zerodimensional tensor, or not?
How to best implement it?

Contributing.md test

This is a test to see the impact of contributing.md

Shape should become an interface

This would have the advantage that not every shape would have to copy the reference to the positions, but could e.g. be simply a view on the tensor content itself.
Disadvantage: could be problemativ when being serialized (e.g. similar to map.keySet())

Fix Travis build

It seems that there is no travis build running anymore

... could have to do with the fact that we should move to travis-ci.com instead of org...

Introduce Scalar

Hi all,

I was hesistating for a long time (since the beginning ;-) if we should introduce a type Scalar into the library. Actually the doubts with this are related to the fact that even in mathematics, the term scalar is used sometimes for (in the langueage of tensorics) the values of a tensor and the a tensor of zero dimension. The term is fuzzy (and we have some leftovers in the code: sometimes S (scalar) is used as a type parameter, sometimes V (value), sometimes E (element).

Long story short: I think it payed off to wait a bit here and learn from others ;-) ... If we look at reactive streams, each of them implements a dedicated object for 'zero or one' (Mono in java nine, Single in RxJava2).... So it seems to make sense....
However, seeing a bit more our use case... one main feature of a scalar object in tensorics is the fact that it still would have a context ... so a big difference wrt processing Scalars (if we would call zero-dimensional tensors like this) compared to bare values would be that they transport the context ... This would come in very handy as soon as scalars are values of reactive streams .(aka onlline-analysis at CERN).. (e.g. one can merge them into tensors etc....)

However, to make this stuff simpler, it seems to be natural to guarantee one more thing compared to Mono/Single, namely that it contains always one element. (This avoids a lot of checking of size()... (like avoiding null checks in java) ... I might be wrong ... still not sure...

In any case for a first iteration, I would propose:

Scalar extends Tensor

with the following features:

Always Size 1
Always zero dimensions
only method is get()
Shape is constant

So it is basically only a wrapper around a single value, still implementing the tensor interface and having a potentially non-empty context. Therefore it needs no conversion and can be used everywhere a tensor could ...

Builder Value Type is not well defined after builder(Class<?> ...) method ... Consider PreBuilders

Unfortunately, with the current implementation, it is not possible to create a tensor by writing something like:

Tensor<Double> tensor = Tensorics.builder(C1.class, C2.class).put(at(c1a, c2a), 0.0).build();

The reason for this is that the compiler cannot guess the type of the value after calling the builder(..) method.

Currently, there exist 2 workarounds:

assigning the builder to a local variable, so that the compiler can infer the types:

ImmutableTensor.Builder<Double> builder = Tensorics.builder(C1.class, C2.class);
Tensor<Double> tensor = builder.put(at(c1a, c2a), 0.0).build();

Specifying explicit type arguments:

Tensor<Double> tensor = Tensorics.<Double>builder(C1.class, C2.class).put(at(c1a, c2a), 0.0).build();

Each of them has their specific 'uglyness':
(1) is just annying for tensors with only a few arguments, where the chaining of the methods would be simply convenient ... assigning to a dedicated variable is just boilerplate.
(2) for example does not allow to have a static import to the builer() method ... it always requires the 'Tensorics' class in front ....

Both now become much more prominent, in case we decide to introduce limited support for typed coordinates (#36). In this case in both cases the lines will be polluted by a lot of generics arguments ... so in this cases, it would give more value to work on this problem.

One idea, which also plays in the direction of other issues would be something like explicitly giving the type of the value of a tensor. Imagine something like:

Tensor<Double> tensor = Tensorics.builder(C1.class, C2.class)
                            .valueType(Double.class)
                            .put(at(c1a, c2a), 0.0)
                            .build();

This might not look like a gain at the first glance... however... imagine, we introduce a shortcut, so that the first put(....) method further specifies the value type (lets call the returned object beforehand a 'PreBuilder` for the moment:

Tensor<Double> tensor = Tensorics.builder(C1.class, C2.class) // <PreBuilder<Object>> (or not type parameter at all )
                            .put(at(c1a, c2a), 0.0)  // Builder<Double>
                            .build();

... still does not look impressing, I know ...

However, if we look at other issues we have, for example that we are considering introducing specific tensor implementations for certain types (e.g. preferred dimensions - #11 - or even for certain value types - e.g. double array backed -) ... this could be a potential solution: Whenever the first put() method is called, the PreBuilder manifestates into a builder of an appropriate implementation as best guessed by the arguments given so far (e.g. max number of values, ordering of dimensions ...)....

... just thoughts ;-)

Provide limited support for tensors of typed dimensions

As noted recently by Andrea, one limiting factor, which might bring fear to new users, is the fact that dimensions are never typed and that also the number of dimensions is never known when using a tensor ....

In early designs such things were tried, but always led to horrible "generic" monsters, so it was decided to give this up at that time....

However, since now a core functionality is well established and proves to be useful.... probably it is time to rethink this approach and add limited 'typed tensors' sugar on top of the existing....

For sure we should follow an approach, where all the utility methods always only act on Tensors (only the value is typed). Otherwise, we will have an explosion of methods .... However, we could potentially have e.g. typed tensors (and builders) up to a limited number of dimensions...

Better support for Tensorbackeds

compose(ATensorbacked.class).from(aTensor);
vs.:
aTensor.as(ATensoribacked.class);
(usually, the latter one should be possible ... as long it does not require anything additional (e.g. a field)).

Further:

calculate(ATensorbacked.class).by(a).times(b);
(or from(a)?)