The accumulo-access from apache

Possible re-validation of an AccessExpression

#71 introduced MultiAccessEvaluatorImpl which uses different AccessEvaluatorImpl instances when evaluating an expression against multiple sets of Authorizations. Each AccessEvaluatorImpl will re-validate the expression, which introduces a minor inefficiency.

Consider removing AccessEvaluator as an API or improving its entry points

I think the entry point for AccessEvaluator entry point has room for improvement. Currently, the entry point is a static method named of that is overloaded. However, it almost exclusively takes an Authorizations object of some form.

Rather than construct an Authorizations object, then construct a separate AccessEvaluator object, it would probably make more sense to have a method that dangles off of Authorizations that evaluates.

For example:

var evaluator = AccessEvaluator.of(Authorizations.of("a", "b"));
// vs.
var evaluator = Authorizations.of("a", "b").evaluator();

What I'm suggesting is similar to the way Java regex Patterns can construct a Matcher. You wouldn't do Matcher m = new Matcher(Pattern.compile("p"), "other");. Instead, you do Pattern.compile("p").matcher("other"). You start with the thing you have, then you derive the related objects from it. In our case, our starting point is the Authorizations.

Also, while Authorizations and AccessExpression also use of as named entry points, I think it makes more sense to use for those than it does for AccessEvaluator. For the first two, you're literally composing them "of" the string provided, because they are simple objects that wrap and represent the thing it is being composed "of". That's not quite true for AccessEvaluator, which isn't merely wrapping, but is a tool to do evaluating work using the Authorizations used to initialize it. I think something like .compile(auths), .using(auths), or even .with(auths) might make more sense. However I'd still rather see if we can do away with it entirely, and just have an entry point off of Authorizations instead.

Add unit test that check error messages

For different invalid access expressions it would be nice to have to unit test that check the error messages in the exceptions.

build is broken, Arrays import is missing from AccessExpression

We should add the GitHub build actions

Improve readme

The readme is currently not user facing. Seems like it should at least contain the following.

quick overview of the history of the library and how it relates to Accumulo in general
description of the functionality offered by the library
some short examples of using the library
maven coordinates of the library

IllegalAccessExpressionException could use improvements

IllegalAccessExpressionException starts with IllegalAccess, which has a particular meaning, and I think the current name could cause confusion. It could be renamed, so as to make it clear that it's an AccessExpression exception and not an IllegalAccess exception.

Also, the supertype is PatternSyntaxException, which is a type that comes from the java.util.regex package, and specifically pertains to exceptions when dealing with regular expressions. I understand we're using it to help display helpful error messages that show where the problem occurred (if we can), but while that's fine in an internal API to construct a log message, it's probably too confusing when used in a public API for users to handle.

Contrib project pom's do not get updated on release candidate creation

When a release candidate is made for Accumulo-Access, the poms for contrib/antlr4 and contrib/getting-started do not get updated. We don't create official releases of the contrib artifacts, so I think that it's likely ok that their versions remain SNAPSHOTs. However, we might want to create a post-release action to update the version of accumulo-access that they rely on to use the newly released version.

Release Candidate script issue

There is an issue with the email generated by the src/build/create-release-candidate.sh script. Specifically, the use of the variable "projName" causes some of the build artifact links to have an upper case A in the word access. Currently the email links need to be modified manually before the release candidate email is sent out.

Evaluate whether implementations should be / can be Serializable

https://github.com/apache/accumulo/blob/main/core/src/main/java/org/apache/accumulo/core/security/Authorizations.java implements Serializable and Iterable<byte[]>. I think AccessExpression could be Serializable as well.

Create benchmark to evaluate parsing and evaluation performance

Remove caching functionality

The visibility evaluator builder has an option for caching evaluation execution. The cache it offers may not satisfy users needs. It would be very easy for a user to wrap the evaluator with something like a guava cache and use its extensive options. Therefore it probably not something this project needs to offer.

When removing the cache we could update the class level javadocs to recommend caching for the case where expressions are expected to repeat.

Duplicate authorization will throw an Illegal Argument Exception

accumulo-access/src/main/java/org/apache/accumulo/access/Authorizations.java

Line 67 in 2da31d3

return new Authorizations(Set.of(authorizations));

Set.of will throw an IllegalArgumentException as expected, but as a consumer to the API we should avoid the exception and prune duplicates .

Would love thoughts. I think it's reasonable to suggest that the client of the Authorizations API should do this work.

I'll submit a PR for this trivial change if there is any concurrence from @keith-turner or @dlmarion that we can de-dupe in the of helper method in Authorization

Post-vote release checklist for version 1.0.0-beta

Consider removing overloaded byte array APIs

AccessExpressions should be composed solely of printable characters that are human-readable. The library might benefit from making the APIs enforce stronger types for what it accepts when constructing its own objects.

This stronger typing would probably result in simpler implementation and less API bloat, working entirely with characters, and not having to worry about encoding or conversion within the library. However, in order to use the library this way, users who currently store their access expressions as byte arrays, may be forced to convert them to Strings for evaluation, and that may be undesirable for performance reasons (though perhaps not really that bad, and may not be noticeable at all with caching).

It's not clear to me at this time what those performance differences and the full extent of the trade-offs would be, but I think this is worth at least some investigation, since having a clean API with strong types out the gate for a new library will bode well for this library's longevity.

Consider adding default expression for empty expressions

Accumulo supports the functionality of setting a default expression for the empty expression. Need to determine if this library should offer a similar functionality that would use a configured access expression when it sees an empty expression.

This could look like the following the API, where when the empty expression is encountered the expression C&D is used instead.

        var evaluator = AccessEvaluator.builder()
                                       .authorizations(authsSet)
                                       .emptyExpressionDefault("C&D")
                                       .build();

Could possibly have an option to disallow the empty expression instead of this, where it would throw an exception when seeing the empty expression. Something like the following.

        var evaluator = AccessEvaluator.builder()
                                       .authorizations(authsSet)
                                       .failOnEmptyExpression(true)
                                       .build();

AccessExpression.abnf could be deleted

src/main/resources/org/apache/accumulo/access/specification/AccessExpression.abnf exists for the sole purpose of validating that the ABNF in SPECIFICATION.md is valid ABNF. The contents of AccessExpression.abnf are a copy of the ABNF in SPECIFICATION.md. AccessExpressionTest.testSpecificationDocumentation and AccessExpression.abnf could be deleted and the method used in AccessExpressionTest.testSpecificationDocumentation to get the ABNF from SPECIFICATION.md could be used in SpecificationTest.

Create profiles for running benchmarks to simplify the documented commands to run them

The exec-maven-plugin's execution, along with corresponding configuration to execute a benchmark, should be placed in a profile, so it becomes much easier to document how to run a benchmark.

Authorizations set view could be sorted

Currently, Authorizations is implemented as an unordered set (Set.copyOf(authorizations)). It also has some unnecessary copying (Authorizations.of(Collection<String>) results in two separate calls to Set.copyOf, one of which is redundant). When calling asSet to view the result, the returned set could be in any order.

It would be nice to make some improvements, so that a particular set of strings had a canonical view. This may also have implications for hashCode and equals if the implementation stores them in different orderings, but I'm not certain about that. It's also just nice to have a sorted view of the authorizations when calling asSet or toString.

Authorizer can be replaced with `Predicate<String>`

In AccessEvaluator, there is a small type called Authorizer that allows users to plug in an object that determines whether an entity being evaluated has an authorization or not. This is basically just a Predicate<String> and can be replaced by that.

HAS_AUTH_PREDICATE.test("someAuth");
// is basically the same as
HAS_AUTH_AUTHORIZER.isAuthorized("someAuth");

The main difference is that Predicates can be much more easily composed using and and or, and works better with all the Java Functional ecosystem. There is an argument to be had for granting a new convenience name to an equivalent functional object. But, I don't think it adds much here, and even if it did, it could at least extend Predicate<String> with a default method, so either way of calling could be used. But, I still think we could probably just get rid of it entirely, since Predicate<String> by itself does the job well enough.

LICENSE file contains information for non-existent files

I think the LICENSE file was copied from the main Accumulo project. It has information in the file for files that do not exist in the Accumulo-Access repo. These can be removed.

apache / accumulo-access Goto Github PK

accumulo-access's People

Contributors

Stargazers

Watchers

Forkers

accumulo-access's Issues

Recommend Projects

Recommend Topics

Recommend Org