Giter Site home page Giter Site logo

hadrienk / java-vtl Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 8.0 16.13 MB

Java implementation of the Validation Transformation Language

License: Apache License 2.0

Java 50.80% GAP 1.75% ANTLR 0.66% HTML 1.45% JavaScript 40.50% CSS 4.64% Shell 0.10% RAML 0.10%

java-vtl's Introduction

Build Status Codacy Badge Codacy coverage Gitter

Java VTL: Java implementation of VTL

The Java VTL project is an open source java implementation of the VTL 1.1 draft specification. It follows the JSR-223 Java Scripting API and exposes a simple connector interface one can implement in order to integrate with any data stores.

Visit the interactive reference manual for more information.

Modules

The project is divided in modules;

  • java-vtl-parent
    • java-vtl-parser, contains the lexer and parser for VTL.
    • java-vtl-model, VTL data model.
    • java-vtl-script, JSR-223 (ScriptEngine) implementation.
    • java-vtl-connector, connector API.
    • java-vtl-tools, various tools.

Usage

Add a dependency to the maven project

<dependency>
    <groupId>no.ssb.vtl</groupId>
    <artifactId>java-vtl-script</artifactId>
    <version>[VERSION]</version>
</dependency>

Evaluate VTL expressions

ScriptEngine engine = new VTLScriptEngine(connector);

Bindings bindings = engine.getBindings(ScriptContext.ENGINE_SCOPE);
engine.eval("ds1 := get(\"foo\")" +
            "ds2 := get(\"bar\")" +
            "ds3 := [ds1, ds2] {" +
            "   filter ds1.id = \"string\"," +
            "   total := ds1.measure + ds2.measure" +
            "}");

System.out.println(bindings.get("ds3"))

Connect to external systems

VTL Java uses the no.ssb.vtl.connector.Connector interface to access and export data from and to external systems.

The Connector interface defines three methods:

public interface Connector {

    boolean canHandle(String identifier);

    Dataset getDataset(String identifier) throws ConnectorException;

    Dataset putDataset(String identifier, Dataset dataset) throws ConnectorException;

}

The method canHandle(String identifier) is used by the engine to find which connector is able to provide a Dataset for a given identifier.

The method getDataset(String identifier) is then called to get the dataset. Example implementations can be found in the java-vtl-ssb-api-connector module but a very crude implementation could be as such:

class StaticDataset implements Dataset {

    private final DataStructure structure = DataStructure.builder()
            .put("id", Role.IDENTIFIER, String.class)
            .put("period", Role.IDENTIFIER, Instant.class)
            .put("measure", Role.MEASURE, Long.class)
            .put("attribute", Role.ATTRIBUTE, String.class)
            .build();

    @Override
    public Stream<DataPoint> getData() {

        List<Map<String, Object>> data = new ArrayList<>();
        HashMap<String, Object> row = new HashMap<>();
        Instant period = Instant.now();
        for (int i = 0; i < 100; i++) {
            row.put("id", "id #" + i);
            row.put("period", period);
            row.put("measure", Long.valueOf(i));
            row.put("attribute", "attribute #" + i);
            data.add(row);
        }

        return data.stream().map(structure::wrap);
    }

    @Override
    public Optional<Map<String, Integer>> getDistinctValuesCount() {
        return Optional.empty();
    }

    @Override
    public Optional<Long> getSize() {
        return Optional.of(100L);
    }

    @Override
    public DataStructure getDataStructure() {
        return structure;
    }
}

Implementation roadmap

This is an overview of the implementation progress.

Group Operators Progress Comment
General purpose round parenthesis done
General purpose := (assignment) done
General purpose membership done
General purpose get usable The keep, filter and aggregate are not yet reflected in the connector interface.
General purpose put usable The Connector interface is defined but expressions are not recognized yet.
Join expression []{} done
Join clause filter done
Join clause keep done
Join clause drop done
Join clause fold done
Join clause unfold done
Join clause rename done
Join clause := (assignment) done
Join clause . (membership) done
Clauses rename done
Clauses filter done
Clauses keep done
Clauses calc todo
Clauses attrcalc todo
Clauses aggregate todo
Conditional if-then-else todo
Conditional nvl done
Validation Comparisons (>,<,>=,<=,=,<>) done
Validation in,not in, between todo
Validation isnull done Implemented syntax are isnull(value), value is null and value is not null
Validation exist_in, not_exist_in todo
Validation exist_in_all, not_exist_in_all todo
Validation check usable The boolean dataset must be built manually (no lifting).
Validation match_characters todo
Validation match_values todo
Statistical min, max todo
Statistical hierarchy usable The inline definition is not supported. A dataset that has a correct structure can be used instead.
Statistical aggregate todo
Relational union done
Relational intersect todo
Relational symdiff todo
Relational setdiff done
Relational merge todo
Boolean and usable Only inside join expression (no lifting).
Boolean or usable Only inside join expression (no lifting).
Boolean xor usable Only inside join expression (no lifting).
Boolean not usable Only inside join expression (no lifting).
Mathematical unary plus and minus done
Mathematical addition, substraction done
Mathematical multiplication, division done
Mathematical round, ceil, floor done
Mathematical abs done
Mathematical trunc done
Mathematical power, exp, nroot done
Mathematical ln, log done
Mathematical mod done
String length todo
String concatenation done
String trim todo
String upper/lower case todo
String substr usable No lifting.
String indexof todo
String date_from_string usable Dataset as input not implemented. Only YYYY date format accepted.
Outside specification integer_from_string done
Outside specification float_from_string done
Outside specification string_from_number done

Analytics

java-vtl's People

Contributors

eivindgi avatar hadrienk avatar pawbu avatar takvamborgen avatar trygu avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

java-vtl's Issues

Get operator

Dataset ds_o := get(PersistentDataset ds_id {, PersistentDataset ds_id}\* {,keep(keepPart Component {, keepClause Component})} {,filter(Component filterPart)}
{,aggregate(
[sum|avg|median|count|count_distinct|min|max]({include NULLS} MeasureComponent aggrPart) {,[sum|avg|median|count|count_distinct|min|max]({include NULLS} MeasureComponent aggrPart)}

)} )

  • Update grammar tests
  • Update grammar
  • Refine Connector interface
  • Test that the input datasets ds_id must have the same Logical Data Structure, which is the same Components in number, name and type (static).
  • Test that keepPart must be a Component expression containing exactly the name of a Component of any ds (complex Component expressions, combining more than one Component are not allowed) (static).
  • Test that aggrPart must be a Component expression containing exactly the name of a MeasureComponent present in any ds (no complex Component expressions, combining more than one Component is allowed). If there is at least one aggrPart, there must be one for each MeasureComponent that is present in a keepPart. If keepPart is omitted, all MeasureComponents must be in the aggregate. This means that there cannot be MeasureComponents, kept that are not used in aggregations (static).

rename operator

Dataset ds_2 := Dataset ds_1
[rename Component k as Constant compName {role=[MEASURE|IDENTIFIER|ATTRIBUTE]} {, Component k as Constant compName
{role=[MEASURE|IDENTIFIER|ATTRIBUTE]}
}*
]

  • Test that k is a Component expression that can have only Component literals of ds_1 (static).

package name in script is incorrect

Package name in script java is:
main/java/kohl.hadrien.vtl/script/
while in test it is:
test/java/kohl/hadrien/vtl/script/

Test is correct

Union operator

Implement union (records)

Syntax
Dataset ds_2 := union (Dataset ds_1 {, Dataset<?> ds_1}*)

Test:

  • Test antlr grammar
  • Create antlr grammar
  • Test that all the Datasets ds_1 must have the same IdentifierComponents and MeasureComponents, in name and type (static).
  • If only one argument is given, it is returned unchanged.
  • Test that in the output there cannot be two rows with the same values for all the IdentifierComponents and MeasureComponents.
  • Test Attribute propagation
  • Implement a Visitor in the script

filter operator

Dataset ds_1 := Dataset ds_2
[filter Component f]

  • Test that f is a Component expression over the Components of ds_1 (static).
  • Model interactions with other operators, such as if a filter is performed on a union, the filter gets applied on the datasets (evaluate the performance impact as well)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.