Giter Site home page Giter Site logo

discovery's Introduction

See my personal GitHub Pages

❤️ Repositories

KnoMa KnoMa Hierarchical data transformations Open hypothesis Data bot Horizon

📘 Linked Data Repositories

Extract Transform Load DCAT-AP Viewer SPARQLess

📘 Bioinformatics & Cheminformatics Repositories

Autodock Vina Autodock Vina Autodock Vina

All Repositories

discovery's People

Contributors

skodapetr avatar

Watchers

 avatar

discovery's Issues

Ordering of experiment-level output CSVs

Output CSVs should be ordered.
discovery.csv and application-discovery.csv should be ordered according to the ordering of discoveries in the input experiment rdf:List.

NPE when running discovery

klimek@KLIMEK-MFF-NTB:/mnt/c/Users/Kuba/Documents/GitHub/discovery/deploy$ java -jar discovery.jar --Filter no-filter -e https://discovery.linkedpipes.com/resource/experiment/label/config -o labels-no-filter
07:54:23 [main] INFO  c.l.d.cli.RunExperiment - Collected 5 discoveries in experiment https://discovery.linkedpipes.com/resource/experiment/label/config
07:54:23 [main] INFO  c.l.d.c.f.RemoteDefinition - Collecting templates for: https://discovery.linkedpipes.com/resource/discovery/label-00/config
07:54:23 [main] INFO  c.l.d.c.f.RemoteDefinition - Loading templates ...
07:54:55 [main] INFO  c.l.d.c.f.RemoteDefinition - Loaded applications: 1 transformers: 0 datasets: 114
Exception in thread "main" java.lang.NullPointerException
        at com.linkedpipes.discovery.cli.factory.DiscoveriesFromUrl.createDiscoveryBuilder(DiscoveriesFromUrl.java:95)
        at com.linkedpipes.discovery.cli.factory.DiscoveriesFromUrl.create(DiscoveriesFromUrl.java:59)
        at com.linkedpipes.discovery.cli.RunDiscovery.runDiscoveriesFromUrl(RunDiscovery.java:72)
        at com.linkedpipes.discovery.cli.RunDiscovery.run(RunDiscovery.java:58)
        at com.linkedpipes.discovery.cli.RunExperiment.run(RunExperiment.java:57)
        at com.linkedpipes.discovery.cli.AppEntry.run(AppEntry.java:31)
        at com.linkedpipes.discovery.cli.AppEntry.main(AppEntry.java:21)

Discovery not generating pipelines.json correctly

With: java -jar discovery.jar -o out -d https://dis covery.linkedpipes.com/resource/discovery/dbpedia-test-01/config, in pipelines.json, a pipeline is seemingly discovered. However, it is missing a transformer, which had to be used, because otherwise the pipeline would not work:

{
  "pipelines" : [ {
    "components" : [ {
      "node" : "node_00001",
      "iri" : "https://ldcp.opendata.cz/resource/dbpedia/datasource-templates/Category-Charter_77_signatories",
      "label" : "Data source"
    }, {
      "node" : "node_00001",
      "iri" : "https://discovery.linkedpipes.com/resource/application/map/template",
      "label" : "Map Application"
    } ]
  } ]
}

Add another "transformers used" column

Now, the "transformers used" column was transofrmers applicable to datasets, even from pipelines not ending with application. We can keep that, but we need to add another number: transformers used in pipelines, which end with an application.

Update propsal

Now each data sample can contain multiple resources. Alternative is to allow single resource per data sample, as a result components would need to have multiple inputs - each per one resource / class type.

This should make it easier to scale as the data samples are smaller and are less likely to change. It may also help to deal with integration of different resources.

But, we need to specify how exactly split the data samples and how should output data samples be produced.

Extend export

Export data sample for each node as an extra file in the output directory.
The vertices file should contain beside the transformer name also it's IRI, for the first node (data source) IRI of the data source should be used.

The pipeline file then should be a JSON array, where each object refers to a pipeline and pipeline consists of a series of nodes. For each node, we store IRI of transformer and link to the vertices file (node id).

Add option to ignore missing templates and report them instead

It would be useful to have a switch enabling the user to skip over missing components (and report them). The tool would report the number of missing templates and produce an extra file with their list.

Exception in thread "main" java.io.FileNotFoundException: https://discovery.linkedpipes.com/resource/lod/templates/http---202.45.139.84-10035-catalogs-fao-repositories-agrovoc
        at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1915)
        at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1515)
        at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:250)
        at java.base/java.net.URL.openStream(URL.java:1139)
        at com.linkedpipes.discovery.rdf.RdfAdapter.fromHttp(RdfAdapter.java:61)
        at com.linkedpipes.discovery.rdf.RdfAdapter.asStatements(RdfAdapter.java:52)
        at com.linkedpipes.discovery.cli.factory.FromExperiment.loadTemplates(FromExperiment.java:70)
        at com.linkedpipes.discovery.cli.factory.FromExperiment.create(FromExperiment.java:49)
        at com.linkedpipes.discovery.cli.AppEntry.runExperiment(AppEntry.java:141)
        at com.linkedpipes.discovery.cli.AppEntry.run(AppEntry.java:74)
        at com.linkedpipes.discovery.cli.AppEntry.main(AppEntry.java:40)

Add support for inclusion of components into discovery definition

Right now, many discovery definitions run on the same set of components. However, these lists of compnents are copied into each discovery definition. When this list needs to be updated (like now), we would have to update all the discovery definitions. see example and another example.

This is highly impractical. Therefore, an "inclusion" feature is requested.

Given a discovery input such as this one:

<https://discovery.linkedpipes.com/resource/discovery/label-02/config> a <https://discovery.linkedpipes.com/vocabulary/discovery/Input> ;
    <https://discovery.linkedpipes.com/vocabulary/discovery/hasTemplate> 
      <https://discovery.linkedpipes.com/resource/application/dcterms/template>,

a list of components could be imported into the definition from another location, like this:

<https://discovery.linkedpipes.com/resource/discovery/label-02/config>
  <https://discovery.linkedpipes.com/vocabulary/discovery/import>
    <https://discovery.linkedpipes.com/resource/lod/list>

where

<https://discovery.linkedpipes.com/resource/lod/list> a <https://discovery.linkedpipes.com/vocabulary/discovery/Input>;
  <https://discovery.linkedpipes.com/vocabulary/discovery/hasTemplate> <https://discovery.linkedpipes.com/resource/application/dcterms/template>,
    <https://discovery.linkedpipes.com/resource/application/personal-profiles/template>,
...

This way, the list of imported components can be regenerated by a LP-ETL pipeline without breaking everything.

Discovery not displaying node IDs correctly

With: java -jar discovery.jar -o out -d https://dis covery.linkedpipes.com/resource/discovery/dbpedia-test-01/config, in pipelines.json I get duplicate node IDs:

{
  "pipelines" : [ {
    "components" : [ {
      "node" : "node_00001",
      "iri" : "https://ldcp.opendata.cz/resource/dbpedia/datasource-templates/Category-Charter_77_signatories",
      "label" : "Data source"
    }, {
      "node" : "node_00001",
      "iri" : "https://discovery.linkedpipes.com/resource/application/map/template",
      "label" : "Map Application"
    } ]
  } ]
}

Import definitions not working

java -jar discovery.jar -o label-01 -d https://discovery.linkedpipes.com/resource/discovery/label-01/config now gives me StackOveflow

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.