Giter Site home page Giter Site logo

onto-med / smog Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 1.0 288 KB

Spreadsheet Model Generator (SMOG): A Lightweight Tool for Object-Spreadsheet Mapping

Home Page: https://onto-med.github.io/SMOG/

License: MIT License

Java 99.94% Shell 0.06%
ontologies spreadsheet-mapping

smog's People

Contributors

alexu75 avatar christophb avatar dependabot[bot] avatar konradhoeffner avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

konradhoeffner

smog's Issues

Add CLI options to generate Maven artefact

This is a feature that was available in some code that has not been ported to this GitHub repository.

Basically, the following tasks must be addressed:

  • generate pom.xml file (additional configurations might be needed)
  • bundle resulting code to a JAR file

Add documentation for OWL/RDF export

Describe the solution you'd like
In 2022, a new undocumented feature was introduced to create OWL/RDF exports.

  • add documentation and examples to README.md
  • add documentation for the config.yaml file that is used to generate OWL/RDF

Use LF line endings instead of CRLF

Problem
Right now, all the files seem to use CRLF (Windows) line endings, which creates problems for developers on other operating systems.

Describe the solution you'd like
It is customary to always use LF line endings in the repository.
If Windows users need CRLF endings on their local machine, then git config core.autocrlf should be used.
See for example https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings?platform=windows.

add test for SimpleOWLExport

Right now there doesn't seem to be a test for the SimpleOWLExport but that is an important feature.

  • generate test data if necessary
  • perform SimpleOWLExport on the test data
  • verify that the output is correct

support individuals in SimpleOWLExport

Is your feature request related to a problem? Please describe.
The ANNO project includes visual definition links, which are images.
Because they are generated by a spreadsheet column, they get transformed into OWL classes as everything else, however this does not make sense, see below.
Additionally, this causes the auto generated ontology documentation to get bloated by large amounts of entries that shouldn't exist.

<owl:Class rdf:about="https://annosaxfdm.de/ontology/IMAGE0009">
    <rdfs:subClassOf rdf:resource="https://annosaxfdm.de/ontology/VisualDefinition"/>
    <ontology:file>Mandibula_AL_anterior.png</ontology:file>
    <ontology:headline>1, Corpus mandibulae, anterior</ontology:headline>
    <ontology:id>IMAGE_0009</ontology:id>
    <ontology:location rdf:resource="https://biosciences.hs-mittweida.de/anno/Mandibula_AL_anterior.png"/>
    <ontology:source rdf:resource="https://annosaxfdm.de/ontology/SO0052"/>
    <ontology:source rdf:resource="https://annosaxfdm.de/ontology/SO0110"/>
</owl:Class>

Describe the solution you'd like
There should be a way to mark a spreadsheet as containing individuals.
For example, instead of "Tree: Categories", it could be "Individuals".
This could then be result in something like this:

<owl:NamedIndividual rdf:about="https://annosaxfdm.de/ontology/IMAGE0009">
    <rdf:type rdf:resource="https://annosaxfdm.de/ontology/VisualDefinition"/>
    <ontology:file>Mandibula_AL_anterior.png</ontology:file>
    <ontology:headline>1, Corpus mandibulae, anterior</ontology:headline>
    <ontology:id>IMAGE_0009</ontology:id>
    <ontology:location rdf:resource="https://biosciences.hs-mittweida.de/anno/Mandibula_AL_anterior.png"/>
    <ontology:source rdf:resource="https://annosaxfdm.de/ontology/SO0052"/>
    <ontology:source rdf:resource="https://annosaxfdm.de/ontology/SO0110"/>
</owl:NamedIndividual>

add code style guideline, configuration and workflow

Problem
Right now there is no code style guideline mentioned in the readme, no code formatter configuration file and no automatic code formatting workflow.
This means contributors do not know which code style to use so it could differ between contributors.
Also code formatting has to be done manually, which takes away valuable developer time.

Solution

  1. Describe the code style in the readme (e.g. "Google Java Format" or refer to a specific tool like Prettier with the Java plugin)
  2. If the code style has parameters, add a configuration file for that.
  3. Add a workflow that rejects wrongly formatted commits or alternatively fix the formatting in the workflow.

provide metaclass export option

Is your feature request related to a problem? Please describe.

Ontologies often need validation checks for basic issues such as missing values, invalid references and cardinality violations.
The full power of a reasoner is often not needed and can also provide unintuitive messages.
For example, if you model a horse as having 4 legs, and you forget to model one, than a reasoner may tell you that your individual is not a horse, when the user expects to be told to add the missing leg.
SHACL shapes with validators such as PySHACL provide an easily automatable validation with messages that are intuitive to domain experts.
However SHACL shape target instances of classes, not subclasses of classes.

Describe the solution you'd like
As described at https://stackoverflow.com/questions/70756167/how-to-apply-shacl-to-subclasses-instead-of-instances and already successfully used in SNIK, metaclasses solve this problem.

So on top of this:

:Elefant rdf:type owl:Class;
 rdfs:subClassOf :Elephantidae.

:Elephantidae rdf:type owl:Class;
 rdfs:subClassOf :Animal.

You would add:

:Elefant rdf:type owl:Class;
 rdfs:subClassOf :Elephantidae.
 a :AnimalClass.

Where :AnimalClass is a meta class.
This allows one to immediately see without going through the whole hierarchy that :Elefant is an :Animal and crucially allows tools like PySHACL to function.
This also allows easier development and more resource-efficient tooling because SPARQL 1.1. property paths, which can be performance intensive, are not required to identify which subclass belongs to which core class.
I estimate this to be easy to implement in SMOG and could be gated behind a parameter.

Describe alternatives you've considered

  • I will investigate whether a metaclass could also be entered as a relation in the tabular source, which would then need to be repeated for each value.
  • Alternatively, PySHACL could add an ontology mode, but this will probably not be implemented.

update Java to newest LTS version 17

Problem
Currently, SMOG uses Java 8 from March 2014, which lost Oracle Premier Support in March 2022 and lacks some key features like Records.

Describe the solution you'd like
Update to Java 17, which is the newest current LTS and take advantage of new features like records.

ProjectGenerator export NoSuchMethodError

Running class ProjectGenerator with parameters export config.yml causes:

picocli.CommandLine$ExecutionException: Error while calling command (int de.imise.excel_api.ProjectGenerator.export(java.io.File)): java.lang.NoSuchMethodError: 'com.fasterxml.jackson.core.StreamReadConstraints com.fasterxml.jackson.dataformat.yaml.YAMLParser.streamReadConstraints()'
	at picocli.CommandLine.executeUserObject(CommandLine.java:2078)
	at picocli.CommandLine.access$1500(CommandLine.java:148)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
	at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
	at picocli.CommandLine.execute(CommandLine.java:2170)
	at de.imise.excel_api.ProjectGenerator.main(ProjectGenerator.java:36)
Caused by: java.lang.NoSuchMethodError: 'com.fasterxml.jackson.core.StreamReadConstraints com.fasterxml.jackson.dataformat.yaml.YAMLParser.streamReadConstraints()'
	at com.fasterxml.jackson.dataformat.yaml.YAMLParser._parseNumericValue(YAMLParser.java:1109)
	at com.fasterxml.jackson.core.base.ParserBase.getNumberValue(ParserBase.java:599)
	at com.fasterxml.jackson.databind.deser.std.StdDeserializer._checkFloatToStringCoercion(StdDeserializer.java:1573)
	at com.fasterxml.jackson.databind.deser.std.StdDeserializer._parseString(StdDeserializer.java:1426)
	at com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:48)
	at com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:11)
	at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:129)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:314)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)
	at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4697)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3517)
	at de.imise.excel_api.owl_export.Config.get(Config.java:35)
	at de.imise.excel_api.ProjectGenerator.export(ProjectGenerator.java:111)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at picocli.CommandLine.executeUserObject(CommandLine.java:2066)
	... 8 more

Add Command Line Interface

Is your feature request related to a problem? Please describe.
If I understand it correctly, SMOG currently does not have a command line interface and needs to be used as a dependency from within another Java project.
This tool could be useful to many research groups but the hurdle of writing another Java application could be significantly lowered using a command line interface.

Describe the solution you'd like
Add a Main class with a simple command line interface.

add data property restrictions

In addition to ref-a (annotations) and ref-r (restriction), add a way for spreadsheet editors to specify data property restrictions.

  • rename ref-r to ref-o for object property restriction, but keep ref-r for backwards compatibility
  • introduce ref-d for data property restrictions
  • test with an example document
  • check if there are existing unit tests for the SMOG simple OWL generator and adapt it in that case

Sketch

Screenshot from 2024-04-11 14-10-31

See also http://owlcs.github.io/owlapi/apidocs_5/org/semanticweb/owlapi/model/OWLDataRestriction.html.

The work is done in branch https://github.com/KonradHoeffner/SMOG/tree/pr-data-restriction.

Log4j2 could not find a logging implementation

Describe the bug
Running the tests with mvn test produces the following error:

Running de.imise.excel_api.model_generator.ModelGeneratorTest
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...

To Reproduce

  1. mvn test

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.