metafacture / metafacture-core Goto Github PK
View Code? Open in Web Editor NEWCore package of the Metafacture tool suite for metadata processing.
Home Page: https://metafacture.org
License: Apache License 2.0
Core package of the Metafacture tool suite for metadata processing.
Home Page: https://metafacture.org
License: Apache License 2.0
onResetStream() in AbstractBatcher should be made final. However, StreamBatchMerger needs to override this method.
There are several possibilities for increasing the performance of WildcardTrie which is used by the Metamorph class.
Some modules are built as wrappers around other modules. Currently, this is clumsy to implement as all method calls have to be relayed manually from the wrapper module to the inner module. Additionally, using the Default*Pipe implementations because difficult as they automatically forward resetStream() and closeStream() events.
When building a distribution (assemble:single), the Metamorph schema is currently contained in the jar. It should be additionally put in a folder so it can easily be referenced by xml editors.
One of the classes should be removed
MetafactureException is currently an unchecked exception. However, this makes it necessary to catch RuntimeException (which is bad style according to checkstyle).
This raises two questions:
The metafacture package should cotain a constant named Metafacture.VERSION or similar.
Currently, the nesting of collectors has no effect on the behaviour of flushWith and reset. However, from the user's perspective morph scripts might be easier to understand if collectors would inherit the flushWith and reset setting from their parent. What do you think, @mgeipel?
Use of Validator
The Metamorph XSD schema should be more modular so that plugins can extend the schema more easily.
substring keeps the original char[] which leads to high memory usage in sorting.
Except for some test cases the cg-xml format is not used much. Formeta offers a much more concise syntax and should be used instead.
The parser classes should not be part of the FormetaDecoder to allow for reusability and to simply the interface of the decoder module (it does not need to support partial records any more).
This is similar to the function tag.
Example:
<maps>
<javamap name="myJavaMap" class="org.culturegraph.MyMap" attributeA="xy"/>
</maps>
Enum properties in modules should be accessible from Flux
Most don't!
with no arguments: stdout
with incorrect number of arguments: stderr
Variable replacements in strings should be implemented using a proper parser instead of regexes, so that support for escape sequences can be added.
At the moment the semantics of the LifeCycle.closeStream() method is not consistently implemented. Some modules can be reused after calling closeStream() while others cannot. However, the contract of the closeStream() methods clearly states that the module should not be used after having closed it.
Many test cases rely on the ability to reuse certain modules even after closeStream() has called. This violates the contract. This should be fixed.
ListMap.java:[90,19] error: name clash: put(K#1,V#1)
Hide large stacktraces. Maybe add --verbose switch for full stacktraces.
Is it intentionally that choose clears its value after emitting it, @mgeipel? In cases where choose is used within an entity it might make sense to emit the result of the choose element multiple times.
The FileObjectWriter needs to set the file encoding.
@mgeipel RDF Schema doesn't contain the attribute rdf:reference s. http://www.w3schools.com/rdf/rdf_reference.asp. It must be replaced by rdf:resource.
ex: <ore:Aggregation rdf:about="http://www.dnb.de/018238327">
<edm:aggregatedCHO rdf:reference="http://d-nb.info/018238327" />
See https://gist.github.com/cboehme/5204711
and https://gist.github.com/neothemachine/4060735
and metafacture-mediawiki
org.culturegraph.mf.stream.pipe.StreamBufferTest should be rewritten to use StreamValidator. @cboehme what's your opinion?
this is a test
FormatException is used as a base class for WellformednessException and ValidationException but it is also used for flagging errors while parsing Flux files and Formeta data. This makes it difficult to distinguish between parser errors and failed assertions during unit testing (see FormetaDecoderTest.java)
another test
Returns only first match. In analogy to it should return all matches.
The Formatter classes defined in FormetaEncoder can also be handy outside of a module or in other modules. The should probably be moved to the utility package.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.