marda-alliance / metadata_extractors_registry Goto Github PK
View Code? Open in Web Editor NEWArchive. See Datatractor Yard, below:
Home Page: https://github.com/datatractor/yard
License: MIT License
Archive. See Datatractor Yard, below:
Home Page: https://github.com/datatractor/yard
License: MIT License
We should deploy the API with a /v<major>
prefix (e.g., /v0
).
Committing an example file for everything under the sun into this repo directly is probably not the best way forward. We should have a mechanism for providing persistent links to example files (e.g., archived files with DOIs) that the registry can download and use as test data. Probably this ends up being a registry of example files too, in that case...
Now we have a few more entries, we can see that entries are sorted in order of addition, but alphabetical probably makes more sense.
Deployed registry version should match e.g., release-<commit>
but somehow magically works in the HTML app.
Time to add proper testing!
Potential idea as discussed: save .marda.yaml
as a file in a GitHub repo that outlines the entry in this registry, then simply submit the link to the repo here.
Repo can then be watched by the CI of the registry and entries updated.
Can also make a form/UI for creating the initial yaml file.
This issue can be used to vaguely track specific filetypes and extractors we want in the registry:
Intermittent build problems e.g., "remote builder app unavailable", seem to be related to docker layer caching. Destroying the "builder" (not the app) with fly destroy <builder name>
seems to work on the next build.
This could either be additional data added to the single entry file type endpoint, e.g.,
/registry/filetypes/biologic-mpr
also returns
"relationships": {
"extractors": [
"yadg"
]
}
or it could be a search endpoint for registry/extractors?supported_filetypes=biologic-mpr
.
I think I prefer the former to start with.
The most important thing in our "schema" was perhaps the link to an example file (as simply the name of the instrument and extension oftentimes doesn't describe much). Perhaps one could also consider adding it here.
To make this possible, I once started a "chemical files registry" here, where I use also a yml schema similar to yours: https://github.com/kjappelbaum/chemical-files-registry/blob/master/fileDescriptions/analyticalMethods/thermogravimetricAnalysis/ta-txt/description.yml.
I didn't have any time to work on this, but the idea was to collect example files and link to them (and the filetype schema) from the parser registry.
Originally posted by @kjapplebaum in marda-alliance/metadata_extractors_schema#2 (comment)
This will enable free-text search over entries (if we use e.g., Mongo). Can also just use sqlite or something if that ends up being easier.
See Materials-Consortia/optimade-python-tools#2027
and as such, the registry is down. Good time to migrate!
data
folder and make sure they correspond to registered file typesA declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.