Comments (14)
We're asking ourselves the same question! cc @noamross @maelle , who are talking about how this would work as part of the onboarding requirements / checks at rOpenSci (see https://github.com/ropensci/onboarding).
Automated PRs across GH is an interesting but more provocative question, I know this is something @arfon has thought about wrt codemeta.json
and I'd be curious to hear his latest thinking.
In practice, we've found that it's often better if some more metadata can be added to the DESCRIPTION files than what most authors are currently doing (in particular, it's really nice for codemeta.json to have ORCID ids, and though DESCRIPTION files now support that thanks in part to codemeta, few authors have adopted this so far).
from codemetar.
I'd prefer to keep it in the one place where it started.
from codemetar.
Automated PRs across GH is an interesting but more provocative question, I know this is something @arfon has thought about wrt codemeta.json and I'd be curious to hear his latest thinking.
Anything automated is considered spam by GitHub and you'd quickly be banned. Something that makes is super-simple to allow a human to open a pull request to someone else's repository is likely more acceptable.
from codemetar.
It's the latter ;-)
from codemetar.
related to #20
from codemetar.
I now think that a bottom-up approach would not be successful, because it essentially means pushing "speculative complexity" towards project (and into their repos). "Speculative", because one hopes that a repository will make use of the codemeta.json
. Complexity, because it adds an auto-generated file to the Git repo and one step to the build/release process.
I understood CodeMeta's main goal as "improv[ing] how [repositories] talk to each other" in order to close gaps in software "preservation, discovery, reuse, and attribution" on the "infrastructure" level. Thus, the IMHO strongest argument for a bottom-up approach would be prototyping the metadata aggregation (currently in codemeta.json
), no? But with codemetar, is that happening on the "infrastructure" level? Currently, the infrastructure providers seem to be reducing complexity on their end (by reading-in a file, rather than aggregating themselves), but is that aligned with CodeMeta's goals?
If not, maybe a long-term winning strategy (and more elegant handling of the complexity) on their level would be to roll out a "2nd system" for repositories to integrate into their import procedures. Metaphorically speaking: The metadata camel needs to go through the eye of the repo-needle at some point, so it should be lubricated then and there ;-)
from codemetar.
Good questions, with no simple answers but maybe can add some perspective.
Personally, I agree with you that getting major repositories on board, in more of a top-down approach, is most efficient in realizing systematic change. That was reflected from basically the start of codemeta, where most of the initial workshop participants represented major repositories: https://codemeta.github.io/workshop/
Second, it is worth noting that Zenodo already does something quite like this with it's GitHub integration, and has for some time. That is, it parses information, not from DESCRIPTION files, but from GitHub metadata, and constructs a metadata record which it can provide in JSON-LD form. However, despite their central importance, repositories like Zenodo have very limited capacity for additional developer support, so I don't think they would ever take it upon themselves to implement a direct parser for R DESCRIPTION files and then similarly for all possible languages. Zenodo and other repositories that participated in the codemeta workshop are still interested in tackling the 'easier' case of parsing a standardized format like codemeta.json
that can be shared across languages, but even adopting something like that is a big lift on top of their existing infrastructure and limited capacity, so we're still waiting for that to happen.
There's also a flip side in which not all of this is easily automated. codemetar
now gives "opinions" back to the user to help encourage best practices of where and how authors can add additional metadata to help with this, but without such interaction with the user, such metadata will be more limited and not always perfectly correct. The R package model is also already somewhat richer in metadata than many other languages, in some (possibly many) cases, what might be much more useful is a simple generic tool for authoring codemeta.json manually, agnostic of the computer language, rather than 'automatically' populating it. @arfon has some nice ruby-based CLI tools for this, I'm meaning to add a web-based UI when I get a chance as well).
So in the long run, I agree that we are likely to get mostly only the extreme tail of users to adopt a 'codemeta.json' file in a purely bottom-up manner, but I also think there's a key role for organizations like ours to play in coordinating / facilitating the effort of the repositories on one hand and field-testing the approaches on the other.
from codemetar.
Thanks for continuing the discussion :-)
[…] despite their central importance, repositories like Zenodo have very limited capacity for additional developer support, so I don't think they would ever take it upon themselves to implement a direct parser for R DESCRIPTION files and then similarly for all possible languages
Agreed, they should never have to. So, the next/real question may be how easy or difficult it is for them to integrate (Caution: pseudo-code!):
if (import == R package) {
Rscript -e "codemetar::write_codemeta()"
} else if (import == Python module) {
pip show -v somepackage | codemetapy > codemeta.json
} ...
metadata = read.file("codemeta.json")
into their import pipelines, isn't it?
About the "opinions": I know (#174) ;-) There are interesting use-cases for that, both offline and during the repo-import. I'm not sure where to write about them, but here is not the best place, IMHO.
from codemetar.
So, the next/real question may be how easy or difficult it is for them to integrate ...
Ah, right, I see what you mean now. Still, running something like R on an incoming packages at that scale is a whole 'nother ball game from parsing some json data...
from codemetar.
In terms of complexity of the necessary pipeline (CI, VMs, etc.), or in terms of computational load on the repository providers servers?
I imagine it as a step before reading a codemeta.json
as before. Just one that is self-generated intermediately, instead of already part of the ingested set of files.
from codemetar.
sorry, I was really just speculating above, which isn't that helpful. really this is a discussion we should have with the individual providers.
from codemetar.
zenodo/zenodo#1504 is thinking about this as well :-)
from codemetar.
from codemetar.
Should we close this issue that's more a general discussion and take the convo to https://discuss.ropensci.org/?
from codemetar.
Related Issues (20)
- guess_fileSize() does not handle regexp like ^docs$ HOT 2
- Add codemeta.json HOT 2
- Error in codemeta workflow on GitHub actions HOT 13
- rm Appveyor and Travis CI HOT 4
- problem when pkg is a package name HOT 3
- Please correct before 2021-05-15 to safely retain your package on CRAN. HOT 5
- Release codemetar 0.2.0 HOT 6
- Release codemetar 0.3.1 HOT 2
- GHA is not recognized as CI by codemetar HOT 2
- Improve handling of non-PCRE compatible strings in `.Rbuildignore` HOT 1
- CRAN FIXME
- Release codemetar 0.3.2 HOT 1
- refactor to import core utilities HOT 1
- Wrong parsing of meta$ tags on CITATION HOT 3
- README suggests installing from CRAN HOT 5
- vignette failure HOT 2
- Document new write_codemeta behavior
- Failure in CRAN's revdep checks for desc
- `dot_to_package()` requires a `man/` and a `R/` folder HOT 1
- Please correct before 2022-09-09 to safely retain your package on CRAN. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from codemetar.