Also related to <a class="issue-link js-issue-link" data-error-text="Failed to load ti

We're asking ourselves the same question! cc <a class="user-mention notranslate" data-

related to <a class="issue-link js-issue-link" data-error-text="Failed to load title"

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Bottom-up / crowdsourcing approach? about codemetar HOT 14 OPEN

ropensci commented on June 18, 2024

Bottom-up / crowdsourcing approach?

from codemetar.

Comments (14)

cboettig commented on June 18, 2024 1

We're asking ourselves the same question! cc @noamross @maelle , who are talking about how this would work as part of the onboarding requirements / checks at rOpenSci (see https://github.com/ropensci/onboarding).

Automated PRs across GH is an interesting but more provocative question, I know this is something @arfon has thought about wrt codemeta.json and I'd be curious to hear his latest thinking.

In practice, we've found that it's often better if some more metadata can be added to the DESCRIPTION files than what most authors are currently doing (in particular, it's really nice for codemeta.json to have ORCID ids, and though DESCRIPTION files now support that thanks in part to codemeta, few authors have adopted this so far).

from codemetar.

katrinleinweber commented on June 18, 2024 1

I'd prefer to keep it in the one place where it started.

from codemetar.

arfon commented on June 18, 2024

Automated PRs across GH is an interesting but more provocative question, I know this is something @arfon has thought about wrt codemeta.json and I'd be curious to hear his latest thinking.

Anything automated is considered spam by GitHub and you'd quickly be banned. Something that makes is super-simple to allow a human to open a pull request to someone else's repository is likely more acceptable.

from codemetar.

katrinleinweber commented on June 18, 2024

It's the latter ;-)

from codemetar.

maelle commented on June 18, 2024

related to #20

from codemetar.

katrinleinweber commented on June 18, 2024

I now think that a bottom-up approach would not be successful, because it essentially means pushing "speculative complexity" towards project (and into their repos). "Speculative", because one hopes that a repository will make use of the codemeta.json. Complexity, because it adds an auto-generated file to the Git repo and one step to the build/release process.

I understood CodeMeta's main goal as "improv[ing] how [repositories] talk to each other" in order to close gaps in software "preservation, discovery, reuse, and attribution" on the "infrastructure" level. Thus, the IMHO strongest argument for a bottom-up approach would be prototyping the metadata aggregation (currently in codemeta.json), no? But with codemetar, is that happening on the "infrastructure" level? Currently, the infrastructure providers seem to be reducing complexity on their end (by reading-in a file, rather than aggregating themselves), but is that aligned with CodeMeta's goals?

If not, maybe a long-term winning strategy (and more elegant handling of the complexity) on their level would be to roll out a "2nd system" for repositories to integrate into their import procedures. Metaphorically speaking: The metadata camel needs to go through the eye of the repo-needle at some point, so it should be lubricated then and there ;-)

from codemetar.

cboettig commented on June 18, 2024

Good questions, with no simple answers but maybe can add some perspective.

Personally, I agree with you that getting major repositories on board, in more of a top-down approach, is most efficient in realizing systematic change. That was reflected from basically the start of codemeta, where most of the initial workshop participants represented major repositories: https://codemeta.github.io/workshop/

Second, it is worth noting that Zenodo already does something quite like this with it's GitHub integration, and has for some time. That is, it parses information, not from DESCRIPTION files, but from GitHub metadata, and constructs a metadata record which it can provide in JSON-LD form. However, despite their central importance, repositories like Zenodo have very limited capacity for additional developer support, so I don't think they would ever take it upon themselves to implement a direct parser for R DESCRIPTION files and then similarly for all possible languages. Zenodo and other repositories that participated in the codemeta workshop are still interested in tackling the 'easier' case of parsing a standardized format like codemeta.json that can be shared across languages, but even adopting something like that is a big lift on top of their existing infrastructure and limited capacity, so we're still waiting for that to happen.

There's also a flip side in which not all of this is easily automated. codemetar now gives "opinions" back to the user to help encourage best practices of where and how authors can add additional metadata to help with this, but without such interaction with the user, such metadata will be more limited and not always perfectly correct. The R package model is also already somewhat richer in metadata than many other languages, in some (possibly many) cases, what might be much more useful is a simple generic tool for authoring codemeta.json manually, agnostic of the computer language, rather than 'automatically' populating it. @arfon has some nice ruby-based CLI tools for this, I'm meaning to add a web-based UI when I get a chance as well).

So in the long run, I agree that we are likely to get mostly only the extreme tail of users to adopt a 'codemeta.json' file in a purely bottom-up manner, but I also think there's a key role for organizations like ours to play in coordinating / facilitating the effort of the repositories on one hand and field-testing the approaches on the other.

from codemetar.

katrinleinweber commented on June 18, 2024

Thanks for continuing the discussion :-)

[…] despite their central importance, repositories like Zenodo have very limited capacity for additional developer support, so I don't think they would ever take it upon themselves to implement a direct parser for R DESCRIPTION files and then similarly for all possible languages

Agreed, they should never have to. So, the next/real question may be how easy or difficult it is for them to integrate (Caution: pseudo-code!):

if (import == R package) {
  Rscript -e "codemetar::write_codemeta()"
} else if (import == Python module) {
  pip show -v somepackage | codemetapy > codemeta.json
} ...

metadata = read.file("codemeta.json")

into their import pipelines, isn't it?

About the "opinions": I know (#174) ;-) There are interesting use-cases for that, both offline and during the repo-import. I'm not sure where to write about them, but here is not the best place, IMHO.

from codemetar.

cboettig commented on June 18, 2024

@katrinleinweber

So, the next/real question may be how easy or difficult it is for them to integrate ...

Ah, right, I see what you mean now. Still, running something like R on an incoming packages at that scale is a whole 'nother ball game from parsing some json data...

from codemetar.

katrinleinweber commented on June 18, 2024

In terms of complexity of the necessary pipeline (CI, VMs, etc.), or in terms of computational load on the repository providers servers?

I imagine it as a step before reading a codemeta.json as before. Just one that is self-generated intermediately, instead of already part of the ingested set of files.

from codemetar.

cboettig commented on June 18, 2024

sorry, I was really just speculating above, which isn't that helpful. really this is a discussion we should have with the individual providers.

from codemetar.

katrinleinweber commented on June 18, 2024

zenodo/zenodo#1504 is thinking about this as well :-)

from codemetar.

cboettig commented on June 18, 2024

👀 nice!

from codemetar.

maelle commented on June 18, 2024

Should we close this issue that's more a general discussion and take the convo to https://discuss.ropensci.org/?

from codemetar.

Bottom-up / crowdsourcing approach? about codemetar HOT 14 OPEN

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent