ropensci / codemetar Goto Github PK
View Code? Open in Web Editor NEWan R package for generating and working with codemeta
Home Page: https://docs.ropensci.org/codemetar
an R package for generating and working with codemeta
Home Page: https://docs.ropensci.org/codemetar
Left over from a vain attempt to avoid a read_csv dependency
Codemeta information should be embedded in README output
Add a function that could be called invisibly (i.e. echo=FALSE, results="asis"
) in the README.Rmd which would embed the codemeta information inside <script>
HTML element. This way the information could be discovered by Google etc when the package is built on CRAN and/or a pkgdown website.
SPDX defines a widely-used set of abbreviations for open source licenses: https://spdx.org/licenses/
codemetar should be able to recognize any license recognized by CRAN and map it to the SPDX term. (Also deal with or at least strip the + FILE
template when parsing license strings)
R-core had recently (with our urging) added an allowable MARC code of rev
to persons in the Authors@R
field of description, indicating individuals or organizations that reviewed the package. It would be great if codemetar could recognize this and put information about these persons in the appropriate JSON-LD fields. (My understanding is that this would be a reviewedBy
field).
Note that there's not a standard way to link to the reviews themselves, though the developing convention for rOpenSci is to include the URL in the comment field of the person()
call, e.g.,
Authors@R: c(person("Sam", "Albers", email = "[email protected]", role = c("aut", "cre")),
person("David", "Hutchinson", email = "[email protected]", role = "ctb"),
person("Province of British Columbia", role = "cph"),
person("Luke", "Winslow", role = "rev",
comment = "Reviewed for rOpenSci, see <https://github.com/ropensci/onboarding/issues/118>"),
person("Laura", "DeCicco", role = "rev",
comment = "Reviewed for rOpenSci, see <https://github.com/ropensci/onboarding/issues/118>")
)
In the package-level documentation, the role "rev" doesn't seem to be translated. https://github.com/ropensci/codemetar/blob/master/man/codemetar-package.Rd#L51
I'm pretty sure it's not a problem in this package, but where does it come from? Have only looked quickly at roxygen2
and couldn't see anything amiss.
Most of my repositories have a master (even with current release or ahead but working) and devel branch (where I break stuff and move over to master).
In the codemeta.json file though it seems that the releaseNotes
and readme
fields point to the devel branch and not the master.
I've tried to figure out if I've set something in my git configuration somewhere or how this happens, but am not finding anything.
Example from https://github.com/ropensci/bomrang
"releaseNotes": "https://github.com/ropensci/bomrang/blob/devel/NEWS.md",
"readme": "https://github.com/ropensci/bomrang/blob/devel/README.md",
It should instead be:
"releaseNotes": "https://github.com/ropensci/bomrang/NEWS.md",
"readme": "https://github.com/ropensci/bomrang/README.md",
Is this something I need to correct in my settings somewhere?
thanks
Sorry if I missed this in the docs.
How could one ensure that codemeta.json is updated often enough?
Could one hope to add a check to devtools::release
checks? E.g. if "codemeta.json" exists the check would be to look when it was updated or just to ask the developper whether they updated it.
Could the first use of codemetar::write_codemeta
create a pre-commit hook like the one that usethis
has for README.Rmd vs README.md (comparing codemeta.json update time to these of the files it uses as sources of information)
See https://github.com/Crossref/open-funder-registry
Function to support adding funder information, including looking up funder ids from the open funder registry (RDF file).
Also related to force11/force11-sciwg#36, #3 and #20.
To practice shell programming, I created this gist to clone an R package repo, generate a . EDIT: deleted for the reasons given below.codemeta.json
, and prepare a pull/merge request
I was wondering whether that kind of approach would be OK? Productive procrastination, but also kind of cold-calling.
Or, is the consensus rather, that codemeta.json
generation should happen within the workflows that people already use and as automatically as possible?
@mbjones @gothub For some reason, the codemeta repos don't show up for me in appveyor. No idea why, I see appveyor is already enabled for the codemeta organization generally, (maybe I just have too many repos for appveyor to track). Can you take a look and flip the switch to turn appveyor on? Thanks!
Also rebuild / review vignettes, Readme.
I couldn't find its source, sorry.
"it's" in "without any change to it’s information content."
"becuause" in "becuase all nodes"
I encountered an error when I first ran install() on the cloned package. The dependency package V8 failed to install. However, the error message was helpful in specifying the version I needed to install with homebrew.
yup, credit to Jeroen an the V8 package there.
Running build_vignettes() at first failed, with Error: processing vignette 'codemeta-parsing.Rmd' failed with diagnostics: could not find function "as_tibble". Adding library(tibble) to that vignette’s library statements should solve that problem, as building the vignettes after loading that package succeeded. This also caused devtools::check() to fail.
Good catch, fixed.
Going through the package’s documentation alongside the ROpenSci Packaging Guide’s Documentation section, I made the following observations:
There’s no top-level documentation. The package should use a codemetar.R file with package-level help, so that ?codemetar leads to something (e.g. https://github.com/tidyverse/dplyr/blob/master/R/dplyr.r).
Added now, thanks! See #25
Help is present for a function add_context() which isn’t exported, whereas sister function drop_context() isn’t exported. If one is exported, the other probably should be, and vice versa. If both are exported, they should have examples added to their documentation.
Both are now exported and have examples
Help is present for functions crosswalk() and crosswalk_transform(), the latter of which isn’t exported. The former is an exported function, but isn’t well-documented — the “Description” field is just “crosswalk”, and doesn’t sufficiently explain what the function does. Maybe it’s not supposed to be exported, but if it is, its purpose should be explained more fully.
The write_codemeta() function should have a fuller ”Description” statement, but is otherwise sufficient. Specifically, it should explain how its usage differs from create_codemeta().
Added to the description. (#26)
The vignettes and README.md file are generally rich, and a great source for expanding a user’s understanding of the package’s purpose and potential use cases. However, for entirely novice users, a reference to some introductory material might be helpful (per the package’s previous review).
Indeed! Substantial background material has been added along the lines advised in Anna's review.
There is no
CONTRIBUTING
file, and theREADME.md
doesn’t contain community guidelines.
Added.
However, the
DESCRIPTION
file specifies the requisite fields.
The package’s core functionality successfully executed when I tested it on a few different packages (i.e. write_codemeta() wrote the appropriate JSON-LD file.
👍
The paper.md “References” section is currently empty.
The only refs are URLs already in the description, so I've just dropped that section; though would appreciate more clarity on that from anyone more familiar with JOSS.
The file R/add_metadata.R seems to be a vestige of development, and contains a bit of commented-out code. If it’s not needed, it should probably be deleted.
It's more of a reminder to me for a future feature to create codemeta json-ld file from scratch, rather than extracting it from R package locations. Would potentially be useful to document other types of software.
I’m not sure what the purpose of R/sysdata.rda file is (and from a few brief searches of the code, could not find a reference to it). If it serves none, it should probably be removed.
Removed (was just a cached copy of the crosswalk, but decided the online update was best). (#29)
The exported functions write_codemeta() and create_codemeta() both follow a nice verb-subject naming scheme, and also reference the package name. It seems like the function codemeta_validate() could be renamed to validate_codemeta() to be consistent with these, though that is just personal preference.
Yeah, I was trying to following Hadley on this (e.g. see xml2
). The standard advice is that functions go from general to specific (i.e. start with package namespace), which is best for tab completion. The read
/ write
functions are usually done backwards just to match R's other read/write functions. So I guess create_codemeta
is backwards.
The
crosswalk()
function looks like it requires internet access. This should probably be noted somewhere.
added to the function's documentation.
Create a simple function to release package to DataONE. Could include the following features:
git tag
if repo is a git repository?codemeta includes fields that don't exist in DESCRIPTION (e.g. author ORCIDs). Currently the interface just writes out a codemeta.json
file directly, rather than providing an intermediate R object and utility functions for modifying (or extracting) metadata from it.
Ideally create_codemeta()
would return an S3 object that could be modified with appropriate helper functions for adding these additional fields. A clever implementation could include the ability to infer some of these fields from sources other than the DESCRIPTION file (e.g. one could image querying GitHub for a list of contributors, and helping add these contributors to the codemeta.json (and optionally, to the DESCRIPTION?) with appropriate flags (e.g. some GitHub contributors might be mustBeCited = FALSE
).
When R generates a bibtex citation entry for a package, it fills in the bibtex field title
using the format Package
: Title
, not just Title
from the Description field. This implies that the citation title should not be mapped to the Title
field alone, but should always include the package name as well, e.g.
@Manual{,
title = {dplyr: A Grammar of Data Manipulation},
author = {Hadley Wickham and Romain Francois},
year = {2016},
note = {R package version 0.5.0},
url = {https://CRAN.R-project.org/package=dplyr},
}
(Note also that only Authors@R
developers with [aut]
get listed).
I bring this up because I think R packages are the only software metadata standard that has both a package name and a title, and I think this has implications for mapping. Schema.org doesn't actually have any notion of title
in this sense (https://schema.org/title strictly refers to a job title), and what one would call the title of a ScholarlyArticle, Book, or anything else is referred to in Schema.org as name
.
I believe this suggests the following mapping for R Description:
schema:name
- "Package: Title"
schema:identifier
- "Package"
and not, as the crosswalk suggests, schema:name - Package
and dcterms:title - Title
. Thoughts?
The validation vignette points readers to the official JSON-LD framing documentation, while some of them might need a more intro-level documentation. I know I did. 😸 When googling "JSON-LD framing" all I could find were these posts that helped because their authors were at the beginning of their "framing journey". 😉 But I did not find any real beginner-friendly intro.
I also struggled with frame vs. schema but was helped by @cboettig's explanation here and here.
So I've thought of an analogy/several analogies that I tested on @annakrystalli (thanks, Anna!). I'm not sure where they could live but they could in any case be linked from the validation/parsing/framing vignette.
A frame is like a grocery list. Say you want vanilla yoghurt. Depending on the supermarket it can be in the dairy/yoghurt aisle or in the dairy aisle or, weirdly, in a vanilla aisle (different levels of nesting/organizations). Your frame/grocery list just says "vanilla yoghurt" and your personal shopper jsonld::jsonld_frame
finds vanilla yoghurt for you no matter the supermarket/the level of nesting/organization.
Anna said that URIs are barcodes!
I'm not sure what not setting explicit to FALSE would be. A hungry personal shopper who'd bring you many more things than what you said you needed?
A schema is a floor plan for a supermarket, and a validator is a manager who checks that products are in their place based on that floor plan.
Some projects / applications may wish to use only terms supported in schema.org SoftwareSourceCode type. Consider a utility that can generate just this codemeta information.
Having IDs for author nodes is very convenient in JSON-LD (e.g. easy way to refer to the same person across a suite of packages in a codemeta.json compendium).
Currently codemetar
allows a user to specify an ORCID id as a comment
field to a person
object in the Authors@R
string. Clearly this is is something of a hack, and we need a more natural solution.
This issue is particularly thorny in light of #9, since simply editing the JSON by hand to add an ORCID would get overwritten if the author node is regenerated from DESCRIPTION.
My preference, when available, is to use https over http.
With CRAN specifying that ORCID should be specified as `comment = c(ORCID = "XXXX-XXXX-XXXX-XXXX) codemetar converts the ORCID properly to the URL for the profile but uses only http, not https where it is available.
Is is worthwhile to consider using https?
See https://github.com/r-lib/desc, from the awesome @gaborcsardi and co
A standard program interface would parse codemeta files, probably transform them into a standard tree stucture using jsonld::jsonld_frame()
(see codemeta/codemeta#128), and perhaps provide helper utilities for extracting data of interest; e.g. generating data.frame representations of metadata over a large set of codemeta files (though maybe that's best left to a vignette documenting a basic json parsing strategy with purrr
)
In the index of vignettes, I think the sequence would work better as:
- intro
- Translating between schema using JSON-LD
- Validation in JSON-LD
- Parsing CodeMeta Data
and include a really short description of what each vignette contains (like a sentence)
CRAN's Writing R Extensions clearly states that we can add additional fields:
There is no restriction on the use of other fields not mentioned here (but using other capitalizations of these field names would cause confusion). Fields Note, Contact (for contacting the authors/developers9) and MailingList are in common use. Some repositories (including CRAN and R-forge) add their own fields.
And from CRAN Policies
Additional DESCRIPTION fields could be used for providing email addresses for contacting the package authors/developers (e.g., ‘Contact’), or a URL for submitting bug reports (e.g., ‘BugReports’).
However, using an unrecognized field does throw a NOTE (though you'll need to be using devtools::check(check_version=TRUE)
or devtools::release()
to see it)
checking CRAN incoming feasibility ... NOTE
Maintainer: ‘Carl Boettiger <[email protected]>’
Unknown, possibly mis-spelled, fields in DESCRIPTION:
‘keywords’
The list of recognized terms appears in the ``tools:::.get_standard_DESCRIPTION_fields()`, see source: https://github.com/wch/r-source/blob/trunk/src/library/tools/R/utils.R#L1179. (Thanks @kevinushey).
Given the policies above, and as the NOTE suggests, this CRAN check may only be as a hack to check spelling in the DESCRIPTION file, so it's not clear whether or not CRAN would object to the term above. OTOH, it would certainly give us pause to start all rOpenSci packages throwing an extraneous note to indicate provider: https://rOpenSci.org
or something similar.
Alternatively, codemetar
users could just enter additional fields manually into the codemeta.json
or through dedicated R functions, but this does seem a bit clunky.
It is a great addition to rOpenSci and general movements towards both linked data and better curation, visibility and citability of software. Overall, the functions are smooth and easy to use. I think the most difficult part of the package is getting your head round the concepts. There is good
codemeta
andJSON-LD
documentation to whichcodemetar
documentation links to (I love Manu's videos!). I also realise that the purpose of package documentation is mainly to demonstrate use and not necessarily educate on the background. However I feel that a small amount of extra explanation and jargon busting could really help users get their head round what's going on and why it's such an important and awesome initiative!
Yay! Thanks so much. Yes, have tried to do this a bit more now, see below.
As mentioned above, here are some suggestions on how I feel the readme could be more informative and self contained:
I would have loved a short background section in the intro that could include a few key definitions (which could then be used consistently throughout) and a touch of historical context: eg. (sorry this are probably rubbish definitions but hopefully you get the gist!)
- Linked data: data that has a context which links the fields used in the data to an online agreed standard.
- context: mapping between a data source fields and a schema. Usually schema.org but domain specific ones also (eg codemeta)
Briefly explain the difference between the data types (ie json, codemeta json-ld, codemeta r list) so that users can be cognisant of them when using package.
Describe how project is the convergence of a number of initiatives:
- Schema.org: the main initiative to link data on the web through a catalogue of standard metadata fields
- codemeta: a later inititative to formalise the metadata fields included in typical software metadata records and introduce important fields that did not have clear equivalents. The codemeta crosswalk provides an explicit map between the metadata fields used by a broad range of software repositories, registries and archives
- JSON-LD: the data type enabling crosswalk through embedding contexts into the data itself.
- codemetar: mapping fields used to describe packages in r to the standard fields agreed in schema & codemeta <- consesus schema
Excellent suggestions!! Added all of this to both the (newly create) top-level documentation (#25) and the intro vignette.
Let's just say this package could act as a gateway drug to JSON-LD, it certainly has for me!
❤️ ❤️ ❤️
function documentation
codemeta_validate
- I think the description might be incomplete? (ie ...verifying that the result.... matches or conforms to sth?).
codemeta
argument description only mentions path/filename. The function, however, is geared up to accept aJSON-LD
, functionality demonstrated in the crosswalking vignette. So should probably be listed in the help file also.
Added to docs.
crosswalk
- Description in help file could be a touch more informative, eg: Crosswalk between different metadata fields used by different repositories, registries and archives. For more details see here.
- Also suggest that where the crosswalk table is mentioned, a link to it be supplied. e.g.
from
-> the corresponding column name from the crosswalk table.
Added, thanks!
- My understanding is that what is refered to here as a JSON list is the same as what is refered to in
create_codemeta
documentation as codemeta object (list)? I think standardisation of the terminology refering to different datatypes throughout documentation would be best.
Yeah, agree this is confusing. Technically they are both list
-class objects, but here it doesn't have to be a list representing 'codemeta' json, it could be other json.
This kind of relates to the bigger issue of how to refer to these things: JSON data in R can be in an external file (path or url), a string, or an R-list format. We try and make most of the functions agnostic to this, but it's still confusing.
Overall function documentation comments:
I wonder if it could be useful to make a distinction in the function names functions that work with codemeta JSON and codemeta r list objects. Let's say we associated
codemeta
with the JSON-LD format andcodemetar
with r lists. Using these associations in function names could make it a bit more explicit to the user. E.g. Functionswrite_codemeta
andvalidate_codemeta
would remain the same because they either output or work on JSON-LD formats butcreate_codemeta
could becomecreate_codemetar
to indicate that the output is an r list? I only mention it because it confused me a little at times but appreciate this is no biggie.
Yup, we've tried to clarify the write
/ create
distinction better in the function descriptions now.
Vignettes
Good scope and demonstration of utility of package. A few suggestion that as a novice to the concepts would have helped me a little.
In the index of vignettes, I think the sequence would work better as:
- intro
- Translating between schema using JSON-LD
- Validation in JSON-LD
- Parsing CodeMeta Data
and include a really short description of what each vignette contains (like a sentence)
Done! Note: the only way I could find to order vignettes was to prefix letters A, B, C, D to the names. Numeric codes, 01, 02, 03, throw a WARNING
in check. Any other solution would be greatly appreciated.
Intro
Really like ability to add through the DESCRIPTION file. Where you mention
See the DESCRIPTION file of the codemetar package for an example.
it could include an actual link to the DESCRIPTION file.
Done! (In README as well). *Side note: this section now also describes the CRAN-approved / compliant format for adding arbitrary schema.org terms into DESCRIPTION files).
Translating between schema using JSON-LD
I found this paragraph confusing and have suggested how it might make the concept clearer. Also I think URI needs a short definition.
Unrecognized properties are dropped, since there is no consensus context into which we can expand them. Second, the expanded terms are then compacted down into the new context (Zenodo in this case.) This time, any terms that are not part of the
codemetaZenodo context are kept,but not compacted,since they still have meaningful contexts (that is, full URIs) that can be associated with them, but not compacted:
Link added and paragraph revised, thanks!
Validation in JSON-LD
Motivating example slightly confusing because while this sentence mentions an error being thrown up, all that is returned in r is a
NULL
.However, there’s other data that is missing in our example that could potentially cause problems for our application. For instance, our first author lists no affiliation, so the following code throws an error:
Then when framing, the value to associate with the field is data missing is also
NULL
. I appreciate that the real value in the process is that the JSON-LD now contains and explicit field that contains a@null
value but it might be worth spelling it out because the actual output in r pre & post framing are the same, ieNULL
.
Yeah, clearly this isn't a good motivating example since R is basically already inferring the NULL correctly, just like you say. I think the other cases are a bit clearer so I've just removed this case.
A few super minor typos which I've corrected and happy to send through a pull request?
That would be awesome, thanks!
I found out (well should have known) on a plane when I planned to work on this review that the functions require an internet connection. It actually got me wondering whether
internet-connection
might be a good thing to list more generally as part of software requirements?
Yeah, though this really applies primarily to the crosswalk function (where it is now mentioned) and then the json-ld functions from the vignettes; which aren't technically package functions. The JSON-LD functions only need the internet if/when the context
file is given as a remote URL; technically one can just embed the literal context file in the context
element, and then there's no need to resolve anything.
crosswalk
While it is relatively straight forward to get the crosswalk .csv, I feel it'd be good to be able to access information through the package. Here are some suggestions based on what I would find personally useful:
- At the very least to have a function that just fetches the
.csv
.- Moving beyond that it could be useful to have a few helper functions to quickly interrogate it?
+ - I'd find it quite useful to quickly get the options for argumentsto
andfrom
incrosswalk
. > Could be cool to have another function egcrosswalks
that prints the available crosswalk column options, eg:
library(readr)
crosswalks <- function(){
df <-
readr::read_csv(
"https://github.com/codemeta/codemeta/raw/master/crosswalk.csv",
col_types = cols(.default = "c"))
names(df)[!names(df) %in% c("Parent Type", "Property", "Type", "Description")]
}
crosswalks()
#> [1] "codemeta-V1"
#> [2] "DataCite"
#> [3] "OntoSoft"
#> [4] "Zenodo"
#> [5] "GitHub"
#> [6] "Figshare"
#> [7] "Software Ontology"
#> [8] "Software Discovery Index"
#> [9] "Dublin Core"
#> [10] "R Package Description"
#> [11] "Debian Package"
#> [12] "Python Distutils (PyPI)"
#> [13] "Trove Software Map"
#> [14] "Perl Module Description (CPAN::Meta)"
#> [15] "NodeJS"
#> [16] "Java (Maven)"
#> [17] "Octave"
#> [18] "Ruby Gem"
#> [19] "ASCL"
#> [20] "DOAP"
#> [21] "Wikidata"
- I also found the non-exported function
crosswalk_table
quite useful (some commented out code in there). Other's might too?
Nice. But your suggestion below is even better, so I've actually just renamed the function you give below as crosswalk_table
, since it serves both that same role (if to=NULL
), and the purpose you illustrate!
- But I feel the most useful would be to be able to narrow down field mappings between particular repositories of interest. So building on the
crosswalk_table
function, I would probably find the following functions quite useful:
library(readr)
crosswalk_map <- function(from, to,
full_crosswalk =
"https://github.com/codemeta/codemeta/raw/master/crosswalk.csv",
trim = FALSE){
df <-
readr::read_csv(full_crosswalk,
col_types = cols(.default = "c"))
df <- df[c("Property", from, to)]
if(trim) df <- df[!is.na(df[,from]),] # trim to `from` argument fields
df
}
crosswalk_map(from = "GitHub", to = c("Zenodo", "Figshare"), trim = T)
#> # A tibble: 11 x 4
#> Property GitHub Zenodo Figshare
#> <chr> <chr> <chr> <chr>
#> 1 codeRepository html_url relatedLink relatedLink
#> 2 programmingLanguage languages_url <NA> <NA>
#> 3 downloadUrl archive_url <NA> <NA>
#> 4 author login creators <NA>
#> 5 dateCreated created_at <NA> <NA>
#> 6 dateModified updated_at <NA> <NA>
#> 7 license license license License
#> 8 description description description/notes Description
#> 9 identifier id id <NA>
#> 10 name full_name title Title
#> 11 issueTracker issues_url <NA> <NA>
Great suggestion, this is now the (exported) function, crosswalk_table()
.
- BTW, it seems that
example_json_r_list %>% crosswalk("R Package Description")
is the only way to get from aJSON-LD r list
to aJSON-LD
in r, astoJSON
doesn't work. While its great that there is a way to do it, it almost feels a bit of a hack. At the very least I feel it should be included as an example in the vignette, so users are aware of it but I'm wondering if an explicit function for that might also be useful?
Hmm, toJSON
should work: create_codemeta() %>% toJSON()
seems happy for me. Can you give an example? Moving between list representation, string representation & external file representation is a bit confusing, but should be handled okay by the standard JSON & JSON-LD tools and not be specific to any codemeta
stuff...
write_codemeta
- When writing the file into a package (ie when
DESCRIPTION
is detected), adding"codemeta.json"
to.Rbuildignore
assumes that the user has not changed thepath
. While it is advised in the help file to leave as default, as the user can change it, there could theoretically be a situation where the user has called it something else but the function has written"codemeta.json"
to.Rbuildignore
. Just wondering whether that would cause any problems downstream?
Interesting question. The function simply uses the devtools
routine to add the file to the .Rbuildignore
list, which is already used by many other devtools
functions, so I figured it was better to be consistent with that function than attempt a custom solution. If this really is an issue I imagine it would first be encountered and patched upstream and we'd get the benefits that way.
- Also when supplying a
JSON-LD r list
instead of a path topkg
, function works but throws a warning:the condition has length > 1 and only the first element will be used
Thanks, should be fixed now.
Compliance with rOpenSci Packaging Guide
Contributing
- No contributing.md or anything about contributing in the README. However does include a good code of conduct.
Added
You can see more about my review here
Sorry if I've missed this in the docs! I was wondering whether it'd make sense to have a function that'd use the topics defined in the codemeta.json to populate Github topics (https://developer.github.com/v3/repos/#list-all-topics-for-a-repository & https://developer.github.com/v3/repos/#replace-all-topics-for-a-repository)? And whether such a function should live in this package?
If one creates a fork of a package like I've done now for codemetar
and modifies the DESCRIPTION to add a contributor or dependency one will have to update codemeta.json because it's best practice and also because of the pre-commit hook cf #59
But if one does so, the codemeta.json will use the GitHub links of the fork.
Would it make sense to add some guidance about this to the docs, and maybe some code? My ideas
Have an option "doc_fork" or so in create_codemeta
, FALSE by default.
It'd only change GH links if TRUE, otherwise it'd let them as they are, this way if one decides to document a fork, one would need to force it, and if one just wants to update codemeta.json in the context of a Pull Request, one wouldn't need to worry about the whole thing.
@noamross Really like your suggestion about including a bit more about the review. Trying to think how best to do this using existing vocabularies, since it makes interoperability of data so much easier.
"review": {
"@type": "Review",
"url": "https://github.com/ropensci/onboarding/issues/130",
"provider": "http://ropensci.org"
}
I'm not sure what the right field is to indicate the status of the review (e.g. the in review
, accepted
etc), maybe status
would be the term? Note that http://schema.org/Review defines the property reviewRating
but that's obviously not the context we really have in mind here.
I've indicated ropensci
as the provider (could be more verbose and indicate ropensci
is an organization), note that it would be natural to include the review author & editor here too, (not clear if you'd list all, and of course that would be harder to scrape from the svg
badge...)
Wanted to cc @mfenner on this too, Martin, any thoughts at a common vocabulary for describing reviews of scholarly works? Has this come up at all on your end?
The write_codemeta() function should have a fuller ”Description” statement, but is otherwise sufficient. Specifically, it should explain how its usage differs from create_codemeta().
We could better distinguish between these two in the function descriptions. (write writes an external file, create creates an R object)
The installation instructions say:
# install.packages("devtools")
devtools::install_github("codemeta/codemetar")
Perhaps you mean, "ropensci/codemetar"?
Also, since the package is on CRAN now, you might want to add CRAN installation instructions.
As a non native speaker when I see "frame" I mostly think of data.frames. Maybe I'm not the one reasoning this way. 🙈
I'd suggest adding a few lines about this at the beginning of the parsing vignette. Saying one uses the concept of framing the JSON-LD to put it into a data.frame using a JSON-LD frame. A flowchart might even be useful.
It's not obvious how a user should best maintain their codemeta.json
document. The workflow is pretty simple if the user adds no additional terms beyond what write_codemeta()
guesses from README and DESCRIPTION file. In this case, a user should simply re-run write_codemeta()
prior to any git tag release / CRAN release, etc.
Things are less obvious though if we start adding extra metadata (such as keywords
) to the codemeta.json
, as illustrated in the README. Then once, say, the DESCRIPTION file is updated, what is the best workflow to update codemeta.json
to capture these changes (e.g. a new contributor, or new dependency) without loosing the additional elements of codemeta.json
?
One could have an R script do this before release, e.g. a update_meta.R script reading:
library("codemetar")
cm <- create_codemeta(".")
cm$keywords <- list("metadata", "ropensci")
write_codemeta(cm)
but it seems a bit clumsy to have such an R script be the canonical document to update keywords. An ideal workflow would allow a user to edit the codemeta.json
by hand, which is perhaps the most obvious way to extend the metadata.
Ideally codemetar
could parse the DESCRIPTION and detect and adopt any changes in the resulting fields relative to any existing codemeta.json
, but would otherwise defer to the existing metadata rather than lose it. (Should just require starting with any existing codemeta.json
)
Not sure why, but sometime in the last week, any code that needs to perform JSON-LD operations using the DOI-based context file seems to fail. (e.g. codemeta::crosswalk()
functions).
@mbjones @gothub Would you know if anything changed with the handling of the DOI header or what not? I also don't think we ever got that working with DOI-based context files on json-ld playground, but it was definitely working in R a week ago (as evidenced by that travis log!).
Some R packages will provide additional citation information to a Software paper in a dedicated CITATION file, which R knows to parse when asked for citation("packageName")
. Ideally, write_codemeta()
should check for such a file and automatically parse it for the relatedPublication
field.
There’s no top-level documentation. The package should use a codemetar.R file with package-level help, so that ?codemetar leads to something (e.g. https://github.com/tidyverse/dplyr/blob/master/R/dplyr.r).
Currently uses stability badges for package status, http://tidyverse.org/lifecycle badges probably a better fit!
DESCRIPTION files of packages in development tend not to have dates. Installed packages do have dates:
cm <- create_codemeta("testthat")
cm$datePublished
"2016-04-23 08:37:40"
If a user is generating the codemeta.json
in the development version (e.g. with create_codemeta(".")
), do we want to just use the current date? In general it would be nice to have at least one date in the metadata to have an idea of how old the package / metadata is.
ideally we'd also keep track of things like dateCreated, (e.g. in the DESCRIPTION file?) , though I suppose technically that would refer to the date the version was created, rather than the pkg project as a whole?
I get this error when I attempt to convert a codemeta file (generated from a package DESCRIPTION) to Zenodo format:
library("codemetar")
crosswalk(
x = "https://github.com/ecohealthalliance/fasterize/blob/master/codemeta.json",
from = "codemeta-V1",
to = "Zenodo")
#> Error in context[[1]][[original_term]]: no such index at level 1
devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.4.4 (2018-03-15)
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> tz America/New_York
#> date 2018-03-23
#> Packages -----------------------------------------------------------------
#> package * version date source
#> backports 1.1.2 2017-12-13 CRAN (R 3.4.3)
#> base * 3.4.4 2018-03-15 local
#> codemetar * 0.1.4 2018-02-12 CRAN (R 3.4.3)
#> compiler 3.4.4 2018-03-15 local
#> curl 3.1 2017-12-12 CRAN (R 3.4.3)
#> datasets * 3.4.4 2018-03-15 local
#> devtools 1.13.5 2018-02-18 CRAN (R 3.4.3)
#> digest 0.6.15 2018-01-28 CRAN (R 3.4.3)
#> evaluate 0.10.1 2017-06-24 CRAN (R 3.4.1)
#> git2r 0.21.0 2018-01-04 CRAN (R 3.4.3)
#> graphics * 3.4.4 2018-03-15 local
#> grDevices * 3.4.4 2018-03-15 local
#> hms 0.4.1 2018-01-24 CRAN (R 3.4.3)
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0)
#> jsonld 1.2 2017-04-11 cran (@1.2)
#> jsonlite 1.5 2017-06-01 CRAN (R 3.4.0)
#> knitr 1.20 2018-02-20 cran (@1.20)
#> magrittr 1.5 2014-11-22 CRAN (R 3.4.0)
#> memoise 1.1.0 2017-04-21 CRAN (R 3.4.0)
#> methods * 3.4.4 2018-03-15 local
#> pillar 1.2.1 2018-02-27 CRAN (R 3.4.3)
#> pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.0)
#> R6 2.2.2 2017-06-17 CRAN (R 3.4.0)
#> Rcpp 0.12.16 2018-03-13 cran (@0.12.16)
#> readr 1.1.1 2017-05-16 CRAN (R 3.4.0)
#> rlang 0.2.0.9000 2018-03-15 Github (tidyverse/rlang@1b81816)
#> rmarkdown 1.9.5 2018-03-21 Github (rstudio/rmarkdown@b73f4ce)
#> rprojroot 1.3-2 2018-01-03 CRAN (R 3.4.3)
#> stats * 3.4.4 2018-03-15 local
#> stringi 1.1.7 2018-03-12 cran (@1.1.7)
#> stringr 1.3.0 2018-02-19 cran (@1.3.0)
#> tibble 1.4.2 2018-01-22 CRAN (R 3.4.3)
#> tools 3.4.4 2018-03-15 local
#> utils * 3.4.4 2018-03-15 local
#> V8 1.5 2017-04-25 CRAN (R 3.4.3)
#> withr 2.1.2 2018-03-15 CRAN (R 3.4.4)
#> yaml 2.1.18 2018-03-08 cran (@2.1.18)
The error occurs in get_crosswalk_context
, and I think it happens when the loop hits NA values in the df
(relatedIdentifier
). Note also that dateCrated
should probably be dateCreated
:
Property | codemeta-V1 |
---|---|
codeRepository | codeRepository |
programmingLanguage | programmingLanguage |
downloadUrl | downloadLink |
operatingSystem | operatingSystems |
softwareRequirements | depends |
author | agents |
citation | relatedLink |
copyrightHolder | agents [role=copyrightHolder] |
creator | agent |
dateCreated | dateCrated |
dateModified | dateModified |
datePublished | datePublished |
keywords | controlledTerms |
license | licenseId |
publisher | publisher |
version | version |
description | description |
identifier | identifier |
name | name |
url | URL |
affiliation | affiliation |
identifier | identifier |
softwareSuggestions | suggests |
maintainer | uploadedBy |
contIntegration | contIntegration |
buildInstructions | buildInstructions |
developmentStatus | developmentStatus |
embargoDate | embargoDate |
funding | funding |
issueTracker | issueTracker |
referencePublication | relatedPublications |
readme | readme |
NA | relatedIdentifer |
NA | relatedIdentiferType |
NA | relationshipType |
NA | title |
NA | namespace |
NA | role |
NA | roleCode |
NA | softwarePaperCitationIdenifiers |
I also note that finding that my from
argument had to be "codemeta-V1"
took some digging - its not in any of the vignettes or docs and I had to look at the raw crosswalk table. (The same that "Zenodo" had to be capitalized.) Perhaps this could be made visible with a function like crosswalks_available()
which just lists the table column headers and is prominent throughout vignettes and docs?
Line 113 reads:
... Almost any additional codemeta field (see
codemetar:::additional_codemeta_terms
for a list) and can be added to and read from the DESCRIPTION into acodemeta.json
file.
This sentence is not a proper sentence as far as I can tell. Perhaps you meant to remove the "and"?
It would be great to have a function which could operate on any github repo to generate at least a rough codemeta.json
file from whatever information was returned by the GitHub API (basically what the current import to zenodo does).
@mbjones @gothub Just wanted to get your input on the choice of software identifier when generating codemeta.json from an R DESCRIPTION file. According to our crosswalk, the identifier for an R package is the Package
field of the DESCRIPTION; e.g. the package name (without a version). Maybe we already went over this, but it seems like this isn't ideal -- e.g. assuming the package is archived on KNB or Zenodo, I'd expect this to be the DOI.
In the codemetar
implementation, I've allowed the function write_codemeta()
to take an id
as an argument. (Currently it's generating a UUID, which is probably silly -- perhaps the default should be the package name, unless a DOI is provided?) As always, there's a bit of a catch-22 in this process, in that we'd like to generate codemeta.json
before uploading to Zenodo / KNB, but in the simplest workflow we only get the DOI after the upload is complete (Though I think you can reserve the DOI ahead of time with KNB; right? so in that case at least the package could handle this negotiation automatically I suppose). Thoughts?
Support all additional / arbitrary codemeta fields via DESCRIPTION file? Easy for all top-level types, though additional author-level metadata (affiliation, address) is more tricky to add in this way.
Here are all the properties in the codemeta context, indicating which ones we do and do not have a mechanism for specifying via codemetar
:
in the latest installed from github (at this commit https://github.com/ropensci/codemetar/tree/c371dbbeb080992a1cfd596cb0354c983d6f285e ) i was getting bad URLs, at least these two. Diff from running codemetar::write_codemeta()
in fulltext
base dir
- "releaseNotes": "https://github.com/ropensci/fulltext/blob/master/NEWS.md",
- "readme": "https://github.com/ropensci/fulltext/blob/master/README.md",
- "fileSize": "3775.796KB"
+ "releaseNotes": "https://github.com/ropensci/fulltext.git/blob/master/NEWS.md",
+ "readme": "https://github.com/ropensci/fulltext.git/blob/master/README.md",
+ "fileSize": "3776.894KB",
+ "developmentStatus": "active"
}
Unless the .git
is supposed to be there for some reason?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.