Giter Site home page Giter Site logo

ropensci / codemetar Goto Github PK

View Code? Open in Web Editor NEW
66.0 10.0 21.0 1.65 MB

an R package for generating and working with codemeta

Home Page: https://docs.ropensci.org/codemetar

R 97.88% TeX 1.69% Shell 0.42%
metadata json-ld codemeta r rstats r-package peer-reviewed

codemetar's Issues

Embed Codemeta JSON-LD in README

Codemeta information should be embedded in README output

Add a function that could be called invisibly (i.e. echo=FALSE, results="asis") in the README.Rmd which would embed the codemeta information inside <script> HTML element. This way the information could be discovered by Google etc when the package is built on CRAN and/or a pkgdown website.

Generating reviewer metadata

R-core had recently (with our urging) added an allowable MARC code of rev to persons in the Authors@R field of description, indicating individuals or organizations that reviewed the package. It would be great if codemetar could recognize this and put information about these persons in the appropriate JSON-LD fields. (My understanding is that this would be a reviewedBy field).

Note that there's not a standard way to link to the reviews themselves, though the developing convention for rOpenSci is to include the URL in the comment field of the person() call, e.g.,

Authors@R: c(person("Sam", "Albers", email = "[email protected]", role = c("aut", "cre")),
    person("David", "Hutchinson", email = "[email protected]", role = "ctb"),
    person("Province of British Columbia", role = "cph"),
    person("Luke", "Winslow", role = "rev", 
            comment = "Reviewed for rOpenSci, see <https://github.com/ropensci/onboarding/issues/118>"),
    person("Laura", "DeCicco", role = "rev", 
            comment = "Reviewed for rOpenSci, see <https://github.com/ropensci/onboarding/issues/118>")
    )

releaseNotes and readme fields point to devel branch

Most of my repositories have a master (even with current release or ahead but working) and devel branch (where I break stuff and move over to master).

In the codemeta.json file though it seems that the releaseNotes and readme fields point to the devel branch and not the master.

I've tried to figure out if I've set something in my git configuration somewhere or how this happens, but am not finding anything.

Example from https://github.com/ropensci/bomrang

  "releaseNotes": "https://github.com/ropensci/bomrang/blob/devel/NEWS.md",
  "readme": "https://github.com/ropensci/bomrang/blob/devel/README.md",

It should instead be:

  "releaseNotes": "https://github.com/ropensci/bomrang/NEWS.md",
  "readme": "https://github.com/ropensci/bomrang/README.md",

Is this something I need to correct in my settings somewhere?

thanks

include more guessable fields

  • downloadUrl -- github zip address? for release version only?
  • releaseNotes - NEWS / NEWS.md (should be URL. to GitHub? CRAN?)
  • fileSize - of what? the .zip archive? built package? whole repo (i.e. with .git history)?
  • readme -- github URL, CRAN URL?
    ...

codemetar and releases

Sorry if I missed this in the docs.

How could one ensure that codemeta.json is updated often enough?

  • Could one hope to add a check to devtools::release checks? E.g. if "codemeta.json" exists the check would be to look when it was updated or just to ask the developper whether they updated it.

  • Could the first use of codemetar::write_codemeta create a pre-commit hook like the one that usethis has for README.Rmd vs README.md (comparing codemeta.json update time to these of the files it uses as sources of information)

Bottom-up / crowdsourcing approach?

Also related to force11/force11-sciwg#36, #3 and #20.

To practice shell programming, I created this gist to clone an R package repo, generate a codemeta.json, and prepare a pull/merge request. EDIT: deleted for the reasons given below.

I was wondering whether that kind of approach would be OK? Productive procrastination, but also kind of cold-calling.

Or, is the consensus rather, that codemeta.json generation should happen within the workflows that people already use and as automatically as possible?

enable appveyor to check builds on windows (help needed)

@mbjones @gothub For some reason, the codemeta repos don't show up for me in appveyor. No idea why, I see appveyor is already enabled for the codemeta organization generally, (maybe I just have too many repos for appveyor to track). Can you take a look and flip the switch to turn appveyor on? Thanks!

Typos in framing vignette

I couldn't find its source, sorry.

  • "it's" in "without any change to it’s information content."

  • "becuause" in "becuase all nodes"

Toph review

I encountered an error when I first ran install() on the cloned package. The dependency package V8 failed to install. However, the error message was helpful in specifying the version I needed to install with homebrew.

yup, credit to Jeroen an the V8 package there.

Running build_vignettes() at first failed, with Error: processing vignette 'codemeta-parsing.Rmd' failed with diagnostics: could not find function "as_tibble". Adding library(tibble) to that vignette’s library statements should solve that problem, as building the vignettes after loading that package succeeded. This also caused devtools::check() to fail.

Good catch, fixed.

Going through the package’s documentation alongside the ROpenSci Packaging Guide’s Documentation section, I made the following observations:

There’s no top-level documentation. The package should use a codemetar.R file with package-level help, so that ?codemetar leads to something (e.g. https://github.com/tidyverse/dplyr/blob/master/R/dplyr.r).

Added now, thanks! See #25

Help is present for a function add_context() which isn’t exported, whereas sister function drop_context() isn’t exported. If one is exported, the other probably should be, and vice versa. If both are exported, they should have examples added to their documentation.

Both are now exported and have examples

Help is present for functions crosswalk() and crosswalk_transform(), the latter of which isn’t exported. The former is an exported function, but isn’t well-documented — the “Description” field is just “crosswalk”, and doesn’t sufficiently explain what the function does. Maybe it’s not supposed to be exported, but if it is, its purpose should be explained more fully.

The write_codemeta() function should have a fuller ”Description” statement, but is otherwise sufficient. Specifically, it should explain how its usage differs from create_codemeta().

Added to the description. (#26)

The vignettes and README.md file are generally rich, and a great source for expanding a user’s understanding of the package’s purpose and potential use cases. However, for entirely novice users, a reference to some introductory material might be helpful (per the package’s previous review).

Indeed! Substantial background material has been added along the lines advised in Anna's review.

There is no CONTRIBUTING file, and the README.md doesn’t contain community guidelines.

Added.

However, the DESCRIPTION file specifies the requisite fields.
The package’s core functionality successfully executed when I tested it on a few different packages (i.e. write_codemeta() wrote the appropriate JSON-LD file.

👍

The paper.md “References” section is currently empty.

The only refs are URLs already in the description, so I've just dropped that section; though would appreciate more clarity on that from anyone more familiar with JOSS.

The file R/add_metadata.R seems to be a vestige of development, and contains a bit of commented-out code. If it’s not needed, it should probably be deleted.

It's more of a reminder to me for a future feature to create codemeta json-ld file from scratch, rather than extracting it from R package locations. Would potentially be useful to document other types of software.

I’m not sure what the purpose of R/sysdata.rda file is (and from a few brief searches of the code, could not find a reference to it). If it serves none, it should probably be removed.

Removed (was just a cached copy of the crosswalk, but decided the online update was best). (#29)

The exported functions write_codemeta() and create_codemeta() both follow a nice verb-subject naming scheme, and also reference the package name. It seems like the function codemeta_validate() could be renamed to validate_codemeta() to be consistent with these, though that is just personal preference.

Yeah, I was trying to following Hadley on this (e.g. see xml2). The standard advice is that functions go from general to specific (i.e. start with package namespace), which is best for tab completion. The read / write functions are usually done backwards just to match R's other read/write functions. So I guess create_codemeta is backwards.

The crosswalk() function looks like it requires internet access. This should probably be noted somewhere.

added to the function's documentation.

Implement DataONE upload utility

Create a simple function to release package to DataONE. Could include the following features:

  • Reserve DOI and use it as identifer when releasing the package
  • Optionally, could check for use of a git tag if repo is a git repository?

Implement S3 class & utilities for modifying the codemeta object

codemeta includes fields that don't exist in DESCRIPTION (e.g. author ORCIDs). Currently the interface just writes out a codemeta.json file directly, rather than providing an intermediate R object and utility functions for modifying (or extracting) metadata from it.

Ideally create_codemeta() would return an S3 object that could be modified with appropriate helper functions for adding these additional fields. A clever implementation could include the ability to infer some of these fields from sources other than the DESCRIPTION file (e.g. one could image querying GitHub for a list of contributors, and helping add these contributors to the codemeta.json (and optionally, to the DESCRIPTION?) with appropriate flags (e.g. some GitHub contributors might be mustBeCited = FALSE).

Title metadata

When R generates a bibtex citation entry for a package, it fills in the bibtex field title using the format Package: Title, not just Title from the Description field. This implies that the citation title should not be mapped to the Title field alone, but should always include the package name as well, e.g.

 @Manual{,
    title = {dplyr: A Grammar of Data Manipulation},
    author = {Hadley Wickham and Romain Francois},
    year = {2016},
    note = {R package version 0.5.0},
    url = {https://CRAN.R-project.org/package=dplyr},
  }

(Note also that only Authors@R developers with [aut] get listed).

I bring this up because I think R packages are the only software metadata standard that has both a package name and a title, and I think this has implications for mapping. Schema.org doesn't actually have any notion of title in this sense (https://schema.org/title strictly refers to a job title), and what one would call the title of a ScholarlyArticle, Book, or anything else is referred to in Schema.org as name.

I believe this suggests the following mapping for R Description:

schema:name - "Package: Title"
schema:identifier - "Package"

and not, as the crosswalk suggests, schema:name - Package and dcterms:title - Title. Thoughts?

Framing & schema analogies

The validation vignette points readers to the official JSON-LD framing documentation, while some of them might need a more intro-level documentation. I know I did. 😸 When googling "JSON-LD framing" all I could find were these posts that helped because their authors were at the beginning of their "framing journey". 😉 But I did not find any real beginner-friendly intro.

I also struggled with frame vs. schema but was helped by @cboettig's explanation here and here.

So I've thought of an analogy/several analogies that I tested on @annakrystalli (thanks, Anna!). I'm not sure where they could live but they could in any case be linked from the validation/parsing/framing vignette.

A frame is like a grocery list. Say you want vanilla yoghurt. Depending on the supermarket it can be in the dairy/yoghurt aisle or in the dairy aisle or, weirdly, in a vanilla aisle (different levels of nesting/organizations). Your frame/grocery list just says "vanilla yoghurt" and your personal shopper jsonld::jsonld_frame finds vanilla yoghurt for you no matter the supermarket/the level of nesting/organization.

Anna said that URIs are barcodes!

I'm not sure what not setting explicit to FALSE would be. A hungry personal shopper who'd bring you many more things than what you said you needed?

A schema is a floor plan for a supermarket, and a validator is a manager who checks that products are in their place based on that floor plan.

ORCID ids for authors

Having IDs for author nodes is very convenient in JSON-LD (e.g. easy way to refer to the same person across a suite of packages in a codemeta.json compendium).

Currently codemetar allows a user to specify an ORCID id as a comment field to a person object in the Authors@R string. Clearly this is is something of a hack, and we need a more natural solution.

This issue is particularly thorny in light of #9, since simply editing the JSON by hand to add an ORCID would get overwritten if the author node is regenerated from DESCRIPTION.

Use https for ORCID?

My preference, when available, is to use https over http.

With CRAN specifying that ORCID should be specified as `comment = c(ORCID = "XXXX-XXXX-XXXX-XXXX) codemetar converts the ORCID properly to the URL for the profile but uses only http, not https where it is available.

Is is worthwhile to consider using https?

Implement utilities for working with codemeta files

A standard program interface would parse codemeta files, probably transform them into a standard tree stucture using jsonld::jsonld_frame() (see codemeta/codemeta#128), and perhaps provide helper utilities for extracting data of interest; e.g. generating data.frame representations of metadata over a large set of codemeta files (though maybe that's best left to a vignette documenting a basic json parsing strategy with purrr)

Re-order vignettes & improve their descriptions

In the index of vignettes, I think the sequence would work better as:

  • intro
  • Translating between schema using JSON-LD
  • Validation in JSON-LD
  • Parsing CodeMeta Data

and include a really short description of what each vignette contains (like a sentence)

Should we be using a Non-DESCRIPTION home for additional metadata fields?

CRAN's Writing R Extensions clearly states that we can add additional fields:

There is no restriction on the use of other fields not mentioned here (but using other capitalizations of these field names would cause confusion). Fields Note, Contact (for contacting the authors/developers9) and MailingList are in common use. Some repositories (including CRAN and R-forge) add their own fields.

And from CRAN Policies

Additional DESCRIPTION fields could be used for providing email addresses for contacting the package authors/developers (e.g., ‘Contact’), or a URL for submitting bug reports (e.g., ‘BugReports’).

However, using an unrecognized field does throw a NOTE (though you'll need to be using devtools::check(check_version=TRUE) or devtools::release() to see it)

checking CRAN incoming feasibility ... NOTE
Maintainer: ‘Carl Boettiger <[email protected]>’

Unknown, possibly mis-spelled, fields in DESCRIPTION:
  ‘keywords’

The list of recognized terms appears in the ``tools:::.get_standard_DESCRIPTION_fields()`, see source: https://github.com/wch/r-source/blob/trunk/src/library/tools/R/utils.R#L1179. (Thanks @kevinushey).

Given the policies above, and as the NOTE suggests, this CRAN check may only be as a hack to check spelling in the DESCRIPTION file, so it's not clear whether or not CRAN would object to the term above. OTOH, it would certainly give us pause to start all rOpenSci packages throwing an extraneous note to indicate provider: https://rOpenSci.org or something similar.

Alternatively, codemetar users could just enter additional fields manually into the codemeta.json or through dedicated R functions, but this does seem a bit clunky.

Anna Review

It is a great addition to rOpenSci and general movements towards both linked data and better curation, visibility and citability of software. Overall, the functions are smooth and easy to use. I think the most difficult part of the package is getting your head round the concepts. There is good codemeta and JSON-LD documentation to which codemetar documentation links to (I love Manu's videos!). I also realise that the purpose of package documentation is mainly to demonstrate use and not necessarily educate on the background. However I feel that a small amount of extra explanation and jargon busting could really help users get their head round what's going on and why it's such an important and awesome initiative!

Yay! Thanks so much. Yes, have tried to do this a bit more now, see below.

As mentioned above, here are some suggestions on how I feel the readme could be more informative and self contained:
I would have loved a short background section in the intro that could include a few key definitions (which could then be used consistently throughout) and a touch of historical context: eg. (sorry this are probably rubbish definitions but hopefully you get the gist!)

  • Linked data: data that has a context which links the fields used in the data to an online agreed standard.
  • context: mapping between a data source fields and a schema. Usually schema.org but domain specific ones also (eg codemeta)

Briefly explain the difference between the data types (ie json, codemeta json-ld, codemeta r list) so that users can be cognisant of them when using package.

Describe how project is the convergence of a number of initiatives:
  • Schema.org: the main initiative to link data on the web through a catalogue of standard metadata fields
  • codemeta: a later inititative to formalise the metadata fields included in typical software metadata records and introduce important fields that did not have clear equivalents. The codemeta crosswalk provides an explicit map between the metadata fields used by a broad range of software repositories, registries and archives
  • JSON-LD: the data type enabling crosswalk through embedding contexts into the data itself.
  • codemetar: mapping fields used to describe packages in r to the standard fields agreed in schema & codemeta <- consesus schema

Excellent suggestions!! Added all of this to both the (newly create) top-level documentation (#25) and the intro vignette.

Let's just say this package could act as a gateway drug to JSON-LD, it certainly has for me!

❤️ ❤️ ❤️

function documentation

codemeta_validate
  • I think the description might be incomplete? (ie ...verifying that the result.... matches or conforms to sth?).
  • codemeta argument description only mentions path/filename. The function, however, is geared up to accept a JSON-LD, functionality demonstrated in the crosswalking vignette. So should probably be listed in the help file also.

Added to docs.

crosswalk
  • Description in help file could be a touch more informative, eg: Crosswalk between different metadata fields used by different repositories, registries and archives. For more details see here.
  • Also suggest that where the crosswalk table is mentioned, a link to it be supplied. e.g. from -> the corresponding column name from the crosswalk table.

Added, thanks!

  • My understanding is that what is refered to here as a JSON list is the same as what is refered to in create_codemeta documentation as codemeta object (list)? I think standardisation of the terminology refering to different datatypes throughout documentation would be best.

Yeah, agree this is confusing. Technically they are both list-class objects, but here it doesn't have to be a list representing 'codemeta' json, it could be other json.

This kind of relates to the bigger issue of how to refer to these things: JSON data in R can be in an external file (path or url), a string, or an R-list format. We try and make most of the functions agnostic to this, but it's still confusing.

Overall function documentation comments:

I wonder if it could be useful to make a distinction in the function names functions that work with codemeta JSON and codemeta r list objects. Let's say we associated codemeta with the JSON-LD format and codemetar with r lists. Using these associations in function names could make it a bit more explicit to the user. E.g. Functions write_codemeta and validate_codemeta would remain the same because they either output or work on JSON-LD formats but create_codemeta could become create_codemetar to indicate that the output is an r list? I only mention it because it confused me a little at times but appreciate this is no biggie.

Yup, we've tried to clarify the write / create distinction better in the function descriptions now.

Vignettes

Good scope and demonstration of utility of package. A few suggestion that as a novice to the concepts would have helped me a little.

In the index of vignettes, I think the sequence would work better as:

  1. intro
  2. Translating between schema using JSON-LD
  3. Validation in JSON-LD
  4. Parsing CodeMeta Data

and include a really short description of what each vignette contains (like a sentence)

Done! Note: the only way I could find to order vignettes was to prefix letters A, B, C, D to the names. Numeric codes, 01, 02, 03, throw a WARNING in check. Any other solution would be greatly appreciated.

Intro

Really like ability to add through the DESCRIPTION file. Where you mention

See the DESCRIPTION file of the codemetar package for an example.
it could include an actual link to the DESCRIPTION file.

Done! (In README as well). *Side note: this section now also describes the CRAN-approved / compliant format for adding arbitrary schema.org terms into DESCRIPTION files).

Translating between schema using JSON-LD

I found this paragraph confusing and have suggested how it might make the concept clearer. Also I think URI needs a short definition.

Unrecognized properties are dropped, since there is no consensus context into which we can expand them. Second, the expanded terms are then compacted down into the new context (Zenodo in this case.) This time, any terms that are not part of the codemeta Zenodo context are kept, but not compacted, since they still have meaningful contexts (that is, full URIs) that can be associated with them, but not compacted:

Link added and paragraph revised, thanks!

Validation in JSON-LD

Motivating example slightly confusing because while this sentence mentions an error being thrown up, all that is returned in r is a NULL.

However, there’s other data that is missing in our example that could potentially cause problems for our application. For instance, our first author lists no affiliation, so the following code throws an error:

Then when framing, the value to associate with the field is data missing is also NULL. I appreciate that the real value in the process is that the JSON-LD now contains and explicit field that contains a @null value but it might be worth spelling it out because the actual output in r pre & post framing are the same, ie NULL.

Yeah, clearly this isn't a good motivating example since R is basically already inferring the NULL correctly, just like you say. I think the other cases are a bit clearer so I've just removed this case.

A few super minor typos which I've corrected and happy to send through a pull request?

That would be awesome, thanks!

functions

I found out (well should have known) on a plane when I planned to work on this review that the functions require an internet connection. It actually got me wondering whether internet-connection might be a good thing to list more generally as part of software requirements?

Yeah, though this really applies primarily to the crosswalk function (where it is now mentioned) and then the json-ld functions from the vignettes; which aren't technically package functions. The JSON-LD functions only need the internet if/when the context file is given as a remote URL; technically one can just embed the literal context file in the context element, and then there's no need to resolve anything.

crosswalk

While it is relatively straight forward to get the crosswalk .csv, I feel it'd be good to be able to access information through the package. Here are some suggestions based on what I would find personally useful:

  • At the very least to have a function that just fetches the .csv.
  • Moving beyond that it could be useful to have a few helper functions to quickly interrogate it?
    + - I'd find it quite useful to quickly get the options for arguments to and from in crosswalk. > Could be cool to have another function eg crosswalks that prints the available crosswalk column options, eg:
library(readr)
crosswalks <- function(){
    df <-
        readr::read_csv(
            "https://github.com/codemeta/codemeta/raw/master/crosswalk.csv",
            col_types = cols(.default = "c"))
    names(df)[!names(df) %in% c("Parent Type", "Property", "Type", "Description")]
}

crosswalks()
#>  [1] "codemeta-V1"                         
#>  [2] "DataCite"                            
#>  [3] "OntoSoft"                            
#>  [4] "Zenodo"                              
#>  [5] "GitHub"                              
#>  [6] "Figshare"                            
#>  [7] "Software Ontology"                   
#>  [8] "Software Discovery Index"            
#>  [9] "Dublin Core"                         
#> [10] "R Package Description"               
#> [11] "Debian Package"                      
#> [12] "Python Distutils (PyPI)"             
#> [13] "Trove Software Map"                  
#> [14] "Perl Module Description (CPAN::Meta)"
#> [15] "NodeJS"                              
#> [16] "Java (Maven)"                        
#> [17] "Octave"                              
#> [18] "Ruby Gem"                            
#> [19] "ASCL"                                
#> [20] "DOAP"                                
#> [21] "Wikidata"
  • I also found the non-exported function crosswalk_table quite useful (some commented out code in there). Other's might too?

Nice. But your suggestion below is even better, so I've actually just renamed the function you give below as crosswalk_table, since it serves both that same role (if to=NULL), and the purpose you illustrate!

  • But I feel the most useful would be to be able to narrow down field mappings between particular repositories of interest. So building on the crosswalk_table function, I would probably find the following functions quite useful:
library(readr)
crosswalk_map <- function(from, to, 
                            full_crosswalk =
  "https://github.com/codemeta/codemeta/raw/master/crosswalk.csv",
  trim = FALSE){
  df <-
    readr::read_csv(full_crosswalk,
             col_types = cols(.default = "c"))
  df <- df[c("Property", from, to)]
  if(trim) df <- df[!is.na(df[,from]),] # trim to `from` argument fields
  df
}

crosswalk_map(from = "GitHub", to = c("Zenodo", "Figshare"), trim = T)
#> # A tibble: 11 x 4
#>               Property        GitHub            Zenodo    Figshare
#>                  <chr>         <chr>             <chr>       <chr>
#>  1      codeRepository      html_url       relatedLink relatedLink
#>  2 programmingLanguage languages_url              <NA>        <NA>
#>  3         downloadUrl   archive_url              <NA>        <NA>
#>  4              author         login          creators        <NA>
#>  5         dateCreated    created_at              <NA>        <NA>
#>  6        dateModified    updated_at              <NA>        <NA>
#>  7             license       license           license     License
#>  8         description   description description/notes Description
#>  9          identifier            id                id        <NA>
#> 10                name     full_name             title       Title
#> 11        issueTracker    issues_url              <NA>        <NA>

Great suggestion, this is now the (exported) function, crosswalk_table().

  • BTW, it seems that example_json_r_list %>% crosswalk("R Package Description") is the only way to get from a JSON-LD r list to a JSON-LD in r, as toJSON doesn't work. While its great that there is a way to do it, it almost feels a bit of a hack. At the very least I feel it should be included as an example in the vignette, so users are aware of it but I'm wondering if an explicit function for that might also be useful?

Hmm, toJSON should work: create_codemeta() %>% toJSON() seems happy for me. Can you give an example? Moving between list representation, string representation & external file representation is a bit confusing, but should be handled okay by the standard JSON & JSON-LD tools and not be specific to any codemeta stuff...

write_codemeta
  • When writing the file into a package (ie when DESCRIPTION is detected), adding "codemeta.json" to .Rbuildignore assumes that the user has not changed the path. While it is advised in the help file to leave as default, as the user can change it, there could theoretically be a situation where the user has called it something else but the function has written "codemeta.json" to .Rbuildignore. Just wondering whether that would cause any problems downstream?

Interesting question. The function simply uses the devtools routine to add the file to the .Rbuildignore list, which is already used by many other devtools functions, so I figured it was better to be consistent with that function than attempt a custom solution. If this really is an issue I imagine it would first be encountered and patched upstream and we'd get the benefits that way.

  • Also when supplying a JSON-LD r list instead of a path to pkg, function works but throws a warning: the condition has length > 1 and only the first element will be used

Thanks, should be fixed now.

Compliance with rOpenSci Packaging Guide

Contributing
  • No contributing.md or anything about contributing in the README. However does include a good code of conduct.

Added

You can see more about my review here

Guidance for contributors

If one creates a fork of a package like I've done now for codemetar and modifies the DESCRIPTION to add a contributor or dependency one will have to update codemeta.json because it's best practice and also because of the pre-commit hook cf #59

But if one does so, the codemeta.json will use the GitHub links of the fork.

Would it make sense to add some guidance about this to the docs, and maybe some code? My ideas

  • Have an option "doc_fork" or so in create_codemeta, FALSE by default.

  • It'd only change GH links if TRUE, otherwise it'd let them as they are, this way if one decides to document a fork, one would need to force it, and if one just wants to update codemeta.json in the context of a Pull Request, one wouldn't need to worry about the whole thing.

Generate review rOpenSci metadata

@noamross Really like your suggestion about including a bit more about the review. Trying to think how best to do this using existing vocabularies, since it makes interoperability of data so much easier.

"review":  {
   "@type": "Review",
   "url": "https://github.com/ropensci/onboarding/issues/130",
   "provider": "http://ropensci.org"
                  }

I'm not sure what the right field is to indicate the status of the review (e.g. the in review, accepted etc), maybe status would be the term? Note that http://schema.org/Review defines the property reviewRating but that's obviously not the context we really have in mind here.

I've indicated ropensci as the provider (could be more verbose and indicate ropensci is an organization), note that it would be natural to include the review author & editor here too, (not clear if you'd list all, and of course that would be harder to scrape from the svg badge...)

Wanted to cc @mfenner on this too, Martin, any thoughts at a common vocabulary for describing reviews of scholarly works? Has this come up at all on your end?

distinguish between write_codemeta & create_codemeta more clearly

The write_codemeta() function should have a fuller ”Description” statement, but is otherwise sufficient. Specifically, it should explain how its usage differs from create_codemeta().

We could better distinguish between these two in the function descriptions. (write writes an external file, create creates an R object)

Outdated installation instructions in README

The installation instructions say:

# install.packages("devtools")
devtools::install_github("codemeta/codemetar")

Perhaps you mean, "ropensci/codemetar"?

Also, since the package is on CRAN now, you might want to add CRAN installation instructions.

Framing vs. data-framing

As a non native speaker when I see "frame" I mostly think of data.frames. Maybe I'm not the one reasoning this way. 🙈

I'd suggest adding a few lines about this at the beginning of the parsing vignette. Saying one uses the concept of framing the JSON-LD to put it into a data.frame using a JSON-LD frame. A flowchart might even be useful.

Define workflow for updating a codemeta.json document

It's not obvious how a user should best maintain their codemeta.json document. The workflow is pretty simple if the user adds no additional terms beyond what write_codemeta() guesses from README and DESCRIPTION file. In this case, a user should simply re-run write_codemeta() prior to any git tag release / CRAN release, etc.

Things are less obvious though if we start adding extra metadata (such as keywords) to the codemeta.json, as illustrated in the README. Then once, say, the DESCRIPTION file is updated, what is the best workflow to update codemeta.json to capture these changes (e.g. a new contributor, or new dependency) without loosing the additional elements of codemeta.json?

One could have an R script do this before release, e.g. a update_meta.R script reading:

library("codemetar")
cm <- create_codemeta(".")
cm$keywords <- list("metadata", "ropensci")
write_codemeta(cm)

but it seems a bit clumsy to have such an R script be the canonical document to update keywords. An ideal workflow would allow a user to edit the codemeta.json by hand, which is perhaps the most obvious way to extend the metadata.

Ideally codemetar could parse the DESCRIPTION and detect and adopt any changes in the resulting fields relative to any existing codemeta.json, but would otherwise defer to the existing metadata rather than lose it. (Should just require starting with any existing codemeta.json)

DOI-based context file is causing errors now

Not sure why, but sometime in the last week, any code that needs to perform JSON-LD operations using the DOI-based context file seems to fail. (e.g. codemeta::crosswalk() functions).

@mbjones @gothub Would you know if anything changed with the handling of the DOI header or what not? I also don't think we ever got that working with DOI-based context files on json-ld playground, but it was definitely working in R a week ago (as evidenced by that travis log!).

Consider parsing a CITATION file in the Package?

Some R packages will provide additional citation information to a Software paper in a dedicated CITATION file, which R knows to parse when asked for citation("packageName"). Ideally, write_codemeta() should check for such a file and automatically parse it for the relatedPublication field.

what to do about Dates

DESCRIPTION files of packages in development tend not to have dates. Installed packages do have dates:

cm <- create_codemeta("testthat") 
cm$datePublished
"2016-04-23 08:37:40"

If a user is generating the codemeta.json in the development version (e.g. with create_codemeta(".")), do we want to just use the current date? In general it would be nice to have at least one date in the metadata to have an idea of how old the package / metadata is.

ideally we'd also keep track of things like dateCreated, (e.g. in the DESCRIPTION file?) , though I suppose technically that would refer to the date the version was created, rather than the pkg project as a whole?

Unable to generate .zenodo.json from codemeta.json

I get this error when I attempt to convert a codemeta file (generated from a package DESCRIPTION) to Zenodo format:

library("codemetar")

crosswalk(
  x = "https://github.com/ecohealthalliance/fasterize/blob/master/codemeta.json",
  from = "codemeta-V1",
  to = "Zenodo")
#> Error in context[[1]][[original_term]]: no such index at level 1
Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.4 (2018-03-15)
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2018-03-23
#> Packages -----------------------------------------------------------------
#>  package   * version    date       source                            
#>  backports   1.1.2      2017-12-13 CRAN (R 3.4.3)                    
#>  base      * 3.4.4      2018-03-15 local                             
#>  codemetar * 0.1.4      2018-02-12 CRAN (R 3.4.3)                    
#>  compiler    3.4.4      2018-03-15 local                             
#>  curl        3.1        2017-12-12 CRAN (R 3.4.3)                    
#>  datasets  * 3.4.4      2018-03-15 local                             
#>  devtools    1.13.5     2018-02-18 CRAN (R 3.4.3)                    
#>  digest      0.6.15     2018-01-28 CRAN (R 3.4.3)                    
#>  evaluate    0.10.1     2017-06-24 CRAN (R 3.4.1)                    
#>  git2r       0.21.0     2018-01-04 CRAN (R 3.4.3)                    
#>  graphics  * 3.4.4      2018-03-15 local                             
#>  grDevices * 3.4.4      2018-03-15 local                             
#>  hms         0.4.1      2018-01-24 CRAN (R 3.4.3)                    
#>  htmltools   0.3.6      2017-04-28 CRAN (R 3.4.0)                    
#>  jsonld      1.2        2017-04-11 cran (@1.2)                       
#>  jsonlite    1.5        2017-06-01 CRAN (R 3.4.0)                    
#>  knitr       1.20       2018-02-20 cran (@1.20)                      
#>  magrittr    1.5        2014-11-22 CRAN (R 3.4.0)                    
#>  memoise     1.1.0      2017-04-21 CRAN (R 3.4.0)                    
#>  methods   * 3.4.4      2018-03-15 local                             
#>  pillar      1.2.1      2018-02-27 CRAN (R 3.4.3)                    
#>  pkgconfig   2.0.1      2017-03-21 CRAN (R 3.4.0)                    
#>  R6          2.2.2      2017-06-17 CRAN (R 3.4.0)                    
#>  Rcpp        0.12.16    2018-03-13 cran (@0.12.16)                   
#>  readr       1.1.1      2017-05-16 CRAN (R 3.4.0)                    
#>  rlang       0.2.0.9000 2018-03-15 Github (tidyverse/rlang@1b81816)  
#>  rmarkdown   1.9.5      2018-03-21 Github (rstudio/rmarkdown@b73f4ce)
#>  rprojroot   1.3-2      2018-01-03 CRAN (R 3.4.3)                    
#>  stats     * 3.4.4      2018-03-15 local                             
#>  stringi     1.1.7      2018-03-12 cran (@1.1.7)                     
#>  stringr     1.3.0      2018-02-19 cran (@1.3.0)                     
#>  tibble      1.4.2      2018-01-22 CRAN (R 3.4.3)                    
#>  tools       3.4.4      2018-03-15 local                             
#>  utils     * 3.4.4      2018-03-15 local                             
#>  V8          1.5        2017-04-25 CRAN (R 3.4.3)                    
#>  withr       2.1.2      2018-03-15 CRAN (R 3.4.4)                    
#>  yaml        2.1.18     2018-03-08 cran (@2.1.18)

The error occurs in get_crosswalk_context, and I think it happens when the loop hits NA values in the df (relatedIdentifier). Note also that dateCrated should probably be dateCreated:

Property codemeta-V1
codeRepository codeRepository
programmingLanguage programmingLanguage
downloadUrl downloadLink
operatingSystem operatingSystems
softwareRequirements depends
author agents
citation relatedLink
copyrightHolder agents [role=copyrightHolder]
creator agent
dateCreated dateCrated
dateModified dateModified
datePublished datePublished
keywords controlledTerms
license licenseId
publisher publisher
version version
description description
identifier identifier
name name
url URL
email email
affiliation affiliation
identifier identifier
softwareSuggestions suggests
maintainer uploadedBy
contIntegration contIntegration
buildInstructions buildInstructions
developmentStatus developmentStatus
embargoDate embargoDate
funding funding
issueTracker issueTracker
referencePublication relatedPublications
readme readme
NA relatedIdentifer
NA relatedIdentiferType
NA relationshipType
NA title
NA namespace
NA role
NA roleCode
NA softwarePaperCitationIdenifiers

I also note that finding that my from argument had to be "codemeta-V1" took some digging - its not in any of the vignettes or docs and I had to look at the raw crosswalk table. (The same that "Zenodo" had to be capitalized.) Perhaps this could be made visible with a function like crosswalks_available() which just lists the table column headers and is prominent throughout vignettes and docs?

Typo in the A-codemeta-intro.Rmd file

Line 113 reads:

... Almost any additional codemeta field (see codemetar:::additional_codemeta_terms for a list) and can be added to and read from the DESCRIPTION into a codemeta.json file.

This sentence is not a proper sentence as far as I can tell. Perhaps you meant to remove the "and"?

selecting the software identifier

@mbjones @gothub Just wanted to get your input on the choice of software identifier when generating codemeta.json from an R DESCRIPTION file. According to our crosswalk, the identifier for an R package is the Package field of the DESCRIPTION; e.g. the package name (without a version). Maybe we already went over this, but it seems like this isn't ideal -- e.g. assuming the package is archived on KNB or Zenodo, I'd expect this to be the DOI.

In the codemetar implementation, I've allowed the function write_codemeta() to take an id as an argument. (Currently it's generating a UUID, which is probably silly -- perhaps the default should be the package name, unless a DOI is provided?) As always, there's a bit of a catch-22 in this process, in that we'd like to generate codemeta.json before uploading to Zenodo / KNB, but in the simplest workflow we only get the DOI after the upload is complete (Though I think you can reserve the DOI ahead of time with KNB; right? so in that case at least the package could handle this negotiation automatically I suppose). Thoughts?

Terms in Codemeta context not (yet) implemented in codemetar parser:

Support all additional / arbitrary codemeta fields via DESCRIPTION file? Easy for all top-level types, though additional author-level metadata (affiliation, address) is more tricky to add in this way.

Here are all the properties in the codemeta context, indicating which ones we do and do not have a mechanism for specifying via codemetar:

  • type (SoftwareSourceCode)
  • id
  • Organization (guessed from name parsing)
  • Person,
  • address -- would be nice
  • affiliation -- would be nice
  • applicationCategory -- would be nice
  • applicationSubCategory -- would be nice
  • citation -- would be nice
  • codeRepository,
  • contributor,
  • copyrightHolder,
  • copyrightYear,
  • creator,
  • dateCreated, -- would be nice
  • dateModified -- would be nice
  • datePublished
  • description,
  • downloadUrl, -- would be nice
  • email,
  • editor, -- not a priority
  • encoding, -- not a priority
  • familyName,
  • fileFormat, -- not a priority
  • fileSize,
  • funder -- would be nice
  • givenName,
  • hasPart,
  • identifier,
  • installUrl,
  • isAccessibleForFree, -- not a priority
  • isPartOf
  • keywords,
  • license,
  • memoryRequirements,
  • name,
  • operatingSystem,
  • permissions, -- not a priority
  • position, -- not a priority
  • processorRequirements, -- not a priority
  • producer, -- not a priority
  • programmingLanguage,
  • provider,
  • publisher -- would be nice
  • funding -- would be nice
  • relatedLink -- would be nice
  • releaseNotes,
  • runtimePlatform,
  • sameAs -- would be nice
  • softwareHelp -- would be nice
  • softwareRequirements,
  • softwareVersion -- use version instead
  • sponsor -- not a priority
  • storageRequirements, -- not a priority
  • supportingData -- would be nice
  • targetProduct, -- not a priority
  • url,
  • version,
  • author,
  • softwareSuggestions,
  • contIntegration,
  • buildInstructions,
  • developmentStatus,
  • embargoDate
  • readme,
  • issueTracker,
  • referencePublication,
  • maintainer

URLs badly formed

in the latest installed from github (at this commit https://github.com/ropensci/codemetar/tree/c371dbbeb080992a1cfd596cb0354c983d6f285e ) i was getting bad URLs, at least these two. Diff from running codemetar::write_codemeta() in fulltext base dir

-  "releaseNotes": "https://github.com/ropensci/fulltext/blob/master/NEWS.md",
-  "readme": "https://github.com/ropensci/fulltext/blob/master/README.md",
-  "fileSize": "3775.796KB"
+  "releaseNotes": "https://github.com/ropensci/fulltext.git/blob/master/NEWS.md",
+  "readme": "https://github.com/ropensci/fulltext.git/blob/master/README.md",
+  "fileSize": "3776.894KB",
+  "developmentStatus": "active"
 }

Unless the .git is supposed to be there for some reason?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.