Giter Site home page Giter Site logo

social-science-data-editors / guidance Goto Github PK

View Code? Open in Web Editor NEW
13.0 13.0 11.0 2.09 MB

Guidance by Data Editors

Home Page: https://social-science-data-editors.github.io/guidance/

License: Other

HTML 94.71% TeX 4.99% Stata 0.13% SAS 0.02% SCSS 0.07% Ruby 0.07% Shell 0.02%

guidance's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

guidance's Issues

Bug in link to Danish registers.

In some cases, governments have list of their (named) registers. For instance, Statistics Denmark provides the full list of registers at [http://www.dst.dk/extranet/forskningvariabellister/Oversigt%20over%20registre.html](http://www.dst.dk/extranet/forskningvariabellister/Oversigt%20over%20registre.html). These can be used to craft data citations, for instance

Talk about author contributions

https://journals.plos.org/plosbiology/s/authorship#loc-author-contributions and https://casrai.org/credit/. While not typical in social sciences (in particular in economics) might warrant a discussion.

The data (and code) authorship != paper authorship is not at all common in Economics. One example I can think of is https://www.aeaweb.org/articles?id=10.1257/pol.20170704, with replication package https://doi.org/10.3886/E110642V1 explicitly single-authored by one of the authors. I've also seen some papers (need to dig out references) where Github with code might be "Author A + RA" whereas paper is "Author A + Author B".

The generic guidance I have in mind is https://casrai.org/credit/, which has been adopted by some journals, such as PLOS https://journals.plos.org/plosbiology/s/authorship#loc-author-contributions, where it applies to the paper. If considering the data + the code as distinct first-class research objects, the CRediT taxonomy directly applies there as well, and there's nothing that requires that the two be the same.

Data citations access date clarification

Question received:

For the access date for cited datasets, what is the definition of “access date”? I believe it is the date the data were actually accessed. Is this correct? The wording of the question was whether it is "the dates we first accessed the actual data itself” or, instead, “the date we accessed the websites describing the data.” I believe it is the former.

It is, in fact the former.

Need to add this to relevant FAQ and guidance.

Suggested improvement: filter-branch for splitting repos

I read this comment in the FAQ:

If not, then what I suggest is to do the following

clean up the repo (possibly in a branch)
on Github, there is no way to fork to your own space, and a fork would carry the entire history anyway. So this assumes manual interaction (I’m going to assume you use the command line for this, this works in git-bash, or bash on Linux/OSX).
create a new clone of your (now cleaned) repo, and switch to the clean branch ``` git clone (WHATEVER) cd whatever git branch “cleaned”

now wipe out all git information: ``` rm -rf .git

create a new repo ``` git init

Add all files ``` git add *

I think there are two things going on here, which can be handled differently:

  1. Author wishes to wipe commit history (e.g. for privacy reasons). Then the advice above is most germane
  2. Author keeps several papers related to an ongoing or long-running project in a single repo, and wants to isolate code for tidy submission alongside an "offshoot paper"

In the latter case, I'd recommend suggesting the git filter-branch approach, which is essentially a way to split off a subset of a repo into a new repo (e.g. a subdirectory paper-1). It will conveniently inherit the commit history as well. See guide:

https://help.github.com/en/articles/splitting-a-subfolder-out-into-a-new-repository

I think it would help if the FAQ delineated these two competing goals and stated more clearly the differences/approaches here.

where do data citations go

In the template, we should make clearer where data citations go. I often have authors send me a ReadMe that is very good, but none of them put data citations in the References. Lars explained: in the DAS, the author will explain how to get access to the data, and cite it, but then the reference goes in the reference list. I don't think this is explicit enough.

Example:
" Trade data for 1974-2000 were downloaded from the NBER-UN World Trade Flows dataset (Feenstra & Lipsey, 2005) originally created by Feenstra et al (2005). Data can be directly downloaded using https://cid.econ.ucdavis.edu/nberus.html. We use fileswtf74 through wtf00. The data are in the public domain"

And then later in the references:

Feenstra & Lipsey. 2005. "NBER-United Nations Trade Data 1962-2000". Center for International Trade, UC Davis [distributor]. https://cid.econ.ucdavis.edu/nberus.html accessed on 2021-03-24.

Add example for anonymous source to data citation guidance

In a case where the authors cannot name the company, you would cite as follows:

"Anonymous Firm (DATE OF FILE CREATION)"

and have in the references the following entry:

Anonymous Firm. (DATE OF FILE CREATION), Property Insurance claims. Accessed via Cornell Restricted Access Data Center (CRADC). Last accessed on (DATE).

where

  • Author = "Anonymous Firm"
  • Title (of dataset) = "Property Insurance Claims"
  • Distributor = the secure access center where the data reside/were accessed = "CRADC"
  • Date = date the date were created (authors should know that)
  • Version = in this case, proxied by "Last accessed on (DATE)"

Add guidance on citing your own data

Data is published

The DOI is thus public, and all repositories will provide a suggested citation. One can also use https://www.doi2bib.org/ or https://citation.crosscite.org/ to get a citation.

Data is not published

This is trickier. The data does not necessarily have a title that is related to the paper. Some repositories allow authors to "reserve" a DOI (Zenodo) or to delay publication. For some repositories, the DOI, while not officially reserved, can be derived from information already available (see this FAQ for openICPSR, something similar may be possible at Dataverse).

In some cases, authors may be able to delay publication, and coordinate it with the publication of the article (openICPSR, possibly Zenodo).

Add guidance on citing your own data

Add to to guidance on Citing Data and Code

Data is published

The DOI is thus public, and all repositories will provide a suggested citation. One can also use https://www.doi2bib.org/ or https://citation.crosscite.org/ to get a citation.

Data is not published

This is trickier. The data does not necessarily have a title that is related to the paper. Some repositories allow authors to "reserve" a DOI (Zenodo) or to delay publication. For some repositories, the DOI, while not officially reserved, can be derived from information already available (see this FAQ for openICPSR, something similar may be possible at Dataverse).

In some cases, authors may be able to delay publication, and coordinate it with the publication of the article (openICPSR, possibly Zenodo).

In all cases

The data deposit should be cited in the main manuscript, and referenced in the data availability statement (some journals) or the README (other journals).

Add distinction between repositories with anonymous pre-publication access and those that do not have that capability

Nature's list of repositories is the only one [...] that a requirement that the repository support anonymous sharing for the purpose of double-blind peer review. [...] The discovery of such an existing list of repositories was very helpful for me in drafting our own data policy because many journals and publishers that are currently requiring data sharing either (a) use single-blind or transparent peer review or (b) only require data and code sharing at the publication stage, not at the submission stage.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.