Giter Site home page Giter Site logo

Comments (1)

jeromyanglim avatar jeromyanglim commented on August 11, 2024

Journals make it difficult

  • Some journals do not accept latex or pdf (i.e., the typical output format of reproducible journal article submissions)
  • Some journals have very specific style and formatting requirements that are difficult to comply with using relatively automated latex-style approaches.
  • The infrastructure for archiving and sharing such reproducible research documents is not in place, or it is not obvious where to find it or how to use it.

Lack of knowledge of how to perform reproducible research

  • Researchers may be trained in other workflows that fall short of complete reproducibility. In particular, here I speak about degrees of reproducible research. For instance, by my observation, many researchers in psychology and the social sciences, adopt a workflow that combines GUI-centric data analysis software such as SPSS with GUI based word processors such as MS Word.
  • Related to this problem is a lack of examples and training material on how to implement reproducible research in a discipline specific fashion.

It creates more work

  • It is a lot of work to polish a journal article. Opening up the internal workings of a journal article may require the internal workings to have the same degree of polish.
  • Also, more broadly, complete automation of steps like formatting of tables, numbers, and graphs can be quite time consuming particularly when you are still learning reproducible research tools.

There are no incentives

  • There are often minimal incentives to share reproducible research.

There are a few exceptions:

  • Some grants encourage or even notionally require data sharing.
  • A couple of journals encourage sharing of reproducible research repositories.
  • You could make an argument that other researchers are more likely to build on your research or cite research that allows them to examine the raw data and the analysis steps.

I also feel that it is not enough to simply share a repository. It's important to make the repository user friendly.
User friendly could mean:

  • Sharing the repository in a way that it's easy to obtain (e.g., from a google search; a clear link from the journal article; i.e., a single click to obtain the repository; don't put it behind a pay wall; don't require a log-in to obtain it)
  • Allow web navigation of the repository (e.g., like on github)
  • Provide documentation of the elements of the repository and how it can be used
  • Provide adequate information on permitted re-uses (i.e., licencing)
  • Make it clear what software needs to be installed to run the code
  • Using open source, cross-platform, and popular software is also an advantage

Deprivation of future papers

  • Some datasets yield multiple papers. Researchers may fear that publishing the raw data may allow a another researcher to take their raw data and publish a paper based on it.
  • Even if this is unlikely and even if the researcher has no specific plans to publish anything more, this still might be a concern.

Fear of making it too easy for the competition

I see science as a collaborative process. One of the major benefits of reproducible research is that it helps others see exactly how to analyse research data of a given sort.

However, it is possible that some researchers might see this as a negative thing as they seek to be a dominant figure in a particular area.

Some analysis software makes automation difficult or impossible

  • Some data analysis software does not have a scripting language that would permit incorporation into an automated reproducible workflow.
  • Some data analysis software does have a scripting language, but it differs substantially from the primary interface, and thus requires substantial investment in order to learn it.
  • Some data analysis software provides a poor interface for automating extraction of content.

Naturally, this raises the question of why anyone would use "un-automatable" software. However,

  • the researcher may be trained in the software that can't be automated.
  • Such software may have features unavailable in software that can be automated.
  • Software that can't be automated may be more user friendly for the researcher.

Fear of a mistake being publicly identified

  • In a reproducible data analysis, not only is the finished product on display, but so is much of the inner workings. If an error did occur in the statistical analysis, this can much more readily be identified by others than if only summary statistics are provided.
  • There have been various high profile cases of fraudulent data analysis. Some of these have involved creating data that did not exist. Another case involved a practice of selectively deleting data from groups until statistical significance was achieved. In both these cases, detection of such practices would have relatively trivial had the researchers supplied a reproducible research document with raw data. While this is a great argument for requiring researchers to supply reproducible research documents, it is also a reason why researchers might not want to provide reproducible research documents.

There is a wide spectrum of data analytic misconduct. If we take a legal perspective, we can think of different kinds of intentions (intentional, reckless, negligent) and consequences (how consequential was it to the paper's findings, etc.).

I have heard advocates of open source software state that one reason why open source software is better than proprietary software is because such software is on display to the community. A similar process would possibly operate in a reproducible data analysis context. Researchers would be more inclined to adopt workflows and procedures that keep their analyses clean and tidy. They would be more likely to incorporate quality control procedures that check for possible errors.

It would be interesting to see how journal articles deal with potential increases in errata that might emerge. At present while journal articles permit the incorporation of errata, it generally seemed to me to be a fairly big deal. In contrast, software is often framed as a work under development where bugs are identified and gradually fixed. Admittedly in some respects, journal articles are more static in their scope and application than are

Ethical concerns related to data sharing

  • This mainly applies to sharing raw data. Maintaining absolute anonymity can be challenging. Even if obvious identifying information is removed such as names, phone numbers, email addresses, and home addresses, there are many ways that data can be de-anonymised.
  • Even in situations where it seems likely that data can not easily be de-anonymised and even the data itself is not sensitive, researchers may still be worried about the faint possibility that it might be de-anonymised.

Compliance with ethics committees

  • In theory, satisfying ethical concerns and compliance with ethical committees would be the same thing. However, in reality there may be situations where it is ethically acceptable to share anonymised raw data, yet it may be more work to justify this to an ethics committee. Research might be delayed if the ethics committee questions the data sharing policy. Thus, there is an incentive to adopt the easy option of adopting the stricter control over data sharing.
  • In other cases, ethics applications may be completed quickly without consideration of the aims of data sharing. Thus, at a later point it may be very difficult to share the data once participants have completed consent forms that set out provisions that limit data sharing.

Limitations imposed by collaboration

  • Even if some researchers understand reproducible data analysis, if a collaborator does not, it may be deemed easier to adopt the common language of Word processors.

Copyright restrictions

In some instances, sharing various algorithms or meta data may be prohibited by copyright restrictions.

  • Item text for many psychological tests is copyrighted preventing the sharing of this information.
  • In some cases the copyright status of scientific content may be ambiguous, as often seems to me to be the case with many psychological tests that have been published in scientific journals. Thus, there may be a decision to err on the side of caution.

from rmarkdown-rmeetup-2012.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.