Giter Site home page Giter Site logo

acmsigsoft / empiricalstandards Goto Github PK

View Code? Open in Web Editor NEW
284.0 284.0 59.0 18.43 MB

Tools and standards for conducting and evaluating research in software engineering

Home Page: https://acmsigsoft.github.io/EmpiricalStandards/

License: Creative Commons Zero v1.0 Universal

HTML 19.55% CSS 1.48% JavaScript 78.97%
empirical-standards research sigsoft softwareengineering standards

empiricalstandards's People

Contributors

arhamarshad1 avatar captainemerson avatar cassandra-cupryk avatar david-istvan avatar dfucci avatar drpaulralph avatar eschltz avatar fernandocastor avatar fzieris avatar guerzh avatar iivanoo avatar jeffcarver avatar koppor avatar luiscruz avatar melvidoni avatar mrksbrg avatar msallin avatar nzjohng avatar rahuljyu avatar sbaltes avatar shravs7455 avatar slingerbv avatar soerenhenning avatar steghoja avatar taher-ghaleb avatar terry-one avatar timm avatar timmenzies avatar tushartushar avatar whasselbring avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

empiricalstandards's Issues

Custom follow-ups and error flows

As an editor (of the standards)

I want to write custom follow-up questions for some, but not all, essential criteria,

So that I can better guide reviewer decision-making.

For example, if a reviewer clicks 'no' to "discusses implications of the results", right now the system asks "is this deviation reasonable," but I would like it to display a text box where the reviewer can list important limitations that are missing or explain why the limitations listed are incorrect. This is just an example. What I'm after is a simple way of specifying these follow up questions (e.g., in mardown files) without hardcoding them in javascript.

Add "too few participants" as an invalid criticism to experiments with human participants

In the “Experiments (with Human Participants)” section, I suggest adding as an invalid criticism “too few participants unless there is an objection to the statistical analysis or participant selection process.” Too often, I see reviewers eyeballing the number of participants in studies and complaining that it doesn’t look like enough participants. Presumably the reviewers are trying to address external validity, but the way to address that is either via objections to the statistical analysis or results, or objections to the participant pool or selection process.

Potential improvements to case study standard

The case study standard doesn't say much about theory. Is it desirable for case study to do one or more of the following?

  • generate a theory
  • base its coding scheme on a theory, or
  • test an a priori theory

Update Subquestions for review checklist

As a reviewer in a two-phase review process,
I want to indicate the "type" of unreasonable deviations
So that the editor can distinguish minor revisions from major revisions.

The types are as follows:

Type 1: can be fixed by editing text only; e.g. clarifying text, adding references, changing a diagram, describing an additional limitation, copyediting.

Type 2: can be fixed by doing some new data analysis, redoing some existing data analysis, or collecting a small amount of additional data (e.g. going back to one interviewee, collecting some additional primary studies for a systematic review).

Type 3: can be fixed completely redoing data analysis, OR collecting additional data (e.g. conducting new or additional experiments or case studies; several new interviews, one or more additional rounds of questionnaire data collection).

Type 4: unacceptable conduct (e.g. plagiarism, p-hacking, HARKing, unethical data collection) OR problems the cannot be fixed without doing a brand new study (e.g. fundamentally invalid measures, data collection or analysis insufficient by an order of magnitude, no chain of evidence whatsoever from data to conclusions).

Also say something like: "Pick the largest number that applies."

This story will entail changes to the appearance of the checklist (to be discussed) and corresponding changes to the .txt export.

Hot key for mark all checklist items 'yes'

As a tester
I want to be able to quickly check all the 'yes' radio buttons
So that I can test the checklists faster

We could use a hotkey but I'm open to other solutions. I don't want this to be visible because we don't want to encourage reviewers to just click 'yes to all' without thinking about each criterion.

Only show desirable and extraordinary attributes possessed

As a reviewer,
when I download my review checklist export,
I want it to only show the Desirable and Extraordinary attributes that I selected
so that I don't get overwhelmed by all the text.

If no desirable attributes are selected, the desirable heading should not display. Likewise, if no extraordinary attributes are selected, the extraordinary heading should not display.

BEFORE:

=================
Review Checklist

Recommended Decision: ACCEPT

Unreasonable Deviations Requiring Revision
F presents a clear chain of evidence from observations to findings

Essential
Y states a purpose, problem, objective, or research question
...
Y defines unit(s) of analysis

Desirable
N summarizes and synthesizes a reasonable selection of related work
N clearly describes relationship between contribution(s) and related work
N states epistemological stance
N appropriate statistical power
Y reasonable attempts to investigate or mitigate limitations
N discusses study’s realism, assumptions and sensitivity of the results to its realism/assumptions
N provides plausibly useful interpretations or recommendations for practice, education or research
N openly shares data and materials to the extent possible within practical and ethical limits
N concise, precise, well-organized and easy-to-read presentation
N visualizations advance the paper’s arguments or contribution
Y clarifies the roles and responsibilities of the researchers
N provides an auto-reflection or assessment of the authors’ own work
N uses multiple raters for any subjective judgments
N provides supplemental materials such as interview guide(s), coding schemes, coding examples, decision rules, or extended chain-of-evidence tables
N triangulates across data sources, informants or researchers
N cross-checks interviewee statements
N validates results using member checking, dialogical interviewing, feedback from non-participant practitioners or research audits of coding by advisors or other researchers
Y reports the type of case study
N describes external events and other factors that may have affected the case or site
N uses quotations to illustrate findings
N EITHER: evaluates an a priori theory

Extraordinary
N applies two or more data collection or analysis strategies to the same research question
N approaches the same research question(s) from multiple epistemological perspectives
Y innovates on research methodology while completing an empirical study
N multiple, deep, fully-developed cases with cross-case triangulation
N uses multiple judges and analyzes inter-rater reliability
N uses direct observation and clearly integrates direct observations into results
N published a case study protocol beforehand and made it publicly accessible

=======
Legend

...

AFTER:

=================
Review Checklist

Recommended Decision: ACCEPT

Unreasonable Deviations Requiring Revision
F presents a clear chain of evidence from observations to findings

Essential
Y states a purpose, problem, objective, or research question
...
Y defines unit(s) of analysis

Desirable
Y reasonable attempts to investigate or mitigate limitations
Y clarifies the roles and responsibilities of the researchers
Y reports the type of case study

Extraordinary
Y innovates on research methodology while completing an empirical study

=======
Legend

Easychair integration

As a program chair,

I want to import reviewer checklists and decision logic directly into Easychair,

So that reviewers don't have to go to two separate websites to review a paper.

Create a standard for Qualitative Simulations

Create a standard for qualitative simulations including:

  • Protocol studies / Protocol analysis
  • quasi-experimental simulations of the sorts used with undergraduate students
  • usability studies like heuristic walkthroughs, cognitive walkthroughs, Wizard-of-Oz studies, etc. (used extensively in HCI)

Greedy replacement of criteria texts in parentheses

A criterion such as "describes data sources (e.g. participants' demographics, work roles)" is shortened to "describes data sources" in the downloadable text file. This is fine.

Another criterion such as "explains how key patterns (e.g. categories) emerged from GT steps (e.g. selective coding)", however, is shortened to "explains how key patterns", which is missing key parts of the phrase.

The cause appears to be this greedy regex:

The fix should be easy:

var regex7 =/ \(.+?\)/g; 

typically have a predictive/preemptive/corrective modelling?

@drpaulralph writes

I get what you're saying about "data-scientific endeavours typically have a predictive/preemptive/corrective modelling in mind by design" Should we state this in the application section? I.e., "applies to studies that primarily analyze existing software engineering artifacts," ... "using predictive, preemptive or corrective modelling"? Does that make sense?

but I don't see that in current text https://github.com/acmsigsoft/EmpiricalStandards/blob/master/docs/DataScience.md.

is this still current?

Correct weird formatting of indented lists

Have a look at the screenshots below. Many of the standards have simple lists like this one from the engineering standard, using manual line breaks to make the text easier to understand.

Screen Shot 2021-07-13 at 4 23 47 PM

When this attribute appears in the checklist, the manual line breaks are removed:

Screen Shot 2021-07-13 at 4 24 07 PM

We can make changes to the standards themselves, if necessary, to get this to look better. Right now the parser doesn't seem to recognize indented lists like this:

  • Item 1
    • Subitem 1
    • Subitem 2
    • Subitem 3
  • Item 2
    • ...

Should Reflexivity be in the general standard?

This argument was raised at a Dagstuhl seminar I attended:

"Reflexivity (author biases and demographics) should be a method-agnostic criteria where the research intersects GBA+" (GBA+ is a Canadian catch-all term for gender, race, class, sexual orientation, religion, etc.) Basically the argument is that if the research touches on these issues, the paper should discuss reflexivity, typically in the limitations section, regardless of its philosophical foundation or method. I'm not certain how to proceed here so am raising the issue for discussion.

Create a standard for Replication Studies

Carver (2010) proposed guidelines for reporting replications. A 2014 special issue in EMSE uses four papers to test the reporting guidelines, and the editorial provides further pointers to the literature about replication studies.

Dennis and Valacich propose a useful taxonomy in their replication manifesto.

da Silva et al. carry out a systematic mapping study of replication studies in software engineering and draw a number of informative and actionable conclusions.

References:

  • Carver JC (2010) Towards reporting guidelines for experimental replications: a proposal. In: RESER’2010: proceedings of the 1st international workshop on replication in empirical software engineering research

Editorial Manager Integration

As an Editor-in-Chief,

I want to import reviewer checklists and decision logic directly into Editorial Manager (Elsevier),

So that reviewers don't have to go to two separate websites to review a paper.

Locations in presubmission checklist

  1. As an author, I want to indicate the location of each presubmission checklist item (instead of just indicating that it's in the paper somewhere), to prevent reviewers from missing things.
  2. As an author, I want to download a txt export of the presubmission checklist, which shows the locations of each item, so that I can provide it to reviewers.

Notes:

  • "location" does not apply to all entries. For example, "contributes in some way to the collective body of knowledge" wouldn't have a location
  • Some items may have multiple locations. It should be easy for the author to say indicate multiple locations
  • The best way to do this would be with line numbers but not all docs have line numbers so we need to be flexible
  • I'm not sure what this should look like exactly.

organize essential attributes into IMRaD groups

As a checklist user,
I want the essential attributes to be organized by where they typically appear in papers rather than by the standard they're drawn from
So I don't have to flip back and forth so much

Delete the "does the manuscript justify the deviation?" follow-up question

Replace with:
Is the deviation reasonable?
Yes: "OK. Not grounds for rejection"
No: "Explain in your review why the deviation is unreasonable and suggest possible fixes. Reject the paper unless fixes are trivial.

Tooltip on "is the deviation reasonable": If the manuscript justifies the deviation, consider the justification offered.

Copyediting in README

... primarily emprical/empirical in the first paragraph. Patch file below.

index f6655c4..1360d53 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # Empirical Standards

-An _Empirical Standard_ is a brief public document that communicates expectations for emprical research. Here _empirical_ denotes research that uses data. The data can be qualitative or quantitative; real or synthetic. _Empirical_ distinguishes research that involves collecting and analyzing data from other kinds of scholarship like a mathematical proof or a philosophical treatise.
+An _Empirical Standard_ is a brief public document that communicates expectations for empirical research. Here _empirical_ denotes research that uses data. The data can be qualitative or quantitative; real or synthetic. _Empirical_ distinguishes research that involves collecting and analyzing data from other kinds of scholarship like a mathematical proof or a philosophical treatise.

 Moreover, our empirical standards are:

@@ -15,13 +15,13 @@ The empirical standards have three main uses:
 2. Designing better studies
 3. Educating graduate students

-Scholarly peer review is simultaneously “the lynchpin about which the whole business of science is pivoted" [1] and "prejudiced, capricious, inefficient, ineffective, and generally unscientific” [2]. Many of the problems with peer review boild down to reviewers inventing their own evaluation criteria. Devising appropriate evaluation criteria for any given manuscript is extraordinarily difficult, so most reviewers' criteria are not very good. Reviewers create criteria that are inconsistent with other reviewers', the venue's, the editor's, the methodological literature and---crucially---the author's. In effect, the real criteria by which our research is judged are not merely opaque; they don't even exist until after the manuscript is submitted. This is why peer review is so frustrating, unpredictable, and unscientific.
+Scholarly peer review is simultaneously “the lynchpin about which the whole business of science is pivoted" [1] and "prejudiced, capricious, inefficient, ineffective, and generally unscientific” [2]. Many of the problems with peer review boiled down to reviewers inventing their own evaluation criteria. Devising appropriate evaluation criteria for any given manuscript is extraordinarily difficult, so most reviewers' criteria are not very good. Reviewers create criteria that are inconsistent with other reviewers', the venue's, the editor's, the methodological literature and---crucially---the author's. In effect, the real criteria by which our research is judged are not merely opaque; they don't even exist until after the manuscript is submitted. This is why peer review is so frustrating, unpredictable, and unscientific.

 Empirical standards are the secret to fixing this situation. With the standards, all the reviewers use the same criteria and the authors know the criteria in advance. Used appropriately, the standards discourage or prevent reviewers from either accepting research with fatal flaws or rejecting research based on bogus criteria.

 Obviously, if authors have these criteria in advance, they can use the criteria to design more rigorous studies. There's a lot to remember when designing a study, and robust methodological training is rare in our community. The standards provide concise, convenient checklists to help us remember all the core practices for a given method.

-The standards can also be used for educational purposes. While they cannot replace a good methods textbook, the lists of references and exemplars can be used to construct reading lists, and the specific attributes can be used to sheppherd graduate students through their study designs and paper write-ups.
+The standards can also be used for educational purposes. While they cannot replace a good methods textbook, the lists of references and exemplars can be used to construct reading lists, and the specific attributes can be used to shepherd graduate students through their study designs and paper write-ups.

 ## Creation and Maintenance

@@ -41,6 +41,6 @@ In the main directory:

 ## References

-[1] John M Ziman. 1968.Public knowledge: An essay concerning the socialdimension of science. Vol. 519. CUP Archive.
+[1] John M Ziman. 1968. Public knowledge: An essay concerning the social dimension of science. Vol. 519. CUP Archive.
 [2] Paul Ralph. 2016. Practical suggestions for improving scholarly peer review quality and reducing cycle times. _Communications of the Association for Information Systems_ 38, 1 (2016), Article 13.
 [3] Paul Ralph et al. 2020 "Empirical Standards for Software Engineering Research." _arXiv_:2010.03525.

Indentation bug

One of the recent updates created a bug where the radio buttons and check boxes are indented relative to their headings. Obviously the 'yes' radio button should be directly beneath the 'yes' heading and so on.

Screen Shot 2021-05-14 at 10 59 00 AM

HotCRP integration

As a program chair,

I want to import reviewer checklists and decision logic directly into HotCRP,

So that reviewers don't have to go to two separate websites to review a paper.

Surveys vs. interviews

I think of surveys and interviews as being very different from each other (p. 16); surveys try to gather data from larger numbers of people in a lower-bandwidth way (sometimes with no direct interaction between researchers and participants), whereas interviews are in-depth, usually 1-1, and with fewer participants. I suggest clarifying that this page refers specifically to interviews, not surveys. Then a separate standard can be written for surveys, which do not meet criterion #1 ("Researcher(s) have synchronous conversations with one participant at a time").

Move recommended decision to top of txt export

Currently the decision is underneath the lists. Move it to the top, as shown below. Do this for both one-phase and two-phase checklists.

=================
Review Checklist

Recommended Decision: REJECT. [or ACCEPT, or REVISION, etc.]

In your review please explain the deviations and why they are not reasonable. Give constructive suggestions. [This is the same text that appears with the decision on the checklist before downloading the text file.]

Essential
Y states a purpose, problem, objective, or research question
Y explains why the problem, objective, or research question is important
...

Create a "standard" for novel or other methods

We need some kind of recommendations for reviewing a method for which there is no existing standard, such that a reviewer can click "other" in the reviewer checklist generator and still get some kind of sensible advice beyond the general standard.

No "Standards/" directory

The README currently states "The standards themselves can be found in the Standards directory."

I cannot find this directory.

davisjam@davisjams-MacBookPro162-13-inch-2020-4TB3-ports EmpiricalStandards % find . -type d  -iname Standards | wc -l
       0

Possible duplicate of #26?

Assign different decision mapping to one and two phase checklists

Implement different mappings:

Conference:
one or more 3s or 4s: Reject
one or more 2s: Gatekeeping
one or more 1s: Accept
no 1s 2s 3s or 4s: Accept

Journal
one or more 4s: Reject
one or more 3s: Reject but invite resubmission
one or more 2s: Major Revision
one or more 1s: Minor Revision
no 1s 2s 3s or 4s: Accept as is

Accept-message

As a reviewer,
if I select "yes" to all essential criteria"
I get a message in red at the end of the essential criteria but before the desirable criteria that says "The manuscript meets all essential criteria and should be accepted."
So that I know what decision I should enter.

github.io links -> github.io pages

As an author or reviewer using the checklist generator,
When I click on the name of a method, I want to go to the page for the corresponding standard on the github.io site instead of going to the corresponding standard in on the github.com repo.

Before attempting this issue, please resolve issue 28: #28

Add "list of unreasonable deviations" to .txt export

This is related to issue 31. If issue 31 is done, put the list of unreasonable deviations, described below, AFTER the recommendation.

The .txt export currently looks like this:

Review Checklist

Essential

Y states a purpose, problem, objective, or research question
Y explains why the problem, objective, or research question is important
Y defines jargon, acronyms and key concepts
Y methodology is appropriate for stated purpose or questions
Y describes in detail what, where, when and how data were collected
R describes in detail how the data were analyzed
R discusses and validates assumptions of any statistical tests used
R presents results
R results directly address research questions
R supports main claims or conclusions with explicit evidence or arguments
U discusses implications of the results
U discusses the study's limitations and threats to validity
U contributes in some way to the collective body of knowledge
U language is not misleading; any grammatical problems do not substantially hinder understanding
Y balances the study's anticipated benefits with its potential risks or harms, minimizing risk or harm wherever possible
Y visualizations/graphs are not misleading
... [Decision, Legend, standards used, etc.]

We want to split up the "essential" list to bad and good, as follows. Take note that "R's" stay in the essential list. Only "U's" go in the new list. The name of the new list depends on the decision. If the decision is "reject", the new list should be called "Reasons for Rejection." If the name of the decision is "accept," the new list should be called "Unreasonable Deviations Requiring Revision." Note that items from the Desirable and Extraordinary lists never go in the new Unreasonable Deviations / Reasons for Rejection list.

Review Checklist

Unreasonable Deviations Requiring Revision [for Accept or revision decisions] OR

Reasons for Rejection [for Reject decisions]

U discusses implications of the results
U discusses the study's limitations and threats to validity
U contributes in some way to the collective body of knowledge
U language is not misleading; any grammatical problems do not substantially hinder understanding

Essential

Y states a purpose, problem, objective, or research question
Y explains why the problem, objective, or research question is important
Y defines jargon, acronyms and key concepts
Y methodology is appropriate for stated purpose or questions
Y describes in detail what, where, when and how data were collected
R describes in detail how the data were analyzed
R discusses and validates assumptions of any statistical tests used
R presents results
R results directly address research questions
R supports main claims or conclusions with explicit evidence or arguments
Y balances the study's anticipated benefits with its potential risks or harms, minimizing risk or harm wherever possible
Y visualizations/graphs are not misleading

... [Decision, Legend, standards used, etc.]

Methods names link to standards

As a reviewer or author,

on the page where I select the methods to generate a presubmission or review checklist,

I want to click on the name of the method to open the corresponding standard from the github repo in a new window

So that I can make sure I'm selecting the right standards

Break up systematic review standard

The systematic review standard should be divided into at least three standards for meta-analysis, case survey, critical review, meta-synthesis. Somewhere we need to explain that ad hoc reviews, rapid reviews and tertiary reviews are not covered. Decide whether to accept scoping reviews / systematic mapping studies, or argue that they aren't mature enough for full papers.

Fix the legend in the review export

Now that we're numbering problems 1 to 4, we need to use the same numbers in the text export, as below. Notes: 1) reasons for rejection should be listed in descending order of type number, 2) we update the legend accordingly; 3) the legend should use the same text as in the type 1-4 tool tips minus the parenthetical examples.

BEFORE:

=================
Review Checklist

Recommended Decision: REJECT

Reasons for Rejection
F states a purpose, problem, objective, or research question
U explains why the problem, objective, or research question is important
U defines jargon, acronyms and key concepts
U methodology is appropriate for stated purpose or questions

Essential
Y describes in detail what, where, when and how data were collected
...
Y presents a clear chain of evidence from observations to findings

=======
Legend

Y = Yes, the paper has this attribute
R = Reasonable deviation
F = (easily) Fixable deviation
U = Unfixable (or not easily fixable) deviation
N = No, the paper does not have this attribute

AFTER:

=================
Review Checklist

Recommended Decision: REJECT

Reasons for Rejection
4 states a purpose, problem, objective, or research question
3 explains why the problem, objective, or research question is important
2 defines jargon, acronyms and key concepts
1 methodology is appropriate for stated purpose or questions

Essential
Y describes in detail what, where, when and how data were collected
...
Y presents a clear chain of evidence from observations to findings

=======
Legend

Y = yes, the paper has this attribute
R = a reasonable, acceptable deviation from the standards
1 = can be fixed by editing text only
2 = can be fixed by doing some new data analysis, redoing some existing data analysis, or collecting a small amount of additional data
3 = can be fixed completely redoing data analysis, or collecting additional data
4 = unacceptable conduct or problems the cannot be fixed without doing a brand new study

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.