Giter Site home page Giter Site logo

how_wrong_is_the_code's Introduction

how_wrong_is_the_code

Repo for CollabW20 hack day - how wrong is the research code base?

The original pitch for the project:

CW20 - 2020-03-31 to 2020-04-02

Alpha - HP1-CW20 How Wrong Is The Research Code Base?

Hackday Idea Proposer

Dan Hobley This document should be used to capture the information for a Hack Day Idea.

Context / Research Domain Please describe the context or research domain to which the problem applies General research software development

Problem Description of the problem you are trying to solve This is motivated by four observations:

  1. There is a huge legacy science codebase, supporting lots of publications
  2. Even professional software developers make errors at 15-50 per 1000 lines of code (McConnell, 2004)
  3. Modern software best practice (reviewing, testing, etc.) has only recently begun to penetrate the research software community, broadly defined
  4. Without formal testing, only code breaking bugs and those that are obviously wrong are likely to be caught.

So, most legacy code is wrong. The famous cases become notorious, but the problem is probably pervasive. This proposal aims to quantify how broken, and how severe this problem might be - ideally heading towards a publishable standard of work.

Solution Explanation of the solution to the problem you have identified The github API provides a tool to have a crack at answering this. The first objective of the team will be to define exactly what method/approach will get used, but I foresee something broadly like this:

  • Assume that code that makes it to Github is broadly representative of the wider code base (this is questionable… Track with changes through time?) So, use the API to randomly sample repos.

  • Separate out the “research” repos using key terms. - “doi.org/”

  • Separate code that uses tests from that which doesn’t - tested code should perform better in any metrics the project makes. By ratio-ing tested against untested code, we should be able to assess the quality of the legacy code (like, or worse than, the untested Github code) against the (broadly known) quality of the tested code.

  • Try to track bug fixing once the code makes it to Github. We can assume any bug caught on github is the pernicious variety. Identify trends through time. This will probably also depend on number of editors active, and how “mature” the code is on github. These should be usable to back-project the bug density when the code went up on Github.

  • Survey the nature of the bugs found. How severe are they? What fraction are code-breaking?

  • Use what we learn by comparison to the tested code how broken the untested code might be. We can probably assume the average error rate in tested vs untested code should be the same (??), so what’s happening in the tested code should be a guide to the untested code… which is probably broadly representative of the legacy code.

Clearly, much detail to be added here, but that’s the spirit of it.

The visualised final output here will be ultimately, an estimate of how many bugs are found in poorly tested/legacy code under a variety of possible conditions (age, users, etc).

This hack would probably suit Python-minded people, who are also familiar with simple, widely-used Github workflows and the very basics of testing. Surveying the nature of bugs would be a non-code-y job. Specific API knowledge probably isn’t necessary, but could be useful if someone has it.

Anyone know if this has been tried before (with these kinds of tools)?

Useful resources:

https://stackoverflow.com/questions/26881441/can-you-get-the-number-of-lines-of-code-from-a-github-repository/29012789#29012789

how_wrong_is_the_code's People

Contributors

cjfullerton avatar siccarpoint avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.