Giter Site home page Giter Site logo

nlesc / guide Goto Github PK

View Code? Open in Web Editor NEW
45.0 32.0 27.0 15.24 MB

Software Development Guide

Home Page: https://guide.esciencecenter.nl

License: Creative Commons Attribution 4.0 International

HTML 80.90% CSS 19.10%
best-practices escience language-guides research-software research-software-engineering

guide's Introduction

DOILink Checker

Guide

This is a guide to software development at the Netherlands eScience Center. It both serves as a source of information for how we work at the eScience Center, and as a basis for discussions and reaching consensus on this topic.

Read The Turing Way instead

If you are looking for an overall picture of best practices, read The Turing Way first. We joined forces with that guide for most of our generic research software engineering advice. Because The Turing Way is language agnostic, this guide mostly provides addtional specific language guides. Please be aware that most remaining language agnostic content is outdated, be careful when using it. We plan on removing that content (#286).

If you would like to contribute to this book see CONTRIBUTING.md.

guide's People

Contributors

bencomp avatar benvanwerkhoven avatar blootsvoets avatar bouweandela avatar bvreede avatar c-martinez avatar ctwhome avatar cwmeijer avatar egpbos avatar f-hafner avatar fdiblen avatar felipez avatar goord avatar hanneoberman avatar hannospreeuw avatar jenswehner avatar jhidding avatar jiskattema avatar jspaaks avatar katrinleinweber avatar lourensveen avatar maltelueken avatar mkuzak avatar nielsdrost avatar romulogoncalves avatar svenvanderburg avatar sverhoeven avatar tbkkr avatar vincentvanhees avatar wrvhage avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

guide's Issues

Better Code Hub

Last week during ICTOpen, I spoke with someone from SIG (software improvement group, not special interest group). They've developed their The 10 Guidelines for Maintainable Software, and have tool which checks your github repo to assess how maintainable id your code.

It looks interesting -- maybe it is just yet another code quality tool. But maybe it is worth checking it out! If it turns out to be quite good, maybe we can include it in the guide?

Python virtualenvwrapper

Add something about that, hugely improves usability of virtualenv (though conda is still preferable imho)

C++ chapter header needs improving

The C++ chapter has a few controversial statements. We should have a (probably offline) discussion on what the consensus and/or official viewpoint is at NLeSC and change it accordingly.

Examples:

C++ is portable

C++ can be safe, if you want to.

don't write C.

Python

Before I start a commit-war, let's do a simple religious war in this issue ;)

There are a few things I don't like about the latest version of the Python guide:

  1. Conda seems to be the non-recommended option. Imho Conda is currently far superior to pip + virtualenv. One important thing to note is that one can use pip inside a Conda enviroment as well. Packages that are not included in Conda can still simply be installed with pip. This is common practice, especially when developing. It makes no sense whatsoever to compile numpy and scipy every time you need a new virtualenv. Until pip gets at least a build cache (preferably just a binary download like Conda, but okay, a build cache would already be a great improvement), I think we should advocate using Conda. For saving on power and the environment, if nothing else.
  2. I know some people like PyCharm, and I guess I can live with that ;) I don't like IDEs at all though. I generally find recommending editors a very tricky subject, actually. Unless there is like 90% agreement within a community that a tool is indispensable (like pip), I'm not sure why we should talk about it (unless it's clearly superior, like conda :D). I don't think anybody will visit our guide to get advise on editors, they will simply Google it (I would); and anyway, the editor is such a basic need that people don't need guidance on it. Package management is different of course, since a beginner wouldn't know that there is such a thing at all, so there it helps to have the "Googleable terms" collected in a single place.
  3. Multiprocessing is a mess in Python. I think we should just put it plainly like that. For sure, recommending just one package totally gives the wrong picture. The alternative to just letting people find out on their own is to list many more packages. For instance, iPython/Jupyter also has parallel stuff built in, there's things like RabbitMQ, we have Noodles, you could build a C module or use Numba or PyCUDA, etc. My preference would be to just state that Python is not really built for it.
  4. About iPython: clearly this is a superior interpreter, but that's about it. What grew out of it is also very interesting: Jupyter notebooks. Jupyter has become language agnostic, but still, it is a very powerful Python tool for interactive analysis, education, communication. Should mention this.

I can write a pull request soon incorporating these aspects. Before doing so, just want to make sure that it's not going to be rejected ;) Shout out to previous contributors: @jvdzwaan @blootsvoets @nielsdrost

Project chapter needs work

The chapter on projects need some work. For instance, the page on project planning is completely empty.

Nicer landing page

Would it be worth to put some effort into making a nicer landing page (https://guide.esciencecenter.nl/index.html)? Or at least one with somewhat more clear information as to what one can find in the guide, for who the guide is (us, RSEs, PIโ€™s, project members? all?), how one can give input/suggestions, and what the goal of the guide is?

Tools of the Trade chapter?

I recently discovered https://draw.io as a really nice way of drawing diagrams. I of course instantly though about adding this to the guide!

However, we seem to not have a section for this. :-(

Do we need a new chapter for this? I can image things like Overleaf, SURFdrive, and Figshare, all deserve a mention somewhere in the guide.

Sync broken

The GitBook <-> GitHub sync does not work (I get an error message via email). Solving via recommended "fix" methods does not seem to work.

Created a ticket at GitBook.

Python: preferred testing library

Currently, the preferred testing library for Python is nose.

However, the nose website states:
Nose has been in maintenance mode for the past several years and will likely cease without a new person/team to take over maintainership. New projects should consider using Nose2, py.test, or just plain unittest/unittest2.

Do we need to update our preference?

NOTICE recursively?

By @blootsvoets :
Should we update NOTICE recursively? For example, in pyxenon, we include all dependencies of Xenon. Should we add all these dependencies to NOTICE? We also install packages from PyPi, should we include the licenses of all those dependencies? And recursively?

Data storage options

Question for a project from @mkuzak:

Do we have a place to put big files. I have 160GB vm image that I need to archive. Any solution?

We should put the answer in the guide as well.

First attempt at an answer: Zenodo

dev setup in docker image should be moved docs

Instead of being separate item in checklist should be sentence in the documentation.md#documented-development-setup. For example:
"If your dev setup is complicated please consider providing a Dockerfile and docker image with dev setup."

R chapter has non existing entries in SUMMARY

In SUMMARY.md, The R chapter has references to non-existing files, leaving "broken" links.

I'll put them in comments for now. Feel free to restore when there is actually a file there (though I do not think we should need subsections for language guides, perhaps this is a hint it is getting too long ;-)

Best and Good Enough practices in scientific computing

Our guide is all about best practices. However, there is also such a thing as "good enough".

These two (known) sources are great in explaining these:

Good Enough Practices in Scientific Computing https://arxiv.org/abs/1609.00037
Best Practices in Scientific Computing http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745

We should at least link to these somewhere. I propose to do this in the introduction of the software guide (which is currently a single sentence, so could use some more content anyway)

Link to emergency revert for accidentally uploaded confidential data

By @egpbos

In case the code uses data, that cannot be open, an engineer should try to keep sensitive parts outside of the main codebase.

It sometimes happens that sensitive data is accidentally uploaded. It would be nice if the checklist has an emergency get that stuff out of my history guide (or link to separate guide, e.g. this one https://help.github.com/articles/remove-sensitive-data/).

comments:
By @mkuzak
Agreed, this is important to have, but should go to documentation.
Actual solution should not be in the checklist, but we should add a note how to prevent (adding things to .gitignore).

Broken links in chapters

As per our shiny new broken link checker (see #24), we have a great overview of all the broken links in the guide. We should fix them :-)

See https://travis-ci.org/NLeSC/guide for an overview of all broken links.

Chapters with broken links atm:

  • Software/Communication
  • Software/Code Review
  • Software/Documentation
  • Languages/Java
  • Languages/Javascript
  • Languages/Python
  • Languages/R
  • Contributing

Chapter owners, please fix :-)

Java integration tests

copyright notice/disclaimer at the top of every source code file

By @mkuzak
I'm in favor of getting rid of those, since they are not required. Unless someone thinks we need it.

comments

By @sverhoeven
If the source code file is or can be distributed alone then the notice should be there.
If the source code file is always distributed with a LICENSE file, etc. then the notice can be removed.
@LourensVeen what do you think?


By @jiskattema
if i remember correctly, they are not (legally) necessary
How about only leaving them in if the license is not our preferred license, and redistribution is likely?


By @mkuzak
I agree with @sverhoeven on that, if the files are likely to be distributed separately they should have notice on top, otherwise not.


By @LourensVeen
It's not legally necessary. When you create a work you get to own the copyright, whether you put any notices on it or not (according to the Berne convention, which almost all nations are signatory to). If you don't have a statement in every file and somehow a file gets separated from the rest, then whoever gets it would not know that it's licensed, and therefore should not distribute it. If they can somehow ascertain that it's ASLv2, then they can safely distribute it, as long as they put the license notice back in, since the license requires that.

Of course, having such a notice prevents a lot of hassle in such a case, but I agree that in practice this probably doesn't happen often, and especially in short files it's not nice to have such a big block of text repeated everywhere.

Both the FSF and the ASF recommend putting a rather lengthy notice into every file, with a restatement of the warranty disclaimer. I've never heard of anyone suing anyone over a warranty issue with free software however, and I can't imagine a judge ruling that we're responsible for warranty on a free-of-cost piece of software licensed under a license that explicitly disclaims warranty, just because it wasn't restated in every single file. (If someone pays us to develop software for them, then we will probably be required to provide warranty regardless of what the license says, but that's a separate issue.)

Perhaps we can use a shortened version, something like this:

Copyright 2014, 2016 Netherlands eScience Center.
Licensed under the Apache License, version 2.0. See LICENSING for details.

The LICENSING file could then contain the longer statement, together with the actual license text, or something more complicated if the program consists of multiple modules with different licenses.

Anyone wanting to reuse a single file from our program would then be responsible for doing so according to the license, e.g. by having their own LICENSING file that explains that we own that particular file and contains the license, or by adding the longer statement to the file, or in some other way that the license we used allows.

Since there is at least a statement of who owns the copyright, if someone happens to come across a single file, they at least know that it's copyrighted (and can't claim they thought it was public domain) and they can contact us if there's any doubt as to what happened and what they are and aren't allowed to do.


By @larsmans
Agree with @LourensVeen on including that short notice. Program modules get copied around when users want only part of the functionality.


By @mkuzak
What will we do in case of files with external contributions. We'd have to ask people to transfer their copyrights to NLeSC. We should have some standard form for that too


By @LourensVeen
Not necessarily, the copyright could be shared. The contributors do have to explicitly contribute under Apache 2.0, but the license already contains language that says that all contributions are considered to be under Apache 2.0 as well.

Of course, this will make it more difficult to relicense the code, because all contributors have to give permission. This is more of a governance matter than a licensing matter though.

Add MoSCoW method for requirements

By @LourensVeen
MoSCoW is a simple method which gives a bit of structure to the list of requirements. We should use it.

Comments

By @mkuzak
I think that should be part of roadmap. Currently we're missing ROADMAP.md we should think about it and I this would go into this file in my opinion.


By @blootsvoets
I think what Lourens means is to use this MoSCoW method on the estep-checklist. In my opinion, if we use Github milestones correctly (I don't think any NLeSC project does), there isn't that much need for ROADMAP.md. Of course, in milestones we could also use the MoSCoW method.


By @mkuzak
I think Lourens meant MoScoW for the software. At least that's what I can tell from the file changes in pull request. @LourensVeen please correct :).
I do think roadmap becomes important when you want to attract external contributors. People will want to know that the project is going the same direction they are interested in if they're going to invest their time in it.


By @LourensVeen
No, I meant using MoSCoW for the software, not for the checklist. Although of course we could apply the checklist to the checklist and ascend to a meta-recursive state of zen, but then we need to order Mateusz some wireless headphones so he can float through the office unrestrained by cabling.

Anyway, my main point is that MoSCoW is a nice tool for organising requirements, and that we should use it, mainly at the beginning of a project and at any re-plannings along the way.

The list of MoSCoW-prioritised requirements could be on the project manager's laptop, but why not make it part of the project documentation? As Mateusz says, it's nice for (potential) external contributors to have an idea of where we are going.


By @jiskattema

You can easily do it in github with Labels; just standardize on the naming. 'Must' = blocker, 'Should' = highly desired / milestone, 'Could' = potential feature, 'Would' = nice to have, but dont wait for it.

Or, an alternative interpretation:
Must: certain
Should: likely
Could: not likely
Wont: definitely not


By @LourensVeen
Well, but then you'd need some rather broad issues, which would be open for the duration of most of the project. Maybe it's better to have that separate from the day-to-day work?

Explain better what should be in the language guides

There is a languages_overview.md that specifies what each chapter should contain. I think there should be more explanation. For example, it wasn't clear to me what available templates means (it now is: default project layout).

C++: linters and profilers

Good linters perhaps should be listed under a separate subheader under Style:

  • cpplint (is mentioned already): checks (mostly?) for style, following the Google style guide.
  • cppcheck: does static analysis, checks for bugs.

These can (and perhaps should, though cpplint can be debated, I guess) be used simultaneously, they are complementary.

Profilers:

  • cachegrind
  • gprof

Good article explaining pitfalls in using profilers:
http://yosefk.com/blog/how-profilers-lie-the-cases-of-gprof-and-kcachegrind.html

checklist divided by events in the project

Example of events:

  • first user
  • first contributor (pull requests)
  • first external collaborator (push access)
  • first release
  • first dissemination

developer should make sure those the relevant checklist items are implemented at the time of those events

We should create an additional checklist page with items matching those events.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.