nlesc / guide Goto Github PK

View Code? Open in Web Editor NEW

45.0 32.0 27.0 15.24 MB

Software Development Guide

Home Page: https://guide.esciencecenter.nl

License: Creative Commons Attribution 4.0 International

HTML 80.90% CSS 19.10%

best-practices escience language-guides research-software research-software-engineering

guide's Introduction

Guide

This is a guide to software development at the Netherlands eScience Center. It both serves as a source of information for how we work at the eScience Center, and as a basis for discussions and reaching consensus on this topic.

Read The Turing Way instead

If you are looking for an overall picture of best practices, read The Turing Way first. We joined forces with that guide for most of our generic research software engineering advice. Because The Turing Way is language agnostic, this guide mostly provides addtional specific language guides. Please be aware that most remaining language agnostic content is outdated, be careful when using it. We plan on removing that content (#286).

If you would like to contribute to this book see CONTRIBUTING.md.

guide's People

Contributors

Stargazers

Watchers

guide's Issues

GPGPU

via @isazi

Prace has released a best practices GPGPU guide.

Would it make sense to link to this in the OpenCL/Cuda section of the guide?

http://www.prace-ri.eu/best-practice-guide-gpgpu-january-2017/
http://www.prace-ri.eu/IMG/pdf/Best-Practice-Guide-GPGPU-1.pdf

Better Code Hub

Last week during ICTOpen, I spoke with someone from SIG (software improvement group, not special interest group). They've developed their The 10 Guidelines for Maintainable Software, and have tool which checks your github repo to assess how maintainable id your code.

It looks interesting -- maybe it is just yet another code quality tool. But maybe it is worth checking it out! If it turns out to be quite good, maybe we can include it in the guide?

Python virtualenvwrapper

Add something about that, hugely improves usability of virtualenv (though conda is still preferable imho)

Create guide for Data Review process

C++ chapter header needs improving

The C++ chapter has a few controversial statements. We should have a (probably offline) discussion on what the consensus and/or official viewpoint is at NLeSC and change it accordingly.

Examples:

C++ is portable

C++ can be safe, if you want to.

don't write C.

React is the new Angular

The Javascript chapter should include something about React.

Explain that you are allowed to have GPL code as dependency

(for Apache 2.0 code)

Python

Before I start a commit-war, let's do a simple religious war in this issue ;)

There are a few things I don't like about the latest version of the Python guide:

Conda seems to be the non-recommended option. Imho Conda is currently far superior to pip + virtualenv. One important thing to note is that one can use pip inside a Conda enviroment as well. Packages that are not included in Conda can still simply be installed with pip. This is common practice, especially when developing. It makes no sense whatsoever to compile numpy and scipy every time you need a new virtualenv. Until pip gets at least a build cache (preferably just a binary download like Conda, but okay, a build cache would already be a great improvement), I think we should advocate using Conda. For saving on power and the environment, if nothing else.
I know some people like PyCharm, and I guess I can live with that ;) I don't like IDEs at all though. I generally find recommending editors a very tricky subject, actually. Unless there is like 90% agreement within a community that a tool is indispensable (like pip), I'm not sure why we should talk about it (unless it's clearly superior, like conda :D). I don't think anybody will visit our guide to get advise on editors, they will simply Google it (I would); and anyway, the editor is such a basic need that people don't need guidance on it. Package management is different of course, since a beginner wouldn't know that there is such a thing at all, so there it helps to have the "Googleable terms" collected in a single place.
Multiprocessing is a mess in Python. I think we should just put it plainly like that. For sure, recommending just one package totally gives the wrong picture. The alternative to just letting people find out on their own is to list many more packages. For instance, iPython/Jupyter also has parallel stuff built in, there's things like RabbitMQ, we have Noodles, you could build a C module or use Numba or PyCUDA, etc. My preference would be to just state that Python is not really built for it.
About iPython: clearly this is a superior interpreter, but that's about it. What grew out of it is also very interesting: Jupyter notebooks. Jupyter has become language agnostic, but still, it is a very powerful Python tool for interactive analysis, education, communication. Should mention this.

I can write a pull request soon incorporating these aspects. Before doing so, just want to make sure that it's not going to be rejected ;) Shout out to previous contributors: @jvdzwaan @blootsvoets @nielsdrost

Project chapter needs work

The chapter on projects need some work. For instance, the page on project planning is completely empty.

Nicer landing page

Would it be worth to put some effort into making a nicer landing page (https://guide.esciencecenter.nl/index.html)? Or at least one with somewhat more clear information as to what one can find in the guide, for who the guide is (us, RSEs, PI’s, project members? all?), how one can give input/suggestions, and what the goal of the guide is?

api design guide

There should be something in the guide about good API design. I would point to https://geemus.gitbooks.io/http-api-design/content/en/ and swagger.io
The problem I have is I don't know where to put it. It could go together with language guides.

Permanent search bar

I only found out today that there is a search option. How about making it better (=permanent) visible? That really helps people getting to their goal quicker. Here's an example of how did they that: https://componently.github.io/componently/.

link to editorconfig file is dead

See title, I can't seem to find the file...

Tools of the Trade chapter?

I recently discovered https://draw.io as a really nice way of drawing diagrams. I of course instantly though about adding this to the guide!

However, we seem to not have a section for this. :-(

Do we need a new chapter for this? I can image things like Overleaf, SURFdrive, and Figshare, all deserve a mention somewhere in the guide.

Sync broken

The GitBook <-> GitHub sync does not work (I get an error message via email). Solving via recommended "fix" methods does not seem to work.

Created a ticket at GitBook.

project documentation should have description on how to report bugs

Python: preferred testing library

Currently, the preferred testing library for Python is nose.

However, the nose website states:
Nose has been in maintenance mode for the past several years and will likely cease without a new person/team to take over maintainership. New projects should consider using Nose2, py.test, or just plain unittest/unittest2.

Do we need to update our preference?

Describe what to do if a GitHub repository is currently not maintained

Describe to visitors of the repository what they can expect.

For example add a sentence to the top of the README.md like:

This repository is currently not maintained. We welcome people to fork this repository for further development and maintenance.

What should be included in CONTRIBUTING.md

More information on what should contribution guidelines include.
We need more examples and NLeSC templates.

Comments

By @sverhoeven
Example by me at https://github.com/3D-e-Chem/3D-e-Chem-VM/blob/master/CONTRIBUTING.md

release check: let the non-commiter follow instalation and usage instructions

Links in guide content should be checked

Somewhat like the links in the software site, we should check if the links in the guide are (still) valid.

R: add link to "Writing better R code" by Laurent Gatto

www.bioconductor.org/help/course-materials/2013/CSAMA2013/friday/afternoon/R-programming.pdf

each chapter should have an owner/maintainer

...and the list should be somewhere in the guide (e.g. as a chapter_owners.md or in the README.md)

NOTICE recursively?

By @blootsvoets :
Should we update NOTICE recursively? For example, in pyxenon, we include all dependencies of Xenon. Should we add all these dependencies to NOTICE? We also install packages from PyPi, should we include the licenses of all those dependencies? And recursively?

Data storage options

Question for a project from @mkuzak:

Do we have a place to put big files. I have 160GB vm image that I need to archive. Any solution?

We should put the answer in the guide as well.

First attempt at an answer: Zenodo

Project name spaces

At https://github.com/NLeSC/guide/blob/master/software/code_quality.md#name-spaces it says to use nl.esciencecenter.

For projects I would like suggest to use the GitHub organization domain.

For example for the 3D-e-Chem project the name space could be 3d-e-chem.github.io

dev setup in docker image should be moved docs

Instead of being separate item in checklist should be sentence in the documentation.md#documented-development-setup. For example:
"If your dev setup is complicated please consider providing a Dockerfile and docker image with dev setup."

Python: preferred documentation style

Dafne updated the Python section about documentation (thank you!), what needs to be added is advice on what documentation style to use for new projects.

The choices are: NumPy style or Google style

Do we have a preference? What are arguments for/against?

See https://github.com/NLeSC/guide/blob/update-python/languages/python.md#writing-documentation (branch update-python)

release check: dummy proofing

Turn all the knobs and push all the buttons and see if it breaks.

R chapter has non existing entries in SUMMARY

In SUMMARY.md, The R chapter has references to non-existing files, leaving "broken" links.

I'll put them in comments for now. Feel free to restore when there is actually a file there (though I do not think we should need subsections for language guides, perhaps this is a hint it is getting too long ;-)

Too many options for "GitHub add-ons" in Code Review section

There are 13 options listed for "GitHub Add ons" in the Code Review section.

This is too many. Could we limit the selection to things we have experience with and that work well?

Which email to use in Code of Conduct?

Use lead developer NLeSC email address or project email or corporate email.

Comments

By @sverhoeven
I vote for lead developer email, as it is private and easy to setup.
👍

By @mkuzak
me too

Python visualization: what do people think of plotly?

https://plot.ly/

If we like it, we should add it to the list

Python: Add pyup.io as dep tracker

https://pyup.io/

Best and Good Enough practices in scientific computing

Our guide is all about best practices. However, there is also such a thing as "good enough".

These two (known) sources are great in explaining these:

Good Enough Practices in Scientific Computing https://arxiv.org/abs/1609.00037
Best Practices in Scientific Computing http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745

We should at least link to these somewhere. I propose to do this in the introduction of the software guide (which is currently a single sentence, so could use some more content anyway)

Python: add link to conda post

https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/

Link to emergency revert for accidentally uploaded confidential data

By @egpbos

In case the code uses data, that cannot be open, an engineer should try to keep sensitive parts outside of the main codebase.

It sometimes happens that sensitive data is accidentally uploaded. It would be nice if the checklist has an emergency get that stuff out of my history guide (or link to separate guide, e.g. this one https://help.github.com/articles/remove-sensitive-data/).

comments:
By @mkuzak
Agreed, this is important to have, but should go to documentation.
Actual solution should not be in the checklist, but we should add a note how to prevent (adding things to .gitignore).

measuring the appreciation of the users

Currently we do not capture that, but that would be very useful.

Broken links in chapters

As per our shiny new broken link checker (see #24), we have a great overview of all the broken links in the guide. We should fix them :-)

See https://travis-ci.org/NLeSC/guide for an overview of all broken links.

Chapters with broken links atm:

Chapter owners, please fix :-)

add reference ot choosealicense.com

We should mention https://choosealicense.com/ somewhere

Mention Zenodo in software/releases.md

A software release should have a doi. This can be achieved using Zenodo.
A doi and badge can be automatically generated using GitHub releases webhooks.

Java integration tests

Use docker-compose junit rule, https://github.com/palantir/docker-compose-rule
Use system-rules to capture stdout, etc, http://stefanbirkner.github.io/system-rules/index.html
See example https://github.com/NLeSC/xenon-cli/blob/master/src/test/java/nl/esciencecenter/xenon/cli/SlurmTest.java

Also mocked web services using http://wiremock.org/docs/junit-rule/
See example https://github.com/3D-e-Chem/knime-kripodb/blob/master/tests/src/java/nl/esciencecenter/e3dchem/kripodb/ws/JavaWsWorkflowTest.java

copyright notice/disclaimer at the top of every source code file

By @mkuzak
I'm in favor of getting rid of those, since they are not required. Unless someone thinks we need it.

comments

By @sverhoeven
If the source code file is or can be distributed alone then the notice should be there.
If the source code file is always distributed with a LICENSE file, etc. then the notice can be removed.
@LourensVeen what do you think?

By @jiskattema
if i remember correctly, they are not (legally) necessary
How about only leaving them in if the license is not our preferred license, and redistribution is likely?

By @mkuzak
I agree with @sverhoeven on that, if the files are likely to be distributed separately they should have notice on top, otherwise not.

By @LourensVeen
It's not legally necessary. When you create a work you get to own the copyright, whether you put any notices on it or not (according to the Berne convention, which almost all nations are signatory to). If you don't have a statement in every file and somehow a file gets separated from the rest, then whoever gets it would not know that it's licensed, and therefore should not distribute it. If they can somehow ascertain that it's ASLv2, then they can safely distribute it, as long as they put the license notice back in, since the license requires that.

Of course, having such a notice prevents a lot of hassle in such a case, but I agree that in practice this probably doesn't happen often, and especially in short files it's not nice to have such a big block of text repeated everywhere.

Both the FSF and the ASF recommend putting a rather lengthy notice into every file, with a restatement of the warranty disclaimer. I've never heard of anyone suing anyone over a warranty issue with free software however, and I can't imagine a judge ruling that we're responsible for warranty on a free-of-cost piece of software licensed under a license that explicitly disclaims warranty, just because it wasn't restated in every single file. (If someone pays us to develop software for them, then we will probably be required to provide warranty regardless of what the license says, but that's a separate issue.)

Perhaps we can use a shortened version, something like this:

The LICENSING file could then contain the longer statement, together with the actual license text, or something more complicated if the program consists of multiple modules with different licenses.

Anyone wanting to reuse a single file from our program would then be responsible for doing so according to the license, e.g. by having their own LICENSING file that explains that we own that particular file and contains the license, or by adding the longer statement to the file, or in some other way that the license we used allows.

Since there is at least a statement of who owns the copyright, if someone happens to come across a single file, they at least know that it's copyrighted (and can't claim they thought it was public domain) and they can contact us if there's any doubt as to what happened and what they are and aren't allowed to do.

By @larsmans
Agree with @LourensVeen on including that short notice. Program modules get copied around when users want only part of the functionality.

By @mkuzak
What will we do in case of files with external contributions. We'd have to ask people to transfer their copyrights to NLeSC. We should have some standard form for that too

By @LourensVeen
Not necessarily, the copyright could be shared. The contributors do have to explicitly contribute under Apache 2.0, but the license already contains language that says that all contributions are considered to be under Apache 2.0 as well.

Of course, this will make it more difficult to relicense the code, because all contributors have to give permission. This is more of a governance matter than a licensing matter though.

Packages should be put on PyPI using the nlesc account

Also explain where to find the password.

Add MoSCoW method for requirements

By @LourensVeen
MoSCoW is a simple method which gives a bit of structure to the list of requirements. We should use it.

Comments

By @mkuzak
I think that should be part of roadmap. Currently we're missing ROADMAP.md we should think about it and I this would go into this file in my opinion.

By @blootsvoets
I think what Lourens means is to use this MoSCoW method on the estep-checklist. In my opinion, if we use Github milestones correctly (I don't think any NLeSC project does), there isn't that much need for ROADMAP.md. Of course, in milestones we could also use the MoSCoW method.

By @mkuzak
I think Lourens meant MoScoW for the software. At least that's what I can tell from the file changes in pull request. @LourensVeen please correct :).
I do think roadmap becomes important when you want to attract external contributors. People will want to know that the project is going the same direction they are interested in if they're going to invest their time in it.

By @LourensVeen
No, I meant using MoSCoW for the software, not for the checklist. Although of course we could apply the checklist to the checklist and ascend to a meta-recursive state of zen, but then we need to order Mateusz some wireless headphones so he can float through the office unrestrained by cabling.

Anyway, my main point is that MoSCoW is a nice tool for organising requirements, and that we should use it, mainly at the beginning of a project and at any re-plannings along the way.

The list of MoSCoW-prioritised requirements could be on the project manager's laptop, but why not make it part of the project documentation? As Mateusz says, it's nice for (potential) external contributors to have an idea of where we are going.

By @jiskattema

You can easily do it in github with Labels; just standardize on the naming. 'Must' = blocker, 'Should' = highly desired / milestone, 'Could' = potential feature, 'Would' = nice to have, but dont wait for it.

Or, an alternative interpretation:
Must: certain
Should: likely
Could: not likely
Wont: definitely not

By @LourensVeen
Well, but then you'd need some rather broad issues, which would be open for the duration of most of the project. Maybe it's better to have that separate from the day-to-day work?

Python: choose recommended approach for building packages

Explain better what should be in the language guides

There is a languages_overview.md that specifies what each chapter should contain. I think there should be more explanation. For example, it wasn't clear to me what available templates means (it now is: default project layout).

C++: linters and profilers

Good linters perhaps should be listed under a separate subheader under Style:

cpplint (is mentioned already): checks (mostly?) for style, following the Google style guide.
cppcheck: does static analysis, checks for bugs.

These can (and perhaps should, though cpplint can be debated, I guess) be used simultaneously, they are complementary.

Profilers:

cachegrind
gprof

Good article explaining pitfalls in using profilers:
http://yosefk.com/blog/how-profilers-lie-the-cases-of-gprof-and-kcachegrind.html

Guide contains forbidden words

In particular, I count at least 68 occurences of "NLeSC" [sic] that should be expunged.

checklist divided by events in the project

Example of events:

first user
first contributor (pull requests)
first external collaborator (push access)
first release
first dissemination

developer should make sure those the relevant checklist items are implemented at the time of those events

We should create an additional checklist page with items matching those events.

nlesc / guide Goto Github PK

guide's Introduction

Guide

guide's People

Contributors

Stargazers

Watchers

Forkers

guide's Issues

Comments

Comments

comments

Comments

Recommend Projects

Recommend Topics

Recommend Org