Giter Site home page Giter Site logo

Comments (7)

shwina avatar shwina commented on June 19, 2024

Can you explain how the src, data and results directory would be laid out? We have to be very careful here - my experience is that the more directories that learners work with, the more issues that come up during the workshop with learners running commands from the wrong directory.

from matlab-novice-inflammation.

gcapes avatar gcapes commented on June 19, 2024

Sure - it's hard to draw a directory structure given it will be rendered with markdown, so I'll just explain instead.
Working directory would be matlab-novice-inflammation for the whole lesson.
Subdirectories: src, results, and data. src contains scripts and functions, results contains the plots created, and data contains the inflammation-xx.csv files.

We can add the src directory to the MATLAB path -- this seems like a more logical way to introduce the PATH, and should avoid problems running any scripts and functions created. Optionally we could add the data directory to the PATH, but it seems better practice to use relative paths for data files: data/inflammation-01.csv etc. I would update all the scripts to output the plots into the results directory.

I'll put a call out with a brief summary of the project organisation recommendations from https://swcarpentry.github.io/good-enough-practices-in-scientific-computing/

At any rate, the inconsistency between the setup and the lesson text needs fixing.

Does the above sound ok?

from matlab-novice-inflammation.

shwina avatar shwina commented on June 19, 2024

@gcapes

Thanks for this issue and for the PR, I think it's very important to get this part right. Personally, I'm not a fan of adding the main script to path in this way. I'd much rather:

  • Have learners work from a project directory, in this case matlab-novice-inflammation
  • Keep their main scripts (e.g., analyze.m) in this directory
  • Have a data and results directory, and refer to files in them in a relative way, e.g., data/inflammation-01.csv and results/inflammation-01.png

So:

matlab-novice-inflammation/
├── analyze.m
├── data/
└── results/

Rather than:

matlab-novice-inflammation/
├── data/
├── results/
└── src/
    └── analyze.m

I think that having an src directory and adding it to the path here is awkward:

  1. If the main script refers to the data and results folder in a relative way, it can only ever run correctly from the matlab-novice-inflammation directory. So why not it just live there? A good reason to add something to the path is so that it can be used from anywhere, but that's not the case here.

  2. I don't have access to MATLAB right now, but let's say we are editing the script analyze.m (which would live in the src directory), and we clicked the "Run" (green arrow) button from the editor. Wouldn't this run the script in the srcdirectory?

That being said, I do realize that the path is essential. For example, adding to path is appropriate when the project is composed of several .m files that can be organized into directories. For example:

project/
├── main.m
├── mathFunctions/
│   ├── max.m
│   └── min.m
├── miscFunctions/
│   └── config.m
└── plottingFunctions/
    └── plot.m

In this case, one could add the mathFunction, miscFunctions and plottingFunctions to the path in the script main.m, so that functions like max and plot can be called from the script.

TL; DR: Adding to path is fine, but adding the main script to path is probably not a good idea, as it generally refers to input and output data in a relative way.

Let me know if that sounds reasonable!


This is not at all to disagree with the "best practices" you have referred to. In Python for example, it would feel completely natural for me to organize a project as follows:

project/
├── doc/
├── main.py
├── setup.py
├── src/
│   └── project/
│       ├── __init__.py
│       ├── mathFunctions/
│       │   ├── __init__.py
│       │   ├── max.py
│       │   └── min.py
│       ├── miscFunctions/
│       │   ├── __init__.py
│       │   └── config.py
│       └── plottingFunctions/
│           ├── __init__.py
│           └── plot.py
└── tests/

Here, setup.py sets up project as a package, enabling one to do:

from project.mathFunctions import max

And nothing needs to be done with the PYTHONPATH.

from matlab-novice-inflammation.

shwina avatar shwina commented on June 19, 2024

BTW: the directory trees above are generated using the tree command, just in case you haven't seen it before! :)

from matlab-novice-inflammation.

gcapes avatar gcapes commented on June 19, 2024

Thanks for the comments - I know I've got a ton of issues and PRs open right now :)

We should definitely take the time to consider any potential problems with this suggestion.

let's say we are editing the script analyze.m (which would live in the src directory), and we clicked the "Run" (green arrow) button from the editor. Wouldn't this run the script in the src directory?

Actually no: it runs in the current directory (which we've now explicitly set in the lesson (in this PR)). So this shouldn't be a problem.

That said, I don't see any fundamental problem with your suggestion of keeping all the scripts in the 'root' project directory, other than:

  1. it appears to go against the recommendations I've just introduced
  2. it would complicate things to explain that certain scripts/functions should be put in a src directory and others not. I guess functions can go in src just fine because any paths should be passed in as arguments.
  3. this then raises the question of whether to have a src folder at all. I can imagine learners being confused if their initial scripts live in the main project directory, but functions live in the src directory which then requires an extra step before they can be called (adding to the path) -- why not just keep all code in the project's root directory? This then takes us back to point (1.)

Do you have any thoughts on how to resolve the above?

I'll look into the tree command - it looks useful! Thanks for the tip!

from matlab-novice-inflammation.

shwina avatar shwina commented on June 19, 2024

Thanks for the comments - I know I've got a ton of issues and PRs open right now :)

Edit I meant to add here: thank you for all your work on this lesson!

I think that the src/ directory can be done without. Having the source code for a project under an src/ directory is one way to organize the project, but it's certainly not the only way. Going back to the example above, I could as easily organize my Python project as:

project/
├── doc/
├── main.py
├── setup.py
└── project/
│       ├── __init__.py
│       ├── mathFunctions/
│       │   ├── __init__.py
│       │   ├── max.py
│       │   └── min.py
│       ├── miscFunctions/
│       │   ├── __init__.py
│       │   └── config.py
│       └── plottingFunctions/
│           ├── __init__.py
│           └── plot.py
└── tests/

I think having a separate data and results folder already captures the idea of "different directories for different things", and is more in the spirit of "Good enough practices". We're only ever writing a single function called run_analysis in this lesson that's relevant to the inflammation analysis, and I don't think that warrants a separate directory.


Rant:

I'm actually not even convinced that this function is good practice - I blame myself for not thinking of this earlier:

  • Is vaguely named, it's probably more appropriate to call it plot_inflammation_data
  • It does too much (reads a file, computes stats, plots them)
  • It doesn't do enough (cannot have a custom output file name)
  • It's hardly reusable, only applies to inflammation data

from matlab-novice-inflammation.

gcapes avatar gcapes commented on June 19, 2024

LOL @ the edit. Thanks! I've held off using this material for ages because I found it a bit too unsatisfactory, but I think there's enough potential in it for it to be useful for me -- hence all the PRs.

I also appreciate it takes time to review all the issues and PRs I've made, so thanks for that.

There are quite a few more things to fix, and I agree with all your points under the rant section. There's no strong argument for a src directory beyond personal preference, so I'll remove all instructions to use a src directory for this lesson, but keep the best practices call out.

Would you prefer I rebase given this is still just PR?

Once this PR is merged I'm happy to work on addressing some of the rant issues.

from matlab-novice-inflammation.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.