Comments (18)
Possibly related to #8 ?
from hpc-intro.
I like the idea. I'd be careful doing so though. For me, (gnu) parallel
is a way to replace a shell for-loop. From that perspective, it would be a nice addition to hpc-shell
(also to differentiate to swc-shell
if that is needed).
It's however debatable, which HPC user behavior would be fostered when teaching parallel
. From an admin/dev perspective, it would foster writing even more difficult shell scripts potentially. Compared to doing the same thing using a programming language (python multi-threading/-processing or any other shared-memory parallelisation technique). In this regard, I'd prefer to give people a more thorough approach to apply parallelisation (profiling, hot spot search, speed-up estimation).
I know there is interest in this topic from some people (mostly from a cloud perspective AFAIK). So my vote would be on potentially adding it as extra material.
from hpc-intro.
Yes, having parallel
as an extra, a callout or side note of sorts, is what would make the most sense.
I agree that there is a danger in walking too far down the path toward difficult shell scripts. However, it does have the potential to provide an introduction to thinking about processing in parallel on a single line and that's the core reason that I find it attractive. Anything else and we'd typically be in a mess this early in the arc of a workshop.
Regardless, as I think we also agree, we'd need the right use case.
Likely something to just mull over for the time being and return to once we've got more essential changes looked after.
from hpc-intro.
Here is a parallel
example we put together using the Nelle Nemo story line for a group of undergraduate students. Maybe something in there will provide some helpful ideas. As @psteinb mentioned, we used it as a replacement for a for-loop.
from hpc-intro.
TBH, I like @pdoehle's parallel Nelle Nemo exercise more than our current "build fastqc" demo. This is nice work that meshes well with (due to building upon) another Carpentries lesson, which helps to reinforce and gently expand on previously covered knowledge.
from hpc-intro.
Yap, I agree. I think though that we should have a larger discussion where such parallel paradigms/tools should go. I like the Nelle Nemo example too, to be frank.
To illustrate my point: One could argue that a similar example can very well tie into the snakemake intro given in hpc-python, see https://github.com/hpc-carpentry/hpc-python/blob/gh-pages/_episodes/11-snakemake-intro.md. So I'd encourage a more conceptual discussion where an introduction to any parallel paradigm can and should go within hpc-carpentry.
from hpc-intro.
I have an example of GNU parallel (https://github.com/SupercomputingWales/SCW-tutorial/blob/gh-pages/_episodes/07-optimising-for-parallel-processing.md) which I use that's based upon Nelle's pipeline from the Software Carpentry Unix Shell Novice. We expanded that example to have it to have 6000 files to process instead of the original 17. There's also a section on a more complex multi argument example. I'm happy to integrate this into HPC Carpentry if there's interest in reusing this material.
from hpc-intro.
from hpc-intro.
There is a lesson for this already developed:
https://deapsecure.gitlab.io/deapsecure-lesson01-hpc/
from hpc-intro.
It may be helpful to move the MPI section from HPC-intro to another 4 hour block such as HPC-novice or HPC-programming as discussed here. GNU parallel or some other embarasingly parallel task or throughput application may be good to have in the intro as it would build on knowledge more smoothly.
from hpc-intro.
GNU parallel is a useful tool, but IMO, it reflects a high throughput computing workflow to a much greater extent than a high performance computing paradigm, discretizing at the task rather than the data structure. This should be mentioned and perhaps covered "somewhere," but not here, at least for now, while we focus on MPI using C, Python, etc.
from hpc-intro.
MPI is very rushed in the last section of the current 4.5 hour hpc-intro. HPC in a day is at least 6.5 hours of material. Most Carpentry workshops are about 16 hours. Moving MPI and other parallel programming models to a separate section of 3 to 4 hours would give a better learning experience. An embarasingly parallel job script is probably a reasonable ending point for HPC-intro. Should one assume sw-shell as a pre-requisite?
from hpc-intro.
As explained by Dursi here simply focusing on MPI will do a disservice to the many ways in which HPC clusters are used. Introducing a variety of programming models in a more structured way would be beneficial.
from hpc-intro.
A good and valid point, @bkmgit, thank you. I'm coming at HPC from the realm of PDE solvers, where MPI, OpenMP, and CUDA rule, but the umbrella is much broader than my experience.
GNU parallel is essentially a tool for dispatching jobs on the local resource, which is exactly the role of a queuing system on the cluster. Since we spend a bunch of time introducing queuing systems, and not much time at all using them, launching a bunch of jobs from a reconfigurable script, or by creating a job array, would be a great way to demonstrate the core tool and conclude the lesson.
from hpc-intro.
Comments on this issue since August all share a theme: it's a great idea, but hpc-intro is not the right lesson to incorporate GNU parallel. Recommend closing this issue.
from hpc-intro.
Am ok with closing the issue and creating a new one for reconfigurable script example or job array for https://carpentries-incubator.github.io/hpc-intro/16-parallel/index.html
from hpc-intro.
The assumption being that a typical introduction will have two modules, hpc-intro and hpc-novice/hpc-parallel
from hpc-intro.
Completely agreed! This issue is superseded by #244.
from hpc-intro.
Related Issues (20)
- E-mail notification from jobs?
- Amdahl's Law confusion
- Images in the jargon presentation are not rendering
- Provide reading resources for backup of essential data
- record jargon presentation
- incorporate firewall gif
- Amdahl code deployment strategy HOT 1
- propagate script name through snippet library
- Jargon buster presentation - presenter notes repeated
- ENH: Possible addition of Netlify-bot HOT 2
- Interesting forks of the `hpc-intro` lesson
- scp introduced during ssh keygen without explanation
- Broken link to Python code
- Add some material on environment variables? HOT 1
- use MagicCastle as the default snippet library HOT 1
- Question about username on the cluster HOT 3
- Confused with "shell application with SSH"
- Shell prerequisites for hpc-intro
- Tiny self-hosted cluster for HPC Carpentry workshop? HOT 2
- Adopting the Carpentries Workbench...and reducing divergent forks! HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hpc-intro.