Comments (16)
It may be worth a footnote that this is common, but it does depend on each site's scheduling policies and is not universal.
At our facility, queue policies are setup to encourage and favor large jobs. While the smallest jobs often quickly run as backfill, there is a middle ground that can lose out to larger jobs, depending on a variety of factors.
from hpc-intro.
OK, I take that positive response as an encouragement and make a SVG (easier to modify and version control friendly) . I think the restaurant idea is more closer as we have tables with fixed number of seats. When you meant host/hostess I guess you were thinking more of when you arrive at the door and then someone take you in when a table is empty .
from hpc-intro.
@bernhold I agree about the footnote, our site is the same.
from hpc-intro.
I have given the following intro to scheduling talk,
there are many potential diagrams in that presentation to simplify and illustrate
concepts. I can edit them as well for specific example for addition to HPC carpentry.
from hpc-intro.
Scheduler - in our current training material we depict scheduler as a "bouncer" manger a queue for crowded club (Slide 17 of https://www.uio.no/english/services/it/research/events/2018b/abel_intro_march2018.pdf) . If this makes sense, I can create a diagram (we do not have a citation for the current diagram) with CC-BY.
from hpc-intro.
I love it! I've definitely compared a scheduler to the host/hostess at a restaurant, which is the same idea.
from hpc-intro.
@Sabryr yes, that's what I meant. I also really like that analogy because (at least on our systems), jobs that request fewer resources will start sooner, just like smaller parties get seated faster at a busy restaurant. ;)
from hpc-intro.
Yes, Site specific configurations and SLURM configuration options for fair usage are important. When users know this they would have a better understanding on for example "why I had to wait longer today". While supporting the foot note idea, I suggest to elaborate this further in an "optional section" or similar (do not want to complicate stuff at this stage though).
from hpc-intro.
@Sabryr and @ChristinaLK, I like the analogy of the host/head waiter/maître d' leading you to an appropriately sized table, once one becomes available.
@bernhold, I think the analogy holds: your facility would be like a restaurant with several very large tables, and few small ones. The medium-sized jobs just have to wait until a suitable table opens up, or until the maître d' can find a complementary group to add so that the composite fills a large table.
edited for spelling, jargon thesaurus, word choice
from hpc-intro.
Cross-posted from #84
The metaphor seems to break down the further it stretches. In a restaurant, raw material is converted to finished results by the back-of-house staff, usually hidden in the kitchen: this is the parallel workforce. The front-of-house staff carry the results from the workers to the clients, more like an interconnect or intranet linking the HPC facility to the campus or Internet.
Perhaps better analogies could be drawn between a shared office space, where the workers are the professionals occupying each office. Reservations and access are managed through the front desk (workload manager). Different offices serve different purposes (architectures/accelerators): accounting jobs go to the accountant, legal to the lawyer, et cetera. A conference room (interconnect) permits efficient collaboration by temporary associations (communicators) of different professionals (nodes).
A linear workflow can be crafted...
- A client comes through the door, carrying with them their notes and reference materials (SSH into the login node).
- The client requests an appointment at the front desk (submit the job to the workload manager).
- The front desk staff reads through the client's portfolio to assess the workload requirements.
- If none have been specified, or the request exceeds this office's resources, then the job is rejected.
- If the job is manageable, the front desk determines the earliest available time and sets the appointment (returns the job ID).
- Appointments are best-estimates. The actual start time may drift relative to the reported "start time."
- If the task requires only one professional, then at the appointed time, the front desk hands off the client's portfolio and time limit and the worker gets to work.
- Once finished, or out of time, the front desk retrieves the updated portfolio.
- If the task requires more than one professional, then at the appointed time, the front desk calls the necessary workers into the conference room and delivers the portfolio and time limit. The pool of workers get to work, communicating as necessary, until the task is complete or the clock runs out.
- Once finished, or out of time, the front desk retrieves the updated portfolio.
- The client may ask at the front desk (check job status), or the front desk may contact the client when the job changes state (start, finish, error).
- Once the job is complete, the client may retrieve the portfolio from the front desk (SSH into or RSYNC from the login node).
All that being said, explaining this extended metaphor in detail would be tantamount to describing the real HPC system in detail. I doubt this abstraction helps the learner to understand; it would take a couple walk-throughs in the class to get the facts straight; and it doesn't help anyone actually understand and use an HPC resource. The time would be better spent, in my opinion, in describing increasingly complex computational frameworks:
- You launch a program on the computer at your desk.
- You ask a colleague to run the program on their beefier computer. They let you know when it finishes.
- You modify the program to use all available cores on your colleague's computer.
- ...
- You submit your job to the queuing system, and it runs in no time on the HPC resource.
from hpc-intro.
@tkphd I still think it's useful to present a metaphor (maybe more than one!)
It sounds like to be helpful, we should keep it rather simplified, just to avoid pushing it to the point where it breaks down.
from hpc-intro.
@ChristinaLK, sure, I don't disagree. My argument is that the restaurant metaphor is best suited for explaining the scheduler as the head waiter, only. It has the added benefit that most people are familiar with the concept of a restaurant, so an illustration is not strictly necessary.
Finding additional, better-suited metaphors for workers and resources would be great.
from hpc-intro.
I'm shocked to see #84 closed, which means I've failed to communicate constructively. @Sabryr, please accept my sincere apology for turning discussion of your work to a discouraging or hostile direction. My goal was to encourage further discussion, and eventually to have an adjusted version of your illustration for reference. I hope that you will consider re-engaging, and re-opening your pull request. I will certainly take this exchange as an opportunity to revise my tone and try harder to foster collaboration on this developing curriculum.
I had a couple of fruitful discussions with @guyer and @reid-a about the restaurant metaphor. While it's not the best fit for describing an entire HPC ecosystem, @guyer in particular came up with some useful features of a workload/queue manager that could be discussed:
- Small parties can wait in line for a table with full service, or jump straight to the bar if they're in a hurry. This would be a quick intro to "fast" queues with decreased runtimes or constrained hardware.
- There will be different wait times for parties of 2-3 vs. 6-8, which is the same of queuing systems, if table size is an analog for number of nodes.
- The restaurant owner decides how many tables there are of each size class. This is the same as HPC partitions into small, medium, and large job sizes, with different numbers of nodes assigned to each.
- Overall, the task of the head waiter can be understood and used to outline the purpose and constraints of a queuing system: because there's a reasonable correlation between party size and dining time, reasonable estimates can be made, and dinner rarely takes more than 3 hours. However, in HPC, the queuing system must accommodate "diners" anywhere from a few minutes to a few weeks, which makes it very important to accurately guess at your job's runtime.
Again, @Sabryr and @ChristinaLK, thanks for engaging in this discussion, and please accept my humble apology for derailing it. I was wrong.
from hpc-intro.
@tkphd apology not required , the pull request was closed to submit a new one. Diff was too much to continue with that.
from hpc-intro.
That's a relief, @Sabryr, and I look forward to seeing the new PR.
I still stand by the apology, though, since I need to work on effectively communicating and dialing back dismissive comments. In particular, I fall into the common expertise trap of assuming things are obvious when they are, in fact, very much not.
from hpc-intro.
Still don't see the need for the apology, thank you for reviews. I will try to open up the same pull, to keep the discussions intact.
from hpc-intro.
Related Issues (20)
- E-mail notification from jobs?
- Amdahl's Law confusion
- Images in the jargon presentation are not rendering
- Provide reading resources for backup of essential data
- record jargon presentation
- incorporate firewall gif
- Amdahl code deployment strategy HOT 1
- propagate script name through snippet library
- Jargon buster presentation - presenter notes repeated
- ENH: Possible addition of Netlify-bot HOT 2
- Interesting forks of the `hpc-intro` lesson
- scp introduced during ssh keygen without explanation
- Broken link to Python code
- Add some material on environment variables? HOT 1
- use MagicCastle as the default snippet library HOT 1
- Question about username on the cluster HOT 3
- Confused with "shell application with SSH"
- Shell prerequisites for hpc-intro
- Tiny self-hosted cluster for HPC Carpentry workshop? HOT 2
- Adopting the Carpentries Workbench...and reducing divergent forks! HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hpc-intro.