Giter Site home page Giter Site logo

Comments (16)

bernhold avatar bernhold commented on July 26, 2024 2

It may be worth a footnote that this is common, but it does depend on each site's scheduling policies and is not universal.

At our facility, queue policies are setup to encourage and favor large jobs. While the smallest jobs often quickly run as backfill, there is a middle ground that can lose out to larger jobs, depending on a variety of factors.

from hpc-intro.

Sabryr avatar Sabryr commented on July 26, 2024 1

OK, I take that positive response as an encouragement and make a SVG (easier to modify and version control friendly) . I think the restaurant idea is more closer as we have tables with fixed number of seats. When you meant host/hostess I guess you were thinking more of when you arrive at the door and then someone take you in when a table is empty .

from hpc-intro.

ocaisa avatar ocaisa commented on July 26, 2024 1

@bernhold I agree about the footnote, our site is the same.

from hpc-intro.

kamil963 avatar kamil963 commented on July 26, 2024

I have given the following intro to scheduling talk,
there are many potential diagrams in that presentation to simplify and illustrate
concepts. I can edit them as well for specific example for addition to HPC carpentry.

https://docs.google.com/presentation/d/e/2PACX-1vQFH3oQL6NAOogswftWy19E1jwIkzO0lFzNKQVXKRPX4QaxCh3SB4EYxg3b7QlXCNEP7k6x6j8DYHDh/pub?start=false&loop=false&delayms=3000

from hpc-intro.

Sabryr avatar Sabryr commented on July 26, 2024

Scheduler - in our current training material we depict scheduler as a "bouncer" manger a queue for crowded club (Slide 17 of https://www.uio.no/english/services/it/research/events/2018b/abel_intro_march2018.pdf) . If this makes sense, I can create a diagram (we do not have a citation for the current diagram) with CC-BY.

from hpc-intro.

ChristinaLK avatar ChristinaLK commented on July 26, 2024

I love it! I've definitely compared a scheduler to the host/hostess at a restaurant, which is the same idea.

from hpc-intro.

ChristinaLK avatar ChristinaLK commented on July 26, 2024

@Sabryr yes, that's what I meant. I also really like that analogy because (at least on our systems), jobs that request fewer resources will start sooner, just like smaller parties get seated faster at a busy restaurant. ;)

from hpc-intro.

Sabryr avatar Sabryr commented on July 26, 2024

Yes, Site specific configurations and SLURM configuration options for fair usage are important. When users know this they would have a better understanding on for example "why I had to wait longer today". While supporting the foot note idea, I suggest to elaborate this further in an "optional section" or similar (do not want to complicate stuff at this stage though).

from hpc-intro.

tkphd avatar tkphd commented on July 26, 2024

@Sabryr and @ChristinaLK, I like the analogy of the host/head waiter/maître d' leading you to an appropriately sized table, once one becomes available.

@bernhold, I think the analogy holds: your facility would be like a restaurant with several very large tables, and few small ones. The medium-sized jobs just have to wait until a suitable table opens up, or until the maître d' can find a complementary group to add so that the composite fills a large table.

edited for spelling, jargon thesaurus, word choice

from hpc-intro.

tkphd avatar tkphd commented on July 26, 2024

Cross-posted from #84

The metaphor seems to break down the further it stretches. In a restaurant, raw material is converted to finished results by the back-of-house staff, usually hidden in the kitchen: this is the parallel workforce. The front-of-house staff carry the results from the workers to the clients, more like an interconnect or intranet linking the HPC facility to the campus or Internet.

Perhaps better analogies could be drawn between a shared office space, where the workers are the professionals occupying each office. Reservations and access are managed through the front desk (workload manager). Different offices serve different purposes (architectures/accelerators): accounting jobs go to the accountant, legal to the lawyer, et cetera. A conference room (interconnect) permits efficient collaboration by temporary associations (communicators) of different professionals (nodes).
A linear workflow can be crafted...

  • A client comes through the door, carrying with them their notes and reference materials (SSH into the login node).
  • The client requests an appointment at the front desk (submit the job to the workload manager).
  • The front desk staff reads through the client's portfolio to assess the workload requirements.
  • If none have been specified, or the request exceeds this office's resources, then the job is rejected.
  • If the job is manageable, the front desk determines the earliest available time and sets the appointment (returns the job ID).
  • Appointments are best-estimates. The actual start time may drift relative to the reported "start time."
  • If the task requires only one professional, then at the appointed time, the front desk hands off the client's portfolio and time limit and the worker gets to work.
  • Once finished, or out of time, the front desk retrieves the updated portfolio.
  • If the task requires more than one professional, then at the appointed time, the front desk calls the necessary workers into the conference room and delivers the portfolio and time limit. The pool of workers get to work, communicating as necessary, until the task is complete or the clock runs out.
  • Once finished, or out of time, the front desk retrieves the updated portfolio.
  • The client may ask at the front desk (check job status), or the front desk may contact the client when the job changes state (start, finish, error).
  • Once the job is complete, the client may retrieve the portfolio from the front desk (SSH into or RSYNC from the login node).

All that being said, explaining this extended metaphor in detail would be tantamount to describing the real HPC system in detail. I doubt this abstraction helps the learner to understand; it would take a couple walk-throughs in the class to get the facts straight; and it doesn't help anyone actually understand and use an HPC resource. The time would be better spent, in my opinion, in describing increasingly complex computational frameworks:

  1. You launch a program on the computer at your desk.
  2. You ask a colleague to run the program on their beefier computer. They let you know when it finishes.
  3. You modify the program to use all available cores on your colleague's computer.
  4. ...
  5. You submit your job to the queuing system, and it runs in no time on the HPC resource.

from hpc-intro.

ChristinaLK avatar ChristinaLK commented on July 26, 2024

@tkphd I still think it's useful to present a metaphor (maybe more than one!)

It sounds like to be helpful, we should keep it rather simplified, just to avoid pushing it to the point where it breaks down.

from hpc-intro.

tkphd avatar tkphd commented on July 26, 2024

@ChristinaLK, sure, I don't disagree. My argument is that the restaurant metaphor is best suited for explaining the scheduler as the head waiter, only. It has the added benefit that most people are familiar with the concept of a restaurant, so an illustration is not strictly necessary.

Finding additional, better-suited metaphors for workers and resources would be great.

from hpc-intro.

tkphd avatar tkphd commented on July 26, 2024

I'm shocked to see #84 closed, which means I've failed to communicate constructively. @Sabryr, please accept my sincere apology for turning discussion of your work to a discouraging or hostile direction. My goal was to encourage further discussion, and eventually to have an adjusted version of your illustration for reference. I hope that you will consider re-engaging, and re-opening your pull request. I will certainly take this exchange as an opportunity to revise my tone and try harder to foster collaboration on this developing curriculum.

I had a couple of fruitful discussions with @guyer and @reid-a about the restaurant metaphor. While it's not the best fit for describing an entire HPC ecosystem, @guyer in particular came up with some useful features of a workload/queue manager that could be discussed:

  • Small parties can wait in line for a table with full service, or jump straight to the bar if they're in a hurry. This would be a quick intro to "fast" queues with decreased runtimes or constrained hardware.
  • There will be different wait times for parties of 2-3 vs. 6-8, which is the same of queuing systems, if table size is an analog for number of nodes.
  • The restaurant owner decides how many tables there are of each size class. This is the same as HPC partitions into small, medium, and large job sizes, with different numbers of nodes assigned to each.
  • Overall, the task of the head waiter can be understood and used to outline the purpose and constraints of a queuing system: because there's a reasonable correlation between party size and dining time, reasonable estimates can be made, and dinner rarely takes more than 3 hours. However, in HPC, the queuing system must accommodate "diners" anywhere from a few minutes to a few weeks, which makes it very important to accurately guess at your job's runtime.

Again, @Sabryr and @ChristinaLK, thanks for engaging in this discussion, and please accept my humble apology for derailing it. I was wrong.

from hpc-intro.

Sabryr avatar Sabryr commented on July 26, 2024

@tkphd apology not required , the pull request was closed to submit a new one. Diff was too much to continue with that.

from hpc-intro.

tkphd avatar tkphd commented on July 26, 2024

That's a relief, @Sabryr, and I look forward to seeing the new PR.
I still stand by the apology, though, since I need to work on effectively communicating and dialing back dismissive comments. In particular, I fall into the common expertise trap of assuming things are obvious when they are, in fact, very much not.

from hpc-intro.

Sabryr avatar Sabryr commented on July 26, 2024

Still don't see the need for the apology, thank you for reviews. I will try to open up the same pull, to keep the discussions intact.

from hpc-intro.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.