hackclub / putting-the-you-in-cpu Goto Github PK

View Code? Open in Web Editor NEW

4.7K 28.0 140.0 6.95 MB

A technical explainer by @kognise of how your computer runs programs, from start to finish.

Home Page: https://cpu.land

License: MIT License

JavaScript 1.11% Astro 15.64% MDX 74.42% TypeScript 0.32% CSS 8.51%

cpu elf linux linux-kernel

putting-the-you-in-cpu's Introduction

Putting the "You" in CPU

A technical explainer of how your computer runs programs, from start to finish.

by @kognise and @hackclub

From the beginning...

I've done a lot of things with computers, but I've always had a gap in my knowledge: what exactly happens when you run a program on your computer? I thought about this gap — I had most of the requisite low-level knowledge, but I was struggling to piece everything together. Are programs really executing directly on the CPU, or is something else going on? I've used syscalls, but how do they work? What are they, really? How do multiple programs run at the same time?

I cracked and started figuring as much out as possible. There aren't many comprehensive systems resources if you aren't going to college, so I had to sift through tons of different sources of varying quality and sometimes conflicting information. A couple weeks of research and almost 40 pages of notes later, I think I have a much better idea of how computers work from startup to program execution. I would've killed for one solid article explaining what I learned, so I'm writing the article that I wished I had.

And you know what they say... you only truly understand something if you can explain it to someone else.

In a hurry? Feel like you know this stuff already?

Read chapter 3 and I guarantee you will learn something new. Unless you're like, Linus Torvalds himself.

Continue to Chapter 1: The "Basics" »
(cpu.land)

putting-the-you-in-cpu's People

Contributors

Stargazers

Watchers

Forkers

volker-weissmann 0xhackr dtrckd rtm124 chocorho rahuldotbiz hrushal-nikhare omerbaddour remicmacs chadcloman mishalzaman studioblvck spartakos87 chadmiko clarkaulden jamesthesnake begueradj aurimasbalciunas kylepeeler shirriff mz0in michalbiesek fantasyrain troyanov kinddevil wuchirat vikaskundarapu apollohuang1 csh0101 vdt rastuszhang xstarlink raoufgrera shmueld tintupratap thanks547 afilgueira mu-arch ssun3 rajatj17 thuyuyen3947 eyvbeg uuebdiusin jinrenjie ailunai258 chuhemiao wottan32 jwcastillo konyan romanzavodskikh nklymok davidwalschots fazlymr tranvantriet dikakus ravipatelgit terjewiigmathisen ulfsauer0815 sanjeev2838 drruisseau ericschn amirulandalib agentmishra caogonghui dassiorleando unquietmind mu-l nicegoodx stormwild dev-sareno merlin04 diogo-gsa feiyunwill bishwajitdey apocryphon-x alexanderinum caofancpu joldrit prathmesh988 yonghuizusa holmes-w wenwengongren270 varks pulsatinggenius nnuujj eramax chibikes sanch7 taixuyingcai igooglevip hotflower9167 casoetan zhenli1347 chakshugautam mdiqbalahmad leafdown cutff thien0291 tanishqsharma2022 yougigun

putting-the-you-in-cpu's Issues

Kernel space size in virtual memory size

Linux's solution is to always allocate the top half of the virtual memory space to the kernel

https://github.com/hackclub/putting-the-you-in-cpu/blame/7550bc67f911795f356c36502bdaa645a0f1400a/src/content/chapters/5-the-translator-in-your-computer.mdx#L63

the wording suggest linux allocate half of the address space for kernel, but to my understanding it allocate in the higher half, not necessary the whole half. more like ¼

https://elinux.org/images/b/b0/Introduction_to_Memory_Management_in_Linux.pdf

Potentially unclear explanation of register usage in chapter 4 - Becoming an elf lord

The kernel is almost ready to return from the syscall (remember, we’re still in execve). It pushes the argc, argv, and environment variables to the stack for the program to read when it begins.

The registers are now cleared. Before handling a syscall, the kernel stores the current value of registers to the stack to be restored when switching back to user space. Before returning to user space, the kernel zeroes this part of the stack.

Finally, the syscall is over and the kernel returns to userland. It restores the registers, which are now zeroed, and jumps to the stored instruction pointer. That instruction pointer is now the starting point of the new program (or the ELF interpreter) and the current process has been replaced!

When I first read this I was confused as to the order of operations. After a few reads I thought maybe it went like this

execve starts
register values copied onto stack
execve almost finishing up
register values copied back into registers
memory that held those values is zeroed

I was going to open a PR correcting this but then realised I wasn't sure if I was right. Is it actually that the memory is zeroed and then zeroes are copied back into the registers? that didn't seem right to me ("It restores the registers, which are now zeroed")

Anyway, would love to be corrected!

P.S. thanks so much for these blog posts, they're awesome

Time Slicing Diagram

Chapter 2 discusses how target latency works and provides a diagram for clarification. According to the explanation:

The target latency is the time it takes for a process to resume execution after being preempted...

Based on this explanation, shouldn't the target latency in the diagram start from the beginning of Process 2 and finish at the end of Process 3 designated by the following image?

It also presents an approach to calculate the timeslices based on a specific target latency:

Timeslices are calculated by dividing the target latency by the total number of tasks.

Suppose we have 3 processes, and the target latency is 9ms, meaning there can be a 9-millisecond gap between the bursts of each process. Using the given method, we divide 9 by 3 and realize that each process can run for 3 milliseconds and therefore must wait for the subsequent processes for 6 milliseconds. This contradicts the definition of the target latency.

Maybe a mistake of Ch.4 A brief Explanation of Linking

i think it should be PT_DYNAMIC

Is this correct?

The chapter on multitasking says -

"The target latency should be equal to the time it takes for a process to resume execution after being preempted."

Is this correct? AFAIK the target latency only needs to be atleast as long as the time it takes for a process to resume execution after being preempted.

Link in the Section Header Table no longer works

This one : https://binaryresearch.github.io ....

Thought you'd like to know!

Should I mention PC?

From dreamcompiler on HN:

"The CPU stores an instruction pointer which points to the location in RAM where it’s going to fetch the next instruction."

This is also called the Program Counter or PC outside the Intel universe. This is confusing as "PC" also stands for "Personal Computer" but people who learned computing in the days before Intel became popular still call it the PC register.

My response:

I know about the Program Counter terminology, and explicitly chose not to use it to be more architecture-independent... but maybe it was a mistake not mentioning it at all, considering it's such absurdly prevalent terminology.

Is this calculation correct?

Hi there, thank u for these amazing articles, I'm really enjoying it!

One little typo I've found,
https://github.com/hackclub/putting-the-you-in-cpu/blame/7550bc67f911795f356c36502bdaa645a0f1400a/src/content/chapters/5-the-translator-in-your-computer.mdx#L89

48 bits gets you a 128 TiB virtual address space

It seems 256 TiB is the right number if I'm calculating it right.
2^48 = 2^8 * 1024(2^10) * 1024(2^10) * 1024(2^10) * 1024(2^10) = 256 TiB

Mismatched lines and line numbers on mobile

I would describe more, but I don’t know what to tell beyond the title.

iPhone 12 mini, in case that matters.

Coop mutitasking correction/clarification

I just read chapter 2 and I really like the article so far.

I wanted to add a bit of clarification regarding the coop multitasking:

Rather than the OS deciding when to preempt programs, the programs themselves would choose to yield to the OS. They would trigger a software interrupt to say, “hey, you can let another program run now.” These explicit yields were the only way for the OS to regain control and switch to the next scheduled process.

The OS would use any system call to check if the process has consumed its time slice and to switch to another process. The yield system call was there for the case, where your program doesn't need to make system calls (e.g. it's doing some number crunching and all the data it needs is in memory already), but you still want to be a good citizen and to give the other processes a chance to run.

Questions about Chapter 1

Programs can’t directly switch privilege levels; hardware interrupts are safe because the processor has been preconfigured by the OS with where in the OS code to jump to.

This is the first time a hardware interrupt is mentioned. Does it mean other kind of interrupts exist too? Are there any differences in how programs use them?

When this kernel code finishes, it tells the CPU to switch back to user mode and return the instruction pointer to where it was when the interrupt was triggered. This is accomplished using an instruction like IRET.

From this, my understanding is that IRET is used by kernel code to transfer control back to user space. But that seems to contradict with this paragraph:

Programs can delegate control to the OS with special machine code instructions like INT and IRET.

Can user code call IRET? But it's already in the user space, and it can't access kernel space, so how does that work?

Configure DNS with hackclub/dns

Right now cpu.land is has DNS managed using the [email protected] Google Domains account. We want to switch this to having DNS managed using https://github.com/hackclub/dns.

Here are the steps to do that:

Share the domain with [email protected] on Google Domains (this is not required for hackclub/dns, just so I have access too)
Set up DNSimple to manage cpu.land's records (credentials in 1Password)
Set up hackclub/dns to use the DNSimple API to manage the domain using OctoDNS. 90% of the work should already be done, just look at how other Hack Club domains are managed in that repo.

Thank you!!

Dark mode

Adding a dark mode feature for a site intended for reading can be useful. I would be happy to take it up and contribute here if needed.

Cooperative multitasking is still a thing

Hi,

Just noted in the end of Chapter 2 it reads “ For these reasons, the tech world switched to preemptive multitasking a long time ago and never looked back.”
It looks like a bit of overstatement because concepts like coroutines and green threads are quite popular nowadays, and we even have quite popular languages built on top of that (Go with its goroutines).
Appreciate language runtime-level cooperative multitasking is not the same as OS-level, but still worth mentioning I think.

Overall, a great write-up that I very much liked to read!

thanks!

fix typo suggestion

putting-the-you-in-cpu/src/content/chapters/1-the-basics.mdx

Line 109 in 10e0a84

    
           - Programs can't directly switch privilege levels; hardware interrupts are safe because the processor has been preconfigured *by the OS* with where in the OS code to jump to. The interrupt vector table can only be configured from kernel mode.

Please correct me if I am wrong but I think here this should be software interrupts instead of hardware interrupts.

Broken link in chapter 4

The link at the end of this paragraph leads to a 404.

https://binaryresearch.github.io/2019/09/17/Analyzing-ELF-Binaries-with-Malformed-Headers-Part-1-Emulating-Tiny-Programs.html

Found a typo

Found a misspelling: "continous" here: https://github.com/search?q=repo%3Ahackclub%2Fputting-the-you-in-cpu%20continous&type=code

Not a real issue

... just here to say that cpu.land is my favourite website of the year! I have even featured it in my newsletter.

Thanks for this gem! Keep up the good work :)

Feedback from still_grokking on HN

The only thing that I miss a little bit is a kind of "disclaimer" that what gets presented is "just" the result of a market race, and not how computers need necessary to work like. Not going into the details of possible hardware architectures and implementation, but even on the "user facing" level (the operating system and application layer) things can look very, very differently. Just as an example: https://en.wikipedia.org/wiki/Genera_(operating_system)

Why it is bad that the first page is mapped to zeroes?

putting-the-you-in-cpu/src/content/chapters/5-the-translator-in-your-computer.mdx

Line 37 in 93e595c

> **Aside: cursed ELF fact**

Somewhat stupid question: why it is bad that the first page is mapped to zeroes?

It's funny to read but if not explained well it fosters a form of cargo cult, to be blunt, sorry!

Faggin made the first microprocessor

From dreamcompiler on HN:

"The first mass-produced CPU was the Intel 4004, designed in the late 60s by an Italian physicist and engineer named Federico Faggin."

The first microprocessor (CPU on a single chip) was Faggin's Intel 4004, but mass-produced CPUs existed before that. Earlier CPUs were built from multiple chips, and before that multiple individual transistors, and before that multiple vacuum tubes, and before that multiple relays (although it's fair to say that relay computers were never mass-produced).

Hackclub

Hack plote_

Possibility of discussion of ld.so/dyld/etc. behavior

I believe the end of chapter 4 deserves a quick discussion of how control gets from the entry point of the dynamic linker to the entry point of the executable. Including these fun tidbits (for some values of fun, anyway):

The dynamic linker needs to manage data structures, allocate memory, and perform an awful lot of string operations in particular. So it needs access to libc functionality. But it can't use the shared libc everyone else uses: it is going to require that functionality prior to being able to load any dynamic library itself! As a result, the dynamic linker has its own copy of (a subset of) the libc statically linked into it: its only dependency is, understandably, the kernel. This is one of the reasons why on Linux the dynamic linker is actually provided by the folks who provide the libc. And this is the reason all static linkers still need to support building fully self-contained, statically linked binaries, where even system libraries are statically linked (which is discouraged for almost all code): in order to build the dynamic linker itself.
While the kernel is responsible for interpreting the ELF commands for the executable and the dynamic linker (if applicable), on the other hand it is not in charge of interpreting the dynamic libraries themselves: the only visibility it has into these is the mmap() calls, performed by the dynamic linker, specifying (a subrange of) them as backing, allowing that memory to be shared cross-process. This means the dynamic linker has to have its own ELF parser, independently of the kernel's: everything else with regard to loading dynamic libraries in memory is its responsibility.
That a process is provided its own address space for exclusive use enables code in the main executable to be compiled in a position-dependent fashion. At least, in theory: security considerations such as ASLR mean most executables are position-independent these days. But dynamic libraries have no such choice and must consist of position-independent code because, even if there are systems for preferentially loading them at a certain address, there is no guarantee that this virtual address range will be available by the time they are loaded: another dynamic library might have been loaded there first for instance. In which case the bumped dynamic library will need to be loaded at a non-preferred virtual address and work anyway.
.init and .fini sections
for bonus points, the GOT, the PLT, and relocation entries.

More info on the SYSCALL_DEFINE3 security vulnerability

Going to the specified link (https://nvd.nist.gov/vuln/detail/CVE-2009-0029) doesn't provide a concrete explanation for needing the arity in the macro name.

Is there a direct explanation for this, somewhere?

PNG image optimization

I'm currently working on optimizing PNG images used for the website by lossy (not the JPEG kind) means of converting each from RGB to indexed colors, and reducing it's total unique colors to 1/3 of the original since it's going to be downscaled by the browsers anyway to a lower resolution, thus interpolating the colors lost by that lossy optimization process.

As a trial, I managed to trim off a total of ~128 KBs (kilobytes) from all 6 images used in chapter 1.

This is a sample of one of them:

And this is the unoptimized version currently used:

Should I continue ahead and later submit a pull request for that?

in english?

I have many book in english, please write this in Polish or other language
(esperanto)

Add bookmarks to the PDF edition

That you for the pdf version. Is it possible to add bookmarks to help jump to specific topics / table of contents in line with the 7 chapters in the article? Thank you.

@ekoome in #11

A sentence phrasing change

https://github.com/hackclub/putting-the-you-in-cpu/blob/366ef51c7137e824596595a2d56ebcc0c67cef71/src/content/chapters/1-the-basics.mdx#L75C1-L75C106

Is there anything wrong with the phrasing of this statement? Should there be something like "makes sure that" after the closing parathesis? I don't know; it seemed a little out of place, so I considered raising an issue here.

Also, it's a great read; I really appreciate your writing here. Being a beginner to system programming and from a non-CS background, this seems like a perfect resource to start off. Still reading it though, hope to complete it soon :)

PDF version

Thanks for this awesome source of information.

Do you think it is feasible to also "release" a PDF version of it?

EPUB Version

If possible, I would appreciate it if you could provide an EPUB version of this text as well. I think this format might be more suitable for reading on electronic devices. Thank you for taking the time to make this useful information available to me. I hope you can provide an EPUB copy too.

Incorrect argv modification in binfmt_script example

The binfmt_misc section in chapter 3 shows argv transforming from [ "A", "B", "C" ] to [ "/usr/bin/node", "--experimental-module", "B", "C" ], missing the script name that should go before "B".

Reference on the MacOS "split"

https://fahrplan.events.ccc.de/congress/2007/Fahrplan/events/2303.en.html (first attachment; second attachment is the slides) provides a good summary of the behavior of the MacOS (then known as Mac OS X) kernel, XNU, including the memory space provided to processes. It was current as of 32-bit Mac OS X and support of 64-bit process by a 32-bit kernel, but not current with regard to the 64-bit-address-space kernel (AKA K64 in MacOS circles).

So I suggest you include the 4/4 "split" of Mac OS X next to the 3/1 and 2/2 splits found in operating systems of that vintage as illustration, but not necessarily dwell any further, to the extent these splits are less impactful than they once were. Indeed, the main point was to avoid significant memory remapping operations when crossing the userspace/kernel border (except for pre-K64 Mac OS X), but all that went out the window anyway with Meltdown, at which point it was realized keeping the kernel memory mapped while in userspace, even with forbidden access, was not hygienic. Which meant all operating systems were modified to unmap kernel pages when dropping to userspace (and to remap them upon kernel entry), except for a small set of always-mapped pages from which the kernel mappings can be rebootstrapped upon kernel entry, just like pre-K64 Mac OS X.