Giter Site home page Giter Site logo

os01's Introduction

Donate

This book helps you gain the foundational knowledge required to write an operating system from scratch. Hence the title, 0 to 1.

After completing this book, at the very least you will learn:

  • How to write an operating system from scratch by reading hardware datasheets. In the real world, it works like that. You won't be able to consult Google for a quick answer.

  • A big picture of how each layer of a computer is related to the other, from hardware to software.

  • Write code independently. It's pointless to copy and paste code. Real learning happens when you solve problems on your own. Some examples are given to kick start, but most problems are yours to conquer. However, the solutions are available online for you to examine after giving it a good try.

  • Linux as a development environment and how to use common tools for low-level programming.

  • x86 assembly in-depth.

  • How a program is structured so that an operating system can run.

  • How to debug a program running directly on hardware with gdb and QEMU.

  • Linking and loading on bare metal x86_64, with pure C. No standard library. No runtime overhead.

Download the book

The pedagogy of the book

You give a poor man a fish and you feed him for a day. You teach him to fish and you give him an occupation that will feed him for a lifetime.

This has been the guiding principle of the book when I was writing it. The book does not try to teach you everything, but enough to enable you to learn by yourself. The book itself, at this point, is quite "complete": once you master part 1 and part 2 (which consist of 8 chapters), you can drop the book and learn by yourself. At this point, smart readers should be able to continue on their own. For example, they can continue their journeys on OSDev wiki; in fact, after you study everything in part 1 and part 2, you only meet the minimum requirement by OSDev Wiki (well, not quite, the book actually goes deeper for the suggested topics). Or, if you consider developing an OS for fun is impractical, you can continue with a Linux-specific book, such as this free book Linux Insides, or other popular Linux kernel books. The book tries hard to provide you a strong foundation, and that's why part 1 and part 2 were released first.

The book teaches you core concepts, such as x86 Assembly, ELF, linking and debugging on bare metal, etc., but more importantly, where such information come from. For example, instead of just teaching x86 Assembly, it also teaches how to use reference manuals from Intel. Learning to read the official manuals is important because only the hardware manufacturers themselves understand how their hardware work. If you only learn from the secondary resources because it is easier, you will never gain a complete understanding of the hardware you are programming for. Have you ever read a book on Assembly, and wondered where all the information came from? How does the author know everything he says is correct? And how one seems to magically know so much about hardware programming? This book gives pointers to such questions.

As an example, you should skim through chapter 4, "x86 Assembly and C", to see how it makes use of the Intel manual, Volume 2. And in the process, it guides you how to use the official manuals.

Part 3 is planned as a series of specifications that a reader will implement to complete each operating system component. It does not contain code aside from a few examples. Part 3 is just there to shorten the reader's time when reading the official manuals by giving hints where to read, explaining difficult concepts and how to use the manuals to debug. In short, the implementation is up to the reader to work on his or her own; the chapters are just like university assignments.

Prerequisites

Know some circuit concepts:

  • Basic Concepts of Electricity: atoms, electrons, protons, neutrons, current flow.
  • Ohm's law

However, if you know absolutely nothing about electricity, you can quickly learn it here: http://www.allaboutcircuits.com/textbook/, by reading chapter 1 and chapter 2.

C programming. In particular:

  • Variable and function declarations/definitions

  • While and for loops

  • Pointers and function pointers

  • Fundamental algorithms and data structures in C

Linux basics:

  • Know how to navigate directory with the command line

  • Know how to invoke a command with options

  • Know how to pipe output to another program

Touch typing. Since we are going to use Linux, touch typing helps. I know typing speed does not relate to problem-solving, but at least your typing speed should be fast enough not to let it get it the way and degrade the learning experience.

In general, I assume that the reader has basic C programming knowledge, and can use an IDE to build and run a program.

Status:

  • Part 1

    • Chapter 1: Complete
    • Chapter 2: Complete
    • Chapter 3: Almost. Currently, the book relies on the Intel Manual for fully explaining x86 execution environment.
    • Chapter 4: Complete
    • Chapter 5: Complete
    • Chapter 6: Complete
  • Part 2

    • Chapter 7: Complete
    • Chapter 8: Complete
  • Part 3

    • Chapter 9: Incomplete
    • Chapter 10: Incomplete
    • Chapter 11: Incomplete
    • Chapter 12: Incomplete
    • Chapter 13: Incomplete

    ... and future chapters not included yet ...

In the future, I hope to expand part 3 to cover more than the first 2 parts. But for the time being, I will try to finish the above chapters first.

Sample OS

This repository is the sample OS of the book that is intended as a reference material for part 3. It covers 10 chapters of the "System Programming Guide" (Intel Manual Volume 3), along with a simple keyboard and video driver for input and output. However, at the moment, only the following features are implemented:

  • Protected mode.
  • Creating and managing processes with TSS (Task State Structure).
  • Interrupts
  • LAPIC.

Paging and I/O are not yet implemented. I will try to implement it as the book progresses.

Contributing

If you find any grammatical issues, please report it using Github Issues. Or, if some sentence or paragraph is difficult to understand, feel free to open an issue with the following title format: [page number][type] Descriptive Title.

For example: [pg.9][grammar] Incorrect verb usage.

type can be one of the following:

  • Typo: indicates typing mistake.
  • Grammar: indicates incorrect grammar usage.
  • Style: indicates a style improvement.
  • Content: indicates problems with the content.

Even better, you can make a pull request with the provided book source. The main content of the book is in the file "Operating Systems: From 0 to 1.lyx". You can edit the .txt file, then I will integrate the changes manually. It is a workaround for now since Lyx can cause a huge diff which makes it impossible to review changes.

The book is in development, so please bear with me if the English irritates you. I really appreciate it.

Finally, if you like the project and if it is possible, please donate to help this project and keep it going.

Got questions?

If you have any question related to the material or the development of the book, feel free to open a Github issue.

os01's People

Contributors

acehreli avatar archenoth avatar azillion avatar battaile avatar chungy avatar dlallama avatar fabiopozzi avatar homedirectory avatar huylenq avatar jameskr97 avatar keizar901 avatar kmlmhnn avatar kriskras99 avatar manhtai avatar mtricht avatar noriyotcp avatar onlywade avatar ophilli avatar ousia avatar palerdot avatar ringof avatar ryangalamb avatar slobo avatar sloganking avatar sv3a avatar tahodzic avatar tobsta avatar tuhdo avatar vjatcheslavwvvvvvv avatar xel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

os01's Issues

[pg. 44][grammar] Missing word in a phrase

2nd paragraph:

Physically, buses are just electrical wires
that connect all components together and each wire transfer a single
big of data.

It probably should be "big piece of data" or something like that

[pg.50][Typo] Hight -> High

In section 4.1, there is this sentence:

"Now, we use objdump to examine how hight level source code maps..."

Hight should be spelled high.

Assembly Incorrect?

On page 25, we have that this C:

if (argc1) {
  i=i;
} else {
  i=0;
}

And equivalent assembly:

cmp DWORD PTR [ebp+0x8],0x0
je 80483f7 <main+0x1c>
mov DWORD PTR [ebp-0x4],0x1
jmp 80483fe <main+0x23>
mov DWORD PTR [ebp-0x4],0x0

In the "if" portion of the assembly code (line 3), the registry is set to 0x1. This means i is set to hexadecimal 1, aka 1.

However, in the "if" portion of the C code, i is set to itself.

Am I missing something by chance?

[typo][64] Prefix is repeated two times when describing generated code

At the bottom of the page 63 there is code 00000000 67 ff 24 43.

Then at the page 64

First of all, the first byte, 0x67 is not an opcode but a prefix. The
number is a predefined prefix for address-size override prefix. After
the prefix, comes the opcode 0x67 and the ModR/M byte 0x24.

0x67 is repeated, it should be 0xff

[Pg. 80][Question] Assembly Listing

On Page 80, there's an assembly listing. I've tried using objdump and gcc to replicate it, but none of my approaches do. How is this assemble listing generated?

question / suggestion: are all the code capable with / optimized for x86_64 instructions?

Hi,

I am reading your book, and the book seems to be capable with 32 bit processor rather than 64 bit processor. For example, in Chapter 4 "x86 Assembly and C" you introduce x84 assembly, and in Chapter 6 "Runtime inspection and debug" you intentionally compile C program with -m32 flag.

My (limited) understanding is that assembly for x86 and x64 has some (significant) differences, and surely debugging x64 gcc program differs, too. And nowadays (year 2017) almost all new-released personal computers are using 64 bit processors.

Is it unnecessary to notice the differences? Or sufficient to write codes for 32 bit processors in almost all computers (every practical codes are written in this way)?

I am definitely a newbie here, so maybe I am missing something.

Thank you.

Document Links Missing

As specified on page 9, the book's website should have copies of:

  1. Intel® 64 and IA-32 Architectures Software Developer’s Manual (Volume 1, 2, 3)
  2. Intel® 3 Series Express Chipset Family Datasheet
  3. System V Application Binary Interface

However, the book's website doesn't have links to these documents.

I found the documents, but wanted to point out this discrepancy.

Add cross compiler as an exercise with some guidance and invoke linker scripts with the new toolchain

At the moment, the book uses -m32 and -m64 and -m elf_i386 (for ld), which is not a proper way to create an operating system image for a target machine. However, for the purpose of getting to a bootable image as soon as possible, it is the quickest way. However, we cannot ignore the proper way, so it will be integrated as an exercise by:

After that, the Makefiles must be updated to use the new toolchain for invoking the linker scripts in the book.

[pg.62][typo] "... repeated accessed ..."

typo/grammar issue on p. 62, found in the paragraph in the middle of the page:

... if the element is repeated accessed many times...

should be

... if the element is repeatedly accessed many times ...

apologies if this was already spotted

[page 44][content] Transistors aren't capacitors.

(2nd to last paragraph in 3.2.2)

Technically, the charge in a DRAM cell is stored in a capacitor, which is a different type of electrical component than a transistor. I'd suggest something like the following:

"At the physical level, RAM is implemented as a grid of cells that each contain a transistor and an electrical device called a capacitor, which stores charge for short periods of time. The transistor controls access to the capacitor; when switched on, it allows a small charge to be read from or written to the capacitor. The charge on the capacitor slowly dissipates, requiring the inclusion of a refresh circuit to periodically read values from the cells and write them back after amplification from an external power source."

[page 63][content] SIB byte table and example

Hi,

On page 63, you use jmp [eax*2 + ebx] as a simple example to show how the SIB byte works.
The instruction is assembled to 0x67 ff 24 43, and 0x43 is the SIB byte, but when I look for 0x43 in the table, it looks like the final address would be [ebx*2 + eax].
It seems to me, from the table, that the SIB byte for jmp [eax*2 + ebx] would be 0x58, which it clearly isn't.

Also, in the next page, there's a small typo:

After the prefix, comes the opcode 0x67 [...]

It should be 0xff.

I'm loving your book and I'll continue to contribute!

Grammatical fix (pg: 9)

In the page 9 of the pdf, there is a line which says

e.g. Intel are critical for implementing an operating system or any other software that direct controls the hardware.

which should read

e.g. Intel are critical for implementing an operating system or any other software that directly controls the hardware.

It is better to have the source in this repo, to which people can give pull requests. You should probably want to define some guidelines/format like [page no] [type => grammatical/typo] for you to easily pinpoint the issue and to prevent same issue being raised again.

Translation

Hi,

Do you plan to release translations in different languages when the English version is finished? Because if so, I'd gladly help 😄!

[pg.9][Repetition Error] Reuses last sentence

" Most examples revolve around variants of a“Hello World” program. Most examples revolve around variants of a “Hello World” program, which will acquaint you with core concepts. "

Reference to production implementation

For each chapter in part 3, the readers need to refer to a production implementation to check their code and improve it, so they gain practical knowledge that later can be used elsewhere. The reference code should be extracted from the Linux kernel.

[Page 18][Grammar]

"The software engineer must also select the right programming techniques that are apply to the problem domain he is trying to solve because many techniques that are effective in one domain might not be in another. " It should be " ... the right programming techniques that apply to the problem domain ... ".
Also, this is page 18 out of 313, however it is technically page 4 as the book says. Not sure which one to put

Examples 4.5.2 and 4.5.3

It seems that the example 4.5.2 is generated with 16-bit code and the example 4.5.3 is generated with 32 bit code.

Example 4.5.2

The book says

jmp [0x1234]
Then, the machine code is:
ff 26 34 12

And it works with the following instructions for nasm :

bits 16
jmp [0x1234]

Example 4.5.3

The book says :

add eax, ecx
Then the machine code is:
01 c8

But assembling this :

bits 16
add eax, ecx

Gives 66 01 c8, while assembling :

bits 16
add eax, ecx

gives the expected result.

By the way, thanks for this book, it is really usefull, I am learning assembly with it in the hope to be able to complete Exercise 7.5.1 (I saw that there is some code in the repos, but I don't want to look at it at the moment).

Source code?

I've taken a glance at the PDF in your repository and it looks interesting. Do you happen to have the source code you used to create it to begin with?

(Note: Storing binary files, particularly large binary files in a Git repository is not a good idea as it increases the overall repository size. You can solve this by not including the PDF as part of the repository, but as a GitHub release download or a file on a webserver or similar.)

[page 207][typo] dd command should have seek=1

Hi,

I believe the dd command should be invoked with seek=1 in this case, since we are writing the 2nd sector.

Original:
$ dd if=sample of=disk.img bs=512 count=1 seek=0

Correction:
$ dd if=sample of=disk.img bs=512 count=1 seek=1

[page 96][typo] 0 instead of 1

Hi,

At the end of the logical AND explanation, you describe what happens when both operands are not 0:

  1. If both i and j are not 0, the result is certainly 1, or true.
    (a) Set it accordingly with the instruction at 0x80484a7.
    (b) Then jump over the instruction at 0x80484ae to set the
    variable logical_and at [ebp-0x8] to 0.

The last 0 should be a 1.

What is the error in this line of code? My program ends unexpectedly

#include
#include

using namespace std;

struct employee

{
string empID;
char *empName;
float rate, bsal, gsal, netsal;
float dutyAllow, fuelAllow;
float tax, socSec;
int ndayWork;

float someone();
string userinfo();
};
string employee::userinfo()
{
employee::empName= new char [40];
cout<< "Enter Your name"<< endl;
cin>>employee::empName;
cout<<"Enter Your ID Number"<< endl;
cin>>employee::empID;
A:
cout<< "Enter Your Number of Days Of work"<< endl;
cin>>employee::ndayWork;

if (employee::ndayWork>31)
{
   goto A;
}

}

float employee::someone()
{
char choice;
cout<<"WELCOME TO KOFORIDUA NURSES PORTAL*"<< endl;
cout<< "Please enter Your denomination= ";
cout<<"Nurse = 1"<< endl<< "Doctor=2"<< endl;
cin>> choice;

switch (choice)
{
	case 1:
		if(choice<=1)
		{
		
		 cout<<"Daily Mark is 8Ghc"<<endl;
		 
		 
		 employee::netsal=8* employee::ndayWork;
		 cout<<"Your name is "<<employee::empName<<endl;
		 cout<<"Your ID is "<<employee::empID<<endl;
		 cout<<"Your monthly salary is "<< employee::netsal<< endl;
		}
		break;
	case 2:
		if(choice==2)
		{
			
		cout<<"Daily Mark is 10Ghc"<<endl;
		 employee::netsal=10* employee::ndayWork;
		 cout<<"Your name is "<<employee::empName<<endl;
		 cout<<"Your ID is "<<employee::empID<<endl;
		 cout<<"Your monthly salary is "<< employee::netsal<< endl;
		
		}
		break;
	default:
		cout<<"Error"<< endl;
		
	
}

}

int main()
{

int salary;


employee Doctor, Nurse;
Nurse.someone();
Nurse.userinfo();



	
system("PAUSE");
return 0;

}

Question: Page width seems short

Hi,

It seems that the page borders for the text is not well adjusted as the text covers just around 60% of the space. I don't find it pretty to my eye so I was wondering if that was on purpose.

In any case, thanks for doing the book! It's been always that I have wanted for quite some time :)

Best regards,
Antonio Huete

"Byte sized" vs. "Quadword sized"

In chapter 4.8.1, "fundamental data types", you show a diagram (figure 4.8.1) which shows the different sized integers. On my PDF it's on page 72.

The quadword-sized [un]signed integers in the diagram are incorrectly labeled as "byte-sized [un]signed integer":
image

SUGGESTION: Modern Storage Devices

Let me first say that I really like your book and its presentation style that does not lose sight of the Forrest for the trees. In particular I really like the autodidact approach and starting from first principles. With that in mind I thought the mention of storage devices on pg 22. could have used a little more of this kind of treatment rather than saying "...the modern devices are so complex that is is impossible and unnecessary to understand every implementation detail.."
Rather than say that I propose a quick hi-level summary along the lines of ..

"Modern Storage Devices are implemented by injection of an electron into a material which is held and retrieved from said material by the opening and closing of potiential barriers that exist by virtue of the materials elemental properties and are manipulated by applying a voltage across the two materials. The materials involved will affect the response and retrieval times as well as the memory's fastidiousness in holding the charge over time. The details of this process can be found in any introductory Solid State Theory Textbook."

...While admittedly that is quite terse and probably could use some rephrasing, it does capture the basic gist of it without going into details of quantum mechanics or any talk of valence bands, etc.

Another approach might be to rephrase the above into an analogy along the lines of hungry hungry hippos where the electrons are the balls traveling along the conductive path / BUS and the opening and closing of the hippos mouth be the barrier / oxide and the inside of the mouth be the potential well of the receiving substrate. The players fingers which activate the hippos mouth would be the applied voltage. If you need more detail, I would say the best intro to solid state physics would be https://www.amazon.com/Semiconductor-Device-Fundamentals-Robert-Pierret/dp/0201543931/ref=sr_1_1?s=books&ie=UTF8&qid=1487522457&sr=1-1&keywords=semiconductor+device+fundamentals

Whether or not this is a worthwhile digression (maybe in an appendix?), I think this would help maintain the "from basic principles" approach.

[page 71][content] CS and EIP values are swapped

Hi,

in page 71, when describing jmp far [eax], you say:

The far address consumes total of 6 bytes in size for a 16-bit segment and 32-bit address, which is encoded as m16:32 from the table 4.7.1. As can be seen from the figure above, the blue part is a segment address, loaded into cs register with the value 0x1234; the red part is the memory address within that segment, loaded into eip register with the value 0x5678 and start executing from there.

The two values 0x1234 and 0x5678 are switched.

Also, in figure 4.7.1, shouldn't 0x1234 be laid out as 0x34120000?

In addition to that, could you add a short comment on endianness, explaining why m16:32 is laid out in memory in reverse order?

Thank you.

[Pg.63][Confusing] SIB Table

On page 63, we have this instruction:

jmp [EAX*2 + EBX]

Which turns into:

00000000 67 ff 24 43

43 is the SIB code.

The lookup table provided says that 43 corresponds with row [EBX2] and column EAX. This suggests EBX2 + EAX, which doesn't match up with the original EAX*2 + EBX. I have double checked with OS Wiki, and 43 is indeed the correct SIB code.

As is, the table feels confusing.

Complete the chapter on descriptor that introduces debugging CPU using QEMU and the Intel manuals

The first chapter in part 3 is intended to teach x86 memory descriptors and the guidelines to implement a simple runtime memory model. In the process, guide the readers how to use QEMU logging and various info commands, in combination with the Intel manuals to debug CPU exceptions. This is the first step to build a foundation for working on more complicated features in future chapters.

Creating your own programming language

I have come across this repo by coincidence and I am so happy I found it. I was always fascinated by operating systems and their implementations since college days.

Another thing which always interested me was programming languages. There is a lot of theory and a lot of practical books as well, but there isn't a place where I could find that connects the theory with practical implementation together at the same time and provides a deep understanding to how these concepts relate.

I would love to hear if you know of a such a resource or book ?
Kindest regards,

String writing does not appear to be working

The code used to write a string, exercise 7.5.1, does not appear to be working. Specifically the string is not written to screen, even though multiple manual calls to my PutChar implementation do work. Even not the reference implementation was able to write the string to the display.
Stripped down code:

bits 16
start: jmp boot

boot:
  cli	; no interrupts
  cld	; all that we need to init
  
  call PrintBootMsg
  hlt	; halt the system

   ; dl = x; dh = y
MovCursor:
; [redacted]
   ret

   ; al = chr, cx = repeat
PutChar:
;   [...] (redacted, is functional though)
   ret
   
   ;; ds:si = Zero terminated string
Print:
.loop:
   lodsb
   or al, al
   jz .done
   mov cx, 1
   call PutChar
   jmp .loop
   
   .done:
   ret
   
; Print the boot message
PrintBootMsg:
   ; Reset cursor
   mov bh, 0
   mov bl, 0
   call MovCursor
  
   ; Print
   mov si, bootMsg
   call Print
   ret
   
   ;; constant and variable definitions
bootMsg db "Booting the Operating System!", 10, 13, 0
cursor_X db 0
cursor_Y db 0

   ; We have to be 512 bytes. Clear the rest of the bytes with 0
times 510 - ($-$$) db 0

dw 0xAA55   ; Boot Signature

The string does appear to be compiled in the binary, but not loading in memory. I tried inspecting the system memory with gdb x/512sb but I couldn't find any trace of the string. Moving the db instructions above does have effect on the output file but not on actual program execution.

Using NASM version 2.10.09.

k + k = 2^k?

In chapter 2, on page 15, the sentence "a k-input gate uses
k PMOS and k NMOS transistors, a total of 2^k transistors" is bolded. Shouldn't it be 2k instead of 2^k?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.