browserengineering / book Goto Github PK

Web browser engineering (a book)

License: MIT License

CSS 0.59% Python 11.04% HTML 1.63% JavaScript 48.09% Makefile 0.14% Lua 0.21% Awk 0.01% TeX 38.31%

book's Introduction

Web Browser Engineering

This is the source code to Web Browser Engineering, my book on how web browsers work. The best way to read the book is online.

Building the book

If you want to build from source, run:

make book draft blog

The source code contains:

The Markdown source for the book text, in book/
A template and code for converting the book to HTML, in infra/
Chapter-by-chapter implementations of the browser, in src/
Styling for the book's website, in www/
The book's built-in feedback system, in www/, including JavaScript and the Python backend.

We prefer to receive typos and small comments on the text using the book's built-in feedback tools, which you can enable with Ctrl+E.

You can run the book's built-in checks with:

make lint

We're always happy to hear from readers and from educators who want to use the book. Please email us!

Running the browser

Code for the browser developed in the book can be found in src/, in files named lab1.py, lab2.py, and so on, corresponding to each chapter.

To run it, you'll need to install:

A recent Python 3; version 3.9.10is known to work, but older versions probably will too.
The tkinter package, part of the Python standard library but often isn't included in pre-installed Pythons on macOS and Linux. You can check by running python3 -m tkinter, which should open a test window.
For Chapter 9+, the dukpy package. Consult that chapter for installation instructions.
For Chapter 11+, the skia and pysdl2 packages. Consult that chapter for installation instructions.

Once you have the above, you can run, say, the browser as of the end of Chapter 3 like so:

cd src/
python3 lab3.py https://browser.engineering

Every chapter can be run in a similar fashion.

For chapters 8 onward, there's also a "guest book" web application, which you can run with:

cd src/
python3 server8.py

Like the browser, there are different versions of the server for different chapters, named server8.py, server9.py, and so on.

You can also run the book's unit tests with:

make test

Rebuilding the quiz JavaScript

The source for the interactive multiple-choice quizzes is taken from mdbook-quiz. To rebuild the JS blob that we include on the quiz pages (located in www/quiz-embed.iife.js) do the following:

Install depot (made by the same people who made mdbook-quiz; tldr: cargo install depot-js --locked).
Ensure you have cargo-make installed.
git clone https://github.com/cognitive-engineering-lab/mdbook-quiz && cd mdbook-quiz
cargo make init-bindings
cd js
depot build

The quiz-embed.iife.js file should be in packages/quiz-embed/dist/.

These instructions are taken from the README.md and CONTRIBUTING.md files in the mdbook-quiz project.

book's People

Contributors

Stargazers

Watchers

Forkers

zhengyang92 chrishtr tianjianchn ullasholla herrdu abrhim danielrosenwasser pietrop kongmoumou macasieb callistusndemo xmonader ianbriggs kierangilliam tklovett raketenlurch jcklama rutagoat ccl0326 shuhei perjerz ne-smalltown vanquach2789 metacane morinokami nubwett audreyma50 alvinsim mercury-ai paulinenemchak originalgordon fxztam siddharthjoy ama1020 andreivisan collectrobot tanay-vakharia wilcoxjay wze444 markqian mixed nite4inspired gonpassa paganiniana oforigyimah ashishkumar64 j-a-c-k-goes xiaochen-z einoplasma vsujeesh m561247 ashton314 nazarepiedady esilyzhang

book's Issues

Feedback on /scheduling.html

A reader points out that in TaskRunner.schedule_task, after we append a task to the task queue, we should also notify_all on the condition variable. This makes sense to me—I'm not sure if we don't do this now, or if there's just a mismatch between the book and the code.

Cross-frame DOM methods don't always invalidate correctly

Suppose you have two same-origin frames on a page, A and B, and A runs a script that sets the style of an element in frame B. In this case, our browser will set_needs_render on A, not B.

Executing lab13.py: missing 1 required positional argument: 'top_level_url'

In src/lab10.py here:

book/src/lab10.py

Line 37 in 7836c2b

def request(url, top_level_url, payload=None):

this below line seems to be the intended line of code:

def request(url, top_level_url=None, payload=None):

I would've created a pull request but seemed like too small an issue.

In chapter 2, the request threw an exception because the test site's certificate was updated

Python version: 3.10.1
System: macos 12.3.1

When python requests the website https://www.zggdwx.com through ssl, an exception will be thrown:

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)

I tried running the Install Certificates.command but it still didn't solve the problem.
Then try to open the website through firefox browser and find that there is a problem with the certificate:

After investigation, it was found that the https://www.zggdwx.com website has recently updated the certificate, and the issuer of the certificate is TrustAsia Technologies, Inc. which is not in the list of certificates trusted by mozilla (it looks like they are trying to request mozilla to add trust: link, but no success yet), so python throws an exception.

So I think it's better to put the "西游记" content of the website on github, and the test website in the course also requests github instead.

Example boxes look strange

Check out this example from Chapter 8

There seems to be a missing newline or something. Perhaps this CSS should contain

.example:before { content: "Example\0a\0a"; color: gray; }

which adds two newline characters.

Add bookmark links

Please add bookmark links to headlines

Example files are copied to www/examples as the target file

With a freshly cloned repository, make examples copies example files into www/examples as a file instead of copying them into www/examples directory.

This is because the repository doesn't contain www/examples directory and cp foo bar behaves differently depending on whether bar is a directory or not.

This is not an issue once you run mkdir www/examples, but it would be good to include it in the repository for new contributors.

Google now uses UTF-8

Chapter 1 includes an "Alternate Encodings" exercise that reads, in part, "Test it on a real site that doesn’t use utf8, like google.com." Unfortunately, Google now uses UTF-8. Moreover, many Asian govt and corporate websites that used to be reliably non-UTF8 are now all UTF8 as far as I can tell. We probably need to cut this exercise, or make a special page on our website using non-utf8 or something.

Improve "Word By Word" section

This is feedback from a student, and I think I agree. The "Word By Word" section in Chapter 3 is a short section on how to update the text drawing loop to handle multiple words, but it's a bit confusing:

It's not clear where it goes in the book. I think we can clarify that this is in the layout function.
It would probably be nicer if instead of one code block and then three paragraphs of commentary, maybe the code block can be split up

Filing this as an issue instead of as a PR because I'm not sure whether we should be making changes as large as #2 at this point. If not, we can just do #1 and add a def layout header.

Dead links in Ch. 13

Hi! Just a heads up that these links in Chapter 13 point to localhost, and fail on the live site:

[eventloop-ch2]: http://localhost:8001/graphics.html#creating-windows
[eventloop-ch12]: http://localhost:8001/scheduling.html#animating-frames

I am not sure if there's a rendering step which is supposed to strip localhost links and tranform them to the deployed site's URL, or if this was just a mistake in prose?

All the best, and thanks for the wonderful content.

Consider adding an exercise on reusing sockets

On Downloading Web Pages, there is a "Go further" section on supporting redirects. One of the things that becomes "obvious" if you know about the handshake process is that closing and opening a socket is kind of wasteful.

Once you support HTTP 1.1, it seems like you should be able to reuse the socket by not immediately closing it, and then requesting the Location of a redirect. Would that be a useful exercise for readers, or is that just an "obvious" thing that you already expect readers to do?

Blog post link paths need fixing

Hi, Pavel and Chris.

Thank you for creating this book!

Heads-up, when navigating to the website blog posts via page links, they all 404 because "/blog" occurs twice in the path (e.g. browser.engineering/blog/blog/why-python.html). Otherwise, those pages load fine after removing one of those "/blog" occurrences.

Dead-ish link to tkinter canvas documentation

The Graphics Chapter has a link to http://infohost.nmt.edu/tcc/help/pubs/tkinter/web/canvas.html which seems to time out now. Is there another resource or a mirror that could be used instead?

RSS feed

There's probably some way to convince Pandoc to make an RSS feed from the blog and from new chapters.

Link to home on every page

As noted on this page: https://browser.engineering/preface.html there is no UI given to navigate back to the home page of the book. If one is reading a chapter and navigates multiple pages into the book using the given buttons, it would require multiple presses of a browser's back button to finally come back to the home page.

Proper debugging advice

Each chapter should have comprehensive debugging advice. For example:

For networking, advice to print the request and response; to trying latin1; to compare with curl; headers to look out for like Transfer-Encoding
For graphics, instructions on installing Tk and quieting the macOS deprecation warning, and listing out what we actually need from a GUI framework in case you're working on another language or platform
For text, instructions on printing the baseline, ascents, and descents; on flushing the line buffer; on printing the state at each character in the lexer
For HTML, instructions on printing the open tags at each token

Clicking on transformed content doesn't work

Clicking on a link, button, or text entry that has been translated with CSS transforms doesn't work, because our hit testing algorithm doesn't handle transforms (or any other effects, if we had them). #596 adds a failing unit test to test this, but we probably shouldn't merge it until we fix it.

How does the typo editing tool work?

It doesn't seem to do anything as far as I can tell.

Feedback on /forms.html

A user writes:

I was surprised to find self.get_font(node) here, since this method was not introduced previously (only the global function get_font(...) which has a different signature). It's easy enough to find out what it's supposed to do, but in my opinion a bit unfortunate because the code won't run at first.

Potentially we could fix this by back-porting the new font function all the way back to Chapter 8, or perhaps even earlier.

Clicking on links in Chapter 15 draft doesn't work

Clicking on links was broken by the dispatch change. The core reason is thus:

When you click on something, we resolve it to a /layout/ object
You then walk up and dispatch based on the /element/ tree
This is important because an <a> tag does not have a layout object in our browser

My best suggestion is that we undo the dispatch changes, which is also shorter.

Line wrapping wrong in history.html on mobile

Near the http://example.comm/doc.html#link link, per offline discussion with a reader of the book.

Clicking after scrolling broke.

@chrishtr, I'm not sure this is actually the best fix for this problem, but there's a problem with clicking on things after scrolling. Basically, open the browser to a large page (like the browser book), scroll down a few times, and click where a link used to be (before scrolling). We'll click on the link even though it's been scrolled offscreen.

As far as I was able to diagnose it, since we scroll on the browser thread, the Tab doesn't even know that it's been scrolled, and a click event from the Browser doesn't indicate how far it thinks it's been scrolled. We do render before executing the click event, but that doesn't update scroll, so we end up clicking on the wrong thing.

I can't imagine this bug is specific for Chapter 16. How did we not catch it before? Or did I miss something? I bet we broke something when we were changing between Tab and Frame, but I'm not sure what.

I have a branch called fix-scroll that fixes it by sending scroll tasks from the Browser to the Tab, but that's not ideal. Sometimes it causes the scroll position to jerk around, for example. Better would be to figure out what broke it and undo that.

Undo whitespace hacks once Chrome stable is shipped

And the fix for this bug is in the stable channel:

https://bugs.chromium.org/p/chromium/issues/detail?id=1499290

"User Style" and "User Agent Style" therms mixed up

Chapter styles.md (currently titled Applying User Styles) [1] mentions in marginalia [2] that they should be »"Technically called "User Agent" style sheets"« and also mentions that »The CSS standard also allows for browser extensions that set custom style sheets for websites.« [3]

This is unfortunate misinterpretation of the standard cascade (see https://www.w3.org/TR/css-cascade/#cascading) that involves several distinct origins of style sheets having precise names: "User Agent Styles", "User Styles" and "Author Styles". In brief:

the therm "User Agent Style" (UA style) denotes "default CSS that's in the browser that has 'weakest' normal declarations but 'strongest' !important declarations" [5];
the therm "User Style" denotes "CSS defined by the user to refine UA style normal declarations and possibly override Author styles using !important declarations";
the therm "Author Style" denotes that sits between "User normal" and "User !important property declarations.

As you can see, "User Style" coming from "User Origin" levels are something different from styles in HTML attributes (authored by the page Author) and "Browser Styles"

Hard to blame you; original naming is confusing: we have to be taught that "user agent" is superset of "browsers and bots and generally anything what is processing web content on users' behalf" and "User" is, well "the actual user that expresses their preferences and enters the cascade in dedicated origin 'slots'". Also current implementations and state of extension for "custom style sheets for websites" as well as the native support for real User Styles is probably not as developed as authors of the CSS imagined [4].

[1]

book/book/styles.md

Line 2 in a052da9

title: Applying User Styles

[2]

book/book/styles.md

Lines 510 to 511 in a052da9

    
           [^technically-ua]: Technically called a "User Agent" style sheet. User Agent, 
        
               [like the Memex](history.md).

[3]

book/book/styles.md

Lines 724 to 725 in a052da9

    
           to an alternate style sheet. The CSS standard also allows for [browser 
        
           extensions][userstyles] that set custom style sheets for websites.

[4] Most (all?) extensions for "User Style management" just injects styles after Author styles on the same origin level, because it is reportedly less confusing for users. Speaking of which, I'd recommend to not link the userstyles.org and current Stylish extension, since they are in a very bad shape and neglected or even abused by current owners. (

book/book/styles.md

Line 729 in a052da9

[userstyles]: https://userstyles.org

)

[5] that priority 10000 higher than normal statement about !important properties is just for the sake of simplified exercise I assume?

book/book/styles.md

Lines 1080 to 1081 in a052da9

    
           selector). Parse and implement `!important`, giving any property-value pairs 
        
           marked this way a priority 10000 higher than normal property-value pairs.

Offline Use

Can this book be read offline?

Wrong events for scrolling in Linux

In the mouse wheel exercise in chapter 2 it says <Mouse-4> and <Mouse-5> are the scroll events in Linux when they should be <Button-4> and <Button-5>.

Instructions to run the code (browser)

Firstly, a big thank you to the authors for writing this amazing book!

I understand that this book is a work in progress, but I am really curious to get the code for the browser up and running to see how the current version looked and behaved. I tried running the files in src with relevant dependencies installed and providing a URL as an argument. I got it to work somewhat (beside the SSL error and other internal errors) but its not obvious how to get a web page to be rendered, which is understandable since I haven't read the book so far into the version that I'm running.

But, it could be great if there were some instructions on setup and usage on README.md. Also, the dependencies could be listed in a requirements.txt file.

Chapter 7: unclosed <a> tag

Hello!

In Chapter 7 (chrome), there is this <a> tag that remains unclosed and seems to have been rendered as-is:

book/book/chrome.md

Lines 13 to 17 in 0a4f06f

    
           looking at. 
        
           <a name="hit-testing"> 
        
           Where are the links?

Which causes underlines when you hover over the body text nearby:

I want to say this is unintentional (?), but since this is also the chapter on handling hyperlinks, I wasn't entirely sure.

Should Chapter 10 print out JS output in browser.load()?

Hi! First of all, thanks for creating this amazing textbook.

I'm one of the TAs for the browser engineering course at UW that is based on this book this quarter, and our class found this expected Script returned: None but got nothing issue when running chapter 10 tests.

From our understanding of chapter 9 of the textbook, this line of code print("Script returned: ", dukpy.evaljs(body)) is only intended to test the JS running for the first time and shouldn't be kept in the final product. Also, as shown later in chapter 9 textbook, the printing out line of code is replaced by self.js.run(body) (https://github.com/browserengineering/book/blob/main/book/scripts.md#handling-crashes).

If our understanding is correct, could you update the chapter 10 lab code and chapter 10 test accordingly? Thanks a lot!
We've updated our test files on our GitLab.
Also, the later lab code might also have this line of code.

Add explanations of replaced elements

This could be in a "go further" section, an exercise, or a more advanced chapter.

Place a page's github link on the page

i believe providing a link to each page's location in github would greatly ease the ability to contribute and in turn increase the quality of the book

Chapter 10: can't write to the guestbook after adding the SameSite check

After adding the SameSite cookie check to the toy browser, I can't write messages on the guestbook server running locally, for example http://localhost:8000. I was able to reproduce the issue also with src/server10.py and src/lab10.py in this repository.

In the following code, host doesn't contain a port number while top_level_host does when both url and top_level_url are http://localhost:8000/. As a result, allow_cookie gets False for POST requests.

def request(url, top_level_url, payload=None):
    # ...
    if host in COOKIE_JAR:
        cookie, params = COOKIE_JAR[host]
        allow_cookie = True
        if top_level_url and params.get("samesite", "none") == "lax":
            _, _, top_level_host, _ = top_level_url.split("/", 3)
            allow_cookie = (host == top_level_host or method == "GET")

Github Project Link on home page

It is common place for documentation to place a link to it's repo on the home page. currently it only exists on the preface.

Introduce Browser.load earlier

The early chapters of the book thread together lexing, parsing, and so on in the entry point. This is confusing to some readers and also means that all of that is renamed to a method on Browser midway through the book. It seems like instead we should introduce Browser.load in Chapter 2, and pass the page source itself as a field on Browsers instead of as an argument to layout.

Change style of <code>

in this text block are at least 3 <code> inline block, but they are really hard to differentiate.
I think it would be nice to change the color or add a small border or add an background as well, not just the font to monospace

Leaving a comment in the feedback tools won't work in a narrow window

Consider adding testcases to go with exercises

It's can be challenging enough to understand what the exercises are getting at in the book. Perhaps we should provide example
pages for exercises that the reader can test against.

This feedback appeared in the book next to the Encoding exercise in chapter 1.

We don't support our own iframe

If you use the Chapter 15 browser to load https://browser.engineering/, it crashes when trying to render a button that has HTML inside it:

 <button tabindex="0" class="butto" rightbutton="" primary="" subscribe-btn"="" type="submit">
   <b class="button-tex" "="">
     'Subscribe'

That's because our browser assumes a button only has text inside. We don't need to solve this the "right" way (that's an exercise) but we should at least make it not crash!

Next link is missing in Chapter 10

First of all, thank you for creating this amazing book! I've been having a lot of fun reading it and doing exercises.

I just found that Chapter 10 doesn't have a link to the next chapter.

Reflow chapter does not reflow heights upwards

The reflow chapter splits layout into two phases: layout1 to compute widths/heights and layout2 to compute positions. Most changes require rerunning layout1 on affected elements and layout2 on everything; the split saves time during word wrapping.

But layout1 needs to be rerun on the ancestors of a changed element to recompute their heights if children got taller / shorter. The text also needs to be improved so that readers don't make this mistake conceptually!

In widgets, after it's done, hitting "Back" goes back two steps.

Define algorithm to create anonymous wrapper boxes for inlines

Per offline discussion: to avoid confusion in the block layout chapter, add a section near the beginning about how inlines with block siblings are wrapped in anonymous blocks. These anonymous blocks receive default styles, plus inherited properties, plus a display used style that depends on the context. See code for many examples.

Stop using <base>

I thought using <base> would make it easier to maintain the draft, blog, and other pages on the site, but it has a major problem: those pages don't work well in our toy browser, which doesn't support <base>. I think we'll just need to not use <base>.

Verify locking behavior during load of or cycle through tabs

Consider adding clarity around default HTML form method

Hi there,

First off, thank you for your time and effort on this project! I wish I had this as a course in college :)

I've noticed that the book uses a default <form> method of POST, which is contrary to the apparent [recommendation](https://www.w3.org/TR/html401/interact/forms.html#submit-format (spec?) from W3C of using the "get" method.

Not a big deal, but in practice it caused me brief confusion. In Firefox, I opened a file://form.html with the contents:

<html>
<head>
</head>
<body>
    <div>
        <form action="/submit">
            <p>Name: <input name=name value="hello"></p>
            <p>Comment: <input name=comment value="world"></p>
            <p><button>Submit!</button></p>
        </form>
    </div>
</body>
</html>

I was attempting to use the Firefox devtools to check the contents of the POST request. Firefox, however, made a GET request with URL params instead. I wonder if this is a bit of implicitness and complexity that could be removed for students? Either by:

Changing the content to describe a GET request for the form submission
Change the HTML provided to explicitly specify <form action="/submit" method="post">

Feedback on /visual-effects.html

A reader writes, referring to the paragraph that starts "Now, in paint_visual_effects, we can use ClipRRect instead of destination-in blending":

why was or needs_opacity removed from should_save like above? Isn't this required if i have an element with only an opacity attribute?

I think the actual reason is that the needs_opacity was moved into SaveLayer, but we should confirm and then add a footnote or other clarification.

Skia-python can't be installed on Python 3.12

I'm not sure why this is, but when I try to install skia-python, version 87.5 (which continues to be the current version, I think!), I get this:

ERROR: Could not find a version that satisfies the requirement skia-python==87.5 (from versions: 119.0b4, 120.0b5, 121.0b6)
ERROR: No matching distribution found for skia-python==87.5

Updating the requirements.txt to ask for one of the above versions is doable and works, but it skeeves me out a bit that they're marked "beta".

Ch2 add additional info for linux user on installing tkinter

https://realpython.com/python-gui-tkinter/

Broken link to attribute grammars

Hi,

In https://browser.engineering/layout.html, there's a note-box about attribute grammars with a broken link to https://browser.engineering/wiki-atgram

Maybe it was supposed to link https://en.wikipedia.org/wiki/Attribute_grammar? (from the wiki part in the link?)

Thanks!
Bruno

Feedback on /intro.html

This sentence:

What’s amazing is that, despite the scale and the pace and the complexity, there is still plenty of room to contribute.

in intro.md is repeated almost exactly the same twice from 2 paragraphs ago. That's definitely a bug, we should fix it.

	[^technically-ua]: Technically called a "User Agent" style sheet. User Agent,
	[like the Memex](history.md).

	to an alternate style sheet. The CSS standard also allows for [browser
	extensions][userstyles] that set custom style sheets for websites.

	selector). Parse and implement `!important`, giving any property-value pairs
	marked this way a priority 10000 higher than normal property-value pairs.

browserengineering / book Goto Github PK

book's Introduction

Web Browser Engineering

Building the book

Running the browser

Rebuilding the quiz JavaScript

book's People

Contributors

Stargazers

Watchers

Forkers

book's Issues

Recommend Projects

Recommend Topics

Recommend Org