Giter Site home page Giter Site logo

akkartik / mu Goto Github PK

View Code? Open in Web Editor NEW
1.4K 39.0 46.0 90.09 MB

Soul of a tiny new machine. More thorough tests → More comprehensible and rewrite-friendly software → More resilient society.

Home Page: http://akkartik.name/akkartik-convivial-20200607.pdf

License: Other

C++ 6.30% Shell 0.54% C 0.31% Emacs Lisp 0.02% Assembly 70.45% Forth 22.14% Less 0.01% Python 0.05% HTML 0.01% Vim Script 0.18%
programming-literacy white-box-testing linux x86 unit-tested machine-code

mu's Issues

running on arm

I'm just curious if mu can be potentially be targeted for arm processors and how much work that might take.

Mu: Can't build due to linker error

willow% CFLAGS="-O0" ./mu
updating mu_bin
-- /home/max/Documents/Dev/mu/.build
-- /home/max/Documents/Dev/mu/termbox
-- /home/max/Documents/Dev/mu
/usr/bin/ld: termbox/libtermbox.a(utf8.o): relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
willow% uname -a
Linux willow 4.15.0-12-generic #13-Ubuntu SMP Thu Mar 8 06:24:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
willow% lsb_release -irc
Distributor ID:	Ubuntu
Release:	18.04
Codename:	bionic
willow% 

Reduce idle cpu consumtion (using the hlt instruction)

This is a feature request: Currently, when waiting for keyboard input, Mu uses a busy loop (which you can tell by the high CPU usage in qemu, for example in the shell app).

On x86, idling is pretty simple, by using the HLT instruction.

Typo in mu/opcode

Hello,

I think there is a typo in mu/opcode :

opcode:

0f 86: jump disp8 bytes away if lesser or equal (unsigned), if ZF is set or CF is set (jcc/jbe/jna)

should be:

0f 86: jump disp32 bytes away if lesser or equal (unsigned), if ZF is set or CF is set (jcc/jbe/jna)

as every opcode around it. Maybe a missed modification from a copy/paste of opcodes 72 to 7f?

Support the x86 carry flag `CF`

When I selected the x86 instructions SubX would support I excluded the carry flag on the basis that higher-level languages didn't really provide access to it. However, it turns out it is needed for comparing unsigned integers. The current opcodes for jumping on lesser/greater rely on the sign flag which is set purely by treating comparison operands as signed integers. When comparing unsigned integers (say two addresses), we need to check the carry flag.

http://unixwiz.net/techtips/x86-jumps.html makes this obvious, whereas the doc I've been consulting so far (https://c9x.me/x86/html/file_module_x86_id_146.html) refers to the distinction in only a single sentence:

The terms "less" and "greater" are used for comparisons of signed integers and the terms "above" and "below" are used for unsigned integers.

Highish-level syntax for dependency-injected abort

SubX currently allows one to test the exit() syscall. It does so using a dependency-injected wrapper called stop that takes an exit-descriptor as an argument. If the exit-descriptor is null the program really exits. If it is created using tailor-exit-descriptor stop unwinds the stack until the frame that called tailor-exit-descriptor.

But tailor-exit-descriptor is klunky. The way it currently works is, you pass in the number of args of the function that's going to get passed in the exit-descriptor, and it computes where on the stack the return address for the current stack frame is going to be, saving it to the exit-descriptor. That way all stop has to do is set ESP to the return address in the exit-descriptor and then call c3/return.

If we moved to a more HLL syntax where function calls were all in a single line, we'd like to make tailor-exit-descriptor cleaner or maybe do away with it altogether.

One easier way to explain tailor-exit-descriptor is that it is equivalent to a special function call. Now for background, regular calls in SubX look like this:

  • push all args
  • call
  • discard args (say by adding a constant to ESP)

The special call to a function that may want to call stop looks like this:

  • push all the args (which must include the address to the exit-descriptor)
  • save ESP-4 to the exit-descriptor
  • call
  • discard args

Only the second step is new. But we now get to replace all of tailor-exit-descriptor with a single instruction (assuming the address of the exit-descriptor is always in a register).

Now my question becomes: what does a syntax for this special call look like?

If regular calls look like f(arg1, arg2, ... argn), then some possibilities:

ed = tailor-exit-descriptor(...total size of args to f...)
f(arg1, arg2, ...ed, ...argn)
w/exit-descriptor(ed) f(arg1, arg2, ...ed, ...argn)

I don't like either, but they do have some benefits. The first allows for a single exit-descriptor to be reused across function calls in the same stack frame (totally safe). The second is all on one line to indicate that conceptually it's all a single call.

On the other hand, the second now looks like two operations on a single line, which is confusing and potentially sets a bad precedent. So I wonder what sort of approach we may take that makes it look less like two calls in a single line.

  1. Baroque:
w/exit-descriptor{|ed| f(arg1, arg2, ...ed, ...argn) }

That's a lot of grammar for just one construct.

try f(arg1, arg2, ...ed, ...argn)

We'd need to somehow figure out where ed is.

protect[ed] f(arg1, arg2, ...ed, ...argn)
f<ed>(arg1, arg2, ...ed, ...argn)
  1. Some other random ideas: perhaps we should try to generalize the syntax with any other operations that involve munging the stack. Maybe closures or more general exceptions or first-class continuations.

Anyways, that's my brain dump.

cc @charles-l

undefined reference to...

First of all, neat project. Second of all, I get the following list of errors:

ld: common.o: in function sprintf_va': common.c:(.text+0x836): undefined reference to __stack_chk_fail_local'
ld: common.o: in function sprintf': common.c:(.text+0x89b): undefined reference to __stack_chk_fail_local'
ld: common.o: in function printkf': common.c:(.text+0x96e): undefined reference to __stack_chk_fail_local'
ld: common.o: in function getCpuFlags': common.c:(.text+0xa4d): undefined reference to __stack_chk_fail_local'
ld: fatfilesystem.o: in function initializeFatFileSystem': fatfilesystem.c:(.text+0x96): undefined reference to __stack_chk_fail_local'
ld: fatfilesystem.o:fatfilesystem.c:(.text+0x2ef): more undefined references to __stack_chk_fail_local' follow ld: kernel.bin: hidden symbol __stack_chk_fail_local' isn't defined
ld: final link failed: bad value
make: *** [Makefile:10: kernel.bin] Error 1

In response to running the command: sudo ./gen_soso_iso init.soso examples/ex6.subx

Operating system is arch derivative.

translate_emulated produces no image on MacOS

Following the directions in the README on a MacOS machine, I get:

./translate_emulated life.mu

cat $* [0-9]*.mu  |linux/bootstrap/bootstrap run linux/mu                 > a.subx

./translate_subx_emulated boot.subx mu-init.subx [0-9]*.subx a.subx

cat $*            |linux/bootstrap/bootstrap run linux/braces                         > a.braces

cat a.braces      |linux/bootstrap/bootstrap run linux/calls                          > a.calls

cat a.calls       |linux/bootstrap/bootstrap run linux/sigils                         > a.sigils

cat a.sigils      |linux/bootstrap/bootstrap run linux/tests                          > a.tests

# no assort since baremetal SubX doesn't have segments yet

cat a.tests       |linux/bootstrap/bootstrap run linux/dquotes                        > a.dquotes

cat a.dquotes     |linux/bootstrap/bootstrap run linux/pack                           > a.pack

cat a.pack        |linux/bootstrap/bootstrap linux/survey_baremetal   > labels
nothing to do

... and no code.img is produced.

Opcode mnemonics for SubX

Giving mnemonics to x86 opcodes might make SubX easier for newcomers to read. This issue is intended to track the pros and cons.

Previous discussion: https://news.ycombinator.com/item?id=21268252#21293301

Initial list of pros and cons:

  • Pro: names give some indication (albeit imprecise) of the operation being performed.
  • Pro: names can be checked by tooling. Typoing 8d for 8f seems easier than typoing copy for pop.
  • Con: per-opcode mnemonics are an additional set of things for the new reader to understand and navigate. And cross-correlate with the Intel manual, since it only gives names to sets of opcodes.
  • Con: tooling may have a hard time giving a good error message when a user typos one variant of say add for another.

SubX: Tip in `subx help opcodes` doesn't work

willow% ./subx help opcodes     
Opcodes currently supported by SubX:
  01: add r32 to rm32
  03: add rm32 to r32
... SNIP ...
  0f 8e: jump disp16 bytes away if lesser or equal (ZF is set or SF != OF)
  0f 8f: jump disp16 bytes away if greater (ZF is unset, SF == OF)
Run `subx help instructions` for details on words like 'r32' and 'disp8'.
willow% ./subx help instructions
No help found for 'instructions'
Available top-level topics:
  usage
  registers
  opcodes
  syntax
Please check your command for typos.
willow%

Static checks for function outputs

Background

Mu (this repo) contains a compiler built in machine code, which converts a memory-safe high-level language into machine code. Since it's built in machine code, the compiler is intended to be as simple as possible. In particular, it performs no optimizations.

Most Mu statements translate to a single instruction of machine code. One consequence of this 1:1 constraint is that the language exposes registers, and requires programmers to manage them manually. However, the compiler will check registers, and raise errors if you read a variable after a different variable has clobbered its register.

Current design

Function headers currently specify the names, types and registers for their outputs. For example:

# example 1
fn foo -> result/eax: int {
  result <- copy 0x34
}

There is no return instruction.

Problems with the current design (Plan 0)

Failing to write outputs

We need some way to statically ensure that outputs are always written. Otherwise a caller may end up with access to random temporaries of any possible type, which violates the memory-safety guarantees I want:

# example 2
fn foo -> result/eax: int {
  {
    break-if-=
    result <- copy 0x34
  }
}

Detecting errors like this will require generating the control-flow graph for functions, and checking every possible path through them. I've built such analysis passes before, but never in machine code.

Outputs getting over-written

There's an even bigger problem: tracking variable lifetimes. Mu is designed so variable lifetimes nest perfectly like Russian dolls:

fn foo {
  var x/eax: int <- copy 0
  ...
  {
    var x2/eax: int <- copy 1
    ...
    {
      var x3/eax: int <- copy 2
      ...
    }  # x3 gets reclaimed here
    ...  # eax guaranteed to contain x2 here
    {
      var x4/eax: int <- copy 2
      ...
    }  # x4 gets reclaimed here
    ...  # eax guaranteed to contain x2 here
  }  # x2 gets reclaimed here
  ...  # eax guaranteed to contain x here
}

It's been relatively easy to do this in machine code. I just maintain a stack of variables and blocks, and I clean up all variables for a block when I encounter its }.

However, this nice property currently has a loophole due to function outputs. Consider this example:

# example 3
fn foo -> out/eax: int {
  var x/eax: int <- 0
  ...
  {
    var x2/eax: int <- 0  # should we push `x` to the stack here?
    ...
    break-if-=
    out <- copy 0x34
  }
  # Should `x` be accessible here?
}

Currently we save the state of registers when defining a new variable -- only if it wouldn't be written to by a function output later in the block. In example 3 above, we currently don't save x. However this strategy has two problems:

  1. Blocks can conditionally write function outputs. In this situation we lose access to x even in some situations where we could preserve it.
  2. Errors become hard to explain. "x isn't available here, because it may have been overwritten by out."

Back in March, my best attempt at clean error messages was to speak of live ranges in terms of static ranges of lines, ignoring control flow. Once an output is defined, you can't use other existing variables in that register in lower lines. (You can still shadow the output in new variables in nested blocks.) However, I've lately found myself breaking this rule as I write code. Fixed ranges of lines are easy in themselves, but they're also a very different rule from regular variables with shadowing, so it's been easy to forget the rule for outputs. So I need to add the static check to remind myself when I violate the rule. And I've been loath to get into that just because tracking possible control-flow paths seems so hard in machine code. Perhaps it's time to go back to the drawing board on this plan.

Redo how SubX lays out code/data segments in memory

Background

Right now SubX has two ways to determine the address of a code/data segment.

  • Programs can provide a starting address for segments:
== 0x0a0001000
...
  • Programs can name segments, in which case segments get hard-coded addresses assigned by SubX:
== code
...

Either way, the first segment defined must be code and contain instructions.

This approach has long seemed a mess. It's hard to explain, and there's a certain amount of historical evolution that led to this mess. I started out expecting programs to specify addresses because that made writing tests easier, and later found use in naming segments so that I could keep code and related data close together. However I couldn't easily retire the old approach because so many tests relied on specific instructions being at specific addresses.

The named segments approach 'works' for a simple ELF binary with two code/data segments, but hasn't really been used with more than two segments even though it seems to support it. As I revise my mental model of how the heap works in the presence of ASLR, this mess has been slowing me down.

New plan

Programs always specify the starting address of segments. But they only have to do so once, the first time the segment is encountered.

== code 0x0a0001000
...
== code
...

This is a single uniform syntax that should help write small tests and also deal with ASLR. Code segments no longer have to come first. The name code is special.

Migration

I've been reluctant to make changes here because we have to change the C++ version and also change how we bootstrap SubX in SubX. But I think it's doable if we first start with SubX in SubX. Currently assort.subx and dquotes.subx fail tests if I add starting addresses to segment headers. First task is to make them pass. In particular, assort.subx will require some changes to the table data structure.

Now that I've made this plan, it looks like it's not needed for building allocate out of mmap. So I'm going to set it aside and hopefully deal with that issue first.

Keyboard input stops working after moving the mouse

Reproduced as follows:

./translate shell/*.mu
qemu-system-i386 code.img -m 256 -snapshot -display gtk,zoom-to-fit=on

Initially the keyboard works. After clicking on the screen to lock the mouse and moving it, keyboard input stops working.

I've narrowed it down (while debugging another issue in v86): Mu turns off the mouse interrupts, but doesn't poll from the IO port. When a mouse event happens, the PS/2 controller sets the output buffer status bit, which prevents further interrupts (including keyboard) until port 60h is read.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.