Giter Site home page Giter Site logo

oriansj / stage0 Goto Github PK

View Code? Open in Web Editor NEW
891.0 891.0 51.0 1.53 MB

A set of minimal dependency bootstrap binaries

License: GNU General Public License v3.0

Assembly 59.08% C 35.97% Shell 0.05% Python 3.01% Makefile 1.19% Forth 0.49% Scheme 0.20%
bootstrap-process bootstrappable hex0

stage0's Introduction

The master repository for this work is located at: https://savannah.nongnu.org/projects/stage0/

If you wish to contribute:

pull requests can be made at https://github.com/oriansj/stage0 and https://gitlab.com/janneke/stage0 or patches/diffs can be sent via email to Jeremiah (at) pdp10 [dot] guru or join us on libera.chat’s #bootstrappable or update the wiki at https://bootstrapping.miraheze.org/wiki/Stage0

Those wishing to work on POSIX ports of stage0 can do so here: https://github.com/oriansj/stage0-posix Those wishing to do work with CPM/DOS porting let me know (I have some easy work for you)

Goal

This is a set of manually created hex programs in a Cthulhu Path to madness fashion. Which only have the goal of creating a bootstrapping path to a C compiler capable of compiling GCC, with only the explicit requirement of a single 1 KByte binary or less.

Additionally, all code must be able to be understood by 70% of the population of programmers. If the code can not be understood by that volume, it needs to be altered until it satisfies the above requirement.

Also found within

This repo contains a few of my false start pieces that may be of interest to people who want to independently create the root binary. I welcome all bug fixes and code that aids in the above stated goal.

A link to the successful POSIX ports for x86, AMD64 and AArch64 in the POSIX submodule

FYI

I’ll be adding more code and documentation as I build pieces. ALL code in this REPO is under the GPLv3 or Later.

In order to build stage0 and all the pieces, one only needs to run make all. Each individual piece can be built by simply running make $piece with $piece being replaced by the actual part you want to make.

The only pieces that have any external dependencies are the Web IDE (Python3+CherryPy), libvm (GCC) and vm (GCC+GNU getopt) Those wishing to work in Python, please checkout https://github.com/markjenkins/knightpies He does an amazing job

Future

Software

Add more ports to more hardware platforms.

Hardware

Implement the Knight processor in FPGA and then convert into TTL.

Need to know information

This repository utilizes submodules, so you need to clone this repository using `git clone –recursive`. If you have already cloned it run `git submodule update –init` or after a pull be sure to do: `git submodule update –recursive`

stage0

The stage0 is the ultimate lowest level of bootstrap that is useful for systems without firmware, operating systems nor any other provided software functionality. Those with such capabilities can skip this stage as it requires human input.

Hex0_monitor

The Hex0_monitor provides dual functionality:

  1. It assembles hex0 programs manually typed in
  2. It writes the characters, providing minimal text input functionality.

The first is essential for creating of the root binaries. The second is essential for creating source files before you have an editor. The distinction is important because only the Hex0 assembler in stage1 is built by the Hex0_monitor and from that point onwards it is used as a minimal text editor until a more advanced text editor can be bootstrapped.

stage1

The stage1 is dependent on the availability of text source files and at least a hex0 monitor or assembler. The steps in this stage can be fully automated should one trust their automation or performed manually on any hardware they trust.

Regardless of which method selected, the resulting binaries MUST be identical.

Hex0

The Hex0 assembler or stage1_assembler-0 is the head node of the stage1 bootstrap. Its functionality is reduced compared to the stage0 monitor simply because it only performs half of the required functions; that of generating binaries from hex0 source files.

Its most important features of note are: ; line comments and

As careful notes are essential for this stage.

Hex1

The Hex1 assembler or stage1_assembler-1 is the next logical extension of the Hex0 assembler, single character labels and relative displacement using a prefix. In this case labels start with : thus the label a must be written :a and the prefix for relative offsets is @ thus the pointer must be written @a Further because of the mescc-tools standardization of syntax @label indicates a 16bit relative displacement.

Alternative architectures porting this need not limit themselves to 16bit displacements should they so choose, rather they must provide at least 1 size of displacement or if they so desire, they may skip and write their Hex2 assembler in Hex0 but as it is a much larger program, I recommend against it.

Hex2

The Hex2 assembler or stage1_assembler-2 or hex2_linker is as complex of a hex language that is both meaningful and worth the effort.

Hex2’s important advances over Hex1 are as follows: Support for long labels (Minimal 42 chars, ideally unlimited) Support for Absolute addressing ($label for 16bit absolute addresses) Support for Alternative pointer sizes (%label for 32bit relative and &label for 32bit absolute addresses)

Optionally support for !label (8bit relative addressing) and ?label (Architecture specific size/properties) and/or @label1>label2 %label1>label2 displacements may be implemented should the specific architecture require it for human readable hex2 source files (such as ELF headers).

M0

M0 or M0-macro or M1-macro is the minimal string replacement program with string processing functionality required to convert an Assembly like syntax into Hex2 programs that can be compiled. Its rules are merely an extension of Hex2 with the goal of reducing the amount of hex that one would need to write.

The 3 essential pieces are:

  1. DEFINE STRING1 HEX_CHARACTERS (No extra whitespace nor \t or \n inside

definition)

  1. “Raw strings” allow every character except ” as there is no support for

string escapes, including NULL; which are converted to Hex chars for Hex2 To convert back to the chars inside of the “quotes” with the addition of a trailing NULL character or the number desired (Must be at least 1, no upper bound) and restrictions such as padding to word boundaries are acceptable.

  1. ‘Raw char strings’ will be passing anything inside of them (except ’ which

terminates the string).

Thus by combining :label, @label, DEFINE SYSCALL 0F05, Raw strings and chars; one has created a rather flexible and powerful Assembler capable of building far more ambitious pieces in “Macro Assembly”.

stage2

The stage2 is dependent on the availability of text source files and at least a functional macro assembler and can be used to build operating systems or other “Bootstrap” functionality that might be required to enable functional binaries; such as programs that set execute bits or generate dwarf stubs.

FORTH

Because a great many people stated FORTH would be an ideal bootstrapping language the time and effort was put forth by Caleb and Jeremiah to provide a framework for those people to contribute immediately; thus the FORTH was born.

Several efforts were taken to make the FORTH more standard but ultimately it was determined, Assembly was preferable as the underlying architecture wasn’t total garbage.

It now sits waiting for any FORTH programmer who wishes to prove FORTH is a real bootstrapping language.

Lisp

The next recommendation in bootstrapping was Lisp, so efforts were taken to design the most minimal Lisp with all of the functionality described in the original Lisp papers. The task was completed relatively quickly compared to the FORTH and even had enhancements such as a compacting garbage collector.

Ultimately it was found, the lisp that many rave about isn’t entirely compatible with modern lisps or schemes; thus was shelved for any Lisper who wishes to pick it up.

C

After being told for months there is no way to write a proper C compiler in assembly and months of research without any human written C compilers in assembly found. To prove the point Jeremiah decided the First C compiler on the bootstrap would actually be a cross-compiler for x86, such that everyone would be able to verify it did exactly what it was supposed to and see it self-host its C version.

stage3

The stage3 is dependent on the availability of text source files and at least a functional M2-Planet level C compiler, FORTH and a Minimal Garbage collecting Lisp and can be used to build more advanced tools that can be used in bootstrapping whole operating systems with modern tool stacks.

initial_library

A library collection of very useful FORTH functionality designed to make the lives of any FORTH programmer easier.

It now sits waiting for any FORTH programmer who wishes to build upon it.

ascension

A library collection of useful Lisp functionality designed to make the lives of any Lisp programmer easier.

As it depends on archaic Lisp dialect; it will likely need to be replaced should the Lisp be properly fixed.

blood-elf_x86

The x86 program for a dwarf stub generator used in mescc-tools bootstrapping. Specifically mescc-tools-seed generation, which can be used to build M2-Planet and thus complete the circle.

get_machine_x86

The trivial x86 program that allows one to skip tests or scripts that will not run on that specific platform or run alternative commands depending upon the architecture.

hex2_linker_x86

The program that allows one to build the hex2 programs for any hardware platform on x86 and thus verify software builds for hardware one does not even have.

M1-macro_x86

The program that allows one to build the M1 program for any hardware platform on x86 and thus verify software builds for hardware one does not even have.

M2-Planet_x86

The x86 port of the M2-Planet C compiler v1.0 used as one of the paths in bootstrapping M2-Planet on x86 hardware.

Inspirations

This work wouldn’t have come so far without the inspirational work of others They are in alphabetical order of the Author’s last names

GRIMLEY EVANS, Edmund - bcompiler [http://homepage.ntlworld.com/edmund.grimley-evans/bcompiler.html] :: The inspiration for hex0, hex1 and hex2 GRIMLEY EVANS, Edmund - cc500 [http://homepage.ntlworld.com/edmund.grimley-evans/cc500] :: The inspiration for M2-Planet Jones, Richard W.M. - jonesforth [http://git.annexia.org/?p=jonesforth.git] :: The inspiration for stage2 FORTH and initial_library Piner, Steve and Deutsch, L. Peter - Expensive Typewriter [http://archive.computerhistory.org/resources/text/DEC/pdp-1/DEC.pdp_1.1972.102650079.pdf] :: The inspiration for SET kragensitaker - The Monitor [https://old.reddit.com/r/programming/comments/9x15g/programming_thought_experiment_stuck_in_a_room/c0ewj2c/] :: The inspiration for the hex0-monitor

stage0's People

Contributors

0xflotus avatar banyc avatar bmwiedemann avatar dgpv avatar exonorid avatar fgeorgatos avatar janneke avatar manzonigiuseppe avatar markjenkins avatar no-identd avatar oriansj avatar siraben avatar stikonas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stage0's Issues

Forth

No Forth interpreter or compiler has been implemented yet

stage0 and the c++ abomination

More and more ppl are starting to realize that we needed "stage0s" and alternative C (I mean C89 with benign bits of c99/c11) compilers... but the fact that g++ switched to c++98 instead of being implemented in a simple dialect of C is probably one of the biggest mistakes in software, ever. Instead of having only a few steps to bootstrap the gcc compilers, you will need 1 billion steps in order to reach a recent enough gcc to fit the tantrums of many recent pieces of software:
From the canonical gcc 4.7.4, how many gccs I have to compile to reach 11.2?

segfault

it does not run here. Debian10. amd64. 8-(
latest checkout: 48f098c

user@box:~/software/stage0$ make clean
rm -f libvm.so libvm-production.so bin/vm bin/vm-production
rm -rf test/stage0_test_scratch

user@box:~/software/stage0$ make all
cc -ggdb -DVM32=true -Dtty_lib=true -shared -Wl,-soname,libvm.so -o libvm.so -fPIC wrapper.c vm_instructions.c vm_decode.c tty.c
cc -ggdb -DVM32=true -Dtty_lib=true vm.c vm_instructions.c vm_decode.c tty.c -o bin/vm
./bin/vm --rom seed/NATIVE/knight/hex0-seed --tape_01 stage0/stage0_monitor.hex0 --tape_02 roms/stage0_monitor
make: *** [makefile:50: stage0_monitor] Segmentation fault

user@box:~/software/stage0$ file bin/vm
bin/vm: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=b92ea1b5d6d109c76dbe52c8029813d431386cd3, for GNU/Linux 3.2.0, with debug_info, not stripped

Anything i can debug further?

Code duplication

Unrequired code duplication exits between VM library and the example VM

LISP

No lisp interpreter or compiler has been implemented yet

C

No C compiler has been implemented yet

ascii_comment from stage_assember-0.s expects newline prior to EOF

The ascii_comment routine from stage1/stage1_assembler-0.s expects a stream of characters followed by a newline. It doesn't anticipate the possibility of end of stream happening before a final newline.

For example

$ echo -n '#' > tape_01
$ ./bin/vm-minimal roms/stage1_assembler-0

causes an infinite loop as the -1 returned by fgetc never matches the condition of being a newline.

I found this by doing testing with the technique of fuzzing for my knightpies project.

This could be argued to be a non-bug, as always ending in a newline could be considered part of the hex0 file format. For now I'm going to incorporate that assumption into my fuzzing. More graceful handling of this edge case would be ideal, but I recognize that would increase the amount of instructions used.

I figured I'd file a bug report so this could at least be on the record as "won't fix" if that's the right stance to keep the hex and assembler minimal.

stage0_monitor.s has a similar issue. It's pretty safe to assume an interactive session is going to end with newline prior to end of stream with a normal user. But, if we set that assumption aside then there is an inconsistency worth noting:

$ ./bin/vm-minimal roms/stage0_monitor > output
abc^d^d
$ hexdump -C output
00000000  61 62 63 ff
Computer Program has Halted
After Executing 118 instructions

where ^d is control+d, which I have to do it twice to actually exit. The 0xFF is -1 (from fgetc) encoded in 8 bits.

So that at least exits. Whereas:

$ ./bin/vm-minimal roms/stage0_monitor > output
#^d^d^d^d....^d

never exits and continues to fputc the -1 from fgetc indefinitely.

Assembler

Full assembler not yet implemented nor has the bootstrap assembler reached sufficient level of complexity

high level prototype stage1_assembler-1.c has wrong relative offset

The C prototype hex1 encoder/assembler in stage1/High_level_prototypes/stage1_assembler-1.c adds 4 bytes to the relative addresses output by the @ symbol, see line 42 and 43.

This isn't consistent with stage1/stage1_assembler-1.s, the implementation of vm_instructions.c/vm_decode.c, the checksum in test/SHA256SUMS, and running stage1_assembler-2.hex1 as input to roms/stage1_assembler-1

$ ln -s stage1/stage1_assembler-2.hex1 tape_01
$ ./bin/vm-minimal roms/stage1_assembler-1
$ sha256sum tape_02
2c02c50958f489a660a4915d2a9e207a0c61f411d42628bdaf4dcf6bf7149a9d  tape_02
grep stage1_assembler-2 test/SHA256SUMS 
2c02c50958f489a660a4915d2a9e207a0c61f411d42628bdaf4dcf6bf7149a9d  roms/stage1_assembler-2

I discovered this while implementing my own hex1 encoder/assembler in python
markjenkins/knightpies@bc7b571
First I copied the behavior of stage1_assembler-1.c and I discovered the output was off just where @ is used and by 4. Once I corrected that I was able to get my python implementation consistent with the stage0 assembly.

Pull request with a fix to the C prototype coming right up.

EDITOR

The minimal editor hasn't been ported to assembly yet

M0-macro.s NULL pointer dereference in 16 bit mode

In developing knightpies I found issues running M0-macro.s in 16 bit and 64 bit mode. I used stage0/High_level_prototypes/defs plus smaller .s files such as stage_monitor.s as input. In 32 bit mode I was able to process these with 48k of memory.

I suspected the issue was that program memory was being written to. So, similar to the outside_of_world guard, I started implementing inside_of_world to test for write inside of program memory or instruction reads outside of program memory.
https://github.com/markjenkins/knightpies/compare/readonlyprogrammem

This showed null pointer dereference at places such as STORE32 R2 R4 12 (0x2b0 ) in setExpression when combined with stage0 Release_0.4.0 .

I've found the same result by checking for NULL pointer when STORE32 is called in the stage0 C emulator.
master...markjenkins:M0nullpointerbug

A simple JUMP.Z guard at the start of setExpression can allow M0-macro to run in 16 bit mode, but the output ends up corrupted, and in any event, R13 shouldn't be NULL on the first iteration through Line_Macro_0 .

I'm going to be looking into this. Part of what I'll do is improve my optional program area guard code for inappropriate read and write accesses in both the knightpies and stage0 implementations. At this point I can't yet say if this is an assembler bug or emulator implementation bug common to knightpies and stage0. I am seeing less issues with stage0 Release_0.4.0 than Release_0.2.0.

I'm interested in contributing to 16 bit support. It helps me test knightpies more vigorously and challenges my assumptions about these instructions, particularly especially when I write register size optimized versions of the instructions and compare. I also like the idea of seeing a 16 bit knight machine constructed at some point due to the lower TTL count vs 32 bit. I appreciate this will require a rewrite of M0-macro.s and cc_knight-native.s to support larger inputs and a low memory rewrite will be a more challenging for people to read. In my mind, better to support this with a separate M0-macro-lowmem.s and cc_knight-native_lowmem.s than not at all. That this is possible in a 16bit address space in two passes with only things like symbol and struct definition tables being there in the first pass is something makes C a cool language.

But in any event, M0-macro.s as it is currently written should be able to operate in a 16 bit address space with small inputs seeing how it works fine in 32 bit mode with 48K memory and small inputs.

M0 doesn't work for me

I'm using musl-based, x86_64 VoidLinux, and compiled hex and vm to produce stage0_monitor, stage1_assembler-0, stage1_assembler-1, stage1_assembler-2, and M0 all with the correct SHA256 sums. But then I run into this problem:

cat ../stage0/High_level_prototypes/defs ../stage0/stage2/lisp.s > lisp.S
./vm --rom ./M0 --tape_01 ./lisp.S --tape_02 lisp.hex2 --memory 1M

yields

Invalid instruction was recieved at address:00000052
After 207093 instructions
Unable to execute the following instruction:
        31100100

No experience at all in assembly or any low-level programming, so I'm not sure if I'm doing something wrong or if it's a bug. Though if you're willing to guide me I'm happy to explore. Really impressed by this project, so I hope this is helpful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.