Giter Site home page Giter Site logo

Comments (14)

akkartik avatar akkartik commented on May 24, 2024 1

I'm starting to grow less excited about this whole thread. For multiple reasons:

a) The new stack syntax adds a new gotcha to compensate for the gotcha it protects us from. You have to make sure you never exit except through the }. Otherwise the stack gets mismatched.

b) It seems to increase the reader's burden to have an additional 'language' that code in the repo may be written in. The alternative would be to treat the new syntax sugar as part of core SubX, support it in the C++ version, rewrite all our SubX code to use it, and treat any new phases as part of the core. That seems like a lot of work for unclear benefit, since the amount of progress we've made is a sort of existence proof that maybe SubX without the extra sugar isn't so bad after all.

c) Rather than attack gotchas one by one, we should just start on a new language. A memory-safe statement-oriented language implemented in SubX where each statement maps to a single x86 instruction.

In other words, I'd rather this be the next syntax:

var x : slice
...

than this:

{ 0 0 ->%ecx
...
}

from mu.

akkartik avatar akkartik commented on May 24, 2024

Ok, I have a broader proposal: create syntax not for function calls but for stack management.

Stack management is a crucial part of the book-keeping involved in Assembly programming, and it would be great if explicit push instructions become code smells if not utterly disallowed.

Currently we use push for three kinds of things:

a) Defining local variables (which we then must remember to clean up before c3/return, because otherwise we lose our return address.
b) Calling functions with arguments (which we must then remember to clean up after the callee returns)
c) Spilling registers to be reused later.

Here's an example syntax to support all 3: create a rudimentary stack-based language for lines beginning with some special token, say {. Such lines can have two kinds of expressions:

  1. push something (rm32, imm32) to the stack
  2. save the value of ESP to some rm32

Later a line with a } would restore the stack to the same level as before the corresponding {.

For example:

{ 0 0 ->%ecx
...
}

This is equivalent to:

68/push 0/imm32
68/push 0/imm32
89/copy %ecx 4/ESP/r32
...
81 0/subop/add %esp, 8

Which is basically what you need to define a local variable (say a slice).

A function call:

{ %ecx "foo" %edx
e8/call foo/disp32
}

which is equivalent to the pseudocode:

push %ecx
push "foo"/imm32
push %edx
e8/call foo/disp32
81 0/subop/add %esp, 8

from mu.

akkartik avatar akkartik commented on May 24, 2024

Hmm, this syntax is interesting, but it makes the original tailor-exit-descriptor scenario pretty terse and awfully hard to spot. For example, assuming the address to the exit descriptor is currently in ECX:

# call f(x, y, ed, z) that may call stop(ed) at some point
{ z ed y x ->*ed  # last word tailors 'ed'
e8/call f/disp32
}

The only difference between a local variable and tailoring is that % turns into *.

from mu.

charles-l avatar charles-l commented on May 24, 2024

create a rudimentary stack-based language for lines beginning with some special token, say {

I really like this idea. Keeping the stack balanced is error prone, and this gives us more control over function calls than the normal function call syntax (i.e. whether we use disp32/disp8).

In terms of syntax, I think being able to reference the stack frame/scope by name would be handy:

{|stack1| 1 2 3 {|stack2| 3 *(stack1+4) 5   call blah}}
{|stack1| 1 (stack1+12) call some-func-that-uses-ed}
{|stack1| 1 (stack1 + stack1.retaddr) call some-function-that-uses-ed} # if we calculate the return address for every stack frame, just pass the return addr

I think the stack location can then be computed from the lexical location in the file (i.e. we know how many words are on the stack, so we can determine statically what the offset is to the address).

from mu.

akkartik avatar akkartik commented on May 24, 2024

That is interesting, but where would these stack1 variables be stored? This may be harder than it seems at first glance.

I like how you've put the entire {...} on a single line. If it's not too hard I'd like to provide that single-line alternative. But we still need to support multiple lines between the {...}.

from mu.

charles-l avatar charles-l commented on May 24, 2024

I was imagining something like this:

{|stack1| 1 2 3 {|a-call| %ebx *(stack1-4) e8/call somefunc}

=>
# Stack #
| 0x01          | <- stack1 is a pointer to this
| 0x02          |
| 0x03          | 
| %ebx val      | <- a-call points here
| *((ebp+32)-4) | # we know this is ebp-32 since there are only 4 words on the stack at this point in the program

Essentially it's just a label for the stack, which I think works(?).

from mu.

akkartik avatar akkartik commented on May 24, 2024

Yeah, mostly makes sense. The question in my mind is: where is stack1 allocated? Is it a purely translation-time variable? Your first example also seemed to use it in complex ways like stack1 + stack1.retaddr.

from mu.

charles-l avatar charles-l commented on May 24, 2024

Is it a purely translation-time variable?

Yeah. That's what I was thinking.

stack1.size or stack1.len is probably a better name. It just becomes the length of the stack (which we can be evaluated at compile time). This of course doesn't work with vararg functions and dynamic stack manipulation, but I feel like those are less common cases.

If a dynamic stack is required, I think something like this would work:

{|stack1| %esp 1 2 3 *dynamic pushes* ...
# length is required now
%esp - (*stack1) # calculate dynamically by subtracting the old esp with the current esp
}```

from mu.

akkartik avatar akkartik commented on May 24, 2024

Ok, I see.

It looks like your examples have expressions on multiple lines? Maybe this is a completely new language rather than just sugar?

from mu.

charles-l avatar charles-l commented on May 24, 2024

Yeah, I was looking at the syntax I came up with the other day and I realized that it attempts to solve two problems: memory labeling (poorly) and stack balancing.

With the new approach are you thinking it’ll do stack balancing automatically since it should be memory safe?

from mu.

akkartik avatar akkartik commented on May 24, 2024

You know, that's a good question. It's my top priority, and I think I'll have to violate my "1 instruction per line" design constraint to achieve it.

But yes, that's the plan.

from mu.

akkartik avatar akkartik commented on May 24, 2024

Today, though, I'm enamored with the idea of a tiny Lisp interpreter. It's not going to be the final goal, but it would just be so cool to be able to type commands at init. And should be fairly quick.. We're due for some fun.

Any fun little projects you want to try?

from mu.

charles-l avatar charles-l commented on May 24, 2024

Today, though, I'm enamored with the idea of a tiny Lisp interpreter. It's not going to be the final goal, but it would just be so cool to be able to type commands at init. And should be fairly quick.. We're due for some fun.

That definitely could be fun. I've not implemented lisp in asm before (though I guess it has been done before which might be a handy reference).

I've been interested in Forth recently (particularly because of how simple it is to implement in asm, and because it allows mixing asm code with interpreted code). I started implementing a Forth in nasm a few months back (https://git.sr.ht/~nch/onward/tree/master/onward.s), but never quite finished it. Now might be the time for me to port it to subx and finish it off :)

from mu.

akkartik avatar akkartik commented on May 24, 2024

Excellent idea!

from mu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.