Giter Site home page Giter Site logo

robertbendun / stacky Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 1.0 241 KB

Stack based programming language

License: Boost Software License 1.0

Makefile 1.53% Shell 2.54% C++ 93.43% Vim Script 2.51%
programming-language compiler cpp concatenative-language stack-based-language

stacky's Introduction

stacky

WIP stack-based compiled concatenative programming language.

Currently is developement is postponed, due to other projects (and university) with higher priority.

It's somewhere between B and C programming languages in universe where B was stack based concatenative language.

Features

  • Turing completness
  • Optional type checking
  • Low level OS access ability

Example

hello, world

# hello.stacky
"io" import

"hello, world" puts nl
$ stacky run hello.stacky
hello, world

puts current date

# date.stacky
"io"   import
"time" import

"Date is " puts

now  # returns seconds in Unix time
date # returns year, month, day, hours, minutes
	   putu "-" puts
	2 aputu "-" puts
	2 aputu " " puts
	2 aputu ":" puts
	2 aputu nl
$ stacky run date.stacky
Date is 2021-10-17 23:13

Language reference

Control flow

if ... else ... end

Consumes top of the stack. If value was nonzero, executes if branch, otherwise else branch.

"io" import
10 if "value is 10!" else "value is not 10 :c" end puts nl

while ... do ... end

While condition part (between while and do) is nonzero, executes repeatadly loop part (between do and end).

# puts numbers from 0 to 10
"io" import

0 while dup 11 = ! do
	dup .
	1 +
end

Functions

<name> fun <code> end creates function with given name and code block.

"io" import

say-hello fun "Hi, " puts puts "!\n" puts end

"Mark" say-hello # prints "Hi, Mark!"

Address of functions

&<name> puts <name> address onto stack, for example: &foo

Call operator

call calls function pointed by address on top of the stack.

"Mark" &say-hello call

Standard library

algorithm

  • uniform32 - (a b -- n) returns random integer n in range [a, b]

io

  • nl - putss newline to stdout
  • puts - (pointer --) - prints null terminated string to stdout
  • . - (uint --) - puts unsigned integer to stdout with a newline at the end

limits

  • Max_Digits10 - (-- uint) - returns maximum number of decimal digits that unsigned integer may have

time

  • now - (-- seconds) - returns number of seconds since Unix time

  • sleep - (seconds --) - sleeps given number of seconds

  • date - (seconds -- year month day hours minutes) - decomposes seconds since Unix time into human readable form

  • +minutes - (minutes seconds -- seconds) - adds minutes to seconds

  • +hours - (hours seconds -- seconds) - adds hours to seconds

  • +days - (days seconds -- seconds) - adds days to seconds

  • +weeks - (weekds seconds -- seconds) - adds weeks to seconds

  • -minutes - (minutes seconds -- seconds) - subtracts minutes from seconds

  • -hours - (hours seconds -- seconds) - subtracts hours from seconds

  • -days - (days seconds -- seconds) - subtracts days from seconds

  • -weeks - (weekds seconds -- seconds) - subtracts weeks from seconds

posix

Contains definition of constants related to POSIX compatible operating systems.

  • SYS_<syscall> - constants holding syscall numbers, like SYS_exit = 60
  • stdin, stdout, stderr
  • CLOCK_<type> - definition of clocks constants for clock_gettime syscall

Makefile

  • make install-nvim - installs Stacky's syntax highlighting for Neovim
  • make stacky - makes only compiler
  • make test - runs all tests
  • make clean - cleans all intermidiate files

Requirements

For Arch-based users:

pacman -S gcc binutils nasm boost

See also

  • Tsoding Porth (main inspiration for starting this project)
  • Classic stack based language Forth
  • Modern stack based functional language Factor

stacky's People

Contributors

robertbendun avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

stacky's Issues

`static` keyword and compile time code execution

The idea is that static would enable compile time code execution in Stacky. This will enable code generation, conditional code inclusion etc. Useful when starting to support other platforms - same library file will support both x86 and ARM.

  • static if ... end
  • static while ... do ... end
  • static ... end

16, 32 and 64 bit memory declarations

Currently supported: []byte

  • Rename []byte to []uint8?
  • Add []uint16, []uint32, []uint64
  • Add pointer sized arrays and declarations isize (in C ptrdiff_t) and usize (in C size_t)

Should follow convention in #28

Dead code elimination: loops, returns and conditions

For infinite loop and return statements all code to nearest jump after loop / return may be deleted

For always true if condition else branch may be removed, similar always false condition may remove then branch

Output binary file padding directive

Bootsector has to have specific size, so padding directive is needed e.g. <desired-size> define-padding.

512 define-padding

will be same as NASM:

times 512 - ($-$$) db 0

Basic type system

Program is compile time evaluated. Evaluation starts with empty type stack. All operations consume and/or produce values and place their types onto type stack. Illegal are:

  • ending program with types on type stack.
  • trying to pop value from empty type stack
  • violating rules for control flow
  • mismatching types from required by operations

Operations can have different meaning and result according to types that are on stack. Stack manipulation operations (like drop or swap are supported by all operations).

Types

Type system should support following types:

  • bool (result of comparison operations)
  • u64 (integer literals, syscall result)
  • ptr (string literals, memory declarations, address of function)

Casting syntax is as(bool), as(ptr) ...

Control flow

if blocks

Value consumed by if for determining which branch to execute known as condition should be bool.

  • with else data stack types must match on both ends. typeof(then_branch) == typeof(else_branch). Condition and it's type is consumed
  • without else data stack must match stack before if with condition value consumed.
    For program <code> <condition> if <then_branch> end must hold typeof(code) == typeof(then_branch)

while blocks

Stack after consuming condition in do should match this before while. Stack after loop body execution should match this before while.

functions

On each return (including function end) state of stack should be the same. if return is embeded into while or if, it should change behaviour of given block checking.

  • When in if branch with return do not have to produce the same stack as other branch or code before.
  • When in while stack don't have to match stack before while

Operations and their types

Operations on u64

Basically almost all. Excluded are:

  • or, and, call
  • store and load only accepts ptr as address
  • do and if requires bool as conditions

Operations on ptr

  • ptr - ptr -> u64 - (assumes first ptr is the bigger one, otherwise we are in trouble)
  • ptr - u64 -> ptr
  • ptr + u64 -> ptr and u64 + ptr -> ptr
  • equality with ptr and u64
  • casting to bool meaning ptr 0 !=
  • ptr load<n> -> u64
  • u64 ptr store<n> -> void
  • syscall<n> should support ptr for it's data arguments (function wrappers will provide stronger typing for syscalls in the future)
  • ptr call -> ? - such type system cannot express result of call

Related

Different size and "signedness" of types: #46
Function type specification: #47

Move standard library from C++ to Stacky

All definitions below should be implementable in Stacky

stacky/stdlib.cc

Lines 42 to 91 in 4a5e50a

static inline unsigned digits10(std::uint64_t v)
{
unsigned result = 1;
for (;;) {
if (v < 10u) return result;
if (v < 100u) return result + 1;
if (v < 1'000u) return result + 2;
if (v < 10'000u) return result + 3;
v /= 10'000u;
result += 4;
}
}
extern "C" {
void _stacky_print_u64(std::uint64_t v)
{
char buffer[std::numeric_limits<decltype(v)>::digits10 + 1]{};
unsigned const len = digits10(v);
unsigned pos = len - 1;
while (v >= 10) {
std::uint64_t const q = v / 10;
unsigned const r = v % 10;
buffer[pos--] = '0' + r;
v = q;
}
*buffer = v + '0';
buffer[len] = '\n';
posix::syscall(posix::Write, posix::Stdout, (uint64_t)buffer, (uint64_t)len + 1);
}
void _stacky_exit(int exit_code)
{
posix::syscall(posix::Exit, exit_code);
}
void _stacky_newline()
{
char nl = '\n';
posix::syscall(posix::Write, posix::Stdout, (size_t)&nl, 1);
}
void _stacky_print_cstr(char const *cstr)
{
auto p = cstr;
for (; *p != '\0'; ++p) {}
posix::syscall(posix::Write, posix::Stdout, (size_t)cstr, p - cstr);
}
}

Write and read operations other then 8bit

Add support for 16, 32 and 64 bit writes and reads.

stacky/stacky.cc

Lines 613 to 626 in 880ae23

case Word::Kind::Read8:
asm_file << " ;; read8\n";
asm_file << " pop rax\n";
asm_file << " xor rbx, rbx\n";
asm_file << " mov bl, [rax]\n";
asm_file << " push rbx\n";
break;
case Word::Kind::Write8:
asm_file << " ;; write8\n";
asm_file << " pop rbx\n";
asm_file << " pop rax\n";
asm_file << " mov [rax], bl\n";
break;

Strings should be in .rodata

Currently strings are placed in .data, but they need to be allocated in .rodata. This way we could have one string allocated for every same string literal without fear of corrupting by some write operations

Compilation modes

Add compilation modes listed below:

  • build - compile program for 64-bit Linux
  • build-os - compile program for 64-bit mode that is bootable
  • run - same as build but runs program after compilation

arguments parser: Boost Program_Options

Functions explicit typing

Introduces 2 new keywords

  • -- - marks split between input and output stack parameters
  • is - marks end of function type definition

Examples:

add2 fun u64 -- u64 is 2 + end

pswap32 fun ptr ptr -- is ... end

seconds2ymd fun u64 -- u64 u64 u64 is ... end

3dup fun 1 -- 1 1 1 is 2dup dup end

swap fun 1 2 -- 2 1 is ... end

When numbers are used inside of type definition they refer to generic type parameters known at compile time.

2021-11-05: Don't add type variables for now

More stack manipulation operations

  • dropn where n >= 0
  • drop - drop element from the stack
  • over - ( a b -- a b a )
  • rot - ( a b c -- b c a )
  • tuck - ( x1 x2 -- x2 x1 x2 ) - copy the first (top) stack item below the second stack item
  • 2drop
  • 2swap - ( a b c d -- c d a b )
  • 2over- ( a b c d -- a b c d a b)

Call word and function address

  • &<function_name> - push address of function_name onto stack
  • &fun <code> end - declare function and push it's address onto stack (anonymous functions)
  • call - jumps to function that is pointed by top of the stack

Empty string do not parse

"" produces error "Missing terminating " character". std::adjacent_find is not the best choice after all?

Time & date functionality

New words

  • now - pushes number of seconds since 00:00 1970-01-01
  • +minutes, +hours, +days, +months, +years
  • -minutes, -hours, -days, -month, -years
  • date - (seconds -- year month day hour minutes)

Implementation details

#include <unistd.h>

constexpr unsigned Clock_Gettime = 228;
constexpr unsigned
	Realtime_Clock = 0,
	Monotonic_Clock = 1;

#include <iostream>

int main()
{
	std::uint64_t buf[2];
	syscall(Clock_Gettime, Realtime_Clock, (uint64_t)&buf);
	std::cout << "sec: " << buf[0] << " nsec: "<< buf[1] << '\n';
}

Max, min

Using conditional move instruction.

Generated asm from C for min function:

cmp     rsi, rdi
mov     rax, rdi
cmovbe  rax, rsi
ret

`noreturn` keyword and type

Type related construct. noreturn keyword should behave similar to return in function given types. Useful for typing exit since it cannot return and code like 0 SYS_exit syscall1 drop is silly

Include statement

Includes file directly, like C #include.

  • "stdio.stacky" include-local for inclusion of files relative to source file
  • "stdio" include similar to include-local but includes also files from standard library and tries to append .stacky if file is not found

Better string parsing

Currently string is any sequence within " without any escape sequences parsing. This probably requires change to treatment of sval field in struct Word or offloading parsing to NASM.

NASM support for escape sequences: 3.4.2 Character Strings.

Current broken string support:

stacky/stacky.cc

Lines 206 to 212 in 880ae23

if (ch == '"') {
word.kind = Word::Kind::String;
auto str_end = std::find(std::cbegin(file) + i + 1, std::cend(file), '"');
if (str_end == std::cend(file))
error(word, "Missing terminating \" character");
word.sval = { std::cbegin(file) + i, str_end + 1 };

Sized string literal

$"<str>" produces stack: ( address(str) len(str) )

Hello world with this syntax:

$"hello, world" 1 1 syscall3

Character literals

  • Add character literals similar to C syntax '<character>'
  • Character literals can accept escape sequences (maybe same solution for this and #4)
  • Character literals can be multibyte, ab = little_endian('a', 'b') = 0x6261. This makes possible 'hello\n\0' top print. Each character is one byte. Literal is padded with 0 to be 1, 2, 4 or 8 bytes long

Stderr and exit code assertions, and general output compare in test runner

Boost.Process allows for easy stderr and stdout capturing

#include <iterator>
#include <boost/process.hpp>

namespace bp = boost::process;

int main()
{
	bp::ipstream standard_out, standard_err;

	bp::system("<command>", bp::std_out > standard_out, bp::std_err > standard_err);

	std::string out { std::istreambuf_iterator<char>(standard_out), {} };
	std::string err { std::istreambuf_iterator<char>(standard_err), {} };
}

Sized and (un)signed integer types

Removes ptr

Introduces

  • New types
    • Unsigned: u8, u16, u32, u64, usize
    • Signed: i8, i16, i32, i64, isize
    • Pointers: written with * before type, for example: *u32
  • Typed integer literals (like 1u64, 0xdead_i32)

usize and isize are type aliases for type that matches their bit length. For example isize == i64 <=> sizeof(isize == i64)

Memory declarations produce pointers with correct type. For Foo 1 []u32 holds typeof(Foo) == u32

Conversion from smaller type to bigger type is implicit given their both (un)signed. (Not sure about this, time will tell)

Pointers

Addition, subtraction, load, store have different meaning according to what pointer is pointing to.
For example: *i32 + u64 is val(*i32) + 4 * val(u64) (same as C)

User defined static arrays

Currently we have predefined heap array. Better would be syntax for compile time array word definition. For example:

1024 buffer define-bytes
buffer 1 poke
buffer 1 + 255 poke
buffer peek buffer 1 + peek + .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.