S-Assembler v2.0 Release

The Arcadian State from The Course of Empire by Thomas Cole

In these colleges, the professors contrive new rules and methods [...] whereby, as they undertake, one man shall do the work of ten [...]. The only inconvenience is, that none of these projects are brought to perfection; and, in the mean time, the whole country lies miserably waste;

Yes, hello, welcome. Today I announce S-Assembler (sasm) version 2.0 codename “The Arcadian State”. Sasm is a self-hosting RiscV assembler. The bootstrap assembler was written in Rust. You can find the code on Github. This release resulted in almost a complete rewrite of the assembler.


I give an example:

1 (import (syscalls *))
2 (defcon message "Hello, world\n")
3 ;; write(stdout, message, message.len)
4 (addi x10 x0 STDOUT)
5 (la x11 message)
6 (addi x12 x0 (len message))
7 (addi x17 x0 SYS_WRITE)
8 (ecall)
9 ;; exit(0)
10 (addi x10 x0 0)
11 (addi x17 x0 SYS_EXIT)
12 (ecall)

The major features of this release are the addition of a module system and the ability to import modules. This was modeled after Rust's system although it's different in practice. Rather than attempt to explain it, simply look at the source code to see it in action. include! statements remain supported.

Although it isn't an additional feature, the other major piece of work for this release was writing a fault-tolerant parser. I have been very pleased with the results here, and I think that it produces better errors than some C compilers do; the difference with v0.1 is as night and day.

Some minor features are:

  • Support for block comments #| ... |# with nesting.
  • Improved integer parsing: you can now specify integers in binary, octal, and hex and you also may include underscores in integers. I followed Rust's handling of this, so octal integers use the 0o prefix, meaning that leading zeroes in decimal integers are okay.
  • Define statements have been split into define for constants, defvar for global variables, and defcon for global constants. With this we now support the ELF .rodata section which is where defcon definitions are placed. No more "\0\0\0\0\0\0\0\0" for global ints!
  • I also wrote a debugging emulator, sasm-emu. I've found that this provides a much better debugging experience than qemu-user + gdb-multiarch via GDB Remote. It allows for setting breakpoints at a particular line but for more complex breakpoints I simply modify the emulator source. This way the emulator stays small and I can write arbitrarily complex breakpoints.

For version 3.0 I will be adding a macro system in the form of a Scheme interpreter.


F.A.Q.

  • Q.
    Why would you do that?
    A.
    Fun things are fun

  • Q. How do I use this?

    A. Download the repository and make sure that you have Rust installed on your system, build the bootstrap version in bootstrap by running cargo b -r, then run ../bootstrap/sasm/target/release/sasm sasm.sasm sasm in the sasm directory to build the self-hosting version. This should run under Linux on RiscV machines, although I don't have one to test on. On AMD64 Linux install qemu-user and you can now run the RiscV binaries as if they were native executables.


  • Q. How do the versions compare?

    A. They are functionally equivalent. Both versions are self-contained. There are no dependencies beyond the Rust standard library for the bootstrap assembler. On AMD64 the Rust version will be much faster, I'm not sure how they would line up on RiscV.


  • Q. How big is this?

    A. The bootstrap version is 2421 SLOC and produces a 471 KB executable (this doesn't include libc). The self-hosting version is 7062 SLOC and produces a 29 KB executable. The assembly version has more than tripled in both SLOC and size from version 0.1. Actual data structures and especially error handling requires a lot of code! Despite all of the extra features the bootstrap didn't grow much. This is because version 0.1 was complete crap and I did a complete rewrite for version 2.0.


  • Q. How is the performance?

    A. In order to build itself, the bootstrap requires 34M instructions and 4.1ms on my machine. The self-hosting version requires 26M instructions and 19.1ms on my machine in qemu-user, requiring a max heap size of 3.5 MB. I was pretty surprised how many more instructions the bootstrap required, especially because an instruction can do much more on AMD64 than on RiscV.