ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinA Small Universe Part II

Overload Journal #149 - February 2019 + Design of applications and programs   Author: Deák Ferenc
Writing a programming language is a hot topic. Deák Ferenc shows us how he wrote a compiler for bytecode callable from C++.

[Web Editor's Note: Article is continued from here.]

The output of the compiler

The compiled application has a very simple format. The bytecode starts with .P10 representing the version of the script, 4 bytes follow as reserved, 4 more bytes representing the number of global variables and 4 more bytes to indicate the start of the string table of the application (or, depending on how we view it, the size of the compiled bytecode).

The constant string table, which is populated by the compiler upon encountering a string in the source, is to be found in the generated code after the application bytes. The string representation in this part always starts with an 8-bit value, representing the length of the string followed by the ASCII characters themselves.

This output is then read and interpreted by the virtual machine, which sets up its structures and starts executing the application code from the first application bytecode.

Adding a new keyword to the language

In its current incarnation, the script is quite … primitive, and lacks support for most of the keywords and provided functionality we are used to from other scripting languages. But extending the script is actually quite easy. When introducing a new keyword, we will need to provide the parsing implementation and the compilation ourselves. Let’s consider implementing the while keyword as an exercise.

We can get some inspiration from the if keyword, since the difference between while and if is just a jump back to testing again the trueness of an expression, so let’s start and declare the header file for while in the keywords directory of the compiler, with the content in Listing 5 (I have omitted things like include guard and base class includes).

namespace primal
  class kw_while : public sequence, public keyword
    static constexpr const char* N= "WHILE";
    explicit kw_while(source>amp; src) : sequence(src)
      prepare(std::vector<token>>amp; tokens) override;
    bool compile(compiler* c) override;
    std::string name() override { return N; }
Listing 5

And some explanation:

  • A keyword must inherit from sequence in order to have access to the prepare and compile methods that actually perform the parsing of the keyword and the compilation of the keyword.
  • A keyword must inherit from the keyword class in order to be included in the keyword store of the application.
  • The kw_while::N must have the name of the keyword as present in the scripting language’s syntax in uppercase. At some point it is used to compare in a cases insensitive comparison, which means the scripting language itself is case insensitive. Just like Pascal.
  • Since this keyword will have a body, we need a list of sequences that will represent the body of it, hence the m_while_body.

Now the implementation of the while keyword can commence with the parsing of the new syntax (see Listing 6).

sequence::prepared_type kw_while::prepare(std::vector<token> >amp;tokens)
    return sequence::prepared_type::PT_INVALID;
  parser p;
  std::string last;
  auto seqs = p.parse(m_src,[>amp;](std::string s)
    return util::to_upper(s) == kw_end::N;
  m_while_body = std::get<0>(seqs);
  return sequence::prepared_type::PT_NORMAL;
Listing 6

The prepare of the while keyword is very similar to the prepare of the if keyword, with the difference being that since we have no then in the while, that part is missing. Parsing out the body of the while is identical to the if, ie. read and parse lines till we encounter the END keyword. At this level, you don’t have to worry about the expression of the while (like in: while expression) as parsing it is taken care of by the layer calling it; however, if you wish to inspect it, feel free to use and consume the tokens parameter. If you wish to notify the parsing layer not to try to parse the expressions after your keyword has done its work on it, simply return sequence::prepared_type::PT_CONSUMED instead of PT_NORMAL as we do here.

And the compilation of the while keyword is shown in Listing 7.

bool kw_while::compile(compiler* c)
  label lbl_while = label::create(c->get_source());
  (*c->generator()) << declare_label(lbl_while);
  label lbl_after_while = 
  label lbl_while_body = 
  comp* comparator =
    throw syntax_error("Invalid WHILE statement
      condition. Nothing to compare");
  (*c->generator()) << comparator->jump 
    << lbl_while_body;
  (*c->generator()) << DJMP() << lbl_after_while;
    << declare_label(lbl_while_body);
  for(const auto>amp; seq : m_while_body)
  (*c->generator()) << DJMP() << lbl_while;
    << declare_label(lbl_after_while);
  return false;
Listing 7

The compilation begins with declaring a label to the location where the evaluation of the while’s expression will be performed. Then we do the actual compilation of the expression via sequence::compile. The rest till (*c->generator()) << DJMP() << lbl_while; is actually the looping in the while loop itself, and is identical to the if keyword, as presented above.

Now, only one step remains: writing a unit test for it. The project uses Catch2 as its unit test framework, so into the tests.cpp found in the tests directory add the contents of Listing 8.

TEST_CASE("Compiler compiles, while test", "[compiler]")
  auto c = primal::compiler::create();
             var a,b
             let a = 5
             let b = 0
             while a > 0
             let a = a - 1
             let b = b + 1
  auto vm = primal::vm::create();
  REQUIRE(vm->get_mem(0) == 0);
  REQUIRE(vm->get_mem(word_size) == 5);
Listing 8

The virtual machine

The general design and architecture of the virtual machine is loosely based on the workings of a Turing machine: it relentlessly behaves in a similar manner to a computer adhering to the von Neumann architecture, it feels more like a real mode x86 machine thus suffering from all its deficiencies and efficiencies, it has a dedicated assembly level programming language and it contains the following important components:

  • A set of 256 registers, integers, 32 bit by default represented in the source code by the type word_t found in numeric_decl.h.
    • Reg 255 is reserved for the stack pointer
    • Reg 254 is frequently used by the compiler
    • Reg 253 holds the value of the LOF flag (see below)
    • Reg 252 is initialized to the start of the stack segment
    • Reg 251 is initialized to the entry point of the application upon the start of the application
    • Reg 250 is reserved for future endeavours

    All other registers are freely available for you to play with.

  • A flag (called LOF – Last Operation Flag) which is set to a non-zero value by the last operation if the value after the operation evaluated to non-zero and is cleared to zero by the first conditional jump command.
  • A code segment which is the application’s compiled byte-code. The size of this is flexible and depends on the application compiled. This is a read-only section and its content populated upon start-up by the virtual machine ‘loader’. For the sake of completeness, we have to mention that the virtual machine places the code segment in memory after the memory segments 1MB limit, continuously.

    The machine has an Instruction Pointer to track the next instruction that needs to be executed. Initially, this is placed on the very first byte-code, and on stepping through the code, this can increase or decrease. When the virtual machine encounters errors in byte-code being executed, it issues a PANIC call and exits.

  • A free-memory segment, which is used by the virtual machine and the code of the programmer. Usually it’s 1MB in size; however, it can be modified by specifying the size in the hal.h header. At this moment, only assembly level access (r/w) is granted to this area via the MOV and COPY instructions, since the compiled script makes no use of it. The virtual machine will allocate this memory segment on the heap of the real machine.
    • The memory can be accessed by loading the desired address value into a register and then performing the required memory access operation: either a full 32 bit value will be loaded from the memory into the required register either a byte value will be moved in the required register target.
    • A memory area (usually beginning after the end of the global variables) is designated to be functioning as a stack which can be controlled via the PUSH/POP/MOV commands. The assembly instructions CALL and RET are also utilizing the stack, CALL pushes the address where the code should continue when a RET was issued. POP or RET in case of the stack pointer being 0 will result in PANIC.

      There is a Stack Pointer (same variable type as the registers, referenced in register with index 255) that is used to access the stack. Upon a push, the stack pointer increases its value, and upon pop, it decreases it. The assembler syntax for the stack pointer is: $sp

    • An important compatibility note: due to the way the direct jump addresses are calculated, both the compiler and vm need to be compiled using the same memory size.
  • A set of 256 interrupts, but only one of them is used at the moment to handle writing to the screen, the other ones are available for adventurous programmers to experiment with.

The opcodes of the virtual machine

In order to perform the lowest possible level of operations, the primitive virtual machine has a set of opcodes that are acted upon by the VM. Following is a brief presentation of the opcodes which for the technically versed should be immediately familiar due to the heavy influence of the Intel family of opcodes.

Target specifiers

Please note that in the binary stream (some of) the values (the parameters to the assembly commands) are prepended by a type specifier which indicates how to interpret the bytes following the specifier. There is a specific assembly syntax for the various specifiers.

The specifier is always 1 byte, and it can be:

  • 0x00 – the following 4 bytes are to be interpreted as an immediate number, negative or positive. Assembler syntax: 12345 (ie: just a normal number)
  • 0x01 – the following 1 byte is the index of a given register. Assembler syntax: $r123 for register 123.
  • 0x02 – this represents the value of the memory at the register’s value in the following byte. The syntax: [$r123] will give the byte value stored at the value of register 123 in the memory.
  • 0x4X – the following 1 byte is the index of a register, and the nibble marked with X is the index of the byte in the given register. Assembler syntax: $r12@1 meaning byte 1 from register 12. For 32 bit virtual machines @0, @1, @2, @3 are valid byte indexes.
  • 0x05 – the coming 4 bytes represent a number, and its value represents an address in the memory. Syntax: [1234]. This is mostly used by the compiler to index in variables from the memory.
  • 0x06 – the coming 4 bytes represent a number, and its value represents the address of 1 byte in the memory. Syntax: [@1234]
  • 0x07 – this represents the value of the memory at the register’s value in the following byte + the added offset following that. The syntax: [$r123+4] will give the byte value stored at the value of register 123 + 4 in the memory and [$r123-4] will give the byte value stored at the value of register 123 - 4 in the memory. The four basic operations (addition, multiplication, division, subtraction) are possible in this context.
  • 0x08 represents the value of the memory at the register’s value in the following byte + the following register’s value. The syntax: [$r123+$r245] will give the byte value stored at the value of register 123 + value of register 245. The four basic operations (addition, multiplication, division, subtraction) are possible in this context.


The MOV opcode moves data from the given source to the given destination.

The following is the syntax of the command:


where the syntax of <DST> and <SRC> correspond to specific syntax in the target specifier list. MOV makes sure that only sane operations are permitted, and the machine will stop execution with a PANIC command if an invalid operation is attempted for execution (for example trying to load a value into an immediate number). One of <DST> or <SRC> must be a register. For example, see Listing 9.

MOV $r1 12       # initialize register 1 to 12
MOV [0] $r1      # load into memory, at location 0 the value of register 1
MOV $r6@0 12     # load 12 into register 6’s lowest byte
MOV $r1@1 [@$r2] # load into byte 1 of reg 1 the value in memory at register 2’s value
MOV $r3@0 [@0]   # load into byte 0 of reg 3 the byte value in memory at addr 0
Listing 9


The assembly opcodes with these very descriptive names perform the requested operation on two parameters following them. The syntax of the parameters is identical to the parameters of MOV, so you can just refer to that section when writing your assembly code should you require it.

NOT requires only one parameter and it will do a binary negation of it, which cannot be an immediate value. If any of the operations results in a zero value, the LOF flag will be set to false, otherwise to true.


COPY is used to copy data between two addresses in the memory segment. COPY can be imagined as multiple MOV commands where both the destination and source are at increasing locations.

The syntax of the command is:


where DEST is interpreted as a numerical value representing the address of the area in the memory, SRC represents the source address and COUNT is the number of bytes to copy. If DEST + COUNT > MEM_SIZE, the virtual machine will issue a PANIC call and not perform the operation. At the implementation level, std::memmove is used so it is possible to have overlapping memory areas for this command. This operation does not modify the content of the LOF flag.


The comparison operators perform two operations. The first one is to compare the value of the two parameters they get, and the second one is to set the value of the LOF flag to be either true or false.


The direct jump commands will set the IP of the virtual machine to the specified parameter and will continue the execution of the code from the new location.

  • The JMP command will jump to the given address regardless of any surrounding conditions.
  • The JT will jump only if the VM’s LOF flag was set to true by the previous comparison operation, and it will clear the flag simultaneously.
  • The JNT command will jump to the given location only if the flag was set to false, and it also will clear the flag to false.

The syntax of the commands is:


where ADDRESS will be interpreted as the address where the execution will be continued. The programmer needs to take into consideration that ADDRESS is actually a valid destination, where valid, executable code is to be found.


The delta jump commands, such as the ‘direct jump’, ‘jump if last comparison was true’ and ‘jump if last comparison was not true’ commands, jump from the current location in various directions and the new address determined as being at a specific bytes away from the current location. The syntax is:


where DELTA is to be interpreted as the difference between the current IP (instruction pointer) and the upcoming new location. At this point of execution (when DST was evaluated by the jump command) the IP points to the next executable operation, this needs to be taken into consideration when manually calculating jump addresses.

As a side note, the jump commands cannot jump to labels since they all expect a number. In order to jump to a specific label, you will have to use goto.


As guessed from the name, PUSH and POP work with the stack. PUSH always pushes an immediate value onto it, with the type modifier 0 for numeric data or 7 for string indexes, and POP pops the value into the parameter it was required to pop into. The syntax is: PUSH <SOMETHING> and POP <SOMETHING>.

Internals of the VM

Like every other code interpreter out there, our VM is performing very similar operations:

  1. Load the next instruction
  2. The code for the instruction loads its parameters
  3. The code for the instruction executes the instruction
  4. Check for failure
  5. Repeat from step 1.

A big part of the virtual machine’s code (specifically automatic handling of the registered opcodes) is generated by the build system (see below at the extending the VM) and the remaining of the operations are distributed between the data types representing the VM’s data structures and the actual operations the VM can perform.

The virtual machine’s basic data structures are declared in the header registers.h. This header has a basic data type primal::valued. The class acts as a common aggregate for the entities that can be handled by the virtual machine (see Figure 7).

Figure 7

Upon each execution step, the instruction (opcode) needs to fetch its parameters from the memory of the virtual machine (the opcode can have zero, 1, 2 or 3 parameters for now) and perform the most basic of operations on the data types. The overloaded operators in the valued class more act as commodities, the most important operations are the word_t value() and the void set_value(word_t). This class hierarchy makes it very easy to implement basic operations, for example this is the implementation of the ADD opcode (found in impl_ADD.h):

  bool primal::impl_ADD(primal::vm* v)
    primal::valued* dest = v->fetch();
    primal::valued* src  = v->fetch();
    *dest += *src;
    v->set_flag(dest->value() != 0);
    return true;

Extending the virtual machine

The project was created with extendibility in mind, and nothing is easier than adding a new binary opcode and the corresponding implementation, or providing your own interrupt to deal with tasks.

Adding a new opcode

As you might have observed, we have not introduced the INC operation when designing the virtual machine (along with other missing operations, which hopefully will be implemented at a later stage), but no worries – let’s read through the following paragraph in order to see how you can implement your own opcodes to extend the virtual machine.

We will need to do modifications to the build system and also add a new file to it in order to have the new opcode up and running. You have to open the CMakeLists.txt from the opcodes directory around line 35 to find the registration of the opcodes section.

Go to the end of the section, and after the last register_opcode call insert yours:

  register_opcode("INC" 0xEE 1 OF_ARITH)

With this you have told the build system the following facts:

  • It will have an opcode called INC
  • When in the binary stream, it will be represented by the value 0xEE
  • It expects one parameter, ie. the one which will be incremented.
  • It is of type arithmetic

What happens behind the scenes is that CMake generates two files for you:

  • One is the opcodes header file INC.h in the build/opcodes directory (see Listing 10), which has the declaration of the class INC and also two functions.
  • The other file is compile_INC.cpp, generated in the same directory, which will be used by the compiler when encountering the INC opcode in the assembly code of your source file.
struct INC final : public opcode
  uint8_t bin() const override {return 0xEE; }
  std::string name() const override {return "INC";}
  size_t paramcount() const override { return 1; }
  virtual opcode_family family() const override 
  { return
    primal::opcodes::opcode_family::OF_ARITH; };

bool impl_INC(vm*);
Listing 10

The two functions in the header file are:

  • compile_INC, which is responsible for compiling the assembly statements into corresponding (virtual) machine code and it will be provided by the build system in the compile_INC.cpp file
  • impl_INC, which actually tells the virtual machine what to do upon encountering the INC in the binary stream, needs to be provided by us in the opcodes/impl/impl_INC.cpp file, because this is where the build system will look for it.

    So create that file and paste in the following:

          #include <INC.h>
          #include <vm.h>
          bool impl_INC(vm* v)
            valued* dest = v->fetch();
            dest->set_value(dest->value() + 1);
            return true;

After you regenerate your CMake cache, and once the build of the system is ready, everything should be up and running, and you can use INC in your assembly commands. So maybe it’s time to write a unit test for this purpose too (Listing 11).

TEST_CASE("ASM compiler - INC opcode", "[asm-compiler]")
  auto c = compiler::initalize();
              asm MOV $r1 10
              asm INC $r1
  auto vm = vm::create();
  REQUIRE(vm->r(1).value() == 11);
Listing 11

Now if you run make test, your newly created opcode should be up, running and incrementing values you want to.

How the opcodes are registered into the VM

When you have declared a new opcode, with the register_opcode function in the build system in the background a CPP file (opcode-impl.cpp) is generated with the contents similar to Listing 12 for each of the opcodes.

namespace primal {
  void register_opcodes() {
        [>amp;](primal::vm* v) -> bool 
        { return primal::impl_MOV(v); }
        [>amp;](primal::vm* v) -> bool
        { return primal::impl_ADD(v); }
Listing 12

In the VM’s implementation class (vm_impl.h) this boils down Listing 13.

struct executor
  std::function<bool(vm*)> runner;
template<class OPC, class EXECUTOR>
  static void register_opcode(OPC>amp;>amp; o, 
  EXECUTOR>amp;>amp; ex)
  auto f = [>amp;](vm* machina) -> bool {
    return ex(machina);};
  executor t;
  t.runner = std::function<bool(vm*)>(f);
  opcode_runners[o.bin()] = t;
static std::map<uint8_t, executor> opcode_runners;
Listing 13

Now in its turn, the VM’s opcode runner has the code in Listing 14.

  // read in an opcode
  uint8_t opc = ms[static_cast<size_t>(m_ip++)];
    // is there a registered opcode runner for 
    // the given opcode?
    if(!opcode_runners[ opc ].runner(v))
  if(ms[static_cast<size_t>(m_ip)] == 0xFF)
    return true;
Listing 14

And in short, this sums up how the VM is handling the execution of the various opcodes.

Writing a new interrupt

The interrupts are pieces of code that are compiled C++ code, and they can be used by the VM to communicate with the world outside the sandbox.

For the moment, only one interrupt is in use, which is INT 1 used to write to the screen. In order to implement a new interrupt, you will have to modify the CMakeLists.txtof the virtual machine in the section where it states ‘Register the interrupts’ add the interrupt with a new call to enable_interrupt, for example: enable_interrupt(2).

Now, create the implementation file for the interrupt in vm/intr and call it intr_2.cp As a start, the content of it can be:

  #include <vm.h>
  namespace primal
    bool intr_2(vm* v)
      return true;

And from this point on, you can provide the implementation for your interrupt. You can use the interface exposed by vm.h in order to get access to the VM internals and memory.

Binding it together

As I mentioned in the introduction, our ultimate goal in this article is to have it all in one place, ie: in one CPP file we compile a piece of primal script code and let it run in the primal VM.

Using the compiler to compile and the vm in your own source files is pretty easy, you just need a few instructions.

To properly use it in your own code, you will need a sequence of instructions like:

  std::shared_ptr<primal::compiler> c 
    = primal::compiler::create();

This will create for you the object c which is a primal script compiler that you can use to compile code in the following manner:

      let x = 40

The next step you need to do is to create a virtual machine:

  std::shared_ptr<primal::vm> vm 
    = primal::vm::create();

It is possible to use auto for the return type: I just wanted to show the actual return type. Now that you have the compiled code in the compiler and the virtual machine up and running, nothing is easier than to run it:


After this point, you have full access to the memory and registers of the virtual machine with the proper function calls found in class primal::vm located in vm.h:

  • word_t get_mem(word_t address) – to get the 32/64 bit numeric value from the memory of the virtual machine from the specified address.
  • uint8_t get_mem_byte(word_t address) – to get the 8 bit value from the memory of the virtual machine from the specified address.
  • const reg& r(uint8_t i) const – to get the object representing the ith. register of the VM.


Working on this project was really fun, and I have learned a lot during the implementation. However, progress does not stop here. There are a lot of features in the compiler to be implemented, new keywords, proper register handling, and – why not – even some optimizations that can be introduced. The VM needs good profiling in order to pinpoint the bits that can be optimized, and a lot of other features can be included too in the entire ecosystem. While designing this whole universe, I made it as modular as possible with the thought that some day, someone might have time to work on it, and include the features he/she considers nice to have. So, if you feel that you could contribute in any way, feel free to get in touch and start coding on it in the spirit of open source.


[CompOpt] Compiler optimizations:

[Dijkstra61] EdsgerW.Dijkstra (1961) ALGOL Bulletin Supplement no. 10:

[Dragon] Alfred V. Aho, Monica S. Lam, Ravi Sethi and Jeffrey D. Ullman (2006) Compilers: Principles, Techniques, and Tools

[GCCArchitecture] GNU C Compiler Architecture:

[Guntli11] Christopher Guntli (2011) ‘Architecture of clang’,

[Knuth65] Donald E. Knuth (July 1965). ‘On the translation of languages from left to right’. Information and Control 8(6): 607 - 639. doi: 10.1016/S0019 - 9958(65)90426-2

[LLVM] The LLVM Compiler Infrastructure:

[Wikipedia-1] Compiler optimizations:

[Wikipedia-2] Lexical analysis:

[Wikipedia-3] Multi-pass compiler:

[Wikipedia-4] Parser generators:

[Wikipedia-5] Regular expression:

Appendix – The write function

As promised, Listing 15 is the write function you can use to display data. Save it under the name write.prim and place it in the directory where your compiler and source file is, for the moment. Don’t forget to include it in your project.

fun write(...)
  asm MOV $r249 $r255
  # Decrease the stack pointer to skip the pushed
  # R254 and the return address.
  # This is for 32 bit builds. For 64 bit builds,
  # use 16
  asm SUB $r255 8

  # First: the number of parameters that came in
  asm POP $r10

  # fetch the value that needs to be printed
  asm POP $r2

  # This $r1 will contain the type of the variable:
  # 1 for string, 0 for number
  asm POP $r1

  # Is this a numeric value we want to print?
  asm EQ $r1 0

  # If yes, goto the print number location
  asm JT print_number

  # else goto the print string location
  asm JMP print_string

  # print it out
  asm INTR 1

  # Move to the next variable
  asm SUB $r10 1

  # JT is logically equivalent to JNZ
  asm JT next_var

  # Done here, just return
  asm MOV $r255 $r249
  asm JMP leave

  # Here $r2 contains the address of the string,
  #first character is the length

  # Initialize $r1 with the length
  asm MOV $r1 0
  asm MOV $r1@0 [$r2+$r251]

  # Get the address of the actual character data
  asm ADD $r2 1

  # Print it
  asm INTR 1

  # Move to the next variable
  asm SUB $r10 1

  # JT is logically equivalent to JNZ
  asm JT next_var

  # Done here, just return
  asm MOV $r255 $r249
Listing 15

Deák Ferenc Ferenc has wanted to be a better programmer for the last 15 years. Right now he tries to accomplish this goal by working at Maritime Robotics as a system programmer, and in his free time, by exploring the hidden corners of the C++ language in search of new quests.

Overload Journal #149 - February 2019 + Design of applications and programs