    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: A Small Universe Part II</title>
        <link>https://members.accu.org/index.php/articles/2637</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Design of applications and programs + Overload Journal #149 - February 2019</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c67/">Design</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c395/">o149</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c67-395/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c67+395/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;A Small Universe Part II</h1>
<p><strong>Author:</strong>&nbsp;Bob Schmidt</p>
<p>
<strong>Date:</strong> 05 February 2019 16:36:55 +00:00 or Tue, 05 February 2019 16:36:55 +00:00</p>
<p><strong>Summary:</strong>&nbsp;Writing a programming language is a hot topic. DeÃ¡k Ferenc shows us how he wrote a compiler for bytecode callable from C++.</p>
<p><strong>Body:</strong>&nbsp;<p>[Web Editor's Note:  Article is continued from <a href="/index.php/journals/2622" >here</a>.]</p>

<h2>The output of the compiler</h2>

<p>The compiled application has a very simple format. The bytecode starts with <code>.P10</code> representing the version of the script, 4 bytes follow as reserved, 4 more bytes representing the number of global variables and 4 more bytes to indicate the start of the string table of the application (or, depending on how we view it, the size of the compiled bytecode).</p>

<p>The constant string table, which is populated by the compiler upon encountering a string in the source, is to be found in the generated code after the application bytes. The string representation in this part always starts with an 8-bit value, representing the length of the string followed by the ASCII characters themselves.</p>

<p>This output is then read and interpreted by the virtual machine, which sets up its structures and starts executing the application code from the first application bytecode.</p>

<h2>Adding a new keyword to the language</h2>

<p>In its current incarnation, the script is quite â€¦ primitive, and lacks support for most of the keywords and provided functionality we are used to from other scripting languages. But extending the script is actually quite easy. When introducing a new keyword, we will need to provide the parsing implementation and the compilation ourselves. Letâ€™s consider implementing the <code>while</code> keyword as an exercise.</p>

<p>We can get some inspiration from the <code>if</code> keyword, since the difference between <code>while</code> and <code>if</code> is just a jump back to testing again the trueness of an expression, so letâ€™s start and declare the header file for <code>while</code> in the <span class="filename">keywords</span> directory of the compiler, with the content in Listing 5 (I have omitted things like include guard and base class includes).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
namespace primal
{
  class kw_while : public sequence, public keyword
  {
  public:
    static constexpr const char* N= &quot;WHILE&quot;;
    explicit kw_while(source&gt;amp; src) : sequence(src)
    {}
    sequence::prepared_type
      prepare(std::vector&lt;token&gt;&gt;amp; tokens) override;
    bool compile(compiler* c) override;
    std::string name() override { return N; }
  private:
    std::vector&lt;std::shared_ptr&lt;sequence&gt;&gt;
      m_while_body;
  };
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 5</td>
	</tr>
</table>

<p>And some explanation:</p>

<ul>
	<li>A keyword must inherit from sequence in order to have access to the <code>prepare</code> and <code>compile</code> methods that actually perform the parsing of the keyword and the compilation of the keyword.</li>
	
	<li>A keyword must inherit from the <code>keyword</code> class in order to be included in the keyword store of the application.</li>
	
	<li>The <code>kw_while::N</code> must have the name of the keyword as present in the scripting languageâ€™s syntax in uppercase. At some point it is used to compare in a cases insensitive comparison, which means the scripting language itself is case insensitive. Just like Pascal.</li>
	
	<li>Since this keyword will have a body, we need a list of sequences that will represent the body of it, hence the <code>m_while_body</code>.</li>
</ul>

<p>Now the implementation of the <code>while</code> keyword can commence with the parsing of the new syntax (see Listing 6).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
sequence::prepared_type kw_while::prepare(std::vector&lt;token&gt; &gt;amp;tokens)
{
  if(tokens.empty())
  {
    return sequence::prepared_type::PT_INVALID;
  }
  parser p;
  std::string last;
  auto seqs = p.parse(m_src,[&gt;amp;](std::string s)
  {
    return util::to_upper(s) == kw_end::N;
  },
  last);
  m_while_body = std::get&lt;0&gt;(seqs);
  return sequence::prepared_type::PT_NORMAL;
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 6</td>
	</tr>
</table>

<p>The <code>prepare</code> of the <code>while</code> keyword is very similar to the <code>prepare</code> of the <code>if</code> keyword, with the difference being that since we have no <code>then</code> in the <code>while</code>, that part is missing. Parsing out the body of the <code>while</code> is identical to the <code>if</code>, ie. read and parse lines till we encounter the <code>END</code> keyword. At this level, you donâ€™t have to worry about the expression of the <code>while</code> (like in: <code>while expression</code>) as parsing it is taken care of by the layer calling it; however, if you wish to inspect it, feel free to use and consume the <code>tokens</code> parameter. If you wish to notify the parsing layer not to try to parse the expressions after your keyword has done its work on it, simply return <code>sequence::prepared_type::PT_CONSUMED</code> instead of <code>PT_NORMAL</code> as we do here.</p>

<p>And the compilation of the <code>while</code> keyword is shown in Listing 7.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
bool kw_while::compile(compiler* c)
{
  label lbl_while = label::create(c-&gt;get_source());
  (*c-&gt;generator()) &lt;&lt; declare_label(lbl_while);
  sequence::compile(c);
  label lbl_after_while = 
    label::create(c-&gt;get_source());
  label lbl_while_body = 
    label::create(c-&gt;get_source());
  comp* comparator =
    dynamic_cast&lt;comp*&gt;
    (operators[m_root-&gt;data.data()].get());
  if(!comparator)
  {
    throw syntax_error(&quot;Invalid WHILE statement
      condition. Nothing to compare&quot;);
  }
  (*c-&gt;generator()) &lt;&lt; comparator-&gt;jump 
    &lt;&lt; lbl_while_body;
  (*c-&gt;generator()) &lt;&lt; DJMP() &lt;&lt; lbl_after_while;
  (*c-&gt;generator()) 
    &lt;&lt; declare_label(lbl_while_body);
  for(const auto&gt;amp; seq : m_while_body)
  {
    seq-&gt;compile(c);
  }
  (*c-&gt;generator()) &lt;&lt; DJMP() &lt;&lt; lbl_while;
  (*c-&gt;generator()) 
    &lt;&lt; declare_label(lbl_after_while);
  return false;
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 7</td>
	</tr>
</table>

<p>The compilation begins with declaring a label to the location where the evaluation of the <code>while</code>â€™s expression will be performed. Then we do the actual compilation of the expression via <code>sequence::compile</code>. The rest till <code>(*c-&gt;generator()) &lt;&lt; DJMP() &lt;&lt; lbl_while;</code> is actually the looping in the <code>while</code> loop itself, and is identical to the <code>if</code> keyword, as presented above.</p>

<p>Now, only one step remains: writing a unit test for it. The project uses Catch2 as its unit test framework, so into the <span class="filename">tests.cpp</span> found in the <span class="filename">tests</span> directory add the contents of Listing 8.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
TEST_CASE(&quot;Compiler compiles, while test&quot;, &quot;[compiler]&quot;)
{
  auto c = primal::compiler::create();
  c-&gt;compile(R&quot;code(
             var a,b
             let a = 5
             let b = 0
             while a &gt; 0
             let a = a - 1
             let b = b + 1
             end
           )code&quot;
         );
  auto vm = primal::vm::create();
  REQUIRE(vm-&gt;run(c-&gt;bytecode()));
  REQUIRE(vm-&gt;get_mem(0) == 0);
  REQUIRE(vm-&gt;get_mem(word_size) == 5);
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 8</td>
	</tr>
</table>

<h2>The virtual machine</h2>

<p>The general design and architecture of the virtual machine is loosely based on the workings of a Turing machine: it relentlessly behaves in a similar manner to a computer adhering to the von Neumann architecture, it feels more like a real mode x86 machine thus suffering from all its deficiencies and efficiencies, it has a dedicated assembly level programming language and it contains the following important components:</p>

<ul>
	<li>A set of 256 registers, integers, 32 bit by default represented in the source code by the type <code>word_t</code> found in <span class="filename">numeric_decl.h</span>.
		<ul>
			<li>Reg 255 is reserved for the stack pointer</li>
			<li>Reg 254 is frequently used by the compiler</li>
			<li>Reg 253 holds the value of the LOF flag (see below)</li>
			<li>Reg 252 is initialized to the start of the stack segment</li>
			<li>Reg 251 is initialized to the entry point of the application upon the start of the application</li>
			<li>Reg 250 is reserved for future endeavours</li>
		</ul>
		<p>All other registers are freely available for you to play with.</p>
	</li>
	
	<li>A flag (called LOF â€“ <strong>L</strong>ast <strong>O</strong>peration <strong>F</strong>lag) which is set to a non-zero value by the last operation if the value after the operation evaluated to non-zero and is cleared to zero by the first conditional jump command.</li>

	<li>A code segment which is the applicationâ€™s compiled byte-code. The size of this is flexible and depends on the application compiled. This is a read-only section and its content populated upon start-up by the virtual machine â€˜loaderâ€™. For the sake of completeness, we have to mention that the virtual machine places the code segment in memory after the memory segments 1MB limit, continuously.
		<p>The machine has an Instruction Pointer to track the next instruction that needs to be executed. Initially, this is placed on the very first byte-code, and on stepping through the code, this can increase or decrease. When the virtual machine encounters errors in byte-code being executed, it issues a PANIC call and exits.</p>
	</li>
	
	<li>A free-memory segment, which is used by the virtual machine and the code of the programmer. Usually itâ€™s 1MB in size; however, it can be modified by specifying the size in the <span class="filename">hal.h</span> header. At this moment, only assembly level access (r/w) is granted to this area via the <code>MOV</code> and <code>COPY</code> instructions, since the compiled script makes no use of it. The virtual machine will allocate this memory segment on the heap of the real machine.
		<ul>
			<li>The memory can be accessed by loading the desired address value into a register and then performing the required memory access operation: either a full 32 bit value will be loaded from the memory into the required register either a byte value will be moved in the required register target.</li>
			
			<li>A memory area (usually beginning after the end of the global variables) is designated to be functioning as a stack which can be controlled via the <code>PUSH</code>/<code>POP</code>/<code>MOV</code> commands. The assembly instructions <code>CALL</code> and <code>RET</code> are also utilizing the stack, <code>CALL</code> pushes the address where the code should continue when a <code>RET</code> was issued. <code>POP</code> or <code>RET</code> in case of the stack pointer being 0 will result in <code>PANIC</code>.
			
				<p>There is a Stack Pointer (same variable type as the registers, referenced in register with index 255) that is used to access the stack. Upon a push, the stack pointer increases its value, and upon pop, it decreases it. The assembler syntax for the stack pointer is: <code>$sp</code></p>
			</li>
			
			<li>An important compatibility note: due to the way the direct jump addresses are calculated, both the compiler and vm need to be compiled using the same memory size.</li>
		</ul>
	</li>
	
	<li>A set of 256 interrupts, but only one of them is used at the moment to handle writing to the screen, the other ones are available for adventurous programmers to experiment with.</li>
</ul>

<h2>The opcodes of the virtual machine</h2>

<p>In order to perform the lowest possible level of operations, the primitive virtual machine has a set of opcodes that are acted upon by the VM. Following is a brief presentation of the opcodes which for the technically versed should be immediately familiar due to the heavy influence of the Intel family of opcodes.</p>

<h3>Target specifiers</h3>

<p>Please note that in the binary stream (some of) the values (the parameters to the assembly commands) are prepended by a type specifier which indicates how to interpret the bytes following the specifier. There is a specific assembly syntax for the various specifiers.</p>

<p>The specifier is always 1 byte, and it can be:</p>

<ul>
	<li><code>0x00</code> â€“ the following 4 bytes are to be interpreted as an immediate number, negative or positive. Assembler syntax: <code>12345</code> (ie: just a normal number)</li>
	
	<li><code>0x01</code> â€“ the following 1 byte is the index of a given register. Assembler syntax: $<code>r123</code> for register 123.</li>
	
	<li><code>0x02</code> â€“ this represents the value of the memory at the registerâ€™s value in the following byte. The syntax: <code>[$r123]</code> will give the byte value stored at the value of register 123 in the memory.</li>
	
	<li><code>0x4X</code> â€“ the following 1 byte is the index of a register, and the nibble marked with X is the index of the byte in the given register. Assembler syntax: <code>$r12@1</code> meaning byte 1 from register 12. For 32 bit virtual machines <code>@0</code>, <code>@1</code>, <code>@2</code>, <code>@3</code> are valid byte indexes.</li>
	
	<li><code>0x05</code> â€“ the coming 4 bytes represent a number, and its value represents an address in the memory. Syntax: <code>[1234]</code>. This is mostly used by the compiler to index in variables from the memory.</li>
	
	<li><code>0x06</code> â€“ the coming 4 bytes represent a number, and its value represents the address of 1 byte in the memory. Syntax: <code>[@1234]</code></li>
	
	<li><code>0x07</code> â€“ this represents the value of the memory at the registerâ€™s value in the following byte + the added offset following that. The syntax: <code>[$r123+4]</code> will give the byte value stored at the value of register 123 + 4 in the memory and <code>[$r123-4]</code> will give the byte value stored at the value of register 123 - 4 in the memory. The four basic operations (addition, multiplication, division, subtraction) are possible in this context.</li>
	
	<li><code>0x08</code> represents the value of the memory at the registerâ€™s value in the following byte + the following registerâ€™s value. The syntax: <code>[$r123+$r245]</code> will give the byte value stored at the value of register 123 + value of register 245. The four basic operations (addition, multiplication, division, subtraction) are possible in this context.</li>
</ul>

<h3>MOV</h3>

<p>The <code>MOV</code> opcode moves data from the given source to the given destination.</p>

<p>The following is the syntax of the command:</p>

<pre class="programlisting">
  MOV &lt;DST&gt; &lt;SRC&gt;</pre>
  
<p>where the syntax of <code>&lt;DST&gt;</code> and <code>&lt;SRC&gt;</code> correspond to specific syntax in the target specifier list. <code>MOV</code> makes sure that only sane operations are permitted, and the machine will stop execution with a <code>PANIC</code> command if an invalid operation is attempted for execution (for example trying to load a value into an immediate number). One of <code>&lt;DST&gt;</code> or <code>&lt;SRC&gt;</code> must be a register. For example, see Listing 9.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
MOV $r1 12       # initialize register 1 to 12
MOV [0] $r1      # load into memory, at location 0 the value of register 1
MOV $r6@0 12     # load 12 into register 6â€™s lowest byte
MOV $r1@1 [@$r2] # load into byte 1 of reg 1 the value in memory at register 2â€™s value
MOV $r3@0 [@0]   # load into byte 0 of reg 3 the byte value in memory at addr 0
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 9</td>
	</tr>
</table>

<h3>ADD, SUB, MUL, DIV, MOD, OR, XOR, AND and NOT</h3>

<p>The assembly opcodes with these very descriptive names perform the requested operation on two parameters following them. The syntax of the parameters is identical to the parameters of <code>MOV</code>, so you can just refer to that section when writing your assembly code should you require it.</p>

<p><code>NOT</code> requires only one parameter and it will do a binary negation of it, which cannot be an immediate value. If any of the operations results in a zero value, the <code>LOF</code> flag will be set to false, otherwise to true.</p>

<h3>COPY</h3>

<p><code>COPY</code> is used to copy data between two addresses in the memory segment. <code>COPY</code> can be imagined as multiple <code>MOV</code> commands where both the destination and source are at increasing locations.</p>

<p>The syntax of the command is:</p>

<pre class="programlisting">
  COPY &lt;DEST&gt; &lt;SRC&gt; &lt;COUNT&gt;</pre>
  
<p>where <code>DEST</code> is interpreted as a numerical value representing the address of the area in the memory, <code>SRC</code> represents the source address and <code>COUNT</code> is the number of bytes to copy. If <code>DEST</code> + <code>COUNT</code> &gt; <code>MEM_SIZE</code>, the virtual machine will issue a <code>PANIC</code> call and not perform the operation. At the implementation level, <code>std::memmove</code> is used so it is possible to have overlapping memory areas for this command. This operation does not modify the content of the <code>LOF</code> flag.</p>

<h3>EQ, NEQ, LT, GT, LTE and GTE</h3>

<p>The comparison operators perform two operations. The first one is to compare the value of the two parameters they get, and the second one is to set the value of the <code>LOF</code> flag to be either <code>true</code> or <code>false</code>.</p>

<h3>JMP, JT and JNT</h3>

<p>The direct jump commands will set the IP of the virtual machine to the specified parameter and will continue the execution of the code from the new location.</p>

<ul>
	<li>The <code>JMP</code> command will jump to the given address regardless of any surrounding conditions.</li>
	
	<li>The <code>JT</code> will jump only if the VMâ€™s <code>LOF</code> flag was set to <code>true</code> by the previous comparison operation, and it will clear the flag simultaneously.</li>
	
	<li>The <code>JNT</code> command will jump to the given location only if the flag was set to <code>false</code>, and it also will clear the flag to <code>false</code>.</li>
</ul>

<p>The syntax of the commands is:</p>

<pre class="programlisting">
  JMP | JT | JNT &lt;ADDRESS&gt;</pre>
  
<p>where <code>ADDRESS</code> will be interpreted as the address where the execution will be continued. The programmer needs to take into consideration that <code>ADDRESS</code> is actually a valid destination, where valid, executable code is to be found.</p>

<h3>DJMP, DJT and DJNT</h3>

<p>The delta jump commands, such as the â€˜direct jumpâ€™, â€˜jump if last comparison was trueâ€™ and â€˜jump if last comparison was not trueâ€™ commands, jump from the current location in various directions and the new address determined as being at a specific bytes away from the current location. The syntax is:</p>

<pre class="programlisting">
  DJMP | DJT | DJNT &lt;DELTA&gt;</pre>

<p>where <code>DELTA</code> is to be interpreted as the <em>difference</em> between the current IP (instruction pointer) and the upcoming new location. At this point of execution (when <code>DST</code> was evaluated by the jump command) the IP points to the next executable operation, this needs to be taken into consideration when manually calculating jump addresses.</p>

<p>As a side note, the jump commands cannot jump to labels since they all expect a number. In order to jump to a specific label, you will have to use <code>goto</code>.</p>

<h3>PUSH and POP</h3>

<p>As guessed from the name, <code>PUSH</code> and <code>POP</code> work with the stack. <code>PUSH</code> always pushes an immediate value onto it, with the type modifier 0 for numeric data or 7 for string indexes, and <code>POP</code> pops the value into the parameter it was required to pop into. The syntax is: <code>PUSH &lt;SOMETHING&gt;</code> and <code>POP &lt;SOMETHING&gt;</code>.</p>


<h3>Internals of the VM</h3>

<p>Like every other code interpreter out there, our VM is performing very similar operations:</p>

<ol>
	<li>Load the next instruction</li>
	
	<li>The code for the instruction loads its parameters</li>
	
	<li>The code for the instruction executes the instruction</li>
	
	<li>Check for failure</li>
	
	<li>Repeat from step 1.</li>
</ol>

<p>A big part of the virtual machineâ€™s code (specifically automatic handling of the registered opcodes) is generated by the build system (see below at the extending the VM) and the remaining of the operations are distributed between the data types representing the VMâ€™s data structures and the actual operations the VM can perform.</p>

<p>The virtual machineâ€™s basic data structures are declared in the header <span class="filename">registers.h</span>. This header has a basic data type <code>primal::valued</code>. The class acts as a common aggregate for the entities that can be handled by the virtual machine (see Figure 7).</p>

<table class="sidebartable">
	<tr>
		<td><img src="/content/images/journals/ol149/Deak/Deak-07.svg" /></td>
	</tr>
	<tr>
		<td class="title">Figure 7</td>
	</tr>
</table>

<p>Upon each execution step, the instruction (opcode) needs to fetch its parameters from the memory of the virtual machine (the opcode can have zero, 1, 2 or 3 parameters for now) and perform the most basic of operations on the data types. The overloaded operators in the valued class more act as commodities, the most important operations are the <code>word_t value()</code> and the <code>void set_value(word_t)</code>. This class hierarchy makes it very easy to implement basic operations, for example this is the implementation of the <code>ADD</code> opcode (found in <span class="filename">impl_ADD.h</span>):</p>

<pre class="programlisting">
  bool primal::impl_ADD(primal::vm* v)
  {
    primal::valued* dest = v-&gt;fetch();
    primal::valued* src  = v-&gt;fetch();
  
    *dest += *src;
  
    v-&gt;set_flag(dest-&gt;value() != 0);
  
    return true;
  }</pre>

<h2>Extending the virtual machine</h2>

<p>The project was created with extendibility in mind, and nothing is easier than adding a new binary opcode and the corresponding implementation, or providing your own interrupt to deal with tasks.</p>

<h3>Adding a new opcode</h3>

<p>As you might have observed, we have not introduced the INC operation when designing the virtual machine (along with other missing operations, which hopefully will be implemented at a later stage), but no worries â€“ letâ€™s read through the following paragraph in order to see how you can implement your own opcodes to extend the virtual machine.</p>

<p>We will need to do modifications to the build system and also add a new file to it in order to have the new opcode up and running. You have to open the <span class="filename">CMakeLists.txt</span> from the <span class="filename">opcodes</span> directory around line 35 to find the registration of the opcodes section.</p>

<p>Go to the end of the section, and after the last <code>register_opcode</code> call insert yours:</p>

<pre class="programlisting">
  register_opcode(&quot;INC&quot; 0xEE 1 OF_ARITH)</pre>

<p>With this you have told the build system the following facts:</p>

<ul>
	<li>It will have an opcode called <code>INC</code></li>
	
	<li>When in the binary stream, it will be represented by the value <code>0xEE</code></li>
	
	<li>It expects one parameter, ie. the one which will be incremented.</li>
	
	<li>It is of type arithmetic</li>
</ul>

<p>What happens behind the scenes is that <code>CMake</code> generates two files for you:</p>

<ul>
	<li>One is the opcodes header file <span class="filename">INC.h</span> in the <span class="filename">build/opcodes</span> directory (see Listing 10), which has the declaration of the class INC and also two functions.</li>

	<li>The other file is <span class="filename">compile_INC.cpp</span>, generated in the same directory, which will be used by the compiler when encountering the <code>INC</code> opcode in the assembly code of your source file.</li>
	
</ul>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
struct INC final : public opcode
{
  uint8_t bin() const override {return 0xEE; }
  std::string name() const override {return &quot;INC&quot;;}
  size_t paramcount() const override { return 1; }
  virtual opcode_family family() const override 
  { return
    primal::opcodes::opcode_family::OF_ARITH; };
};

bool impl_INC(vm*);
std::vector&lt;uint8_t&gt;
  compile_INC(std::vector&lt;token&gt;&gt;amp;);
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 10</td>
	</tr>
</table>

<p>The two functions in the header file are:</p>

<ul>
	<li><code>compile_INC</code>, which is responsible for compiling the assembly statements into corresponding (virtual) machine code and it will be provided by the build system in the <span class="filename">compile_INC.cpp</span> file</li>
	
	<li><code>impl_INC</code>, which actually tells the virtual machine what to do upon encountering the <code>INC</code> in the binary stream, needs to be provided by us in the <span class="filename">opcodes/impl/impl_INC.cpp</span> file, because this is where the build system will look for it.
		
		<p>So create that file and paste in the following:</p>
		
		

<pre class="programlisting">
      #include &lt;INC.h&gt;
      #include &lt;vm.h&gt;
      bool impl_INC(vm* v)
      {
        valued* dest = v-&gt;fetch();
        dest-&gt;set_value(dest-&gt;value() + 1);
        return true;
      }</pre>
	</li>
</ul>
	  
<p>After you regenerate your CMake cache, and once the build of the system is ready, everything should be up and running, and you can use <code>INC</code> in your assembly commands. So maybe itâ€™s time to write a unit test for this purpose too (Listing 11).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
TEST_CASE(&quot;ASM compiler - INC opcode&quot;, &quot;[asm-compiler]&quot;)
{
  auto c = compiler::initalize();
  c-&gt;compile(R&quot;code(
              asm MOV $r1 10
              asm INC $r1
            )code&quot;
  );
  auto vm = vm::create();
  REQUIRE(vm-&gt;run(c-&gt;bytecode()));
  REQUIRE(vm-&gt;r(1).value() == 11);
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 11</td>
	</tr>
</table>

<p>Now if you run <code>make test</code>, your newly created opcode should be up, running and incrementing values you want to.</p>

<h3>How the opcodes are registered into the VM</h3>

<p>When you have declared a new opcode, with the <code>register_opcode</code> function in the build system in the background a CPP file (<span class="filename">opcode-impl.cpp</span>) is generated with the contents similar to Listing 12 for each of the opcodes.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
namespace primal {
  void register_opcodes() {
    vm_impl::register_opcode
      (primal::opcodes::MOV(),
        [&gt;amp;](primal::vm* v) -&gt; bool 
        { return primal::impl_MOV(v); }
      );
    vm_impl::register_opcode
      (primal::opcodes::ADD(),
        [&gt;amp;](primal::vm* v) -&gt; bool
        { return primal::impl_ADD(v); }
       );
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 12</td>
	</tr>
</table>

<p>In the VMâ€™s implementation class (<span class="filename">vm_impl.h</span>) this boils down Listing 13.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
struct executor
{
  std::function&lt;bool(vm*)&gt; runner;
};
template&lt;class OPC, class EXECUTOR&gt;
  static void register_opcode(OPC&gt;amp;&gt;amp; o, 
  EXECUTOR&gt;amp;&gt;amp; ex)
{
  auto f = [&gt;amp;](vm* machina) -&gt; bool {
    return ex(machina);};
  executor t;
  t.runner = std::function&lt;bool(vm*)&gt;(f);
  opcode_runners[o.bin()] = t;
};
static std::map&lt;uint8_t, executor&gt; opcode_runners;
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 13</td>
	</tr>
</table>

<p>Now in its turn, the VMâ€™s opcode runner has the code in Listing 14.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
while(opcode_runners.count(ms[static_cast&lt;size_t&gt;
  (m_ip)]))
{
  // read in an opcode
  uint8_t opc = ms[static_cast&lt;size_t&gt;(m_ip++)];
  try
  {
    // is there a registered opcode runner for 
    // the given opcode?
    if(!opcode_runners[ opc ].runner(v))
    {
      panic();
    }
  }
  catch(...)
  {
    panic();
  }
  if(ms[static_cast&lt;size_t&gt;(m_ip)] == 0xFF)
  {
    return true;
  }
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 14</td>
	</tr>
</table>

<p>And in short, this sums up how the VM is handling the execution of the various opcodes.</p>

<h3>Writing a new interrupt</h3>

<p>The interrupts are pieces of code that are compiled C++ code, and they can be used by the VM to communicate with the world outside the sandbox.</p>

<p>For the moment, only one interrupt is in use, which is <code>INT 1</code> used to write to the screen. In order to implement a new interrupt, you will have to modify the <span class="filename">CMakeLists.txt</span>of the virtual machine in the section where it states â€˜Register the interruptsâ€™ add the interrupt with a new call to <code>enable_interrupt</code>, for example: <code>enable_interrupt(2)</code>.</p>

<p>Now, create the implementation file for the interrupt in <span class="filename">vm/intr</span> and call it <span class="filename">intr_2.cp</span> As a start, the content of it can be:</p>

<pre class="programlisting">
  #include &lt;vm.h&gt;
  namespace primal
  {
    bool intr_2(vm* v)
    {
      return true;
    }
  }</pre>
  
<p>And from this point on, you can provide the implementation for your interrupt. You can use the interface exposed by <span class="filename">vm.h</span> in order to get access to the VM internals and memory.</p>

<h2>Binding it together</h2>

<p>As I mentioned in the introduction, our ultimate goal in this article is to have it all in one place, ie: in one CPP file we compile a piece of primal script code and let it run in the primal VM.</p>

<p>Using the compiler to compile and the vm in your own source files is pretty easy, you just need a few instructions.</p>

<p>To properly use it in your own code, you will need a sequence of instructions like:</p>

<pre class="programlisting">
  std::shared_ptr&lt;primal::compiler&gt; c 
    = primal::compiler::create();</pre>
	
<p>This will create for you the object <code>c</code> which is a primal script compiler that you can use to compile code in the following manner:</p>

<pre class="programlisting">
  c-&gt;compile(R&quot;code(
      let x = 40
    )code&quot;
  );</pre>
  
<p>The next step you need to do is to create a virtual machine:</p>

<pre class="programlisting">
  std::shared_ptr&lt;primal::vm&gt; vm 
    = primal::vm::create();</pre>
	
<p>It is possible to use <code>auto</code> for the return type: I just wanted to show the actual return type. Now that you have the compiled code in the compiler and the virtual machine up and running, nothing is easier than to run it:</p>

<pre class="programlisting">
  vm-&gt;run(c-&gt;bytecode())</pre>
  
<p>After this point, you have full access to the memory and registers of the virtual machine with the proper function calls found in class <code>primal::vm</code> located in <span class="filename">vm.h</span>:</p>

<ul>
	<li><code>word_t get_mem(word_t address)</code> â€“ to get the 32/64 bit numeric value from the memory of the virtual machine from the specified address.</li>
	
	<li><code>uint8_t get_mem_byte(word_t address)</code> â€“ to get the 8 bit value from the memory of the virtual machine from the specified address.</li>
	
	<li><code>const reg&amp; r(uint8_t i) const</code> â€“ to get the object representing the ith. register of the VM.</li>
</ul>

<h2>Conclusion</h2>

<p>Working on this project was really fun, and I have learned a lot during the implementation. However, progress does not stop here. There are a lot of features in the compiler to be implemented, new keywords, proper register handling, and â€“ why not â€“ even some optimizations that can be introduced. The VM needs good profiling in order to pinpoint the bits that can be optimized, and a lot of other features can be included too in the entire ecosystem. While designing this whole universe, I made it as modular as possible with the thought that some day, someone might have time to work on it, and include the features he/she considers nice to have. So, if you feel that you could contribute in any way, feel free to get in touch and start coding on it in the spirit of open source.</p>

<h2>References</h2>

<p class="bibliomixed"><a id="[CompOpt]"></a>[CompOpt] Compiler optimizations: <a href="http://compileroptimizations.com/">http://compileroptimizations.com/</a></p>

<p class="bibliomixed"><a id="[Dijkstra61]"></a>[Dijkstra61] EdsgerW.Dijkstra (1961) <em>ALGOL Bulletin Supplement no. 10</em>: <a href="http://www.cs.utexas.edu/~EWD/MCReps/MR35.PDF">http://www.cs.utexas.edu/~EWD/MCReps/MR35.PDF</a></p>

<p class="bibliomixed"><a id="[Dragon]"></a>[Dragon] Alfred V. Aho, Monica S. Lam, Ravi Sethi and Jeffrey D. Ullman (2006) <em>Compilers: Principles, Techniques, and Tools</em></p>

<p class="bibliomixed"><a id="[GCCArchitecture]"></a>[GCCArchitecture] GNU C Compiler Architecture: <a href="https://en.wikibooks.org/wiki/GNU_C_Compiler_Internals/GNU_C_Compiler_Architecture">https://en.wikibooks.org/wiki/GNU_C_Compiler_Internals/GNU_C_Compiler_Architecture</a></p>

<p class="bibliomixed"><a id="[Guntli11]"></a>[Guntli11] Christopher Guntli (2011) â€˜Architecture of clangâ€™, <a href="https://wiki.ifs.hsr.ch/SemProgAnTr/files/Clang_Architecture_-_ChristopherGuntli.pdf">https://wiki.ifs.hsr.ch/SemProgAnTr/files/Clang_Architecture_-_ChristopherGuntli.pdf</a></p>

<p class="bibliomixed"><a id="[Knuth65]"></a>[Knuth65] Donald E. Knuth (July 1965). â€˜On the translation of languages from left to rightâ€™. <em>Information and Control</em> 8(6): 607 - 639. doi: 10.1016/S0019 - 9958(65)90426-2</p>

<p class="bibliomixed"><a id="[LLVM]"></a>[LLVM] The LLVM Compiler Infrastructure:<a href="https://llvm.org/">https://llvm.org/</a></p>

<p class="bibliomixed"><a id="[Wikipedia-1]"></a>[Wikipedia-1] Compiler optimizations: <a href="https://en.wikipedia.org/wiki/Category:Compiler_optimizations">https://en.wikipedia.org/wiki/Category:Compiler_optimizations</a></p>

<p class="bibliomixed"><a id="[Wikipedia-2]"></a>[Wikipedia-2] Lexical analysis: <a href="https://en.wikipedia.org/wiki/Lexical_analysis">https://en.wikipedia.org/wiki/Lexical_analysis</a></p>

<p class="bibliomixed"><a id="[Wikipedia-3]"></a>[Wikipedia-3] Multi-pass compiler: <a href="https://en.wikipedia.org/wiki/Multi-pass_compiler">https://en.wikipedia.org/wiki/Multi-pass_compiler</a></p>

<p class="bibliomixed"><a id="[Wikipedia-4]"></a>[Wikipedia-4] Parser generators: <a href="https://en.wikipedia.org/wiki/Comparisonofparser_generators">https://en.wikipedia.org/wiki/Comparisonofparser_generators</a></p>

<p class="bibliomixed"><a id="[Wikipedia-5]"></a>[Wikipedia-5] Regular expression: <a href="https://en.wikipedia.org/wiki/Regular_expression">https://en.wikipedia.org/wiki/Regular_expression</a></p>


<h2>Appendix â€“ The write function</h2><a id="[Appendix]"></a>
<p>As promised, Listing 15 is the write function you can use to display data. Save it under the name <span class="filename">write.prim</span> and place it in the directory where your compiler and source file is, for the moment. Donâ€™t forget to include it in your project.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
fun write(...)
  asm MOV $r249 $r255
  # Decrease the stack pointer to skip the pushed
  # R254 and the return address.
  # This is for 32 bit builds. For 64 bit builds,
  # use 16
  asm SUB $r255 8

  # First: the number of parameters that came in
  asm POP $r10

:next_var
  # fetch the value that needs to be printed
  asm POP $r2

  # This $r1 will contain the type of the variable:
  # 1 for string, 0 for number
  asm POP $r1

  # Is this a numeric value we want to print?
  asm EQ $r1 0

  # If yes, goto the print number location
  asm JT print_number

  # else goto the print string location
  asm JMP print_string

:print_number
  # print it out
  asm INTR 1

  # Move to the next variable
  asm SUB $r10 1

  # JT is logically equivalent to JNZ
  asm JT next_var

  # Done here, just return
  asm MOV $r255 $r249
  asm JMP leave

:print_string
  # Here $r2 contains the address of the string,
  #first character is the length

  # Initialize $r1 with the length
  asm MOV $r1 0
  asm MOV $r1@0 [$r2+$r251]

  # Get the address of the actual character data
  asm ADD $r2 1

  # Print it
  asm INTR 1

  # Move to the next variable
  asm SUB $r10 1

  # JT is logically equivalent to JNZ
  asm JT next_var

  # Done here, just return
  asm MOV $r255 $r249
:leave
end
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 15</td>
	</tr>
</table>

<p class="bio"><span class="author"><b>DeÃ¡k Ferenc</b></span> Ferenc has wanted to be a better programmer for the last 15 years. Right now he tries to accomplish this goal by working at Maritime Robotics as a system programmer, and in his free time, by exploring the hidden corners of the C++ language in search of new quests.</p>

</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
