    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: How to Write a Programming Language: Part 3, The Evaluator</title>
        <link>https://members.accu.org/index.php/journals/2565</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">Overload Journal #147 - October 2018 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c391/">o147</a>
                    (5)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c65/">Programming</a>
                    (877)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c391-65/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c391+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;How to Write a Programming Language: Part 3, The Evaluator</h1>
<p><strong>Author:</strong>&nbsp;Bob Schmidt</p>
<p>
<strong>Date:</strong> 04 October 2018 18:22:24 +01:00 or Thu, 04 October 2018 18:22:24 +01:00</p>
<p><strong>Summary:</strong>&nbsp;Weâ€™ve parsed our tokens: now we need turn them into values. Andy Balaam continues writing a programming language with the evaluator.</p>
<p><strong>Body:</strong>&nbsp;<p>This is the third part of our series on writing a programming language. In part one we broke up the code into chunks like numbers, strings and symbols (lexing), and in part two we assembled those parts into a tree structure (parsing). Now we are ready to start understanding and evaluating the parts of that tree structure to produce values and behaviour.</p>

<p>
    <img src="/content/images/journals/ol147/Balaam/Balaam-01.png" />
</p>

<p>By the end of this article you will have seen all the most important parts of a programming language, and be ready to write your own!</p>

<h2>Recap â€“ lexing and parsing</h2>


<p>The lexer and parser take some text like this:</p>

<pre class="programlisting">
  print( x + 2 );</pre>

<p>and break it into parts, and then assemble it into a tree structure like this:</p>

<pre class="programlisting">
  (&quot;call&quot;,
    (&quot;symbol&quot;, &quot;print&quot;),
    [
      (&quot;operation&quot;,
        &quot;+&quot;,
        (&quot;symbol&quot;, &quot;x&quot;),
        (&quot;number&quot;, &quot;2&quot;)
      )
    ]
  )</pre>

<p>Cell is written in Python and uses Python tuples to hold all the structures it represents. Each tuple contains a string representing its type as the first element, and then other information in the other elements of the tuple.</p>

<p>So far, we have seen tuples representing tokens coming out of the lexer, and tuples representing syntax trees coming out of the parser. This time we will see more tuples, representing values that have been worked out by the evaluator.</p>

<h2>The evaluator calculates values</h2>

<p>
    <img src="/content/images/journals/ol147/Balaam/Balaam-02.png" />
</p>

<p>The evaluator starts at the leaves of the syntax tree and calculates the values of the leaves, then combines together leaves and branches until it ends up with a single value. On the way it may have produced some side effects such as printing something out.</p>

<p>
    <img src="/content/images/journals/ol147/Balaam/Balaam-03.png" />
</p>

<h2>Scope</h2>

<p>Before we look at how the evaluator works out values, we must look at the idea of scope. Scope describes what names can be seen where in our code. Cell, like almost all modern programming languages, uses â€˜lexicalâ€™ scope, which means the values you can see are dictated by the position of the code in the text on the screen.</p>

<p>So, for example, this snippet of Cell code:</p>

<pre class="programlisting">
  x = &quot;World!&quot;;
  myfn = {
    x = &quot;Hello,&quot;;
    print( x );
  };
  myfn();
  print( x );</pre>

<p>prints:</p>

<pre class="programlisting">
  Hello,
  World</pre>

<p>because the value of <code>x</code> inside the function <code>myfn</code> is set to <code>&quot;Hello,&quot;</code> within the function definition, but it reverts to <code>&quot;World!&quot;</code> in code that is outside that block.</p>

<p>This more complicated example:</p>

<pre class="programlisting">
  outerfn = {
    x = 12;
    innerfn = {
      print(x);
    };
    innerfn;
  };

  thing = outerfn();
  thing();</pre>

<p>prints <code>&quot;12&quot;</code> because the function <code>innerfn</code> carries the values it knows about with it, meaning that when we call the function returned by <code>outerfn()</code>, which is actually <code>innerfn</code> because that is what is returned by <code>outerfn</code> when we call it, it runs the<code> print(x)</code> line and it still knows what <code>x</code> is. Functions that are carrying their values with them are called Closures, and the set of values that is passed around is called an Environment. Environments are key to the way the evaluator works.</p>

<h2>Environments</h2>

<p>An environment is a namespace that holds all the symbols that are defined in your program. As illustrated here, each piece of code operates inside a local environment (such as the current function) but can also access symbols from outer environments (such as an outer function) and the global environment, that contains important symbols such as the <code>if</code> function.</p>

<p>
    <img src="/content/images/journals/ol147/Balaam/Balaam-04.png" />
</p>

<p>This structure is provided in Cell by a class called <code>Env</code>. It takes a parent environment as a constructor argument, which it holds in <code>self.parent</code>. To look up a name we call the <code>get()</code> method:</p>

<pre class="programlisting">
  class Env:
    # ...
    def get(self, name):
      if name in self.items:
        return self.items[name]
      elif self.parent is not None:
        return self.parent.get(name)
      else:
        return None</pre>

<p>This method checks whether a symbol is defined locally, and if not, it asks the parent environment. It gives up when it gets to the global environment, which has <code>None</code> for its <code>self.parent</code> value.</p>

<p>Defining a symbol means calling <code>set()</code>, which is simpler:</p>

<pre class="programlisting">
  class Env:
    # ...
    def set(self, name, value):
      self.items[name] = value</pre>

<p>So newly defined symbols are always defined in the local environment, and donâ€™t leak out into wider scopes.</p>

<h2>The evaluator</h2>

<p>Listing 1 shows the main logic of the evaluator. You can see the full code at <a href="https://github.com/andybalaam/cell/blob/master/pycell/eval_.py">https://github.com/andybalaam/cell/blob/master/pycell/eval_.py</a>, but here we can see the main structure is very similar to the code we saw in the previous two parts â€“ a large <code>if</code>-<code>elif</code> block responding differently to the various possible structures.</p>

<table class="sidebartable">
    <tr>
        <td>
            <pre class="programlisting">
def eval_expr(expr, env):
  typ = expr[0]
  if typ == &quot;number&quot;:
    return (&quot;number&quot;, float(expr[1]))
  elif typ == &quot;string&quot;:
    return (&quot;string&quot;, expr[1])
  elif typ == &quot;none&quot;:
    return (&quot;none&quot;,)
  elif typ == &quot;operation&quot;:
    return _operation(expr, env)
  elif typ == &quot;symbol&quot;:
    name = expr[1]
    ret = env.get(name)
    if ret is None:
      raise Exception(
        &quot;Unknown symbol '%s'.&quot; % name)
    else:
      return ret
  elif typ == &quot;assignment&quot;:
    var_name = expr[1][1]
    val = eval_expr(expr[2], env)
    env.set(var_name, val)
    return val
  elif typ == &quot;call&quot;:
    return _function_call(expr, env)
  elif typ == &quot;function&quot;:
    return (&quot;function&quot;, expr[1], expr[2],
      Env(env))
  else:
    raise Exception(
      &quot;Unknown expression type: &quot; + str(expr))
            </pre>
        </td>
    </tr>
    <tr>
        <td class="title">Listing 1</td>
    </tr>
</table>

<p>In the evaluator, the <code>eval_expr()</code> function takes in an expression to evaluate, and the environment in which to work. Without an environment we canâ€™t do anything since we donâ€™t know where to look up the symbols that are being used in the code. The environment is an instance of the <code>Env</code> class we saw earlier, and the expression is a Python tuple representing part of a syntax tree.</p>

<p>The first part of the tuple tells us the type of syntax tree section we have. We place this into a variable called <code>typ</code>, and use it in the <code>if</code> block.</p>

<h2>Ordinary values</h2>

<p>If we are evaluating a number, we use Pythonâ€™s <code>float()</code> function to convert the string form that was captured in the lexer token (as <code>expr[1]</code>, the second value in the tuple) into a Python number. This means we can do arithmetic with it later if we need to, and illustrates the fact that the evaluator is where the textual and structural forms of the code are converted into â€˜meaningâ€™ such as finding actual numbers and looking up the values of symbols.</p>

<p>If we find a string, we have very little to do, since the value of a string looks identical to its form as a lexer token â€“ i.e. it is a tuple of two values, the first of which is <code>&quot;string&quot;</code> and the second is the contents of that string.</p>

<p>Next we deal with a special case â€“ there is a special type of value in Cell that is called <code>None</code> â€“ we inject this value into the global environment with the name <code>None</code>, and the Python tuple to represent its value is <code>(&quot;none&quot;,)</code> i.e. a tuple with just one value in it to represent a special none type. In Cell, <code>None</code> is used to describe a missing or empty value. If we find a <code>None</code> value like this, we simply return a similar <code>None</code> value from <code>eval_expr</code>.</p>

<p>The next type of syntax tree we handle is <code>&quot;operation&quot;</code> â€“ this represents an arithmetic operation like <code>&quot;+&quot;</code> or <code>&quot;*&quot;</code>. We call a dedicated function <code>_operation()</code> to deal with this, which is shown in listing 2.</p>

<table class="sidebartable">
    <tr>
        <td>
            <pre class="programlisting">
def _operation(expr, env):
  arg1 = eval_expr(expr[2], env)
  arg2 = eval_expr(expr[3], env)
  if expr[1] == &quot;+&quot;:
    return (&quot;number&quot;, arg1[1] + arg2[1])
  elif expr[1] == &quot;-&quot;:
    return (&quot;number&quot;, arg1[1] - arg2[1])
  elif expr[1] == &quot;*&quot;:
    return (&quot;number&quot;, arg1[1] * arg2[1])
  elif expr[1] == &quot;/&quot;:
    return (&quot;number&quot;, arg1[1] / arg2[1])
  else:
    raise Exception(
      &quot;Unknown operation: &quot; + expr[1])
            </pre>
        </td>
    </tr>
    <tr>
        <td class="title">Listing 2</td>
    </tr>
</table>

<p>The <code>_operation()</code> function takes in an expression and environment just like <code>eval_expr</code>, and it looks at <code>expr[1]</code> to find out what kind of operation is being asked for. As we saw in the previous article, this was populated by the parser and can be <code>&quot;+&quot;</code>, <code>&quot;-&quot;</code>, <code>&quot;*&quot;</code> or <code>&quot;/&quot;</code>. In each case, we evaluate the expressions on the left and right of the operator and place them into variables <code>arg1</code> and <code>arg2</code>, and then combine them together using the appropriate Python arithmetic operator. If the rules of arithmetic in Cell were different from those in Python, this is where we would see the difference. Similarly, if we chose to use some other class to represent numbers, we would have seen that being used when we found a <code>&quot;number&quot;</code> type, instead of the <code>float()</code> function. In fact, because Cell is designed to be simple to implement, we choose to use Pythonâ€™s float type and built-in arithmetic for all the numeric operations.</p>

<p>Back in the main code (listing 1) we see the next type is <code>&quot;symbol&quot;</code>. This means we found a symbol like <code>my_function</code> in the code, or a built-in symbol like <code>print</code> or <code>if</code>. To evaluate it, we simply look it up in the environment using <code>Env</code>â€™s <code>get()</code> method that we saw earlier. If we canâ€™t find it, we throw an exception, producing a crude error message for the user.</p>

<p>Similarly, if the type is <code>&quot;assignment&quot;</code>, we have found code like <code>&quot;x = 3&quot;</code>, and we use the environmentâ€™s <code>set()</code> method to store the value inside the symbol we were given. We make sure to evaluate the value before storing it, by calling <code>eval_expr()</code> again. In this case, and in other cases, we call <code>eval_expr</code> from inside <code>eval_expr</code>. This is called recursion, and makes perfect sense if you donâ€™t think about it too much, or, alternatively, if you think about it a lot.</p>

<h2>Functions</h2>

<p>The last two types to deal with both concern functions. First, <code>&quot;call&quot;</code> means we are looking at a function call, something like <code>&quot;my_fn(3)&quot;</code>. We deal with this in a separate function called <code>_function_call</code>, shown in listing 3.</p>

<table class="sidebartable">
    <tr>
        <td>
            <pre class="programlisting">
def _function_call(expr, env):
  fn = eval_expr(expr[1], env)
  args =
    list((eval_expr(a, env) for a in expr[2]))
  if fn[0] == &quot;function&quot;:
    params = fn[1]
    fail_if_wrong_number_of_args(expr[1],
      params, args)
    body = fn[2]
    fn_env = fn[3]
    new_env = Env(fn_env)
    for p, a in zip(params, args):
      new_env.set(p[1], a)
    return eval_list(body, new_env)
  elif fn[0] == &quot;native&quot;:
    py_fn = fn[1]
    params = inspect.getargspec(py_fn).args
    fail_if_wrong_number_of_args(expr[1],
      params[1:], args)
    return fn[1](env, *args)
  else:
    raise Exception(
      &quot;Attempted to call something that is not a function: %s&quot; % str(fn)
    )
            </pre>
        </td>
    </tr>
    <tr>
        <td class="title">Listing 3</td>
    </tr>
</table>

<p>The <code>_function_call</code> function (did I mention that writing a programming language can get confusing when you start writing functions about functions, or variables containing variables?) evaluates the function object that is being called and checks its type.</p>

<p>The type will be either <code>&quot;function&quot;</code> or <code>&quot;native&quot;</code>. The <code>&quot;function&quot;</code> type means that this is a normal function written in Cell. In order to run it, we check we have been given the right number of arguments, and then create a temporary environment based on the environment carried around by the function itself (recall the discussion of scope and closures earlier). Next it puts the argument values into that environment using the names provided in the function definition, and then calls <code>eval_list</code>. It is not shown here, but <code>eval_list</code> just evaluates each line of the function one by one. It actually ignores the values of all those lines except the last one, which it uses as the functionâ€™s return value.</p>

<p>A <code>&quot;native&quot;</code> type is a function that is not written in Cell, but instead is provided as part of Cellâ€™s implementation. This means the function is written in Python (because Cell is written in Python). In this case, <code>fn[1]</code> is a Python function. We check the number of arguments again, and call the function, passing in the environment and the arguments. Python functions that provide native Cell functions actually take one more argument in Python than you see in Cell, because the first argument is the environment in which to run. We donâ€™t create a sub-environment in which to run in this case, because native functions can do all kinds of magic, like modifying the environment in which they are running. Writing these functions is slightly odd because the arguments passed in are tuples representing Cell values, rather than simple Python types, and any symbols etc. need to be looked up in the environment provided.</p>

<p>Back in listing 1, the last type we deal with is <code>&quot;function&quot;</code>. So far weâ€™ve only dealt with calling functions, but this is about the definition of a function â€“ in Cell that means code inside curly braces. In fact, most of the hard work of defining the function has been done by the parser, which made us a list of argument names and expressions that make up the body of the function. All we need to do in the evaluator is wrap all that up with a new <code>Env</code> object that is the environment passed around with the function. This means if we return a function definition from another function, it can still access the variables it could see when it was defined because its environment (and the parent environments) are held with it. We rely on Pythonâ€™s object references to make sure the values we are interested in are still available when we use them.</p>

<p>The <code>else</code> part of listing 1 throws an exception because we have found a syntax tree that we donâ€™t recognise (in the famous last words of all programmers, this should never happen) and we are done.</p>

<h2>Side effects</h2>

<p>With all this discussion of finding values, it seems strange to say that most programming languages, including Cell, actually do nothing with the values they find. In order to make a useful program, the programmer must use the values to produce â€˜side effectsâ€™ â€“ things that make something happen in the world outside the program. In Cell, we have a native function called <code>&quot;print&quot;</code> that prints out values we have calculated. Most other languages have lots of available side effects such as creating and modifying files, displaying windows, and making sounds.</p>

<h2>Summary</h2>

<p>Weâ€™ve completed our journey: this time we saw how to take a syntax tree and turn it into meaningful values. When we combine this with being able to break code into separate chunks (lexing) and building those chunks into a syntax tree (parsing), weâ€™ve covered all the basic building blocks needed to write an interpreter. Are you ready to design your own language?</p>

<p>You can find all the code for Cell at <a href="https://github.com/andybalaam/cell">https://github.com/andybalaam/cell</a>, and I would to hear from you if you have made your own language â€“ let me know through GitHub or Twitter on @andybalaam or in the fediverse on andybalaam@mastodon.social, and check out a video series about Cell on my YouTube page at <a href="https://www.youtube.com/user/ajbalaam">https://</a>www.youtube.com/user/ajbalaam.</p>

<p class="bio"><span class="author"><b>Andy Balaam</b></span>  Andy is happy as long as he has a programming language and a problem. He finds over time he has more and more of each. You can find his open source projects at artificialworlds.net</p>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
