    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Programming Your Own Language in C++</title>
        <link>https://members.accu.org/index.php/journals/2252</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">Overload Journal #133 - June 2016  + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c362/">o133</a>
                    (8)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c65/">Programming</a>
                    (877)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c362-65/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c362+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Programming Your Own Language in C++</h1>
<p><strong>Author:</strong>&nbsp;Martin Moene</p>
<p>
<strong>Date:</strong> 03 June 2016 14:42:20 +01:00 or Fri, 03 June 2016 14:42:20 +01:00</p>
<p><strong>Summary:</strong>&nbsp;Scripting languages allow dynamic features easily. Vassili Kaplan writes his own in C++ allowing keywords in any human language.
</p>
<p><strong>Body:</strong>&nbsp;<p class="quote">To iterate is human, to recurse divine.<br />
   ~ L. Peter Deutsch</p>
   
<p>Why is yet another scripting language needed? Not only are there already quite a few scripting languages present but there are also a few frameworks to build new languages and parsers, notably ANTLR Framework [<a href="#[ANTLR]">ANTLR</a>] and the Boost Spirit library [<a href="#[Spirit]">Spirit</a>]. Nevertheless, I see two main advantages in writing a language like the one presented in this article: the possibility of adding a new function, a control flow statement or a new data structure on the fly and to have keywords translated to any language with just a few configuration changes (or to add synonyms to existing keywords, so that, for instance, <code>ls</code> and <code>dir</code> could be used interchangeably). Also, as you will see from this article, the learning curve is much shorter to get started adding your own code in C++.</p>

<p>In <em>C Vu</em> [<a href="#[Kaplan15a]">Kaplan15a</a>, <a href="#[Kaplan15b]">Kaplan15b</a>] I published the split-and-merge algorithm to parse a string containing a mathematical expression. Later, in <em>MSDN Magazine</em> [<a href="#[Kaplan15c]">Kaplan15c</a>, <a href="#[Kaplan16]">Kaplan16</a>] I implemented this algorithm in C# and showed how one can implement a scripting language in C# based on the split-and-merge algorithm.</p>

<p>In this article, I am going to extend the split-and-merge algorithm presented earlier and show how one can use it to write a scripting language in C++. In <em>MSDN Magazine</em> [<a href="#[Kaplan16]">Kaplan16</a>] the scripting language was called CSCS (Customized Scripting in C#). For simplicity, I will continue calling the language I discuss here CSCS (Customized Scripting in C++ using the split-and-merge algorithm) as well.</p>

<p>The CSCS language, as described in <em>MSDN Magazine</em> [<a href="#[Kaplan16]">Kaplan16</a>], was still not very mature. In particular, there is a section towards the end of the article mentioning some of the important features usually present in a programming language that CSCS was still missing. In this article Iâ€™ll generalize the CSCS language and show how to implement most of those missing features and a few others in C++.</p>

<p>All the code discussed here is available for download [<a href="#[Downloads]">Downloads</a>].</p>

<h2>The split-and-merge algorithm to parse a language statement</h2>

<p>Here weâ€™ll generalize the split-and-merge algorithm to parse not only a mathematical expression but any CSCS language statement. A separation character must separate all CSCS statements. It is defined in the <span class="filename">Constants.h</span> file as <code>const char END_STATEMENT = ';'</code>.</p>

<p>The algorithm consists of two steps.</p>

<p>In the first step, we split a given string into a list of objects, called â€˜Variablesâ€™. Each <code>Variable</code> consists of an intermediate result (a number, a string, or an array of other <code>Variable</code>s) and an â€˜actionâ€™ that must be applied to this <code>Variable</code>. Previously we called this <code>Variable</code> a â€˜Cellâ€™ and it could hold only a numerical value.</p>

<p>The last element of the created list of <code>Variable</code>s has so called â€˜null actionâ€™, which, for convenience, we denote by the character <code>&quot;)&quot;</code>. It has the lowest priority of 0.</p>

<p>For numbers, an action can be any of <code>+</code>, <code>-</code>, <code>*</code>, <code>/</code>, <code>%</code>, <code>&amp;&amp;</code>, <code>||</code>, and so on. For strings, only <code>+</code> (concatenation), and logical operators <code>&lt;</code>, <code>&lt;=</code>, <code>&gt;</code>, <code>&gt;=</code>, <code>==</code>, <code>!=</code> are defined.</p>

<p>Listing 1 contains an excerpt from the <code>Variable</code> class definition.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
class Variable {
public:
  Variable(): type(Constants::NONE) {}
  Variable(double val): numValue(val),
    type(Constants::NUMBER) {}
  Variable(string str): strValue(str),
    type(Constants::STRING) {}

  string toString() const;
  bool canMergeWith(const Variable&amp; right);
  void merge(const Variable&amp; right);
  void mergeNumbers(const Variable&amp; right);
  void mergeStrings(const Variable&amp; right);

private:
  double numValue;
  string strValue;
  vector&lt;Variable&gt; tuple;
  string action;
  string varname;
  Constants::Type type;
};
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 1</td>
	</tr>
</table>

<p>The separation criteria for splitting a string into a list of <code>Variable</code>s are: an action, an expression in parentheses, or a function (including a variable, which is also treated as a function), previously registered with the parser. We are going to talk how to register a function with the parser in the next section. In Listing 2 you can see all of the actions defined for numbers and their priorities.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
unordered_map&lt;string, int&gt; prio;
prio[&quot;++&quot;] = 10;
prio[&quot;--&quot;] = 10;
prio[&quot;^&quot;]  = 9;
prio[&quot;%&quot;]  = 8;
prio[&quot;*&quot;]  = 8;
prio[&quot;/&quot;]  = 8;
prio[&quot;+&quot;]  = 7;
prio[&quot;-&quot;]  = 7;
prio[&quot;&lt;&quot;]  = 6;
prio[&quot;&gt;&quot;]  = 6;
prio[&quot;&lt;=&quot;] = 6;
prio[&quot;&gt;=&quot;] = 6;
prio[&quot;==&quot;] = 5;
prio[&quot;!=&quot;] = 5;
prio[&quot;&amp;&amp;&quot;] = 4;
prio[&quot;||&quot;] = 3;
prio[&quot;+=&quot;] = 2;
prio[&quot;-=&quot;] = 2;
prio[&quot;*=&quot;] = 2;
prio[&quot;/=&quot;] = 2;
prio[&quot;%=&quot;] = 2;
prio[&quot;=&quot;]  = 2;
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 2</td>
	</tr>
</table>

<p>In the case of an expression in parentheses or a function, we apply recursively the whole split-and-merge algorithm to that expression in parentheses or the function argument in order to get a <code>Variable</code> object as a result. At the end of the first step we are going to have a list of <code>Variable</code>s, each one having an action to be applied to the next <code>Variable</code> in the list. Thanks to the recursion there will be no functions left after step 1, just numbers, strings or lists of numbers, strings or lists of numbers, strings, â€¦ and so on recursively. We call these lists â€˜tuplesâ€™. Internally they are implemented as vectors (see Listing 1). </p>

<p>The data structure holding the string with the expression to parse and a pointer to the character currently being parsed is <code>ParsingScript</code>. An excerpt from its definition is shown in Listing 3.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
class ParsingScript
{
public:
  ParsingScript(const string&amp; d): data(d),
    from(0) {}
  inline char operator()(size_t i) const {
    return data[i]; }
  inline size_t size() const             {
    return data.size(); }
  inline bool stillValid() const         {
    return from &lt; data.size(); }
  inline size_t getPointer() const       {
    return from; }
  inline char current() const            {
    return data[from]; }
  inline char currentAndForward()        {
    return data[from++]; }
  inline char tryNext() const            {
    return from+1 &lt; data.size() ?
           data[from+1] : Constants::NULL_CHAR; }
  inline void forward(size_t delta  = 1) {
    from += delta; }
private:
  string data; // contains the whole script
  size_t from; // a pointer to the 
               // script data above
};
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 3</td>
	</tr>
</table>

<p>The main parsing cycle of the first part of the algorithm is shown in Listing 4.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
do // Main processing cycle of the first part.
{
  char ch = script.currentAndForward (); 
  // get the next character and move one forward

  bool keepCollecting = stillCollecting(script,
    parsingItem, to, action);
  if (keepCollecting)
  { // The char still belongs to the 
    // previous operand.
    parsingItem += ch;
    // Parse until the next extracted character is
    // in the &quot;to&quot; string.
    bool goForMore = script.stillValid() &amp;&amp;
      !Utils::contains(to, script.current()));
    if (goForMore) {
      continue;
    }
  }
  // Done getting the next token. The getValue()
  // call below may recursively call this method if
  // extracted item is a function or is starting
  // with a '('.
  ParserFunction func(script, parsingItem, ch,
    action);
  Variable current = func.getValue(script);

  if (action.empty()) { // find the next &quot;action&quot;
                     // token or a null action '('
    action = updateAction(script, to);
  }

  char next = script.current(); // we've already
                                // moved forward
  bool done = listToMerge.empty() &amp;&amp; 
    (next == END_STATEMENT ||
    (action == Constants::NULL_ACTION &amp;&amp;
     current.getType() != NUMBER);
  if (done) {
    // If there is no numerical result, we are not
    // in a math expression.
    listToMerge.push_back(current);
    return listToMerge;
  }
  current.action = action;
  listToMerge.push_back(current);
  parsingItem.clear();
} while (script.stillValid() &amp;&amp;
    !Utils::contains(to, script.current());
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 4</td>
	</tr>
</table>

<p>The second step consists in merging the list of <code>Variable</code>s created in the first step according to their priorities. Two <code>Variable</code> objects can be merged together only if the priority of the action of the <code>Variable</code> on the left is greater or equal than the priority of the action of the <code>Variable</code> on the right. If not, we jump to the <code>Variable</code> on the right and merge it with the <code>Variable</code> on its right first, and so on, recursively. As soon as the <code>Variable</code> on the right has been merged with the <code>Variable</code> next to it, we return back to the original <code>Variable</code>, the one we werenâ€™t able to merge before, and try to remerge it with the newly created <code>Variable</code> on its right. Note that eventually we will be able to merge the list since the last <code>Variable</code> in this list has a null action with zero priority.</p>

<p>Check out the implementation of the second step of the algorithm in Listing 5. The function <code>merge()</code> is called from outside with the <code>mergeOneOnly</code> parameter set to <code>false</code>. That parameter is set to <code>true</code> when the function calls itself recursively, because we need to merge the <code>Variable</code> on the right with the <code>Variable</code> on its right just once, and return back in order to remerge the current <code>Variable</code>.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
Variable Parser::merge(Variable&amp; current, 
  size_t&amp; index, vector&lt;Variable&gt;&amp; listToMerge,
  bool mergeOneOnly)
{
  while (index &lt; listToMerge.size()) {
    Variable&amp; next = listToMerge[index++];
    while (!current.canMergeWith(next)) {
      merge(next, index, listToMerge,
            true/*mergeOneOnly*/);
    }
    current.merge(next);
    if (mergeOneOnly) {
      break;
    }
  }
  return current;
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 5</td>
	</tr>
</table>

<p>Letâ€™s see an example of applying the split-and-merge algorithm to the following CSCS language statements:</p>

<pre class="programlisting">
  a = &quot;blah&quot;; b = 10; c = a == &quot;blue&quot; || b == 1;
  print(c);</pre>

<p>We have four statements above, separated by the <code>;</code> character. </p>

<p>The parsing of the first two statements is analogous â€“ as soon as the parser gets the <code>=</code> token, it will register the parameters on the left (<code>a</code> and <code>b</code>), mapping them to their corresponding values (<code>&quot;blah&quot;</code> and <code>10</code>). We will see how to register variables and functions with the parser in the next section (actually for the parser it doesnâ€™t matter whether it is a variable or a function â€“ they are all treated the same). Note that there is no need to declare any variables â€“ they are all deduced from the context.</p>

<p>More interesting is the third statement:</p>

<pre class="programlisting">
  c = a == &quot;blue&quot; || b == 1;</pre>
  
<p>When parsing the right part of this statement (after the assignment), as soon as the parser gets a function or a variable (this can be anything that is not in quotes and is not a number) it will try to find if there is already a corresponding variable or a function registered with the parser. If not, a <code>ParsingException</code> will be thrown, but in our case we have already defined <code>a</code> and <code>b</code> in the previous two statements, so their values will be substituted, transforming the right side of the statement above to:</p>

<pre class="programlisting">
  &quot;blah&quot; == &quot;blue&quot; || 10 == 1;</pre>

<p>The first step of the split-and-merge algorithm produces the following list of <code>Variable</code>s from the above statements:</p>

<pre class="programlisting">
Split(&quot;blah&quot; == &quot;blue&quot; || 10 == 1) â†’
1. Variable(strValue = &quot;blah&quot;, action = &quot;==&quot;)
2. Variable(strValue = &quot;blue&quot;, action = &quot;||&quot;)
3. Variable(numValue = 10, action = &quot;==&quot;)
4. Variable(numValue = 1, action = &quot;)&quot;)</pre>

<p>In the second part of the algorithm we merge the list of <code>Variable</code>s above. The <code>Variable</code>s 1 and 2 can be merged on the first pass since the priority of the <code>&quot;==&quot;</code> action is higher than the priority of the <code>&quot;||&quot;</code> action according to the Listing 2. The merging of <code>Variable</code>s 1 and 2 consists in applying the action <code>&quot;==&quot;</code> of the first variable to the values of both variables, leading to a new <code>Variable</code> having the action of the <code>Variable</code> on its right:</p>

<pre class="programlisting">
  Merge(Variable(strValue = &quot;blah&quot;, action = &quot;==&quot;),
  Variable(strValue = &quot;blue&quot;, action = &quot;||&quot;)) = 
  Variable(&quot;blah&quot; == &quot;blue&quot;, 
    action = &quot;||&quot;) = Variable(numValue = 0, 
    action = &quot;||&quot;)</pre>
	
<p>Next, we merge <code>Variable(numValue = 0, action = &quot;||&quot;)</code> with the next <code>Variable</code> in the list, <code>Variable(numValue = 10, action = &quot;==&quot;)</code>. But now the action <code>= &quot;||&quot;</code> has a lower priority than the action <code>= &quot;==&quot;</code>, therefore we need to merge first the <code>Variable(numValue = 10, action = &quot;==&quot;)</code> with the next <code>Variable(numValue = 1, action = &quot;)&quot;)</code>. Since the null action <code>&quot;)&quot;</code> has the lowest priority, the merge can be done, producing a new <code>Variable</code>:</p>

<pre class="programlisting">
  Merge(Variable(numValue = 10, 
    action = &quot;==&quot;),Variable(numValue = 1,
    action = &quot;)&quot;)) =
  Variable(10 == 1, 
    action = &quot;)&quot;) = Variable(numValue = 0, 
    action = &quot;)&quot;)</pre>
	
<p>Since this merge was successful we must now get back to the <code>Variable(numValue = 0, action = &quot;||&quot;)</code> and remerge it with the newly created <code>Variable(numValue = 0, action = &quot;)&quot;)</code>, producing:</p>

<pre class="programlisting">
  Merge(Variable(numValue = 0,
    action = &quot;||&quot;),Variable(numValue = 0, 
    action = &quot;)&quot;)) =
  Variable(0 || 0,
    action = &quot;)&quot;) = Variable(numValue = 0,
    action = &quot;)&quot;)</pre>

<p>Since there are no more <code>Variable</code>s left in the list, the result of merging is 0. This value will be assigned to the variable <code>&quot;c&quot;</code> and registered with the parser.</p>

<p>The last statement is <code>&quot;print(c);&quot;</code>. After extracting the token <code>print</code> the parser will look if there is a function named <code>print</code> already registered with the parser. Since there is one, the parser will recursively call the whole split-and-merge algorithm on the argument of the <code>print()</code> function, <code>&quot;c&quot;</code>. Since <code>&quot;c&quot;</code> was registered with the parser in the previous step, the parser will return back its value, 0, to the print function. Letâ€™s see more closely how variables and functions can be implemented and registered with the parser taking as an example the <code>print()</code> function.</p>

<h2>Registering functions and variables with the parser</h2>

<p>All the functions that can be added to the parser must derive from the <code>ParserFunction</code> class.</p>

<p>The <code>Identity</code> is a special function which is called when we have an argument in parentheses. It just calls the main entry method of the split-and-merge algorithm to load the whole expression in parentheses:</p>

<pre class="programlisting">
 class IdentityFunction : public ParserFunction
 {
 public:
   virtual Variable evaluate(ParsingScript&amp; script)
   {
     return Parser::loadAndCalculate(script);
   }
 };</pre>

<p>All split-and-merge functions and variables are implemented similarly.</p>

<p>There are three basic steps to register a function with the parser:</p>

<ul>
	<li>Define a function keyword token, i.e. the name of the function in the scripting language, CSCS, e.g.:
	
<pre class="programlisting">
    static const string PRINT; // in Constants.h

    const string Constants::PRINT = &quot;print&quot;; 
                               // in Constants.cpp</pre>
	</li>
	
	<li>Implement the class to be mapped to the keyword from the previous step. Basically just the <code>evaluate()</code> method must be overridden. E.g. for the <code>print()</code> function:

<pre class="programlisting">
    Variable 
      PrintFunction::evaluate(ParsingScript&amp;
        script)
    {
      vector&lt;Variable&gt; args =
      Utils::getArgs(script, Constants::START_ARG,
      Constants::END_ARG);
      for (size_t i = 0; i &lt; args.size(); i++) {
        cout &lt;&lt; args[i].toString();
      }
      cout &lt;&lt; endl;
      return Variable::emptyInstance;
    }</pre>
	</li>

	<li>Map an object of the class implemented in the previous step with the previously defined keyword as follows:

<pre class="programlisting">
    ParserFunction::addGlobalFunction
           (Constants::PRINT, new  PrintFunction());</pre>
	</li>
</ul>

<p><code>Utils::getArgs()</code> auxiliary function parses the arguments of the print function and first gets a list of strings, separated by commas from <code>print(string1, string2, ..., stringN)</code>. Then it recursively applies the split-and-merge algorithm to each of these <code>string1, string2, ..., stringN</code>. Finally, it prints them out to the terminal.</p>

<p>The <code>addGlobalFunction()</code> method (see Listing 6) just adds a new entry to the global dictionary <code>s_functions</code> (implemented as an <code>unordered_map</code>) used by the parser to map keywords to functions.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
Variable IfStatement::evaluate(ParsingScript&amp; script)
{
  size_t startIfCondition = script.getPointer();
  
  Variable result = Parser::loadAndCalculate
    (script, Constants::END_ARG_STR);
  bool isTrue = result.numValue != 0;
    
  if (isTrue) {
    result = processBlock(script);
        
    if (result.type == 
          Constants::BREAK_STATEMENT ||
        result.type ==
          Constants::CONTINUE_STATEMENT) {
        // Got here from the middle of the if-block.
       // Skip it.
         script.setPointer(startIfCondition);
      skipBlock(script);
    }
    skipRestBlocks(script);
    return result;
  }
    
  // We are in Else. Skip everything in the 
  // If statement.
  skipBlock(script);
    
  ParsingScript nextData(script.getData(),
    script.getPointer());
  string nextToken =
    Utils::getNextToken(nextData);
    
  if (Constants::ELSE_IF_LIST.find(nextToken) !=
      Constants::ELSE_IF_LIST.end()) {
    script.setPointer(nextData.getPointer() + 1);
    result = processIf(script);
  }
  if (Constants::ELSE_LIST.find(nextToken) !=
      Constants::ELSE_LIST.end()) {
    script.setPointer(nextData.getPointer() + 1);
    result = processBlock(script);
  }
  return Variable::emptyInstance;
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 6</td>
	</tr>
</table>

<pre class="programlisting">
  void ParserFunction::addGlobalFunction
    (const string&amp; name, ParserFunction* function)
  {
    auto tryInsert = 
      s_functions.insert({name, function});
    if (!tryInsert.second) {
      // The variable or function already exists.
      // Delete it and replace with the new one.
      delete tryInsert.first-&gt;second;
      tryInsert.first-&gt;second = function;
    }
  }</pre>

<p>As weâ€™ve mentioned before, we treat a variable as a function: as soon as the parser gets an expression like <code>&quot;a = something&quot;</code>, it will register a function with the keyword <code>&quot;a&quot;</code> and map it to the <code>Variable</code> <code>&quot;something&quot;</code> (which will be calculated applying recursively the split-and-merge algorithm). In C++ the code for this is:</p>

<pre class="programlisting">
  ParserFunction::addGlobalFunction(&quot;a&quot;,
    something /*Variable*/
);</pre>

<h2>Implementing the if â€“ else if â€“ else control flow statements</h2>

<p>Letâ€™s see how to implement the <code>if()</code> â€“ <code>else if()</code> â€“ <code>else()</code> functionality in CSCS.</p>

<p>The first and the third steps are clear: define the <code>if</code> constant and register a class implementing the <code>if</code> control flow statement with the parser, the same way we registered the <code>print()</code> function above.</p>

<p>The second step, the implementation of the <code>if</code> statement, is shown in Listing 6.</p>

<p>First we evaluate the condition inside of <code>if()</code> by recursively calling the split-and-merge algorithm on that condition. If the condition is true we process the whole <code>if()</code> block, recursively calling the split-and-merge algorithm on each statement inside of the <code>processBlock()</code> method. If the condition is false we first skip the whole <code>if()</code> block in the <code>skipBlock()</code> method. Then we evaluate the <code>else()</code> and <code>else if()</code> statements. The evaluation of <code>else if()</code> is basically same as the evaluation of the <code>if()</code> statement itself, so for <code>else if()</code> we recursively call the <code>if()</code> statement evaluation.</p>

<p>Note that we enhanced the execution of the <code>if</code>-statement here â€“ as soon as there is a <code>break</code> or a <code>continue</code> statement, we get out of the <code>if ()</code> block â€“ same way we get out from the <code>while()</code> block. This can be useful in case of nested <code>if</code>s.</p>

<p>Similarly, we can register any function with the parser, e.g. <code>while()</code>, <code>for()</code>, <code>try()</code>, <code>throw()</code>, <code>include()</code>, etc. We can also define local or global variables in the same way. In the next section we are going to see how to define functions in CSCS and add passed arguments as local variables to CSCS.</p>

<h2>Implementing custom functions in CSCS</h2>

<p>To write a custom function in the scripting language, letâ€™s introduce two functions in C++, <code>FunctionCreator</code> and <code>CustomFunction</code>, both deriving from the <code>ParserFunction</code> base class. A <code>FunctionCreator</code> object is registered with the parser in the same way we registered <code>if()</code> and <code>print()</code> functions above.</p>

<p>As soon as the parser gets a token with the <code>&quot;function&quot;</code> keyword, an instance of the <code>FunctionCreator</code> will be executed, namely, its <code>evaluate()</code> method, see Listing 7.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
Variable FunctionCreator::evaluate(ParsingScript&amp;
    script) {
  string funcName = Utils::getToken(script,
     TOKEN_SEPARATION);
  vector&lt;string&gt; args =
     Utils::getFunctionSignature(script);
  string body = Utils::getBody(script, '{', '}');

  CustomFunction* custFunc = 
     new CustomFunction(funcName, body, args);
  ParserFunction::addGlobalFunction(funcName,
     custFunc);

  return Variable(funcName);
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 7</td>
	</tr>
</table>

<p>Basically, it just creates a new object, an instance of the <code>CustomFunction</code>, and initializes it with the extracted function body and the list of parameters. It also registers the name of the custom function with the parser, so the parser maps that name with the new <code>CustomFunction</code> object which will be called as soon as the parser encounters the function name keyword.</p>

<p>So all of the functions that we implement in the CSCS code correspond to different instances of the <code>CustomFunction</code> class. The custom function does primarily two things, see Listing 8. First, it extracts the function arguments and adds them as local variables to the parser (they will be removed from the parser as soon as the function execution is finished or an exception is thrown). It also checks that the number of actual parameters is equal to the number of the registered ones (this part is skipped for brevity).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
Variable CustomFunction::evaluate(ParsingScript&amp; script)
{
  // 0. Extract function arguments.
  vector&lt;Variable&gt; args = Utils::getArgs(script);
  // 1. Add passed arguments as local variables.
  StackLevel stackLevel(m_name);
  for (size_t i = 0; i &lt; m_args.size(); i++) {
    stackLevel.variables[m_args[i]] = args[i];
  }
  ParserFunction::addLocalVariables(stackLevel);
  // 2. Execute the body of the function.
  Variable result;
  ParsingScript funcScript(m_body);
  while (funcScript.getPointer() &lt;
     funcScript.size()-1 &amp;&amp; !result.isReturn) {
    result =
       Parser::loadAndCalculate(funcScript);
    Utils::goToNextStatement(funcScript);
  }
  // 3. Return the last result of the execution.
  ParserFunction::popLocalVariables();
  return result;
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 8</td>
	</tr>
</table>

<p>Second, the body of the function is evaluated, using the main parser entry point, the <code>loadAndCalculate()</code> method.</p>

<p>If the function body contains calls to other functions, or to itself, the calls to the <code>CustomFunction</code> can be recursive.</p>

<p>Letâ€™s see this with an example in CSCS. It calculates recursively the Fibonacci numbers, see Listing 9.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
cache[&quot;fib&quot;] = 0;
function fibonacci(n) {
  if (!isInteger(n)) {
    exc = &quot;Fibonacci is for integers only&quot;
      &quot;(n=&quot;+ n +&quot;)&quot;;
    throw (exc);
  }
  if (n &lt; 0) {
    exc = &quot;Negative number (n=&quot;+ n +&quot;) supplied&quot;;
    throw (exc);
  }
  if (n &lt;= 1) {
    return n;
  }
  if (contains(cache[&quot;fib&quot;], n)) {
    result = cache[&quot;fib&quot;][n];
    print(&quot;  found in cache fib(&quot;, n, &quot;)=&quot;,
      result);
    return result;
  }
  result = fibonacci(n - 2) + fibonacci(n - 1);
  cache[&quot;fib&quot;][n] = result;
  return result;
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 9</td>
	</tr>
</table>

<p>The Fibonacci function above uses an auxiliary <code>isInteger()</code> function, also implemented in CSCS:</p>

<pre class="programlisting">
  function isInteger(candidate) {
    return candidate == round(candidate);
  }</pre>

<p>The <code>isInteger()</code> function calls yet another, <code>round()</code> function. The implementation of the <code>round()</code> function is already in the C++ code and is analogous to the implementation of the <code>print()</code> function that we saw in the previous section.</p>

<p>To execute the Fibonacci function with different arguments we can use the following CSCS code:</p>

<pre class="programlisting">
  n = ...;
  print(&quot;Calculating Fibonacci(&quot;, n, &quot;)...&quot;);
  try {
    f = fibonacci(n);
    print(&quot;fibonacci(&quot;, n, &quot;)=&quot;, f);
  } catch(exc) {
    print(&quot;Caught: &quot; + exc);
  }</pre>

<p>We get the output in Figure 1 for different values of <code>n</code>.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
Calculating Fibonacci(10)...
  found in cache fib(2)=1
  found in cache fib(3)=2
  found in cache fib(4)=3
  found in cache fib(5)=5
  found in cache fib(6)=8
  found in cache fib(7)=13
  found in cache fib(8)=21
Fibonacci(10)=55

Calculating Fibonacci(-10)...
Caught: Negative number (n=-10) supplied at
  fibonacci()

Calculating Fibonacci(1.500000)...
Caught: Fibonacci is for integers only
(n=1.500000) at
  fibonacci()
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Figure 1</td>
	</tr>
</table>

<p>Since the exceptions happened at the global level, the exception stacks printed consisted only of the <code>fibonacci()</code> function itself. To keep track of the execution stack, i.e. CSCS functions being called, internally we use a C++ stack data structure, where we add every executing <code>CustomFunction</code> object as soon as we start function execution and remove it as soon as the execution is over. In case of an exception we just print out the contents of this stack. Then we clear the execution stack up to the level where the exception was caught. The implementation of the execution stack and of the <code>try()</code> and <code>throw()</code> functions can be consulted in [<a href="#[Downloads]">Downloads</a>].</p>

<p>The implementation of the Fibonacci function in Listing 9 uses caching of already calculated Fibonacci numbers by using dictionaries â€“ we will see how one can implement dictionaries next.</p>

<h2>Arrays and other data structures</h2>

<p>To declare an array and initialize it with some data we use the same CSCS language statement. The array elements for the initialization are declared between the curly braces. Here is an example in CSCS:</p>

<pre class="programlisting">
  a = 10;
  arr = {++a-a--, ++a*exp(0)/a--, -2*(--a - ++a)};
  i = 0;
  while(i &lt; size(arr)) {
    print(&quot;a[&quot;, i, &quot;]=&quot;, arr[i], &quot;, expecting &quot;,
      i);
    i++;
  }</pre>

  <p>The number of elements in the array is not explicitly declared since it can be deduced from the assignment.</p>

<p>The function <code>size()</code> is implemented in C++. It returns the number of elements in an array. In case the passed argument is not an array, it will return the number of characters in it.</p>

<p>Internally an array is implemented as a vector, so you can add elements to it on the fly. In CSCS we access elements of an array, or modify them, by using the squared brackets. As soon as the parser gets a token containing an opening squared bracket, it knows that it is for an array index, so it applies recursively the whole split-and-merge algorithm to the string between the squared brackets to extract the index value. There can be unlimited number of dimensions of an array (well, limited by the system resources) because the array is implemented as a <code>vector&lt;Variable&gt;</code>: the <code>Variable</code> class has a member called <code>&quot;tuple&quot;</code> and declared as <code>vector&lt;Variable&gt;</code>. For instance, accessing <code>a[i][j][k]</code> in the CSCS code means <code>a.tuple[i].tuple[j].tuple[k]</code> in the C++ underlying code (<code>&quot;a&quot;</code> is a <code>Variable</code>, see Listing 1 for <code>Variable</code> definition).</p>

<p>In other words, for each consequent index in squared brackets the parser will create a new <code>Variable</code> of type array or use an existing <code>&quot;tuple&quot;</code> member.</p>

<p>If we access an element of an array and that element has not been initialized yet, an exception will be thrown by the parser. However, itâ€™s possible to assign a value to just one element of an array, even if the index used is greater than the number of elements in the array and even if the array has not been initialized yet. In this case the non-existing elements of the array will be initialized with the empty values. The CSCS code below is legal, even if the array has not been initialized before:</p>

<pre class="programlisting">
  i = 10;
  while(--i &gt; 0) {
    array[i] = 2*i;
  }
  print(&quot;array[9]=&quot;, array[9]);       &lt;CodeComment&gt;
// prints 18&lt;/CodeComment&gt;

  print(&quot;size(array)=&quot;, size(array)); &lt;CodeComment&gt;
// prints 10&lt;/CodeComment&gt;

</pre>

<p>We can also add other data structures to the CSCS language. Letâ€™s see an example of adding a dictionary, implemented internally as an <code>unordered_map</code>. We add the following member to the <code>Variable</code> class:</p>

<pre class="programlisting">
  unordered_map&lt;string, size_t&gt; dictionary;</pre>

<p>This is the mapping between a dictionary key and an index of already existing member <code>vector&lt;Variable&gt;</code> tuple, where the actual dictionary value will be stored. So every time a new key-value pair is added to the dictionary the following code is executed:</p>

<pre class="programlisting">
  tuple.emplace_back(var);
  dictionary[key] = tuple.size() - 1;</pre>

<p>Every time an existing key is accessed the following code is executed (a check for existence is skipped):</p>

<pre class="programlisting">
  auto it = dictionary.find(key);
  size_t ptr = it-&gt;second;
  return tuple[ptr];
</pre>

<p>With a few changes one can use not only strings, but anything else as a dictionary key. Similarly, we can add other data structures to CSCS â€“ as long as a data structure exists, or can be implemented in C++, it can be added to CSCS as well. </p>

<p>In Listing 6 we saw the implementation of the <code>if()</code> - <code>else if()</code> - <code>else</code> control flow statements. Towards the end of the listing you might have scratched your head, asking why we didnâ€™t compare the extracted token with the <code>&quot;else&quot;</code> string, but we did a comparison with the <code>ELSE_LIST</code>? The reason is that the <code>ELSE_LIST</code> contains all possible synonyms and translations of the <code>&quot;else&quot;</code> keyword to any of the languages that the user might have supplied in the configuration file. How is a keyword translation added to the parser? </p>

<h2>How to add keyword synonyms and language translations</h2>

<p>One of the advantages of writing a custom programming language is a possibility to have the keywords in any language (besides the â€˜baseâ€™ language, understandably chosen to be English). Also we can create our language in such a way, that there will be no need to remember that one must use <code>dir</code>, <code>copy</code>, <code>move</code>, <code>findstr</code> on Windows, but <code>ls</code>, <code>cp</code>, <code>mv</code>, and <code>grep</code> on Unix; and that a very nice shortcut <code>cd..</code> works only on Windows: in our language we can have both! And with just a few configuration changes.</p>

<p>Here is how we can add keyword synonyms and translations to the CSCS language.</p>

<p>First, we define them in a configuration file; Figure 2 is an incomplete example of a configuration file with Spanish translations. The same configuration file may contain an arbitrary number of languages. For example, we can also include the keyword synonyms in the same file (see Figure 3).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
  function    = funciÃ³n
  include     = incluir
  if          = si
  else        = sino
  elif        = sino_si
  return      = regresar
  print       = imprimir
  size        = tamaÃ±o
  while       = mientras
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Figure 2</td>
	</tr>
</table>


<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
  ls          = dir
  cp          = copy
  mv          = move
  rm          = rmdir
  grep        = findstr
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Figure 3</td>
	</tr>
</table>

<p>After reading the keyword translations we add them to the parser one by one (see Listing 10).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
void addTranslation(const string&amp; origName,
                    const string&amp; translation)
{
  ParserFunction* origFunction =
    ParserFunction::getFunction(origName);
  if (origFunction != 0) {
    ParserFunction::addGlobalFunction
       (translation, origFunction);
  }
  tryAddToSet(originalName, translation, CATCH,
    CATCH_LIST);
  tryAddToSet(originalName, translation, ELSE,
    ELSE_LIST);
  //â€¦ other sets
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 10</td>
	</tr>
</table>

<p>First, we try to add a translation to one of the registered functions (like <code>print()</code>, <code>sin()</code>, <code>cos()</code>, <code>round()</code>, <code>if()</code>, <code>while()</code>, <code>try()</code>, <code>throw()</code>, etc.). Basically, we map the new keyword to the <code>ParserFunction</code>, corresponding to the original keyword. Therefore, as soon as the parser extracts either the original keyword (say <code>&quot;cp&quot;</code>), or the one added from the configuration file (e.g. <code>&quot;copy&quot;</code>), the same C++ function <code>CopyFunction</code>, deriving from the <code>ParserFunction</code> class, will be called.</p>

<p>Then we try to add new keywords to the sets of additional keywords, that are not functions (e.g. a <code>&quot;catch&quot;</code> is processed only together with the try-block, <code>&quot;else&quot;</code> and <code>&quot;else if&quot;</code> are processed together with the <code>if</code>-block, etc.) The <code>tryAddToSet()</code> is an auxiliary template function that adds a translation to a set, in case the original keyword name belongs to that set (e.g. <code>CATCH = &quot;catch&quot;</code> belongs to the <code>CATCH_LIST</code>).</p>

<p>Listing 11 is an example of the CSCS code using Spanish keywords and functions.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
palabras = {&quot;Este&quot;, &quot;sentido&quot;, &quot;es&quot;, &quot;en&quot;,
 &quot;EspaÃ±ol&quot;};
tam = tamaÃ±o(palabras);
i = 0;
mientras(i &lt; tam) {
  si (i % 2 == 0) {
    imprimir(palabras[i]);
  }
  i++;
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 11</td>
	</tr>
</table>

<h2>Conclusions</h2>

<p>Using the techniques presented in this article and consulting the source code in [<a href="#[Downloads]">Downloads</a>] you can develop your own fully customized language using your own keywords and functions. The resulting language will be interpreted at runtime directly, statement by statement.</p>

<p>We saw that implementing a printing function and a control flow statement is basically the same: one needs to write a new class, deriving from the <code>ParserFunction</code> class and override its <code>evaluate()</code> method. Then one needs to register that function with the parser, mapping it to a keyword. The <code>evaluate()</code> method will be called by the parser as soon as the parser extracts the keyword corresponding to this function. For the lack of space we didnâ€™t show how to implement the <code>while()</code>, <code>try</code>, <code>throw</code>, <code>break</code>, <code>continue</code>, and <code>return</code> control flow statements but they are all implemented analogously. The same applies to the prefix and postfix <code>++</code> and <code>--</code> operators that we did not have space to show but you can consult in [<a href="#[Downloads]">Downloads</a>].</p>

<p>Using the above approach of adding a new function to the parser, anything can be added to the CSCS language as long as it can be implemented in C++.</p>

<p>You can also use this custom language as a shell language (like bash on Unix or PowerShell on Windows) to perform different file or operating system commands (find files, list directories or running processes, kill or start a new process from the command line, and so on). Stay tuned to see how to do that in our next article.</p>

<h2>References</h2>

<p class="bibliomixed"><a id="[ANTLR]"></a>[ANTLR]  ANTLR Framework, <a href="http://www.antlr.org">http://www.antlr.org</a> </p>

<p class="bibliomixed"><a id="[Downloads]"></a>[Downloads]  Implementation of the CSCS language in C++,<a href="http://www.ilanguage.ch/p/downloads.html">http://www.ilanguage.ch/p/downloads.html</a></p>

<p class="bibliomixed"><a id="[Kaplan15a]"></a>[Kaplan15a]  V. Kaplan, â€˜Split and Merge Algorithm for Parsing Mathematical Expressionsâ€™, ACCU <em>CVu</em>, 27-2, May 2015, <a href="http://accu.org/var/uploads/journals/CVu272.pdf">http://accu.org/var/uploads/journals/CVu272.pdf</a> </p>

<p class="bibliomixed"><a id="[Kaplan15b]"></a>[Kaplan15b]  V. Kaplan, â€˜Split and Merge Revisitedâ€™, ACCU <em>CVu</em>, 27-3, July 2015, <a href="http://accu.org/var/uploads/journals/CVu273.pdf">http://accu.org/var/uploads/journals/CVu273.pdf</a></p>

<p class="bibliomixed"><a id="[Kaplan15c]"></a>[Kaplan15c]  V. Kaplan, â€˜A Split-and-Merge Expression Parser in C#â€™, <em>MSDN Magazine</em>, October 2015, <a href="https://msdn.microsoft.com/en-us/magazine/mt573716.aspx">https://msdn.microsoft.com/en-us/magazine/mt573716.aspx</a></p>

<p class="bibliomixed"><a id="[Kaplan16]"></a>[Kaplan16]  V. Kaplan, â€˜Customizable Scripting in C#â€™, <em>MSDN Magazine</em>, February 2016, <a href="https://msdn.microsoft.com/en-us/magazine/mt632273.aspx">https://msdn.microsoft.com/en-us/magazine/mt632273.aspx</a></p>

<p class="bibliomixed"><a id="[Spirit]"></a>[Spirit]  Boost Spirit Library, <a href="http://boost-spirit.com">http://boost-spirit.com</a> </p>


<h2>Acknowledgements</h2>

<p>Iâ€™d like to thank Frances Buontempo and the <em>Overload</em> review team for providing their feedback which enabled me to elevate the content presented in this article.</p>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
