    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Stream-Based Parsing in C++</title>
        <link>https://members.accu.org/index.php/articles/346</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Programming Topics + Overload Journal #56 - Aug 2003</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c65/">Programming</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c156/">56</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c65-156/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c65+156/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Stream-Based Parsing in C++</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 02 August 2003 22:56:32 +01:00 or Sat, 02 August 2003 22:56:32 +01:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e18" id="d0e18"></a></h2>
</div>
<p>This paper shows how to implement general parsers as a family of
streams. This allows for very readable, maintainable and flexible
parsers. The method is illustrated with a parser for simple
arithmetic expressions, but can easily be extended to a parser for
a full-fledged programming language. Moreover, the same technique
can be applied to the entire process from lexing to execution,
since actions can be associated with each sub-parser.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e22" id=
"d0e22"></a>Introduction</h2>
</div>
<p>The parsing of input is a very important problem appearing in
many different parts of software development - parsing user input
in the form of command-line options, the parsing of arithmetic
expressions in a calculator, parsing values in a user-defined
configuration file or compiling some programming language.</p>
<p>This makes it important to have different approaches. What we
will present here in this paper, is a method of parsing inspired by
what is done in functional programming (FP). The paper is
<span class="emphasis"><em>not</em></span> about functional
programming in C++Moreover, the presence of too many parentheses
makes the code harder to read and is hence also more error
prone.<sup>[<a name="d0e32" href="#ftn.d0e32" id=
"d0e32">1</a>]</sup>, rather it is about how to implement a
particularly elegant idea from FP in an object oriented
context.</p>
<p>The entire process, from lexical analysis to actual execution of
a program can be divided into a number of individual steps:</p>
<pre class="programlisting">
source -&gt; lexer -&gt; parser -&gt; optimiser
       -&gt; execution -&gt; output
</pre>
<p>In FP, this could be represented by a family of functions:</p>
<pre class="programlisting">
output = execution <sup>o</sup> optimiser <sup>o</sup> parser
         <sup>o</sup> lexer(source)
</pre>
<p>The advantage of this approach is <span class=
"emphasis"><em>flexibility</em></span>: it is easy to omit or
modify individual steps, which is important for playing around with
different approaches in language design or compiler
construction.</p>
<p>One could attempt to implement the different sub-parsers as
functions or (better) function objects in C++. The disadvantage of
doing so, however, is the proliferation of parentheses this would
entail. In C++ there is no operator for function (or functor)
composition corresponding to <sup>o</sup> above (which may be
called many different things in a functional language - e.g. a
period or simply the letter 'o'). Nor can such an operator be
defined in a natural way - none of the overloadable operators are
well suited for becoming composition operators<sup>[<a name="d0e63"
href="#ftn.d0e63" id="d0e63">2</a>]</sup>.</p>
<p>Moreover, the presence of too many parentheses makes the code
harder to read and is hence also more error prone.</p>
<p>One can consider a function as a stream, however. Instead of
writing <tt class="literal">x=f(y)</tt> one could try to write
<tt class="literal">x &lt;&lt; f &lt;&lt; y</tt>. Here, we do have
a natural operator for composition, namely the stream operator
<tt class="literal">&lt;&lt;</tt>. This suggests considering the
entire process as a collection of streams:</p>
<pre class="programlisting">
output &lt;&lt; execution &lt;&lt; optimiser
       &lt;&lt; parser &lt;&lt; lexer &lt;&lt; source
</pre>
<p>This won't work, however, since <tt class=
"literal">&lt;&lt;</tt> is left-associative. Hence, either one
needs to introduce parentheses once more, e.g., write <tt class=
"literal">x &lt;&lt; (f &lt;&lt; y)</tt> and so on, or one will
need to use another, right-associative operator. The former case
will automatically suffer from the abovementioned problems with
using function objects.</p>
<p>Consequently, we will have to use a right-associative operator
instead. Unfortunately, C++ does not allow one to declare an
operator to have a user-defined associativity, so instead we will
have to replace <tt class="literal">&lt;&lt;</tt> by one of the
right-associative operators defined by C++. There are very, very
few of these. In fact, the only binary rightassociative operators
are the assignment operators <tt class="literal">+=</tt>,
<tt class="literal">*=</tt>,.... Thus, the simplest possible change
is to use <tt class="literal">operator&lt;&lt;=</tt>. This also has
the added advantage of resembling an arrow, more showing the
direction of the flow of data.</p>
<p>Now, even though strictly speaking the sub-parsers won't be
stream objects, we will still refer to them as such by analogy to
ordinary streams (I/O stream, file streams, string streams) present
in C++. The reason being that the defining feature of a stream is
its ability to process input and to be pipelined, which is
precisely what our generalised stream will do.</p>
<p>In this paper, we will concentrate on the <span class=
"emphasis"><em>parsing</em></span> step, but in such a way that the
remaining steps of the process could be implemented in a similar
fashion. In fact, we will see how to make a simple modification to
our parser and turn it into an expression calculator. For
concreteness, we will consider a particular example, which is
simple enough not to introduce unnecessary complications yet
complex enough to be non-trivial. The chosen example is the parsing
of arithmetic expressions like <tt class="literal">1+(2-3)*4</tt>.
We will only consider integers<sup>[<a name="d0e117" href=
"#ftn.d0e117" id="d0e117">3</a>]</sup>. Furthermore, we will not
worry about the precedence levels of the standard arithmetic
operators - this could be done by slightly modifying the grammar as
shown in for instance, [<a href="#Martin">Martin</a>]. It will be
shown later how to accommodate this with very few changes to our
framework.</p>
<p>Hence, the basic ingredients in our language are:</p>
<pre class="programlisting">
&lt;digit&gt;    ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 |
               7 | 8 | 9
&lt;operator&gt; ::= + | - | * | /
&lt;lpar&gt;     ::= (
&lt;rpar&gt;     ::= )
</pre>
<p>To this we add the definition of a number as a sequence of one
or more digits:</p>
<pre class="programlisting">
&lt;num&gt; ::= &lt;digit&gt;+
</pre>
<p>Now, a simple expression is either a number on its own or it is
two numbers separated by an operator:</p>
<pre class="programlisting">
&lt;simexp&gt; ::= &lt;num&gt; | &lt;num&gt; &lt;operator&gt; &lt;num&gt;
</pre>
<p>Such an expression is the smallest string composed from the
symbols which makes sense in this simple language. A general
expression can then be built like this:</p>
<pre class="programlisting">
&lt;exp&gt; ::= &lt;simexp&gt; | &lt;lpar&gt; &lt;exp&gt; &lt;rpar&gt;
                   | &lt;exp&gt; &lt;operator&gt; &lt;exp&gt;
</pre>
<p>Actually, the simple expression is redundant and could be
omitted by replacing the first option in the production for
<tt class="literal">&lt;exp&gt;</tt>. In any case, the set of rules
above constitute the entire grammar.</p>
<p>The basic idea in the FP-style of parsing is to split-up the
parser into a family of <span class=
"emphasis"><em>sub-parsers</em></span> each corresponding to a term
in the grammar, with special <span class=
"emphasis"><em>combinators</em></span> corresponding to ways of
combining these sub-parsers (there will be one for combining them -
often just standard function composition in an FP-language - and
one for representing options in the BNF grammar), see [<a href=
"#Hutton">Hutton</a>, <a href="#Fokker">Fokker</a>, <a href=
"#Peyton-Jones-">Peyton-Jones-</a>]. For a way to implement part of
this in another multi-paradigm language, Perl, see [<a href=
"#Antonsen">Antonsen</a>].</p>
<p>In the functional language Miranda, for instance, one could
write a parser for this simple language like this (following
[<a href="#Hutton">Hutton</a>] and ignoring problems with
left-recursive grammars for sake of illustration):</p>
<pre class="programlisting">
exp = (exp $then operator $xthen exp) $alt
      (literal '(' $then exp $xthen
      literal ')') $alt simexp
</pre>
<p>Here, <tt class="literal">exp</tt>, <tt class=
"literal">then</tt>, <tt class="literal">xthen</tt>, <tt class=
"literal">alt</tt>, <tt class="literal">literal</tt> and <tt class=
"literal">simexp</tt> are all sub-parsers (the prefix '<tt class=
"literal">$</tt>' turns any function into an infix operator in
Miranda, and brackets around arguments are optional in Miranda as
well as in Haskell and ML). The sub-parser <tt class=
"literal">operator</tt> simply parses operators, while the
sub-parser <tt class="literal">literal</tt> parses the literal
string given as the argument, and alt is a sub-parser representing
alternatives as given by the symbol '<tt class="literal">|</tt>' in
the BNF grammar. We won't be able to reach the same level of
conciseness in this C++ implementation though, but we will try to
get as close as possible.</p>
<p>Such parsers may not be as efficient as the ones generated by
general high-quality parser-writing tools such as <tt class=
"literal">lex</tt> and <tt class="literal">yacc</tt> (and their
various relatives such as <tt class="literal">flex</tt> and
<tt class="literal">bison</tt> from GNU), but they have other
advantages:</p>
<div class="variablelist">
<dl>
<dt><span class="term">Readability:</span></dt>
<dd>
<p>By splitting the parser up into a number of subparsers each
corresponding to a term in the BNF grammar, there will be a much
closer relationship between the structure of the entire parser and
the original BNF grammar.</p>
</dd>
<dt><span class="term">Maintainability</span></dt>
<dd>
<p>Since the syntax is fairly straightforward and closely mirrors
the corresponding BNF grammar, such parsers are easy to
maintain.</p>
</dd>
<dt><span class="term">Flexibility:</span></dt>
<dd>
<p>The same splitting-up also implies that productions can be added
or omitted very easily. Hence, such parsers are extendable.
Furthermore, different versions of the various subparsers can be
tried out.</p>
</dd>
</dl>
</div>
<p>For simplicity, we will assume that the lexing has been done and
resulted in an array of single characters. This is of course a
rather trivial lexing (which, moreover, is easy to implement) -
most lexers would return not a list of characters but a list of
tokens. In order to show the power of the technique, however, it is
advantageous to consider such trivial lexers. Such a lexer, for
instance, could be implemented by simply reading from a file, one
character at a time, returning a list of characters read when
done.</p>
<p>This paper will be structured as follows: First, the abstract
base class is defined. This is a very basic class skeleton, but
will form the foundation of all the richer sub-parsers to be
defined later. At the same time we define the basic parse tree
class and other related datatypes. Secondly, we introduce some
simple general utilities (a Boolean function for testing for digits
and two list processing functions found in all functional
languages). Next, we define our first generic sub-parsers for
numbers and operators. The fourth step is to define general parser
combinators, allowing us to combine subparsers to generate new
types of sub-parsers.</p>
<p>All of these steps are completely general and form a basic
parsing library. We then turn to using these general tools to
actually parse integer arithmetic expressions. This turns out to be
very easy and there is a very, very close relationship between the
BNF grammar and the actual implementation of the sub-parsers as
promised.</p>
<p>Finally, we discuss various ways to refine the framework.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e245" id="d0e245"></a>The basic
stream class</h2>
</div>
<p>We will begin by defining a general <span class=
"emphasis"><em>parse stream</em></span> or <span class=
"emphasis"><em>pstream</em></span>. The various sub-parsers will
then all be derived from this base class. The parser will need to
keep track of a list which is passed on between consecutive parses.
Even though, we will only consider lists of single characters here,
it is worthwhile to work with a more general set-up, namely that of
lists of strings.</p>
<p>Our pstreams will have a state, containing the parse tree
constructed so far. In order to be able to pipe pstreams together,
thereby building more complex parsers from simple sub-parsers, we
will also need the pstream to keep track of the remaining
tokens.</p>
<p>Hence, we define a new data type:</p>
<pre class="programlisting">
typedef std::pair&lt;Ptree, List&gt; Presult;
</pre>
<p>where <tt class="classname">Ptree</tt> is the class defining
parse trees and where <tt class="classname">List</tt> is defined
by:</p>
<pre class="programlisting">
typedef std::list&lt;std::string&gt; List;
</pre>
<p>which will be our basic data structure. Similarly, <tt class=
"classname">Ptree</tt> is a specialisation of a more general parse
tree:</p>
<pre class="programlisting">
typedef ParseTree&lt;std::string&gt; Ptree;
</pre>
<p>The <tt class="classname">ParseTree</tt> class is a simple
binary tree:</p>
<pre class="programlisting">
template &lt;class T&gt;
class ParseTree {
private:
  ParseTree *left; // left sub tree
  ParseTree *right; // right sub tree
  T root;
public:
  ... // constructors &amp; destructor
  bool isEmpty() const { return root==T(); }
  bool isLeaf() const
               { return left==0 &amp;&amp; right==0; }
  T getRoot() const { return root; }
  ParseTree* getLeft() const { return left; }
  ParseTree* getRight() const
               { return right; }
  void setLeft(ParseTree&amp; lft) { left = &amp;lft; }
  void setRight(ParseTree&amp; rgh)
               { right = &amp;rgh; }
  void setRoot(T rt) { root = rt; }
  void update(T); // used for operators
  void insert(T); // used for numbers
  void insert(ParseTree&amp;);
               // used for parentheses
};
</pre>
<p>The member functions <tt class="methodname">update</tt> and
<tt class="methodname">insert</tt> are there to allow parsers to
manipulate the parse tree. The former adds a node as the root,
copying the old tree to the left sub-tree, this is to be used when
parsing binary operators. The latter inserts a node at the first
empty sub-tree it finds. This is used for parsing either numbers or
parentheses.</p>
<p>Using just recursion these methods could be implemented simply
like this:</p>
<pre class="programlisting">
template&lt;class T&gt;
void ParseTree&lt;T&gt;::update(T val) {
  if(isEmpty())
    throw std::logic_error(
           &quot;Syntax error: Missing operand&quot;);
  ParseTree* newLeftSubtree
           = new ParseTree(*this);
  root = val;
  left = newLeftSubtree;
  right = 0;
}

template&lt;class T&gt;
void ParseTree&lt;T&gt;::insert(T val) {
  if(isEmpty()) {
    setRoot(val);
    return;
  }
  if(getLeft() == 0) {
    setLeft(*new ParseTree(val));
    return;
  }
  else {
    if(getRight() == 0) {
      setRight(*new ParseTree(val));
      return;
    }
    getRight()-&gt;insert(val);
               // use right recursion
  }
}

template&lt;class T&gt;
void ParseTree&lt;T&gt;::insert(ParseTree&lt;T&gt; &amp;pt) {
  if(isEmpty()) {
    setRoot(pt.getRoot());
    setLeft(*pt.getLeft());
    setRight(*pt.getRight());
    return;
  }
  if(getLeft() == 0) {
    setLeft(pt);
    return;
  }
  else {
    if(getRight() == 0) {
      setRight(pt);
      return;
    }
    getRight()-&gt;insert(pt);
               // use right recursion
  }
}
</pre>
<p>The <tt class="literal">if</tt>-statements in the above
implementations are the C++ analogue of the argument pattern
matching of functional languages<sup>[<a name="d0e303" href=
"#ftn.d0e303" id="d0e303">4</a>]</sup>, they are needed to control
the recursive steps and to ensure termination. The code above can
be shortened a little bit and most of the <tt class=
"literal">return</tt>-statements can be omitted if on instead uses
nested <tt class="literal">if</tt>-clauses. While this will make
the code a few lines shorter, I think the present coding style has
the advantage of clearly showing the reader that the function
returns after having handled each special case.</p>
<p>Both methods first handle the case of an empty tree. For
<tt class="methodname">insert</tt>, the value passed simply becomes
the new root, whereas for update an exception is thrown. Next comes
the case of a leaf, i.e., a node without children. Here, <tt class=
"methodname">update</tt> inserts the passed value as a new root
moving the leaf to the left subtree, while <tt class=
"methodname">insert</tt> merely inserts its value as the left
subtree rooted at the original leaf node. If the left subtree is
occupied, <tt class="methodname">insert</tt> first tries the right
one, if this is also non-empty it uses recursion to find the first
empty subtree.</p>
<p>The abstract base class, <tt class="classname">pstream</tt> is
now:</p>
<pre class="programlisting">
class pstream {
protected:
  Presult pres; // contents so far
public:
  ... // constructors &amp; virtual destructor
  List const&amp; getList() const
           { return pres.second; } // accessor method
  Presult const&amp; getPres() const
           { return pres; } // accessor method
  virtual Presult operator&lt;&lt;=
           (const List &amp;) = 0;
  virtual Presult operator&lt;&lt;=
           (const Presult&amp;) = 0;
};
</pre>
<p>Notice that only the accessor methods, <tt class=
"methodname">getList</tt> and <tt class="methodname">getPres</tt>,
are non-trivial and that the streaming operator, <tt class=
"function">operator&lt;&lt;=</tt> is a pure virtual function.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e351" id="d0e351"></a>Writing the
parser</h2>
</div>
<p>We will now turn to the question of actually writing the parser
by splitting it up into a number of sub-parsers as stated in the
introduction.</p>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e356" id="d0e356"></a>Utility
functions</h3>
</div>
<p>It is advantageous to define a number of utility functions
representing the basic ingredients of the grammar. This can be done
by defining a few boolean functions like this:</p>
<pre class="programlisting">
bool isDigit(std::string c) {
  return c == &quot;0&quot; || c == &quot;1&quot; || c == &quot;2&quot;
         || c == &quot;3&quot; || c == &quot;4&quot; || c == &quot;5&quot;
         || c == &quot;6&quot; || c == &quot;7&quot; || c == &quot;8&quot;
         || c == &quot;9&quot;;
}
</pre>
<p>with a similar function <tt class="function">isOperator</tt> for
testing whether a character is an operator. In general, one should
write one such Boolean utility function for each type of symbol in
the language. Hence, we define two such functions, <tt class=
"function">isDigit</tt> and <tt class="function">isOperator</tt>.
It is, of course, also possible to define a Boolean function
testing for brackets but since we only have one type of these it is
much easier to simply insert a test <tt class="literal">c ==
&quot;(&quot;</tt> or <tt class="literal">c == &quot;)&quot;</tt> directly in the code
than define special functions <tt class="function">isLPar</tt> and
<tt class="function">isRPar</tt> respectively.</p>
<p>Of course, a similar result could be obtained by using built-in
functions such as isdigit but the advantage of writing them
ourselves is the ability to illustrate the general principle.</p>
<p>We will also need a pair of basic list processing functions
available in all FP languages. The first is for extracting the
<span class="emphasis"><em>head</em></span> of a list, i.e., its
first element. This function is called <tt class=
"function">first</tt>, <tt class="function">car</tt> or <tt class=
"function">hd</tt> in various functional languages (Common Lisp,
old Lisp, and ML respectively). Here we will settle for the name
used in Haskell:</p>
<pre class="programlisting">
string head(List ls) {
  return ls.front();
}
</pre>
<p>Similarly, for extracting the <span class=
"emphasis"><em>tail</em></span> of a list, i.e., the remaining
elements, we will write a wrapper around one of the STL list member
functions:</p>
<pre class="programlisting">
List tail(List ls) {
  ls2.pop_front(); // remove head
  return ls2;
}
</pre>
<p>The tail function also goes under various names in the
FPcommunity, e.g. <tt class="function">rest</tt>, <tt class=
"function">cdr</tt> or <tt class="function">tl</tt> in Common Lisp,
old Lisp and ML respectively. Once more, we have settled for the
Haskell name. Note, by the way, that none of these actually change
the list passed to them - we could have declared the arguments
const, but we would then have to use <tt class=
"literal">const_cast&lt;&gt;</tt> all the time, whenever we want to
call them, and this would be rather tedious.</p>
<p>These are the only two list utility functions we will be
needing.</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e427" id="d0e427"></a>The sub-parser
for numbers</h3>
</div>
<p>The entire code for the sub-parser for numbers is very simple,
illustrating the ease of the functional-style parsing. It is
simply:</p>
<pre class="programlisting">
class pNum : public pstream {
public:
  pNum() {};
  pNum(const List &amp; ls)
         { pres = make_pair(Ptree(),ls); };
  pNum(const Presult &amp;pt) { pres = pt; };
  pNum() {};
  inline Presult operator&lt;&lt;=(const List &amp;);
  inline Presult operator&lt;&lt;=(const Presult &amp;);
};

inline Presult pNum::operator&lt;&lt;=(
                           const List &amp;ls) {
  return operator&lt;&lt;=(make_pair(Ptree(),ls));
}

inline Presult pNum::operator&lt;&lt;=(
                           const Presult &amp;pr) {
  List ls = pr.second;
  if(ls.empty())
    return pr; // nothing to parse
  std::string c = head(ls);
  if(!isDigit(c)) {
    return pr; // not a number, do nothing
  }
  else {
    std::string val(c);
    ls = tail(ls);
    while(isDigit(c=head(ls)) &amp;&amp; !ls.empty()) {
      val += c; // construct number
      ls = tail(ls);
    }
    Ptree pt(pr.first);
    pt.insert(val);
    return make_pair(pt,ls);
  }
}
</pre>
<p>This sub-parser illustrates the basic idea of FP-style parsing
using streams: To parse a particular expression-type, write a very
simple parser (basically just a copy of the abstract base class)
and implement the corresponding <tt class=
"function">operator&lt;&lt;=</tt> for acting upon a reference to a
<tt class="classname">Presult</tt>-object.</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e442" id="d0e442"></a>The sub-parser
for operators</h3>
</div>
<p>The sub-parser for operators is almost identical, with the only
significant change taking place in <tt class=
"function">operator&lt;&lt;=</tt>:</p>
<pre class="programlisting">
inline Presult pOp::operator&lt;&lt;=
                    (const Presult &amp;pt) {
  List ls = pt.second;
  if(ls.empty())
    return pt; // nothing to parse
  std::string c = head(ls);
  if(!isOperator(c))
    return pt;
  Ptree ptt(pt.first);
  ptt.update(c);
  return make_pair(ptt,tail(ls));
}
</pre>
<p>The main difference between parsing numbers and operators is the
way the parse tree is updated. Parsing numbers simply inserts the
number at the first available empty place, whereas operators have
to identify their operands first and hence use the update method
instead. Numbers are typically leaves while operators are roots of
subtrees.</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e454" id="d0e454"></a>Parser
combinators</h3>
</div>
<p>It is convenient to be able to handle combinations of
sub-parsers as well, e.g. to indicate that a particular token must
always come after another (<span class=
"emphasis"><em>sequencing</em></span>) or to indicate a choice
between two tokens (<span class=
"emphasis"><em>alternatives</em></span>). In the BNF grammar, such
<span class="emphasis"><em>combinators</em></span> are represented
by concatenation and by |, respectively.</p>
<p>Combinators are higher-order parsers, taking parsers as
arguments and returning new parsers. In typical FP-implementations,
in Haskell, say, these would often be represented by currying,
i.e., by partial binding of arguments<sup>[<a name="d0e470" href=
"#ftn.d0e470" id="d0e470">5</a>]</sup>. In C++, however, it is more
convenient to define new objects. As an illustration, we will
define the following combinators <tt class="classname">pAlt</tt>,
for handling alternatives, <tt class="classname">pThen</tt>, for
handling sequencing, and finally <tt class="classname">pMore</tt>
for handling one or more occurrences of a token. We will also
define some syntactic sugar by overloading the operators <tt class=
"literal">||</tt>, <tt class="literal">&amp;&amp;</tt> and
<tt class="literal">++</tt>, respectively, to be alternative
methods to use these combinators.</p>
<p>The combinators are just parsestreams, although higher-order,
hence they are defined by deriving from the abstract base class and
by implementing <tt class="function">operator&lt;&lt;=</tt>. Since
they have to be able to deal with all kinds of parsers, they are
defined using templates. In a sense, one can think of templates as
C++'s analogue of such higher order constructs. One might call them
&quot;higher order objects&quot;.</p>
<p>For instance, the combinator for alternatives can be written
like this:</p>
<pre class="programlisting">
template &lt;class T, class S&gt;
class pAlt : public pstream {
private:
  T p1;
  S p2; // component pstreams
public:
  ... // constructors &amp; destructor
      // &amp; accessors &amp; operator&lt;&lt;=
};
</pre>
<p>This class differs from the previous parsestreams in having two
pstreams as internal data members (together with the appropriate
accessor methods). These are just defined, for simplicity, as
general classes of types T and S respectively, relying on
compiletime polymorphism to ensure that these can be used as
pstreams.</p>
<p>To actually be able to use this combinator, we must implement
<tt class="function">operator&lt;&lt;=</tt>. This can be done as
follows:</p>
<pre class="programlisting">
template&lt;class T, class S&gt;
inline pAlt&lt;T, S&gt; operator||
                (const T&amp; p1, const S&amp; p2) {
  return pAlt&lt;T, S&gt;(p1,p2);
            // alternative way of creating pAlt objects
}
// definition of how pAlt works is given
// by operator&lt;&lt;=
template&lt;class T, class S&gt;
inline Presult pAlt&lt;T, S&gt;::operator&lt;&lt;=
                (const Presult&amp; pr) {
  Presult lp1(p1 &lt;&lt;= pr), lp2(p2 &lt;&lt;= pr);
            // use component parsers
  List l1=lp1.second, l2=lp2.second;
  return (l1.size() &lt;= l2.size())? lp1 : lp2;
            // return best parse
}
</pre>
<p>where we have also taken the opportunity to add some syntactic
sugar by providing <tt class="literal">p1 || p2</tt> as an
alternative syntax for constructing a <tt class=
"classname">pAlt</tt>-object out of two sub-parsers.</p>
<p>The implementation of <tt class=
"function">operator&lt;&lt;=</tt> is a bit naive, but will suffice
for the present case. In general, one may need a more complicated
criterion for &quot;best parse&quot; than simply returning the parse
resulting in the smallest list of remaining tokens, but in our case
it will suffice.</p>
<p>Similarly, the code for <tt class="classname">pThen</tt>, the
sequencing combinator, is:</p>
<pre class="programlisting">
template&lt;class T, class S&gt;
class pThen : public pstream {
private:
  T p1;
  S p2; // component parsers
public:
  ... // usual stuff
};

template&lt;class T, class S&gt;
inline pThen&lt;T, S&gt; operator&amp;&amp;
                 (const T&amp; p1, const S&amp; p2) {
  return pThen&lt;T,S&gt;(p1,p2);
    // alternative syntax
}
template&lt;class T, class S&gt;
inline Presult pThen&lt;T, S&gt;::operator&lt;&lt;=
                 (const Presult&amp; pr) {
  return p2 &lt;&lt;= p1 &lt;&lt;= pr;
}
</pre>
<p>Finally, the combinator for one or more occurrences of a token,
which would be expressed by a regular expression in BNF, <tt class=
"literal">(token)+</tt>, can be written as:</p>
<pre class="programlisting">
template&lt;class T&gt;
class pMore : public pstream {
private:
  T p; // component parser
public:
  ... // usual stuff
};

template&lt;class T&gt;
inline Presult pMore&lt;T&gt;::operator&lt;&lt;=
                   (const Presult&amp; pr) {
  Presult pr2(p &lt;&lt;= pr);
  List ls = pr.second;
  List ls2 = pr2.second;
  while(!ls2.empty() &amp;&amp; (ls2.size()
                 &lt; ls.size())) { // something was parsed
    pr2 = (p &lt;&lt;= pr2); // continue parsing
    ls = ls2;
    ls2 = pr2.second;
  } 
  return pr2;
}
template&lt;class T&gt;
inline pMore&lt;T&gt; operator++(const T pp) {
  return pMore&lt;T&gt;(pp);
}
</pre>
<p>Here, we have settled for a straightforward &quot;procedural&quot;
implementation, one could also have used recursion (as one would
almost certainly have done in FP) in the definition of <tt class=
"function">operator&lt;&lt;=</tt>.</p>
<p>In a sense, these combinators together with a parser for literal
substrings, <tt class="classname">pLit</tt>, which we haven't given
but which is almost identical to pNum and pOp earlier, are all we
need to parse anything. Parsing numbers, for instance, could
symbolically be written as:</p>
<pre class="programlisting">
pNum = ++(pLit(&quot;0&quot;) || pLit(&quot;1&quot;) || pLit(&quot;2&quot;)
       || pLit(&quot;3&quot;) || pLit(&quot;4&quot;) || pLit(&quot;5&quot;)
       || pLit(&quot;6&quot;) || pLit(&quot;7&quot;) || pLit(&quot;8&quot;)
       || pLit(&quot;9&quot;));
</pre>
<p>which is more or less what one would have written in a
functional language such as Haskell or Miranda. C++, however, does
not allow one to define new classes from old ones in this manner,
hence one would have to put this definition of pNum into the
implementation of <tt class="function">operator&lt;&lt;=</tt>. In
the next section we will see that this is precisely what we will do
to parse general expressions.</p>
<p>There is one more reason, however, one cannot simply use the
above trick to write <tt class="classname">pNum</tt>. Parsing
numbers, as well as operators, one needs to perform certain actions
(inserting or updating the parse tree). We will later see how to
associate <span class="emphasis"><em>actions</em></span> to
parsers, but for now we will have to make do with custom-written
parsers <tt class="classname">pNum</tt> and <tt class=
"classname">pOp</tt> explicitly taking care of the necessary parse
tree manipulations.</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e580" id="d0e580"></a>The sub-parser
for bracketed expressions</h3>
</div>
<p>This is slightly more complicated. It turns out to be
advantageous to consider bracketing as a kind of parser combinator.
Hence, given a parse stream <tt class="varname">p</tt> we will
define a new <tt class="classname">pstream</tt> type just like we
did for <tt class="classname">pAlt</tt> and the other combinators.
This, by the way, is one way in which our implementation differs
from the normal functional approach.</p>
<p>The definition of the bracketing combinator, <tt class=
"classname">pBrack</tt>, is very similar to the previous
combinators:</p>
<pre class="programlisting">
template&lt;class T&gt;
class pBrack : public pstream {
private:
  T p; // internal sub-parser
  Presult pr2;
public:
  ... // usual stuff
};
</pre>
<p>As usual, all of the work is done in the overloaded <tt class=
"function">operator&lt;&lt;=</tt>:</p>
<pre class="programlisting">
template&lt;class T&gt;
inline Presult pBrack&lt;T&gt;::operator&lt;&lt;=
                   (const Presult &amp;pr) {
  if( (pr.second).empty() )
    throw logic_error(&quot;Syntax error.
                      Closing bracket expected!&quot;);
  std::string c = head(pr.second);
  if(c == &quot;(&quot;) {
    Presult prr( p &lt;&lt;= make_pair
                   (pr.first, tail(pr.second)) );
    // apply internal parser
    (pr2.first).insert(prr.first);
    // update internal parse tree
    return operator&lt;&lt;=
                  (make_pair(pr.first, prr.second));
  }
  if(c == &quot;)&quot;) {
    Ptree pt(pr.first);
    pt.insert(pr2.first);
    // insert parsed subtree
    return make_pair(pt,tail(pr.second));
    //skip closing bracket &amp; terminate recursion
  }
  return pr; // do nothing
}
</pre>
<p>The only subtlety is in maintaining an internal, temporary parse
tree. Upon returning, this internal parse tree will hold the parse
tree for the entire bracketed expression, which can then be
inserted into the full parse tree. This pstream mimics the way
human beings often read parenthesised expressions - one sees an
opening bracket and immediately one begins to parse the internal
expression until one sees a matching end bracket. If nested
brackets are found, one resorts to recursion.</p>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e610" id="d0e610"></a>Putting it
all together</h2>
</div>
<p>We now have all the ingredients needed for parsing integer
arithmetic expressions. One could put the previous classes and
functions into general header files to be used for all parsers, and
then proceed to write parsers for the specific language at hand,
which is what we will turn to now.</p>
<p>Now, the grammar is recursive and while C++ is certainly capable
of handling recursion (in fact, we have used it frequently in this
paper), the language is not really capable of handling recursive
definitions such as the grammar in its present state. Hence, we
will have to introduce an extra layer of indirection. Rewrite the
grammar as:</p>
<pre class="programlisting">
&lt;aexp&gt; ::= &lt;num&gt; | &lt;num&gt; &lt;op&gt; &lt;num&gt;
&lt;sexp&gt; ::= &lt;aexp&gt; | &lt;lpar&gt; &lt;aexp&gt; &lt;rpar&gt;
&lt;exp&gt;  ::= &lt;sexp&gt; | &lt;lpar&gt; &lt;sexp&gt; &lt;rpar&gt;
           | &lt;sexp&gt; &lt;op&gt; &lt;sexp&gt;
</pre>
<p>The idea is now simply to write three parsers <tt class=
"classname">pAExp</tt>, <tt class="classname">pSExp</tt> and
<tt class="classname">pExp</tt>. These classes are completely
trivial, with all the work being done in <tt class=
"function">operator&lt;&lt;=</tt> as usual.</p>
<p>The first sub-parser is:</p>
<pre class="programlisting">
inline Presult pAExp::operator&lt;&lt;=
                     (const Presult &amp;pr) {
  pOp po;
  pNum pn;
  return pn || (pn &amp;&amp; po &amp;&amp; pn) &lt;&lt;= pr;
}
</pre>
<p>Notice the simple expression in the last line. It is a simple,
faithful reflection of the definition of the corresponding term in
the BNF grammar.</p>
<p>The remaining sub-parser, <tt class="classname">pSExp</tt>, and
final parser, <tt class="classname">pExp</tt>, are very simple and
only their non-trivial <tt class="function">operator&lt;&lt;=</tt>
function will be given.</p>
<p>For <tt class="classname">pSExp</tt> the code is simply:</p>
<pre class="programlisting">
inline Presult pSExp::operator&lt;&lt;=
                    (const Presult &amp;pr) {
  pAExp pa;
  return pa || pBrack&lt;pAExp&gt;(pa) &lt;&lt;= pr;
}
</pre>
<p>Which, once more, is a faithful representation of the
corresponding definition in the BNF grammar.</p>
<p>Finally, the full parser, parsing general integer arithmetic
expressions is simply:</p>
<pre class="programlisting">
inline Presult pExp::operator&lt;&lt;=
                   (const Presult &amp;pr) {
  pSExp ps;
  pOp po;
  return ps || pBrack&lt;pSExp&gt;(ps)
            || (ps &amp;&amp; po &amp;&amp; ps) &lt;&lt;= pr;
}
</pre>
<p>Hence, with the generic sub-parser tools defined, writing a
functional style parser using pstreams is an easy task, which makes
it well suited for language experiments and for rapid
prototyping.</p>
<p>With these definitions in place, to parse a list of characters,
<tt class="literal">ls</tt>, using this parser, one simply
writes:</p>
<pre class="programlisting">
pExp pe;
result = pe &lt;&lt;= ls;
</pre>
<p>where <tt class="varname">result</tt> is of type <tt class=
"classname">Presult</tt>. For a complete parse, this would contain
a parse tree as first component and an empty list as second.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e680" id=
"d0e680"></a>Refinements</h2>
</div>
<p>The previous sections have shown one way to implement
functional-style parsing in C++ using a generalisation of streams
together with operator overloading. Incidentally, the heavy usage
of operator overloading shows one advantage of C++ over a language
like Java that does not support the overloading of operators. Most
of the above could also be carried out in Java, but one would then
have to use functions instead of operators resulting in harder to
read code. Java also lacks C++'s features for generic programming
(the templates), although a library providing some of this support
is available. Some of the same effects could be mimicked in Java,
however, by judicious use of the universal base class <tt class=
"classname">Object</tt> and frequent casting. Such an approach will
not be as easy to read and will, moreover, be more error prone than
the one presented here. Although a direct translation of the above
C++ code into Java would not be satisfactory, one could instead use
Java Beans - in some sense, these are able to model a behaviour
similar to that of functions in a functional language.</p>
<p>On the other hand, C++ is not perfect in this respect either,
since it lacks the possibility of user defined operators as well as
a method to specify the associativity of an operator and whether it
is to be applied as infix, postfix or prefix. Such possibilities
are often available in FP languages, e.g. in ML and Haskell, and
they are also likely to be available in future versions of existing
languages such as Perl6.</p>
<p>The necessity of writing the execution from right to left lies
in the standard OO-convention that <tt class=
"literal">y&lt;&lt;=f</tt> is really short for <tt class=
"literal">y.operator&lt;&lt;=(f)</tt> and hence treats the left
hand operand differently from the right hand one. This is a
consequence of OOP's dispatching rules. If one wanted to make
left-to-right parsing work instead, one would have to define,
inside all classes, methods for handling the different
parsestreams. For instance, one would have to define <tt class=
"literal">List::operator&gt;&gt;=(T &amp;p)</tt>, where the
typename <span class="type">T</span> can be any valid parsestream
subclass. This could of course be done with templates much the same
way as was done for the combinators, but it would also necessitate
the writing of a wrapper class for the List-object in order to be
able to extend it this way. A much cleaner solution would be to use
multi-methods. However, unlike Lisp's CLOS, these are not directly
available in C++. There are ways around this, of course, C++ being
after all a very flexible language, [<a href=
"#Alexandrescu">Alexandrescu</a>].</p>
<p>A problem not addressed at all in this paper is the problem of
optimisation. Clearly, the parsestreams as developed here are
optimised neither for speed of execution nor for space, but rather
for <span class="emphasis"><em>speed of definition</em></span>. By
this phrase, it is to be understood that the method is intended for
rapid prototyping and for experimenting with language features.
Although the program isn't slow, it would certainly be advantageous
to have the parsestreams run faster in real-life applications. One
simple way of doing this is to pass a tree-iterator around keeping
track of where the last insert/update of the parsetree occurred,
for instance by pointing to the first empty subtree. This would
speed up the <tt class="function">insert</tt> and <tt class=
"function">update</tt> operations.</p>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e718" id="d0e718"></a>Handling
ambiguity and precedence</h3>
</div>
<p>For simplicity, we did not consider operator precedence in the
example above, i.e., <tt class="literal">1+2*3</tt> will be parsed
as <tt class="literal">(1+2)*3</tt> and not <tt class=
"literal">1+(2*3)</tt>. Of course, this could always be enforced by
appropriate use of parentheses, but it would clearly be
advantageous to conform closer to the likely expectations of the
end-user.</p>
<p>It is well-known that operator precedence can be handled by
slightly modifying the grammar (see e.g. [<a href=
"#Martin">Martin</a>]):</p>
<pre class="programlisting">
&lt;expr&gt;     ::= &lt;term&gt; + &lt;term&gt; | &lt;term&gt; - &lt;term&gt;
               | &lt;term&gt;
&lt;term&gt;     ::= &lt;factor&gt; * &lt;factor&gt;
               | &lt;factor&gt; / &lt;factor&gt; | &lt;factor&gt;
&lt;factor&gt;   ::= &lt;number&gt; | ( &lt;expr&gt; )
</pre>
<p>Thus, at the cost of adding new terms to the grammar, one can
handle operator precedence in the expected way. It is trivial to
change our family of sub-parsers to accommodate this, in particular
since the recursive definitions have been removed in the same
step.</p>
<p>Our parser stream framework worked well for the simple case of
basic arithmetic expressions, but a general grammar is likely to be
ambiguous with the parser effectively having to make certain
choices at various stages of the parsing. Clearly, it would be of
great interest to be able to handle this case too.</p>
<p>The standard way this is dealt with in FP is to replace the
<tt class="classname">Presult</tt> data type with another one,
[<a href="#Hutton">Hutton</a>, <a href="#Fokker">Fokker</a>,
<a href="#Peyton-Jones-">Peyton-Jones-</a>].</p>
<p>In the face of ambiguity, what we have to deal with is not a
single, unique parse, but rather a family of <span class=
"emphasis"><em>possible parses</em></span>. Hence, the proper way
of dealing with ambiguity is to introduce a new data type, let's
call it <tt class="classname">LPresult</tt> (for List of Parse
results). This is defined as:</p>
<pre class="programlisting">
typedef std::list&lt; Presult &gt; LPresult;
</pre>
<p>All the parsers must then return this data type instead. This,
of course, necessitates some modifications. Each sub-parser must
now act on <span class="emphasis"><em>all</em></span> the elements
of the list of possible parses. Consequently, the individual parse
trees will have different sizes, in general with the largest one
corresponding to an empty list of remaining tokens, and thus to a
complete parse.</p>
<p>With these quite simple changes, our parse stream framework will
be able to deal with ambiguous grammars as well.</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e774" id="d0e774"></a>Adding
actions</h3>
</div>
<p>The FP-style of parsing discussed in this paper opens up further
modifications. In the arithmetic expression example, we saw how one
of the types (the operators) had to perform some non-trivial
operations on the parse tree.</p>
<p>This can be generalised. Instead of just allowing the simple
manipulations involved in parsing binary operators and parentheses,
one could allow more general actions to be associated with steps in
a parse.</p>
<p>Various examples could be:</p>
<div class="orderedlist">
<ol type="a">
<li>
<p>a pretty printer, printing out the code in a nicely formatted
manner as the parse goes along;</p>
</li>
<li>
<p>cross-translation, translating each element of a parse into some
code in another language, in the same way as yacc and similar tools
do;</p>
</li>
<li>
<p>execution of the result as it goes along, in the case of
arithmetic expressions this would be one very natural action to
add, effectively turning the parser into a calculator.</p>
</li>
</ol>
</div>
<p>To associate an action with a step in a parse, i.e., with a
particular sub-parser, we need two things. First of all, we need a
function to perform. This is done by using a function object, as
this is the best way of passing around function definitions in C++.
Secondly, we need an operator associating an action with a given
sub-parser. This latter step will be done by overloading the
<tt class="literal">&gt;&gt;=</tt> operator. Hence, given a
sub-parser p and a functor <tt class="literal">f</tt>, we will
define <tt class="literal">p&gt;&gt;=f</tt> to be the sub-parser
<tt class="literal">p</tt> with the actions given by <tt class=
"literal">f</tt> added to it<sup>[<a name="d0e810" href=
"#ftn.d0e810" id="d0e810">6</a>]</sup>. This is consistent with
interpreting the sub-parsers as streams: the output of one pstream
is sent to the associated function object for further
processing.</p>
<p>The definition of the corresponding sub-parser type, <tt class=
"classname">pUse</tt>, is just like the definitions of all the
other combinators. Explicitly<sup>[<a name="d0e822" href=
"#ftn.d0e822" id="d0e822">7</a>]</sup>:</p>
<pre class="programlisting">
template&lt;class T, class S&gt;
class pUse : public pstream {
private:
  T p; // component parser
  S f; // function to apply
public:
  ... // usual stuff
};
template&lt;class T, class S&gt;
inline Presult pUse&lt;T,S&gt;::operator&lt;&lt;=(const
                 Presult&amp; pr) {
  Presult pr2(p &lt;&lt;= pr); // apply parser
  return f(pr2); // apply function - MUST return Presult
}
template&lt;class T, class S&gt;
inline pUse&lt;T,S&gt; operator&gt;&gt;=
                 (const T &amp;pp, const S &amp;ff) {
  return pUse&lt;T,S&gt;(pp,ff);
                     // alternative syntax
}
</pre>
<p>To use this, one must then define a function object. For
instance:</p>
<pre class="programlisting">
class printer {
public:
  Presult operator() (Presult &amp;pr) const {
    std::cout &lt;&lt; &quot;Parse tree: &quot;;
    (pr.first).print();
    std::cout &lt;&lt; std::endl &lt;&lt; &quot; and &quot;;
    prList(pr.second);
    return pr;
  }
};
</pre>
<p>assuming that a <tt class="methodname">print</tt> method has
been added to the <tt class="classname">ParseTree</tt> class and
that the function <tt class="function">prList</tt> prints a list in
some appropriate format.</p>
<p>Let <tt class="varname">ls</tt> be a list of characters, one can
then write:</p>
<pre class="programlisting">
pUse&lt;pExp,printer&gt; pu(pe,prn);
pu &lt;&lt;= ls;
</pre>
<p>where <i class="parameter"><tt>pe</tt></i>, <i class=
"parameter"><tt>prn</tt></i> are instances of <tt class=
"classname">pExp</tt> and <tt class="classname">printer</tt>
respectively. This would then print the parse tree together with
the list of unparsed characters. In fact, precisely such a
construction was used during the test phase of developing the
programs presented here.</p>
<p>Alternatively, one could write:</p>
<pre class="programlisting">
(pe &gt;&gt;= prn) &lt;&lt;= ls;
</pre>
<p>leaving the creation of the proper classes to C++. This latter
syntax is probably as close as one will be able to get to the
FPsyntax in C++.</p>
<p>Similarly, one could define a function object, <tt class=
"classname">executor</tt>, which executes the code found in the
parse tree.</p>
<p>Finally, once <tt class="classname">pUse</tt> has been defined,
one could redefine <tt class="classname">pNum</tt> and <tt class=
"classname">pOp</tt> in terms of <tt class="varname">it</tt> and
<tt class="classname">pLit</tt>. The former would have to extract
the inserted individual digits, combine them to form a number and
re-insert this at the proper place in the parse tree, while the
latter would have to extract the operator and call update.</p>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e898" id=
"d0e898"></a>Conclusion</h2>
</div>
<p>We were able to extend the functional-style parsing using
subparsers to C++ by defining a generalised class of streams,
called pstreams. With this, we could define combinators allowing us
to build new sub-parsers by combining old ones. With these general
tools out of the way, it turned out to be very easy, once recursion
had been handled properly, to implement the BNF grammar for integer
arithmetic expressions. Moreover, similarly to the situation in FP,
there was a very intimate relationship, in fact more or less an
isomorphism, between the actual BNF production rule and the
corresponding sub-parser, making it very easy to write such
sub-parsers - it would even be feasible to write a general
sub-parser generator to do so automatically.</p>
<p>It was also shown how to handle operator precedence and, perhaps
even more importantly, ambiguous grammars. None of this involved
major changes to the framework, thus the proposed framework scales
well to more complicated grammars such as those of typical
programming languages.</p>
<p>Moreover, we showed how one could associate actions to
individual sub-parsers thereby dramatically extending the
possibilities of stream-based parsing. The associated actions could
be used to create a pretty printer, or even to translate or execute
the code.</p>
</div>
<div class="bibliography">
<div class="titlepage">
<h2><a name="d0e907" id="d0e907"></a>References</h2>
</div>
<div class="bibliomixed"><a name="Martin" id="Martin"></a>
<p class="bibliomixed">[Martin] J. C. Martin, <span class=
"citetitle"><i class="citetitle">Introduction to languages and the
theory of computation, 2nd ed</i></span>, McGraw-Hill (1997).</p>
</div>
<div class="bibliomixed"><a name="Hutton" id="Hutton"></a>
<p class="bibliomixed">[Hutton] G. Hutton, &quot;Higher-order functions
for parsing&quot;, <span class="citetitle"><i class="citetitle">J.
Funct. Prog. 2 (3)</i></span>, p323 (1992).</p>
</div>
<div class="bibliomixed"><a name="Fokker" id="Fokker"></a>
<p class="bibliomixed">[Fokker] J. Fokker, &quot;Functional parsers&quot;,
<span class="citetitle"><i class="citetitle">Lect. Notes of the
Baastad Spring School on Funct. Prog.</i></span> (1995).</p>
</div>
<div class="bibliomixed"><a name="Peyton-Jones-" id=
"Peyton-Jones-"></a>
<p class="bibliomixed">[Peyton-Jones-] S. L. Peyton-Jones, D.
Lester, <span class="citetitle"><i class="citetitle">Implementing
functional languages. A tutorial</i></span>, Prentice-Hall
(1992).</p>
</div>
<div class="bibliomixed"><a name="Antonsen" id="Antonsen"></a>
<p class="bibliomixed">[Antonsen] F. Antonsen, &quot;Functional
programming in Perl&quot;, to appear in <span class=
"citetitle"><i class="citetitle">The Perl Review</i></span>.</p>
</div>
<div class="bibliomixed"><a name="Alexandrescu" id=
"Alexandrescu"></a>
<p class="bibliomixed">[Alexandrescu] A. Alexandrescu <span class=
"citetitle"><i class="citetitle">Modern C++ Design : Generic
Programming and Design Patterns Applied</i></span>, Addison-Wesley,
2001.</p>
</div>
<div class="bibliomixed"><a name="Josuttis" id="Josuttis"></a>
<p class="bibliomixed">[Josuttis] N. M. Josuttis, <span class=
"citetitle"><i class="citetitle">The C++ Standard Library. A
tutorial and reference</i></span>, Addison-Wesley, 1999.</p>
</div>
</div>
<div class="footnotes"><br>
<hr class="c2" width="100">
<div class="footnote">
<p><sup>[<a name="ftn.d0e32" href="#d0e32" id=
"ftn.d0e32">1</a>]</sup> Although, C++ being a multi-paradigm
language this would be a worthwhile topic.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e63" href="#d0e63" id=
"ftn.d0e63">2</a>]</sup> That being said, it ought to be mentioned
too that the Standard Template Library (STL) has introduced many FP
features, among these a limited support for partial binding and the
ability to write function composition operators, [<a href=
"#Josuttis">Josuttis</a>], although this still cannot be done by
overloading a natural operator.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e117" href="#d0e117" id=
"ftn.d0e117">3</a>]</sup> Floating point numbers could be
accommodated with small changes, but this will introduce an
unnecessary level of complexity</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e303" href="#d0e303" id=
"ftn.d0e303">4</a>]</sup> This &quot;pattern matching&quot; is really an
advanced dispatch mechanism allowing the user to &quot;overload&quot;
functions not just on basis of different <span class=
"emphasis"><em>types</em></span> of arguments but also on different
<span class="emphasis"><em>values</em></span>.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e470" href="#d0e470" id=
"ftn.d0e470">5</a>]</sup> C++ can also do this in a limited way
using <tt class="literal">bind1st</tt> and <tt class=
"literal">bind2nd</tt> on function objects. These are available in
the <tt class="literal">&lt;functional&gt;</tt> header file and
provide one more example of FP-concepts being introduced into C++.
See e.g. [<a href="#Josuttis">Josuttis</a>]</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e810" href="#d0e810" id=
"ftn.d0e810">6</a>]</sup> In monadic parsing in Haskell one often
uses the sequencing operator <tt class="literal">&gt;&gt;=</tt> for
precisely this purpose, hence we will use that here too.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e822" href="#d0e822" id=
"ftn.d0e822">7</a>]</sup> In most FP languages one would say that
<tt class="literal">f</tt> must be of type <tt class=
"literal">Presult -&gt; Presult</tt>.</p>
</div>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
