    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Split and Merge Revisited</title>
        <link>https://members.accu.org/index.php/articles/2122</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Programming Topics + CVu Journal Vol 27, #3 - July 2015</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c65/">Programming</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c351/">273</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c65-351/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c65+351/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Split and Merge Revisited</h1>
<p><strong>Author:</strong>&nbsp;Martin Moene</p>
<p>
<strong>Date:</strong> 07 July 2015 21:49:04 +01:00 or Tue, 07 July 2015 21:49:04 +01:00</p>
<p><strong>Summary:</strong>&nbsp;Vassili Kaplan makes improvements to the Expression Parser.</p>
<p><strong>Body:</strong>&nbsp;<p>Silas S. Brown (ssb22@cam.ac.uk) spotted a defect in the Merge part of the Split and Merge algorithm published in <em>CVu</em>  <a href="#[1]">[1]</a> â€“ you can see Silasâ€™s letter <a href="http://accu.org/index.php/journals/2125">here</a>. We fix this defect here and also provide a slightly different C++ implementation of the Split and Merge algorithm, using the â€˜Virtual Constructorâ€™ idiom <a href="#[2]">[2]</a>.</p>

<h2>Right associativity defect</h2>

<p>As Silas Brown pointed out, â€œ<em>the Split and Merge algorithm, as described in the article, does not consistently treat operators as right-associative either. If the operator is preceded by another of the same or lower precedence level, then it will be left-associative: 3-12-1 would get -10. But if the operator is preceded by one of a higher precedence level, it will become right-associative with the operators before that: 3-2*6-1 would get -8.</em>â€</p>

<p>To fix this, we need to break out of the merge loop and go back to where we were after any successful merge.</p>

<h2>Split and Merge revisited</h2>

<p>The split part remains the same: the result of the first step of our algorithm is a list of structures. Each structure consists of two elements â€“ a real number and an action, e.g.</p>
<p style="margin-left:1em">Splitting (â€œ3 - 2 * 6 - 1â€) &#8594; [3, â€˜-â€™], [2, â€˜*â€™], [6, â€˜-â€™], [1, â€˜)â€™]</p>

<p>In the merge step the actual calculation is done by merging the elements of the list of structures created in the first step. We attempt to merge each structure in the list with the next, taking into account the priority of actions.</p>

<p>Merging can be done if and only if the priority of the action in the first structure is not lower than the priority of the second structure. If this is the case merging consists in applying the action of the first structure to the numbers of both structures, thus creating a new structure, e.g. merging [5, â€˜-â€™] and [4, â€˜+â€™] will produce [5 - 4, â€˜+â€™] = [1, â€˜+â€™].</p>

<p>What if the second structure priority is higher than the first one?</p>

<p>Then we merge the next (second) structure with the structure next to it, and so on, recursively. We break out of the merge loop and go back where we were after any successful merge.</p>

<p>Continuing with our example of â€œ3 - 2 * 6 - 1â€:</p>
<p style="margin-left:1em">Merging ([3, â€˜-â€™], [2, â€˜*â€™], [6, â€˜-â€™], [1, â€˜)â€™]) &#8594;</p>
<p style="margin-left:1em">Merging ([3, â€˜-â€™], Merging ([2, â€˜*â€™], [6, â€˜-â€™]), [1, â€˜)â€™]) &#8594;</p>
<p style="margin-left:1em">Merging ([3, â€˜-â€™], [12, â€˜-â€™], [1, â€˜)â€™]) &#8594;</p>
<p style="margin-left:1em">Merging ([-9, â€˜-â€™], [1, â€˜)â€™]) &#8594; [-9 - 1, â€˜)â€™] = [-10, â€˜)â€™].</p>

<p>Therefore, 3 - 2 * 6 - 1 = -10, which is now correct.</p>

<p>Listing 1 contains the implementation of the fixed Merge part in C++ (the parts different from the earlier version [1] are in bold).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
<span class="dimmed">double EZParser::merge(Cell&amp; current, 
  size_t&amp; index, vector&lt;Cell&gt;&amp; listToMerge,</span>
  bool mergeOneOnly)
<span class="dimmed">{
  if (index &gt;= listToMerge.size())
  {
    return current.value;
  }

  while (index &lt; listToMerge.size())
  {
    Cell&amp; next = listToMerge.at(index++);
    while (!canMergeCells(current, next))
    { // If we cannot merge cells yet, go to the
      // next cell and merge next cells first. 
      // E.g. if we have 1+2*3, we first merge 
      // next cells, i.e. 2*3, getting 6, and
      // then we can merge 1+6.
      merge(next, index, listToMerge, true 
        /* mergeOneOnly */);
    }

    mergeCells(current, next);</span>
    if (mergeOneOnly) {
      return current.value;
    }
<span class="dimmed">  }

  return current.value;
}</span>
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 1</td>
	</tr>
</table>

<h2>Implementation using the â€˜Virtual Constructorâ€™ idiom</h2>

<p>The implementation of the Split and Merge algorithm in <a href="#[1]">[1]</a> was using pointers to functions. As James Coplien points out, a better, more object-oriented way, would be using the â€˜Virtual Constructorâ€™ idiom [2]:</p>

<p class="blockquote">â€œThe virtual constructor idiom is used when the type of an object needs to be determined from the context in which the object is constructed.â€</p>

<p>In our case these objects are actual functions to be called (e.g. sine, cosine, logarithm, etc.). They are determined only at run time. The advantage of this idiom is that a new function can be added easily to the framework without modifying the main algorithm implementation code.</p>

<p>Listing 2 is how the base class of such functions would look.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
class EZParserFunction
{
public:
  static double calculate(const string&amp; data,
      size_t&amp; from, const string&amp; item, char ch) {
    EZParserFunction func(data, from, item, ch);
    return func.getValue(data, from);
  }
  static void addFunction(const string&amp; name,
     EZParserFunction* function) {
    m_functions_[name] = function;
  }

protected:
  EZParserFunction() : impl_(0) {}
  virtual ~EZParserFunction() {}
  virtual double getValue(const string&amp; data,
     size_t&amp; from) {
    return impl_-&gt;getValue(data, from);
  }
  double loadArg(const string&amp; data, 
    size_t&amp; from)
    {
      return EZParser::loadAndCalculate(data,
        from, ')');
    }

private:
  EZParserFunction(const string&amp; data, 
    size_t&amp; from, const string&amp; item, char ch);

  EZParserFunction* impl_;
  static map&lt;string, 
    EZParserFunction*&gt; m_functions_;
};
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 2</td>
	</tr>
</table>

<p>Here is what a derived class for a sine function would look like:</p>

<pre class="programlisting">
  class SinFunction : public EZParserFunction
  {
    public:
      double getValue(const string&amp; data,
      size_t&amp; from)
    {
      return ::sin(loadArg(data, from));
    }
  }</pre>

<p>The functions derived from the <code>EZParserFunction</code> class are used in the first, splitting process. The split process itself will remain the same, just the call to process the last extracted token will be now</p>

<pre class="programlisting">
  double value = EZParserFunction::calculate(data,
     from, item, ch);</pre>

<p>instead of</p>

<pre class="programlisting">
  double value = m_allFunctions.getValue(data,
     from, item, ch);</pre>

<p>Listing 3 is the (virtual) constructor of the base class.</p>
<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
EZParserFunction::EZParserFunction
   (const string&amp; data, size_t&amp; from,
    const string&amp; item, char ch)
{
  if (item.empty() &amp;&amp; ch == '(') 
  {
    impl_ = identityFunction;
    return;
  }
  map&lt;string, EZParserFunction*&gt;::
  const_iterator it
  = m_functions_.find(item);

  if (it == m_functions_.end())
  {
    strtodFunction-&gt;setItem(item);
    impl_ = strtodFunction;
    return;
  }

  impl_ = it-&gt;second;
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 3</td>
	</tr>
</table>

<p>As we can see the two functions, <code>StrtodFunction</code> and <code>IdentityFunction</code>, have a special meaning and must be implemented (see Listing 4).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
class StrtodFunction : public EZParserFunction
{
public:
  double getValue(const string&amp; data,
    size_t&amp; from)
  {
    char* x;
    double num = ::strtod(item_.c_str(), &amp;x);
    if (::strlen(x) &gt; 0)
    {
      throw CalculationException(&quot;Could not parse
         token [&quot; + item_ + &quot;]&quot;);
    }
    return num;
  }
  void setItem(const string&amp; item) {
    item_ = item; }

private:
  string item_;
};

class IdentityFunction : public EZParserFunction
{
public:
  double getValue(const string&amp; data, 
  size_t&amp; from)
  {
    return loadArg(data, from);
  }
};

static StrtodFunction* 
  strtodFunction = new StrtodFunction();

static IdentityFunction*
  identityFunction = new IdentityFunction();
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 4</td>
	</tr>
</table>

<p>The rest of the functions do not have to be implemented for the algorithm to work and can be added ad hoc (e.g. <code>log</code>, <code>exp</code>, <code>pi</code>, etc.)</p>

<p>Listing 5 is how the user can add functions that she implemented somewhere in her program initialization code:</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
EZParserFunction::addFunction(&quot;exp&quot;,
  new ExpFunction());
EZParserFunction::addFunction(&quot;log&quot;,
  new LogFunction());
EZParserFunction::addFunction(&quot;sin&quot;,
  new SinFunction());
EZParserFunction::addFunction(&quot;cos&quot;,
  new CosFunction());
EZParserFunction::addFunction(&quot;fabs&quot;,
  new FabsFunction());
EZParserFunction::addFunction(&quot;pi&quot;,
  new ExpFunction());
EZParserFunction::addFunction(&quot;sqrt&quot;,
  new SqrtFunction());
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 5</td>
	</tr>
</table>

<h2>References</h2>

<p class="bibliomixed"><a id="[1]"></a>[1]	Vassili Kaplan, â€˜Split and Merge â€“ another Algorithm for Parsing Mathematical Expressionsâ€™, <em>CVu</em>, Volume 27, Issue 2, May 2015.</p>

<p class="bibliomixed"><a id="[2]"></a>[2]	James Coplien, <em>Advanced C++ Programming Styles and Idioms</em> (p. 140), Addison-Wesley, 1992.</p>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
