    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: A Little String Thing</title>
        <link>https://members.accu.org/index.php/journals/1178</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">CVu Journal Vol 14, #4 - Aug 2002 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c113/">144</a>
                    (17)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c65/">Programming</a>
                    (877)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c113-65/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c113+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;A Little String Thing</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 03 August 2002 13:15:52 +01:00 or Sat, 03 August 2002 13:15:52 +01:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e20" id="d0e20"></a></h2>
</div>
<p>It is sometimes useful to be able to tidy up a string before
further processing in a software system e.g. at the point where the
user has entered the string via the user-interface. From this point
on an assumption may be safely made that the string contains no
formatting &quot;surprises&quot;. Perhaps one of the hardest things to spot
with strings is leading or trailing white space. As unlikely as it
may seem this can cause systems to fail e.g. a table in a database
cannot be found because leading white space in a table name is
present. Trying to find such issues can really be like looking for
a needle in a haystack.</p>
<p>By removing all non-significant white space (by that I mean
leading and trailing white space) at the point where the string has
just been entered by a user, such problems should not occur.
Performing the operation at the point of entry of data into a
system means that no further checks for leading/trailing white
space need to be made in other parts of the code and provides for a
single point of maintenance.</p>
<p>The project I have in mind uses the C++ standard library string
class. Despite criticisms from some quarters that the string class
has too many methods (a &quot;Swiss army knife&quot; of a string class, if
you will) it certainly doesn't have any methods for removing
leading and/or trailing white space. This means we have to code it
ourselves. Well, the string class does provide &quot;find&quot; type methods
so we could use those to locate specific portions of the string and
extract the &quot;real&quot; part of the string. Below is some sample
code:</p>
<pre class="programlisting">
#include &lt;string&gt;
#include &lt;iostream&gt;

namespace {
void rem_space(std::string&amp; str) {
  typedef std::string::size_type pos_t;
  pos_t start = str.find_first_not_of(' ');
  pos_t end = str.find_first_of(' ', start);
  str = str.substr(start, end - start);
}

void show(const std::string&amp; str) {
  std::cout &lt;&lt; &quot;Text is: *&quot; &lt;&lt; str &lt;&lt; '*' &lt;&lt; std::endl;
}

void test_it(std::string&amp; str) {
  show(str);
  rem_space(str);
  show(str);
  std::cout &lt;&lt; std::endl;
}}

int main() {
  using std::string;

  string test1(&quot;   abc   &quot;);
  test_it(test1);

  string test2(&quot;   abc&quot;);
  test_it(test2);
    
  string test3(&quot;abc   &quot;);
  test_it(test3);
    
  string test4(&quot;abc&quot;);
  test_it(test4);

  return 0;
}
</pre>
<p>On the face of it, the code seems to do the trick. The method
<tt class="methodname">rem_space</tt> in the anonymous namespace is
really the heart of it and I guess the three lines of code speak
for themselves (note that the clear names of the methods on the
<tt class="classname">std::string</tt> class make the code almost
self-documenting here). The start and end positions are
<span class="bold"><b>not</b></span> checked against <tt class=
"varname">std::string::npos</tt> here as such checks are
unnecessary in this context (but they should be checked for
<tt class="literal">start &gt; end</tt> - more later).</p>
<p>Still, there's plenty wrong with the code above! For a start, it
only checks for a ' ' as white space. What about tab (<tt class=
"literal">\t</tt>) and newline (<tt class="literal">\n</tt>) and
all other such characters? A more subtle problem with the code is
if you give it a string such as &quot; <tt class="literal">abc abc</tt>
&quot;. What would you expect the result to be? What would you want the
result to be? (Unfortunately, the two don't always coincide!)</p>
<p>This can easily be fixed by changing the line</p>
<pre class="programlisting">
pos_t end = str.find_first_of(' ', start);
</pre>
<p>to</p>
<pre class="programlisting">
pos_t end = str.find_last_not_of(' ');
</pre>
<p>The work to fix the check for white space becomes a little more
involved. We could use a string of characters to look for (a
character class for those of you into regular expressions) e.g.</p>
<pre class="programlisting">
str.find_first_not_of(&quot; \t\n&quot;);
</pre>
<p>Sure we've got all our white space characters? A quick look at
an ASCII table will provide several more so we're definitely
lacking here. Alternatively, we could use an already-provided
method called <tt class="methodname">isspace()</tt> which - guess
what - checks to see if the character passed to it is a white space
character. It even does more than that - it will also use locale
information to interpret what should be classed as white space. I
won't go into locales here as it's beyond the scope of this
article<sup>[<a name="d0e78" href="#ftn.d0e78" id=
"d0e78">1</a>]</sup>. We end up with a slight problem here: you
can't pass a function or functor to the <tt class=
"classname">std::string</tt> &quot;<tt class="methodname">find</tt>&quot;
methods. Looks like we need to iterate over the string and provide
some means of processing it on a character by character basis.
<tt class="function">std::find_if</tt> is one way of iterating over
a container and applying an arbitrary &quot;find&quot; predicate to each of
the values in the container. Fortunately, the <tt class=
"constant">std::string</tt> behaves like a container class in that
it supports iterators - in fact, random access iterators - so we
can use the general algorithm methods such as <tt class=
"function">std::find_if</tt>. So here's a slightly modified
version:</p>
<pre class="programlisting">
void rem_space(std::string&amp; str) {
  typedef std::string::iterator str_it;

  str_it str_start =
    std::find_if(str.begin(),
                 str.end(),
                 std::not1(isspace));

  str_it str_end = 
    std::find_if(str.rbegin(),
                 str.rend(),
                 std::not1(isspace)).base();

  str = (str_start &lt;= str_end) ?
         std::string(str_start, str_end) : &quot;&quot;;
}
</pre>
<p>The additional includes are:</p>
<pre class="programlisting">
#include &lt;locale&gt; // for isspace()
#include &lt;algorithm&gt; // for std::find_if
#include &lt;functional&gt; // for std::not1
</pre>
<p>Before you try it: no, it won't compile, but first let's take a
look at what we've got here.</p>
<p>Instead of using <tt class=
"methodname">std::string::find_nnn()</tt> we are now using
<tt class="function">std::find_if</tt>. This is so that we can
apply an arbitrary predicate - in our case, <tt class=
"function">isspace()</tt>. Note that <tt class=
"function">isspace()</tt> is one of a set of global convenience
functions that use facets and locales under the covers.</p>
<p>We are now using <tt class=
"classname">std::string::iterator</tt> types instead of the
<tt class="type">std::string::size_type</tt> as <tt class=
"function">find_if</tt> works with iterators and the <tt class=
"methodname">std::string::find_nnn</tt> methods return you numeric
positions rather than integers. In brief, the algorithm looks like
this:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>find the first location in the string where we do not have a
space</p>
</li>
<li>
<p>find the last location in the string which is non white
space</p>
</li>
<li>
<p>construct a temporary string from the start and end iterators
and assign this to the input string</p>
</li>
</ol>
</div>
<p>Again, no checks need to be made on whether the iterators
returned from <tt class="function">std::find_if</tt> are equivalent
to <tt class="methodname">str.end()</tt> as the code will work fine
if the start and end iterators are equivalent to <tt class=
"methodname">str.end()</tt>. However, there is a possibility that
the start iterator will end up with a value greater than the end
iterator: consider the case of a string containing nothing but
white space. The forward iterator will go all the way to the end of
string and return <tt class="methodname">str.end()</tt>. The
reverse iterator will start at the end and go all the way to the
beginning, returning <tt class="methodname">str.begin()</tt>! The
ternary operator is used to cope with this case.</p>
<p>So, the <tt class="classname">str_start</tt> iterator is set to
the first value not containing a white space character, starting
from the beginning of the <tt class="classname">string.
str_end</tt> is set to the last character not containing a white
space value, starting from the end of the of string - but what is
the <tt class="methodname">.base()</tt> tacked on to the <tt class=
"literal">str_end</tt> line of code? Well, to start from the end of
the string we used the <tt class=
"classname">std::string::reverse_iterator</tt>s given by <tt class=
"methodname">rbegin()</tt> and <tt class="methodname">rend()</tt>.
The type of iterator returned by <tt class=
"function">std::find_if</tt> will also be a <tt class=
"classname">reverse_iterator</tt> in this case. We want a forward
iterator here so we can easily construct our sub-string object from
the start and end iterators using the range constructor form of
<tt class="classname">std::string</tt>. Calling &quot;<tt class=
"methodname">base()</tt>&quot; on a <tt class=
"classname">reverse_iterator</tt> performs the conversion of a
<tt class="classname">reverse_iterator</tt> into a regular iterator
for you<sup>[<a name="d0e201" href="#ftn.d0e201" id=
"d0e201">2</a>]</sup>.</p>
<p>Even so, it still won't compile. The problem is to do with the
fact that we are passing a &quot;normal&quot; function into the <tt class=
"function">find_if</tt> as its predicate. Apart from an efficiency
consideration<sup>[<a name="d0e210" href="#ftn.d0e210" id=
"d0e210">3</a>]</sup>, the issue is that the free function
<tt class="function">isspace()</tt> is not adapatable. This means
you cannot apply function adapters to them - such as <tt class=
"function">bind1st</tt>, <tt class="function">bind2nd</tt> etc.
What we need to do is provide our own function object (aka
&quot;functor&quot;) that is adaptable. Here's the code for the functor:</p>
<pre class="programlisting">
struct is_space : public
  std::unary_function&lt;std::string::value_type, bool&gt; {
  result_type operator()(const argument_type&amp; val) const {
    //The only global isspace() I have access to returns int
    // - which isn't conformant but is programming life! 
    return isspace(val) != 0;
  }
};
</pre>
<p>Inheriting from <tt class="function">std::unary_function</tt>
doesn't do much other than give the <tt class="literal">struct</tt>
a few <tt class="literal">typedef</tt>s - yet, these are the
<tt class="literal">typedef</tt>s that make the functor adapatable.
I've used a couple of the <tt class="literal">typedef</tt>s
provided by <tt class="function">std::unary_function</tt> in my own
<tt class="methodname">operator()</tt> such as &quot;<tt class=
"type">result_type</tt>&quot; and &quot;<tt class="type">argument_type</tt>&quot;.
These just pick up the types I specified in the template parameters
to the base <tt class="function">unary_function</tt> class - if I
change the types later on, the rest of the code will follow which
means less typing for me! Perhaps more importantly, though, it also
means once again a single point of maintenance: I don't need to
change the argument types in two - albeit closely physically
related - places if I use the <tt class="literal">typedef</tt>s
provided for me by <tt class="function">unary_function</tt>.</p>
<p>Modifying the code yet again to use my <tt class=
"function">is_space</tt> functor we end up with this:</p>
<pre class="programlisting">
void rem_space(std::string&amp; str) {
  str_it str_start =
    std::find_if(str.begin(),
                 str.end(),
                 std::not1(is_space()));

  str_it str_end = 
    std::find_if(str.rbegin(),
                str.rend(),
                std::not1(is_space())).base();

  str = (str_start &lt;= str_end) ?
         std::string(str_start, str_end) : &quot;&quot;;
}
</pre>
<p>A full version of the final source code is attached as a zip
file which the ACCU are free to publish on their website for
download if they so wish.</p>
<p>Of course, if you can do it better or have suggestions to
improve this version then write in! I'm sure James would welcome
your input.</p>
<p>(<i><span class="remark">Absolutely! - ed</span></i>)</p>
</div>
<div class="footnotes"><br>
<hr class="c2" width="100">
<div class="footnote">
<p><sup>[<a name="ftn.d0e78" href="#d0e78" id=
"ftn.d0e78">1</a>]</sup> If you're interested in locales,
Josuttis's &quot;C++ Standard Library&quot; has a good section on the
subject, whilst the piece de resistance, at least in my opinion, is
Klaus Kreft and Angelika Langer's &quot;C++ I/O Streams and Locales&quot;</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e201" href="#d0e201" id=
"ftn.d0e201">2</a>]</sup> More on this in, amongst other works,
Scott Meyer's &quot;Effective STL&quot;</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e210" href="#d0e210" id=
"ftn.d0e210">3</a>]</sup> A function pointer passed as a predicate
is called by de-referencing the function pointer whereas a function
object (a.k.a. &quot;functor&quot;) can usually be made inline</p>
</div>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
