    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: A Little Class</title>
        <link>https://members.accu.org/index.php/articles/1133</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Programming Topics + CVu Journal Vol 13, #5 - Oct 2001</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c65/">Programming</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c118/">135</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c65-118/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c65+118/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;A Little Class</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 03 October 2001 13:15:47 +01:00 or Wed, 03 October 2001 13:15:47 +01:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e18" id="d0e18"></a></h2>
</div>
<p>For a recent small administration utility, I had to label a lot
of items uniquely. The label had to be some string. One option
would have been to use a simple <tt class="literal">unsigned
long</tt> and convert it to a <tt class="literal">string</tt> (e.g.
using <tt class="literal">stringstream</tt>), but that would have
been a waste of <tt class="literal">string</tt> space: if you need
to label more than a million items, you have to reserve at least 7
digits. So the idea came up to use a <tt class=
"literal">string</tt> counter using the lower case letters from 'a'
to 'z'. So you start with &quot;aaaaa&quot; and end with &quot;zzzzz&quot;. Here, 5
digits are enough for more than 10 million items. So I started out
to program a class for it. I wanted to use it like this:</p>
<pre class="programlisting">
void stamp(std::string
  &amp; item) { static StringCounter label(5, &quot; Stamp:&quot;); // use 5
  digits and prefix them with &quot;Stamp:&quot; item += label++; }
</pre>
<p>Of course, in my real application stamp is a member class and
label a class member.</p>
<p>With this, I could define my class interface:</p>
<pre class="programlisting">
namespace utilities { class StringCounter{ public: explicit
  StringCounter(int length, std::string prefix = &quot;&quot;); operator
  std::string() const; StringCounter operator++(int); }; } // namespace
  utilities
</pre>
<p>Of course, we put the class into an appropriate namespace, which
is <tt class="literal">utilities</tt> in my case (which in turn is
a sub-namespace of my company-specific namespace; &quot;utilities&quot; is
much too common a name for a top-level namespace).</p>
<p>For the class itself, we have a constructor that takes the
number of digits and an optional prefix. As the second parameter is
optional, we make the constructor explicit to avoid non-intended
conversion. Then, for our usage in <tt class="literal">stamp</tt>,
we need the postfix increment. It returns the current value
(therefore a return type of <tt class="literal">StringCounter</tt>)
and afterwards increments itself. But as we want to append the
result to a <tt class="literal">string</tt>, we need also a
conversion function. Of course, it would be possible to just return
<tt class="literal">string</tt> from <tt class=
"literal">operator++</tt>, but I hate to do so: if we use an
operator, we use predefined semantics. The rule of least surprise
forbids us from violating them, and a return type different from
the class itself would be such a violation. And if you care about
performance: if we make the functions inline, a decent compiler
should be able to optimise to the same effect (and completely
eliminate the temporary return object).</p>
<p>Now for the implementation: we clearly need to store the current
counter and the prefix, and it might be handy to store the length
as well. So we get:</p>
<pre class="programlisting">
class StringCounter { // ... private:
  std::string prefix; std::string counter; int length; };
  StringCounter::StringCounter(int len, string prfx) : prefix(prfx),
  counter(len, 'a'), length(len) { }
</pre>
<p>This already shows our main tool for this class: the standard
class <tt class="literal">string</tt>. A lot of people say that
<tt class="literal">string</tt> suffers from featurism: there is
too much in it. But exactly this helps us with our task. The first
feature is the <tt class="literal">string</tt> constructor that
takes, a number and a character and initializes the <tt class=
"literal">string</tt> with so many occurrences of the character. We
use this in <tt class="literal">counter(len, 'a')</tt> and get
&quot;aaaaa&quot; (with <tt class="literal">len==5</tt> from our stamp
example).</p>
<p>Now we come to the main member function of our class, the
increment:</p>
<pre class="programlisting">
StringCounter
  StringCounter::operator++(int) { StringCounter tmp(*this); string::size_type
  idx = counter.find_last_not_of('z'); if (idx == string::npos) { //
  silent overflow counter.assign(length, 'a'); } else { // there are
  possibly 'z's at the end ++counter[idx++]; counter.replace(idx,
  length-idx, length-idx, 'a'); } return tmp; }
</pre>
<p>We don't name the <tt class="literal">int</tt> parameter, as it
is a dummy parameter to distinguish between prefix and postfix
increment. And not naming it avoids the &quot;unused parameter&quot; warning
on most compilers.</p>
<p>As we have the postfix operation here, we need to store the
current value to return it later, which we do in <tt class=
"literal">tmp</tt>. This creates a full-blown copy of the current
object, but again, if you really care about performance, make
everything inline and let the compiler do the optimisation.</p>
<p>You might ask why I use the postfix increment instead of prefix,
which has fewer performance problems. That's true, but <tt class=
"literal">item += label++;</tt> is quite common in my code, and the
use of such idioms makes my code to look more uniform and therefore
easier to understand. And I'm definitely not willing to trade
understandability of my code for performance, if there is not an
urgent need for it.</p>
<p>Now we come to the real incrementing. The basic algorithm
is:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>Start with the last digit.</p>
</li>
<li>
<p>If it's not a 'z' increment it, and you're done.</p>
</li>
<li>
<p>If it is a 'z', go to the previous digit and redo 3)</p>
</li>
<li>
<p>If you found a non-'z', increment it</p>
</li>
<li>
<p>Set all remaining digits back to 'a'.</p>
</li>
<li>
<p>If all digits are 'z', you have an overflow.</p>
</li>
</ol>
</div>
<p>If we look at the algorithm again, we see that 1) and 2) are
just a special case of 3), 4) and 5), so we can reformulate the
algorithm:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>nothing</p>
</li>
<li>
<p>nothing</p>
</li>
<li>
<p>find the first non-'z' digit, starting from the end.</p>
</li>
<li>
<p>same as before</p>
</li>
<li>
<p>same as before</p>
</li>
<li>
<p>same as before</p>
</li>
</ol>
</div>
<p>Now, if the last digit is not a 'z', in step 5) the number of
remaining digits is just zero.</p>
<p>To implement that algorithm, we start to leverage the power (or
featurism) of string. The <tt class="literal">find_last_not_of</tt>
is just our step 3) and the test and if-part is our 6). For my
application it's fine to silently restart, but you might consider
throwing an exception. The assign used is just the counterpart of
the constructor originally used for counter.</p>
<p>The line <tt class="literal">++counter[idx++];</tt> implements
step 4, and the <tt class="literal">replace</tt> implements step 5.
This was the most difficult part of the implementation: to find out
what replace actually does. I couldn't find any example for it's
use, and the explanation in my usual reference [<a href=
"#Josuttis">Josuttis</a>] didn't really help. But in [<a href=
"#Stroustrup">Stroustrup</a>] I found the hint that the length of
the string can change with <tt class="literal">replace</tt>, and
with that hint I was able to interpret the original Standard text:
essentially <tt class="literal">replace</tt> takes an index where
to start with the replacement, then a number to define how many
characters should be removed, and the rest is about what should be
inserted. In my case it's first the length of the insertion and
then the character that is inserted so often, in the other forms it
might be a second string with again an index and how many
characters to be taken of that string.</p>
<p>Well, that's it. If you really care about performance you might
speculate whether the call of the <tt class="literal">replace</tt>
function with a length argument of zero isn't too expensive.
Frankly, I don't know and I don't really care. In most
implementations <tt class="literal">replace</tt> is inline anyway,
and therefore the additional test in our function that would be
required is just the same cost as the test inside of replace, but
to say exactly you have to do some tests.</p>
<p>Finally, the implementation of our conversion operator is
straightforward: just return the counter.</p>
<p>So, to put it all together, here is the full code:</p>
<pre class="programlisting">
// strcounter.h
  #ifndef UTILITIES_STRING_COUNTER_H_ #define UTILITIES_STRING_COUNTER_H_
  #include &lt;string&gt; namespace utilities { class StringCounter {
  public: explicit StringCounter(int length, std::string prefix = &quot;&quot;);
  operator std::string() const; StringCounter operator++(int); private:
  std::string prefix; std::string counter; int length; }; } // namespace
  utilities #endif // UTILITIES_STRING_COUNTER_H_
</pre>
<pre class="programlisting">
// strcounter.cpp #include &quot;strcounter.h&quot; using
  std::string; namespace utilities { StringCounter::StringCounter(int len,
  string prfx) : prefix(prfx), counter(len, 'a'), length(len) {}
  StringCounter::operator string() const { return prefix + counter; }
  StringCounter StringCounter::operator++(int) { StringCounter tmp(*this);
  string::size_type idx = counter.find_last_not_of('z'); if (idx ==
  string::npos) { // silent overflow counter.assign(length, 'a'); }
  else { // there are possibly 'z's at the end ++counter[idx++];
  counter.replace(idx, length-idx, length-idx, 'a'); } return tmp; } }
  // namespace utilities
</pre>
<p>This class is sufficiently small to leave me quite confident
that it has no major flaw. Even if the constructor is (erroneously)
called with a length argument of zero, everything is fine (only
empty strings are returned).</p>
<p>Nevertheless, there is a portability problem with this
implementation that is not a problem for my environment, but must
be considered when using this class elsewhere: the expression
<tt class="literal">++counter[...]</tt> to increment the character
will not necessarily do what you expect. I'm not even sure that
repeated increments of 'a' will eventually reach 'z' (without
overflow). Actually, I always believed that such a guarantee
exists, but could not find it in the Standard and therefore must
assume that it doesn't exist. But at least on EBCDIC, the sequence
of characters between 'a' and 'z' will include non-letter
characters, and this is probably not what you want. A portable
re-implementation would not be really difficult, but I'll leave it
to you. So you are kindly invited to provide one for the next
issue.</p>
<p>In doing this, you might consider to extend the character set to
include numbers, '$', '_', etc. It would be best if that character
set could be given to the constructor as additional (optional)
argument, e.g. <tt class="literal">StringCounter(5, &quot;&quot;,
&quot;abcdefghijklmnopqrstuvw
xyzABCDEFGHIJKLMNOPQRSTUVWXYZ$_0123456789&quot;);</tt></p>
<p>Are there other things to add? We could add a postfix string,
but on second thought we might decide to omit even the prefix: The
Golden Rule of object-orientation is that one class should do one
thing, and so the addition of prefix, postfix, and maybe some other
formatting should be done outside this class, which then does only
one thing: counting <tt class="literal">stringwise</tt>. But that
could include an <tt class="literal">operator+ (StringCounter,
unsigned int)</tt> as in multi-threaded environments it might be
useful to reserve full blocks of unique labels. That could finally
lead to a complete different implementation: not based on string as
the current one, but based on <tt class="literal">unsigned
long</tt> without any internal <tt class="literal">string</tt>s. In
this case, the conversion to string takes most of the work. But
whatever is decided for future implementations, except for the
possible omission of the prefix parameter in the constructor the
interface of the current class should be simple and straightforward
enough to remain stable.</p>
<p>Another completely different issue could be the return type of
the conversion. As you probably know, <tt class=
"literal">std::string</tt> is a <tt class="literal">typedef</tt>
for a template, so should the conversion be a member template on
the return type? Definitely not. <tt class=
"literal">StringCounter</tt> uses string (i.e. <tt class=
"literal">basic_string&lt;char&gt;</tt>) for implementation and
returns just that, and any conversions to a different string type
(e.g. <tt class="literal">wstring</tt>) is for the user of
<tt class="literal">StringCounter</tt>. And also, I don't see an
easy way to implement such a general conversion.</p>
<p>But on second thought, I might be wrong. Not because of the
return type of the conversion, but the basic implementation type
might be some other instantiation of <tt class=
"literal">basic_string</tt>. If we really provide a character set
parameter for the constructor, a user might want to include
non-chars into that character set. And in that case <tt class=
"literal">StringCounter</tt> should be built indeed on that
non-char character set. Therefore a templatisation of the complete
<tt class="literal">StringCounter</tt> on the character type might
be useful. And probably nothing of the implementation would have to
change. That's the bright magic of templates.</p>
<p>Though this looks really nice, I must admit that I don't feel
completely comfortable with such a template solution. Though I'm
pretty sure that my implementation would work as template as well
(except those things that have to go anyway with the introduction
of a character set) I'm not really confident that any future
maintainer of the class would keep in mind that <tt class=
"literal">sizeof</tt> of the underlying character type is not
necessarily 1. It's too easy to forget that and to rely on that
wrong assumption. So the least to do if we really want to
templatise is to provide a set of test cases with more exotic
character types.</p>
<p>But for now, as I'm happy with my class as it is, I am reminded
of one of the basic rules of XP (&quot;eXtreme Programming&quot;): Don't do
now what you don't need now.</p>
<p>But you are welcome to come up with a better version of my
class, and I would like to read about it at this place in the next
issue.</p>
</div>
<div class="bibliography">
<div class="titlepage">
<h2><a name="d0e260" id="d0e260"></a>References:</h2>
</div>
<div class="bibliomixed"><a name="Josuttis" id="Josuttis"></a>
<p class="bibliomixed">[Josuttis] Nicolai M. Josuttis: <span class=
"citetitle"><i class="citetitle">The C++ Standard
Library</i></span>, Addison-Wesley 1999</p>
</div>
<div class="bibliomixed"><a name="Stroustrup" id="Stroustrup"></a>
<p class="bibliomixed">[Stroustrup] Bjarne Stroustrup: <span class=
"citetitle"><i class="citetitle">The C++ Programming
Language</i></span>, 3rd ed, Addison-Wesley 1997</p>
</div>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
