Title: A Little Class

Author:

Date: 03 October 2001 13:15:47 +01:00 or Wed, 03 October 2001 13:15:47 +01:00

Summary:

Body:

For a recent small administration utility, I had to label a lot of items uniquely. The label had to be some string. One option would have been to use a simple unsigned long and convert it to a string (e.g. using stringstream), but that would have been a waste of string space: if you need to label more than a million items, you have to reserve at least 7 digits. So the idea came up to use a string counter using the lower case letters from 'a' to 'z'. So you start with "aaaaa" and end with "zzzzz". Here, 5 digits are enough for more than 10 million items. So I started out to program a class for it. I wanted to use it like this:

void stamp(std::string
  & item) { static StringCounter label(5, " Stamp:"); // use 5
  digits and prefix them with "Stamp:" item += label++; }

Of course, in my real application stamp is a member class and label a class member.

With this, I could define my class interface:

namespace utilities { class StringCounter{ public: explicit
  StringCounter(int length, std::string prefix = ""); operator
  std::string() const; StringCounter operator++(int); }; } // namespace
  utilities

Of course, we put the class into an appropriate namespace, which is utilities in my case (which in turn is a sub-namespace of my company-specific namespace; "utilities" is much too common a name for a top-level namespace).

For the class itself, we have a constructor that takes the number of digits and an optional prefix. As the second parameter is optional, we make the constructor explicit to avoid non-intended conversion. Then, for our usage in stamp, we need the postfix increment. It returns the current value (therefore a return type of StringCounter) and afterwards increments itself. But as we want to append the result to a string, we need also a conversion function. Of course, it would be possible to just return string from operator++, but I hate to do so: if we use an operator, we use predefined semantics. The rule of least surprise forbids us from violating them, and a return type different from the class itself would be such a violation. And if you care about performance: if we make the functions inline, a decent compiler should be able to optimise to the same effect (and completely eliminate the temporary return object).

Now for the implementation: we clearly need to store the current counter and the prefix, and it might be handy to store the length as well. So we get:

class StringCounter { // ... private:
  std::string prefix; std::string counter; int length; };
  StringCounter::StringCounter(int len, string prfx) : prefix(prfx),
  counter(len, 'a'), length(len) { }

This already shows our main tool for this class: the standard class string. A lot of people say that string suffers from featurism: there is too much in it. But exactly this helps us with our task. The first feature is the string constructor that takes, a number and a character and initializes the string with so many occurrences of the character. We use this in counter(len, 'a') and get "aaaaa" (with len==5 from our stamp example).

Now we come to the main member function of our class, the increment:

StringCounter
  StringCounter::operator++(int) { StringCounter tmp(*this); string::size_type
  idx = counter.find_last_not_of('z'); if (idx == string::npos) { //
  silent overflow counter.assign(length, 'a'); } else { // there are
  possibly 'z's at the end ++counter[idx++]; counter.replace(idx,
  length-idx, length-idx, 'a'); } return tmp; }

We don't name the int parameter, as it is a dummy parameter to distinguish between prefix and postfix increment. And not naming it avoids the "unused parameter" warning on most compilers.

As we have the postfix operation here, we need to store the current value to return it later, which we do in tmp. This creates a full-blown copy of the current object, but again, if you really care about performance, make everything inline and let the compiler do the optimisation.

You might ask why I use the postfix increment instead of prefix, which has fewer performance problems. That's true, but item += label++; is quite common in my code, and the use of such idioms makes my code to look more uniform and therefore easier to understand. And I'm definitely not willing to trade understandability of my code for performance, if there is not an urgent need for it.

Now we come to the real incrementing. The basic algorithm is:

Start with the last digit.
If it's not a 'z' increment it, and you're done.
If it is a 'z', go to the previous digit and redo 3)
If you found a non-'z', increment it
Set all remaining digits back to 'a'.
If all digits are 'z', you have an overflow.

If we look at the algorithm again, we see that 1) and 2) are just a special case of 3), 4) and 5), so we can reformulate the algorithm:

nothing
nothing
find the first non-'z' digit, starting from the end.
same as before
same as before
same as before

Now, if the last digit is not a 'z', in step 5) the number of remaining digits is just zero.

To implement that algorithm, we start to leverage the power (or featurism) of string. The find_last_not_of is just our step 3) and the test and if-part is our 6). For my application it's fine to silently restart, but you might consider throwing an exception. The assign used is just the counterpart of the constructor originally used for counter.

The line ++counter[idx++]; implements step 4, and the replace implements step 5. This was the most difficult part of the implementation: to find out what replace actually does. I couldn't find any example for it's use, and the explanation in my usual reference [Josuttis] didn't really help. But in [Stroustrup] I found the hint that the length of the string can change with replace, and with that hint I was able to interpret the original Standard text: essentially replace takes an index where to start with the replacement, then a number to define how many characters should be removed, and the rest is about what should be inserted. In my case it's first the length of the insertion and then the character that is inserted so often, in the other forms it might be a second string with again an index and how many characters to be taken of that string.

Well, that's it. If you really care about performance you might speculate whether the call of the replace function with a length argument of zero isn't too expensive. Frankly, I don't know and I don't really care. In most implementations replace is inline anyway, and therefore the additional test in our function that would be required is just the same cost as the test inside of replace, but to say exactly you have to do some tests.

Finally, the implementation of our conversion operator is straightforward: just return the counter.

So, to put it all together, here is the full code:

// strcounter.h
  #ifndef UTILITIES_STRING_COUNTER_H_ #define UTILITIES_STRING_COUNTER_H_
  #include <string> namespace utilities { class StringCounter {
  public: explicit StringCounter(int length, std::string prefix = "");
  operator std::string() const; StringCounter operator++(int); private:
  std::string prefix; std::string counter; int length; }; } // namespace
  utilities #endif // UTILITIES_STRING_COUNTER_H_

// strcounter.cpp #include "strcounter.h" using
  std::string; namespace utilities { StringCounter::StringCounter(int len,
  string prfx) : prefix(prfx), counter(len, 'a'), length(len) {}
  StringCounter::operator string() const { return prefix + counter; }
  StringCounter StringCounter::operator++(int) { StringCounter tmp(*this);
  string::size_type idx = counter.find_last_not_of('z'); if (idx ==
  string::npos) { // silent overflow counter.assign(length, 'a'); }
  else { // there are possibly 'z's at the end ++counter[idx++];
  counter.replace(idx, length-idx, length-idx, 'a'); } return tmp; } }
  // namespace utilities

This class is sufficiently small to leave me quite confident that it has no major flaw. Even if the constructor is (erroneously) called with a length argument of zero, everything is fine (only empty strings are returned).

Nevertheless, there is a portability problem with this implementation that is not a problem for my environment, but must be considered when using this class elsewhere: the expression ++counter[...] to increment the character will not necessarily do what you expect. I'm not even sure that repeated increments of 'a' will eventually reach 'z' (without overflow). Actually, I always believed that such a guarantee exists, but could not find it in the Standard and therefore must assume that it doesn't exist. But at least on EBCDIC, the sequence of characters between 'a' and 'z' will include non-letter characters, and this is probably not what you want. A portable re-implementation would not be really difficult, but I'll leave it to you. So you are kindly invited to provide one for the next issue.

In doing this, you might consider to extend the character set to include numbers, '$', '_', etc. It would be best if that character set could be given to the constructor as additional (optional) argument, e.g. StringCounter(5, "", "abcdefghijklmnopqrstuvw xyzABCDEFGHIJKLMNOPQRSTUVWXYZ$_0123456789");

Are there other things to add? We could add a postfix string, but on second thought we might decide to omit even the prefix: The Golden Rule of object-orientation is that one class should do one thing, and so the addition of prefix, postfix, and maybe some other formatting should be done outside this class, which then does only one thing: counting stringwise. But that could include an operator+ (StringCounter, unsigned int) as in multi-threaded environments it might be useful to reserve full blocks of unique labels. That could finally lead to a complete different implementation: not based on string as the current one, but based on unsigned long without any internal strings. In this case, the conversion to string takes most of the work. But whatever is decided for future implementations, except for the possible omission of the prefix parameter in the constructor the interface of the current class should be simple and straightforward enough to remain stable.

Another completely different issue could be the return type of the conversion. As you probably know, std::string is a typedef for a template, so should the conversion be a member template on the return type? Definitely not. StringCounter uses string (i.e. basic_string<char>) for implementation and returns just that, and any conversions to a different string type (e.g. wstring) is for the user of StringCounter. And also, I don't see an easy way to implement such a general conversion.

But on second thought, I might be wrong. Not because of the return type of the conversion, but the basic implementation type might be some other instantiation of basic_string. If we really provide a character set parameter for the constructor, a user might want to include non-chars into that character set. And in that case StringCounter should be built indeed on that non-char character set. Therefore a templatisation of the complete StringCounter on the character type might be useful. And probably nothing of the implementation would have to change. That's the bright magic of templates.

Though this looks really nice, I must admit that I don't feel completely comfortable with such a template solution. Though I'm pretty sure that my implementation would work as template as well (except those things that have to go anyway with the introduction of a character set) I'm not really confident that any future maintainer of the class would keep in mind that sizeof of the underlying character type is not necessarily 1. It's too easy to forget that and to rely on that wrong assumption. So the least to do if we really want to templatise is to provide a set of test cases with more exotic character types.

But for now, as I'm happy with my class as it is, I am reminded of one of the basic rules of XP ("eXtreme Programming"): Don't do now what you don't need now.

But you are welcome to come up with a better version of my class, and I would like to read about it at this place in the next issue.

References:

[Josuttis] Nicolai M. Josuttis: The C++ Standard Library, Addison-Wesley 1999

[Stroustrup] Bjarne Stroustrup: The C++ Programming Language, 3rd ed, Addison-Wesley 1997

Notes:

More fields may be available via dynamicdata ..