Journal Articles

Overload Journal #55 - Jun 2003 + Programming Topics
Browse in : All > Journals > Overload > 55 (6)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: A bin Manipulator For IOStreams

Author: Administrator

Date: 02 June 2003 22:57:04 +01:00 or Mon, 02 June 2003 22:57:04 +01:00

Summary: 

Body: 

The standard stream classes support different bases when doing formatted I/O with integers: there are manipulators std::oct, std::dec, and std::hex for octal, decimal, and hexadecimal I/O, respectively. There is, however, no manipulator for writing and reading integers using other bases, although something like base two seems to be a natural choice, too.

The question is thus what manipulators are and what a manipulator for formatting integers using base two would look like. For this discussion, it is sufficient to concentrate on manipulators without arguments. These are quite simple: A manipulator without argument is (at least normally) just a function with a certain signature. For example, std::hex looks something like this:

namespace std {
  std::ios_base& hex(std::ios_base& ib) {
    ib.setf(std::ios_base::hex,
    std::ios_base::basefield);A bin Manipulator For IOStreams
    return ib;
  }
}

This function just clears a bunch of bits (namely those which are set in basefield) in the formatting flags and then sets a few of them again (namely those which are set in hex). The standard's formatting functions for integers interpret these flags to determine how integers are to be formatted and act accordingly. However, these functions only work correctly for the bases decimal, octal, and hexadecimal (well, at least these are the only bases for which they are guaranteed to work).

Before going more into the formatting flags let's discuss how these manipulator functions actually work. The above function is used to manipulate the stream with an expression like this:

std::cout << std::hex;

What happens is actually pretty simple: the shift operator is overloaded to take arguments with the signature std::ios_base&(*)(std::ios_base&) and this overload just calls the corresponding function, i.e. something like this (actually, this function is implemented as a template but this would only obfuscate the issue):

std::ostream&
std::ostream::operator<<
   (std::ios_base&(*m)(std::ios_base&)) {
  m(*this);
  return *this;
}

That is, if you want to implement a manipulator, you would just implement a function with the appropriate signature: it takes a stream (or one of its base classes: std::ios_base or std::ios) as argument and returns this argument again (the argument and return type have to be identical but there is some choice toward the argument types you can use). The function would just do the manipulation on the argument and then return the argument.

To implement a manipulator which modifies all integer output to become binary is, however, non-trivial because it requires interfering with how integers are formatted and the standard routines for this are not prepared to support bases other than 8, 10, and 16. However, it is doable because it is possible to supply the formatting code for integers by implementing a class derived from the std::num_put facet which is then installed (when necessary) by the manipulator.

Formatting Integers

My guess is that talking about facets is somewhat confusing so let's walk through this whole thing, although most of the stuff will not be related to manipulators directly (some additional stuff on manipulators will, however, come up below).

Facets are a means to adapt certain stuff to local conventions. For example, in Germany we use "," as a decimal point and "." as a thousands separator while in other (weird) places, "." is used as a decimal point and "," as a thousands separator. To adapt output (and other stuff) to the conventions the user is used to, the C++ library uses "facets" which are just classes obeying a few requirements (essentially, each object has to have a public member of type std::locale::id named id and all public functions should be const, i.e. the objects should be immutable). There are several of them but the interesting one here is num_put: the facet doing numerical formatting. Actually, this is not a class but a class template. I will use a template here because it is less confusing than using the specialization we will need to install later.

To replace the functions for formatting integers, a class is derived from num_put and a few functions are overridden:

template <typename cT, typename OutIt>
class bin_num_put: public
std::num_put<cT, OutIt> {
  OutIt do_put(OutIt to,
               std::ios_base& fmt,
               cT fill,
               long v) const;

  OutIt do_put(OutIt to,
               std::ios_base& fmt,
               cT fill,
               unsigned long v) const;
};

There are just two functions dealing with integer values, one for signed and one for unsigned ones. Each function gets an output iterator as argument to write individual characters to. After writing the characters the iterator is returned as the result of the function. The second parameter is a reference to an object holding formatting information. The third parameter is the character to be used for padding a value to a specific length. The last parameter is the value to be formatted.

Since I'm mostly interested in getting the principle right, I will just stick to a rather simple implementation using a fixed width of digits. A "real" implementation might want to omit leading zeros (this isn't really hard to implement either). Effectively, just one formatting function is needed because formatting of signed values can be delegated to formatting unsigned values in this case. The formatting functions will use another facet, ctype, to convert the characters 0 and 1 to values of the appropriate character type:

template <typename cT, typename OutIt>
OutIt
bin_num_put<cT, OutIt>::do_put(OutIt to,
                               std::ios_base& fmt,
                               cT fill,
                               long v) const {
  return do_put(to, fmt, fill,
                static_cast<unsigned long>(v));
}

template <typename cT, typename OutIt>
OutIt
bin_num_put<cT, OutIt>::do_put(OutIt to,
                               std::ios_base& fmt,
                               cT fill,
                               unsigned long v) const {
  char narrow[] = "01";
  cT wide[2] = { 0 };
  std::use_facet<std::ctype<cT> >(
    fmt.getloc()).widen(begin(narrow),
    end(narrow) - 1,
    begin(wide));
  cT buffer[std::numeric_limits<unsigned long>::digits];
  std::fill(begin(buffer),
  end(buffer),
  wide[0]);
  cT* end = end(buffer);
  for (; v != 0; v /= 2)
    *-end = wide[v % 2];
  return std::copy(begin(buffer),
  end(buffer), to);
}

The above code uses the following two auxiliary functions to get an iterator (in this case actually just a pointer) to the beginning and the end of a statically sized array:

namespace {
  template <typename T,
    int sz> T* begin(T (&a)[sz]){
    return a;
  }

  template <typename T,
            int sz> T* end(T (&a)[sz]) {
    return a + sz;
  }
}

If you don't understand these two functions, just don't worry about them: I'm using them to conveniently get iterators to the beginning and the end of a statically sized array.

The above code explicitly excludes the last element of the array narrow (this is what the -1 is good for). The reason for this is that the array narrow has the size three: the null character at the end of the string is included in the array. That is, the line initializing narrow is equivalent to this one:

char narrow[] = { '0', '1', 0 };

The actual formatting of the binary number is trivial: an array with sufficient zeros is obtained and it is filled with digits starting from the end until there are no further digits. This approach also works for formatting integers for bases other than two and this will be used below. Once all digits are available, the array is copied to its destination. The only somewhat tricky part is getting the appropriate characters representing 0 and 1 because we don't really know the character type. This is done by using the facet std::ctype<cT> which can "widen" a narrow character to the corresponding character type.

The bin Manipulator

OK, now that the routines for formatting integers as binary are implemented, they need to be installed into our stream to have them used. Before implementing a corresponding manipulator, lets do this in a small test program. The facets are objects held by a "locale" and it is necessary to construct a locale object with the default num_put facet replaced by the new facet. To provide this, it is necessary to instantiate the bin_num_put class template with appropriate types, i.e. with char as character type and std::ostreambuf_iterator<char> as iterator type: these are the types used when doing numeric formatting using std::cout. If a wide character stream like std::wcout is used, char has to be replaced by wchar_t, of course. Once the new locale is constructed from an existing locale and the bin_num_put class, it is installed into the corresponding stream using the imbue() function:

int main() {
  typedef std::ostreambuf_iterator<char>
                               iterator;
  std::locale loc =
    std::locale(std::cout.getloc(),
                new bin_num_put<char, iterator>());
  std::cout.imbue(loc);
  std::cout << 10 << "\n";
}

We can just put the code installing the binary formatting into a manipulator using an appropriate template to make it feasible with all kinds of streams:

template <typename cT, typename traits>
std::basic_ios<cT, traits>&
bin(std::basic_ios<cT, traits>& ios) {
  typedef std::ostreambuf_iterator<cT>
                             iterator;
  std::locale loc =
             std::locale(ios.getloc(),
             new bin_num_put<cT, iterator>());
  ios.imbue(loc);
  return ios;
}

... and use it in the expected way:

std::cout << bin << 10 << "\n";

Well, except that there is no easy way to turn binary formatting off again! Since we have replaced the formatting routine we can use the std::hex manipulator as often as we want: there will be no change at all.

std::cout << bin << 10 << std::hex
          << 10 << "\n"; // does not work

To do something like this, it is necessary to take the value of formatting flags into account and act correspondingly. Before supporting use of the standard manipulators it is, however, useful to adapt the case to cope with arbitrary bases.

Storing Formatting Information

To do this, the selected base should be stored with the stream such that it can be used by the formatting function. The obvious place to store such information is in an iword() of the fmt member. Here are corresponding manipulators which also install the needed facet only if it is not yet present:

static int base_index =
                std::ios_base::xalloc();
template <typename cT, typename traits>
std::basic_ios<cT, traits>&
install_bin(std::basic_ios<cT,traits>& ios,
            int base) {
  ios.iword(base_index) = base;
  typedef std::ostreambuf_iterator<cT>
                                iterator;
  if(!dynamic_cast
      <bin_num_put<cT, iterator> const*>(
        &std::use_facet<std::num_put<cT,
          iterator> >(ios.getloc())))
    ios.imbue(std::locale(ios.getloc(),
    new bin_num_put<cT, iterator>()));
  return ios;
}

template <typename cT, typename traits>
std::basic_ios<cT, traits>&
bin(std::basic_ios<cT, traits>& ios) {
  return install_bin(ios, 2);
}

template <typename cT, typename traits>
std::basic_ios<cT, traits>&
oct(std::basic_ios<cT, traits>& ios) {
  return install_bin(ios, 8);
}

template <typename cT, typename traits>
std::basic_ios<cT, traits>&
dec(std::basic_ios<cT, traits>& ios) {
  return install_bin(ios, 10);
}

template <typename cT, typename traits>
std::basic_ios<cT, traits>&
hex(std::basic_ios<cT, traits>& ios) {
  return install_bin(ios, 16);
}

The function xalloc() "allocates" a new index for formatting information in the stream objects. This index can be used with the iword() function of streams: this function returns a reference to an integer. This integer is associated with the stream. Initially, the value returned is set to zero but the above code does not take advantage of this feature. If an integer is not sufficient, a pointer to the formatting information can be stored using the pword() function.

There is a function called by the various manipulators which sets the corresponding base and checks whether the appropriate facet is installed. This is done by obtaining the currently installed facet and testing whether it is an instantiation of bin_num_put. If it is not, this facet is installed. What remains to be done is to use the base in the facet, too. The modified code looks like this:

template <typename cT, typename OutIt>
OutIt
bin_num_put<cT, OutIt>::do_put(OutIt to,
                  std::ios_base& fmt,
                  cT fill,
                  unsigned long v) const {
  char narrow[] = "0123456789abcdef";
  cT wide[16] = { 0 };
  std::use_facet<std::ctype<cT> >(
        fmt.getloc()).widen(begin(narrow),
        end(narrow) - 1, begin(wide));
  cT buffer[std::numeric_limits<unsigned
                          long>::digits];
  std::fill(begin(buffer), end(buffer),
                                wide[0]);
  int base = fmt.iword(base_index);
  for (cT* it = end(buffer)
       ; v != 0
       ; v /= base)
    *-it = wide[v % base];
    return std::copy(begin(buffer),
                     end(buffer),
                     to);
}

Now the manipulators can be tested. For example:

int main(int ac, char* av[]) {
int val = ac == 1 ? 10
                   : std::atoi(av[1]);
std::cout << "bin: " << bin << val
                              << "\n";
std::cout << "oct: " << oct << val
                              << "\n";
std::cout << "dec: " << dec << val
                              << "\n";
std::cout << "hex: " << hex << val
                              << "\n";
}

This code is not yet perfect. Actually, several things need to be handled but these are relatively simple and don't need specific new knowledge of the standard library. In particular, the following aspects are not yet addressed but would need handling in a reasonable implementation:

  • Negative values conventionally use a minus sign followed by the absolute value rather than the two's complement. That is, the function taking a long as argument cannot directly use the unsigned long version, at least not for negative decimal values.

  • Although quite usual for binary values, leading zeros are normally stripped for other bases. To get leading zeros for binary values while omitting them for other bases, the width() currently installed in the stream could be used.

  • The formatting has to take care of padding, i.e. it has to add fill characters: if width() is non-zero, there should be at least that many characters written to the sequence. Padding is a little bit tricky because there are three possible places where padding, i.e. copies of the fill argument, should go:

    • to the left of the value

    • the right of the value

    • between a leading sign and the value or to the left if there is no sign

This is specified by fmt.flags() & std::ios_base::adjustfield(): the corresponding values are left, right, and internal. In any case, after the formatting, the width() should be set to 0.

Arbitrary Bases

Of course, most of the formatting issues could be taken care of by the base class: the do_put() function could check whether the base is 2 and if it is not delegate processing to the base class. On the other hand, the above facet is capable of formatting integers according to arbitrary bases as long as the base is bigger than 1 and there are sufficient different characters configured to represent the digits. A manipulator setting an arbitrary base would, however, require a parameter. The approach to manipulators with parameters is to just provide a class with a suitable constructor and a shift operator:

struct setbase {
  setbase(int base): mBase(base) {}
  int mBase;
};

template <typename cT, typename traits>
std::basic_ostream<cT, traits>&
operator<< (std::basic_ostream<cT,
                            traits>& os,
            setbase const& sb) {
  install_bin(os, sb.mBase);
  return os;
}

This manipulator is obviously used identically to the std::setw or std::setprecision manipulators:

std::cout << setbase(3) << 10 << "\n";

The only problem with this manipulator is that the user can set bases which are out of the supported range (with the code above [2, 16]).

Now let's get back to supporting the standard manipulators: it would be useful if the standard manipulators could still be used, eg. for mixed binary and hexadecimal output:

std::cout << "binary: " << bin << i
          << "\n" << "hexadecimal: "
          << std::hex << i << "\n";

To do so, the formatting code has to become aware of the use of std::hex. This can be detected if the special manipulators clear all bits in the basefield: the standard manipulators have to set some bits because the case where no bits are set is treated specially for integer input (it is equivalent to the %i format specifier of scanf(), i.e. the base of the integer read is determined by the first digits). Thus, the binary formatting code can be rewritten to take special action if the basefield is nonzero. A simple approach is delegating processing to the base class in this case. This is achieved by adding these two lines to the start of the do_put() function:

if(fmt.flags() & std::ios_base::basefield)
  return std::num_put<cT,
        OutIt>::do_put(to, fmt, fill, v);

The change to the manipulator is even simpler: it just takes the following line to clear the bits in the basefield:

ios.unsetf(std::ios_base::basefield);

Of course, the overall semantics of using the base class version change the behavior to some extent. At least the open issues noted above are covered. Also, the standard do_put() functions take care of thousands separators (if these are configured for the locale) and some special formatting like upper and lower case letters for hexadecimal values.

Stream Callbacks

As a final round-off to the IOStream manipulator discussion let's deal with those funny callbacks defined in std::ios_base. Streams support registration of callbacks which are called in case of certain events. The main use of these callbacks is support for resource management when associating pointers with streams via the pword() function. There are three events defined in std::ios_base:

erase_event:

This event is notified when resources associated with the stream should be released. This event is called when the stream is destroyed and prior to copying when copyfmt() is called.

imbue_event:

This event is notified when a new locale is imbue()ed into the stream. Since we modified the locale to take care of binary formatting, the code below demonstrates how this event is caught to modify the new locale, too.

copyfmt_event:

This event is notified when copyfmt() is called, after copying all formatting data to the stream. The intent of this event is to either do a deep copy of objects pointed to (the stream merely copies the pointers) or maintain a reference count.

Stream callbacks are rather primitive: only functions with the signature void(*)(std::ios_base::event, std::ios_base&, int) are supported. The first parameter identifies the event being notified, the second identifies the stream object for which the event is notified, and the third parameter is a user parameter passed when registering an event. The callback just handles the imbue_event and imbues a modified locale if the corresponding num_put facet is not a modified one (note that it has to be checked whether the facet is already there to prevent an infinite recursion):

template <typename cT, typename traits>
void
bin_callback(std::ios_base::event ev,
             std::ios_base& ios, int) {
  typedef std::ostreambuf_iterator<cT>
                              iterator;
  if(ev == std::ios_base::imbue_event
     && !dynamic_cast<bin_num_put<cT,
                    iterator> const*>(
        std::use_facet<std::num_put<cT,
            iterator> >(ios.getloc())))
    ios.imbue(std::locale(ios.getloc(),
      new bin_num_put<cT, iterator>()));
}

template <typename cT, typename traits>
std::basic_ios<cT, traits>&
install_bin(std::basic_ios<cT,
               traits>& ios, int base) {
  typedef std::ostreambuf_iterator<cT>
                               iterator;
  if (!dynamic_cast<bin_num_put<cT,
                      iterator> const*>(
        &std::use_facet<std::num_put<cT,
           iterator> >(ios.getloc()))) {
    ios.imbue(std::locale(ios.getloc(),
       new bin_num_put<cT, iterator>()));
    ios.register_callback(
    bin_callback<cT, traits>, 0);
  }
  ios.iword(base_index) = base;
  return ios;
}

The callback is registered when a new locale is installed. Since this basically inhibits reinstalling the original locale without using copyfmt() (copyfmt() copies the locale without triggering the imbue_event), it is not necessarily the best design. On the other hand, it might be a reasonable thing to do anyway and the best thing I could think of for demonstrating stream callbacks with this example.

Conclusions

  • Manipulators are just functions with certain possible signatures. The possible signatures are

    std::ios_base& (*)(std::ios_base&)
    
    template <typename cT, typename traits>
    std::basic_ios<cT, traits>&
      (*)(std::basic_ios<cT, traits>&)
    
    template <typename cT, typename traits>
    std::basic_ostream<cT, traits>&
      (*)(std::basic_ostream<cT, traits>&)
    
    template <typename cT, typename traits>
    std::basic_istream<cT, traits>&
      (*)(std::basic_istream<cT, traits>&)
    
  • Manipulators can use the functions xalloc(), iword(), and pword() to associate data with a stream.

  • Numeric formatting used by the stream classes is done via facets which can be customized to suit specific needs.

Notes: 

More fields may be available via dynamicdata ..