    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: An alternative to wchar_t</title>
        <link>https://members.accu.org/index.php/articles/601</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Programming Topics + Overload Journal #6 - Mar 1995</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c65/">Programming</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c177/">06</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c65-177/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c65+177/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;An alternative to wchar_t</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 27 March 1995 18:22:18 +01:00 or Mon, 27 March 1995 18:22:18 +01:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e18" id="d0e18"></a></h2>
</div>
<p>Not all useful integers on a given machine are necessarily
represented by C's int type; so there is long int with a minimum
range of a 32 bit one's complement integer. Likewise, not all
character sets may be represented by char, so there is a need for a
wider character type. What should its name and representation be?
There appear to be three possibilities for implementation: typedef
an implementation dependent integral type; define a standard
struct, or class in C++, that appropriately describes the wide
character; or add a new primitive type to the language.</p>
<p>The question boils down to what criteria should be used in
deducing whether something is genuinely a new language type:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>portability is often a driving requirement for both new types
and new type names;</p>
</li>
<li>
<p>meeting a previously unfulfillable need is often the indicator
for a new type, built-in or otherwise;</p>
</li>
<li>
<p>a literal constant form seems to indicate a new built-in
type;</p>
</li>
<li>
<p>miscibility with existing primitives indicates a new built-in
type in C, but not necessarily in C++;</p>
</li>
<li>
<p>the need to strongly distinguish between types is an indication
of a new type, especially in C++.</p>
</li>
</ul>
</div>
<p>Portability defined the need to add the types ptrdiff_t and
size_t. The opaque fpos_t type was added to &lt;stdio.h&gt; to
allow portable positioning within very large files using the
fgetpos and fsetpos functions. Portability was also a reason for
adding the third char type, signed char. This move also plugged an
obvious type gap in the language. The addition of long double as a
type met the demands for higher precision numerical computation. A
primitive bool type has recently been added to C++ to allow
differentiation from int for function overloading. It will also
reduce the countless roll-you-own Boolean enums, typedefs and
classes littering application and library code today. Many have
tried, but it is impossible to create a useful Boolean enumeration
or class in C++.</p>
<p>Before ANSI the need for wide characters was not explicitly
catered for in C. Programmers of truly international software were
forced to use raw integers for wide characters or a multi-byte
representation. Widespread use of the language meant that with
standardisation internationalisation was a top priority. This has
lead to the addition of locales as well as basic support for wide
and multi-byte characters to represent non-western character sets.
The number and scope of these functions are sure to be extended in
the next revision of the ISO C standard; it is a shame that with
the exception of locales they all ended up in
&lt;stdkitchensink.h&gt;.</p>
<p>The ANSI C committee added wchar_t, a synonym for an existing
integral type, to &lt;stddef.h&gt; and &lt;stdlib.h&gt;. This makes
wide characters easier to use than a struct such as XChar2b used
for representing 16 bit characters in X. The committee also added
manifest constant forms to the language for wide characters and
strings:</p>
<pre class="programlisting">
wchar_t Char = L'a';
wchar_t String[] = L&quot;a&quot;;
</pre>
<p>One would have thought that any type that had a literal form was
obviously primitive: adding a new language type, rather than simply
aliasing an integer, would appear to be the correct approach.
However, C's already confused notion of char and int sets a
precedent:</p>
<pre class="programlisting">
sizeof('a') == sizeof(int)
</pre>
<p>The new literal form for wide characters is effectively just
another form of integer constant. In C++ the notion of exact type
rather than coercible type plays a more fundamental role. Much of
this nonsense has been sorted out:</p>
<pre class="programlisting">
sizeof('a') == sizeof(char)
</pre>
<p>The joint C++ standardisation committees have also recognised
that wide characters deserve a type of their own, adding wchar_t as
a new keyword and integral type. To understand this decision
consider the problem of overloading output functions:</p>
<pre class="programlisting">
void Print(char);
void Print(wchar_t);
void Print(int);
</pre>
<p>This is not portable if wchar_t is a typedef or a macro because
it will be a synonym of an integral type. An alias for int will
cause a number rather than a character to be printed out. The
compiler would also object to encountering a second definition of
Print(int), assuming that all Print functions were defined in the
same translation unit, otherwise the ball gets passed to the
linker. A first cut solution is to introduce wchar_t as a standard
library class. However, what type does that make literals like
L'a'? The only solution in this case is to add a new language type.
I would hasten to add that this in not just a solution for hacking
C++, but a retrospective correction of what should originally have
happened in C.</p>
<p>The only thing that remains for me to say against wchar_t is
that the name is dreadful. Adding new keywords is always a problem,
but wchar_t must count as one of the clumsiest - especially since
the _t suffix has traditionally indicated a typedef<sup>[<a name=
"d0e64" href="#ftn.d0e64" id="d0e64">1</a>]</sup>. I will, however,
grant you that this new keyword is not likely to break many
programs. (I am still surprised to see C programmers using class
and try as identifiers. Where have they been? More to the point,
where are they going?)</p>
<p>Given that long int and long double are the wider versions of
int and double, what is wrong with long char? This requires no new
keywords and it is also more obviously a character type.
Interestingly the syntax of C and C++ does not exclude this
formation. One criterion that Bjarne Stroustrup has used in
deciding between features is to gauge how easy it would be to teach
and learn them. That long char is unmistakably a character type and
goes a long way to achieving this.</p>
<p>I do not feel the necessity to further complicate this type with
sign, but signedness could obviously follow the char model if
required. For compatibility long char must have the same
implementation as one of the standard integral types:</p>
<pre class="programlisting">
sizeof(long char) == sizeof(char) ||
sizeof(long char) == sizeof(short) ||
sizeof(long char) == sizeof(int) ||
sizeof(long char) == sizeof(long)
</pre>
<p>The retrofit for both C and C++ would be to add the new type to
the language and simply mandate that it is the synonym type for
wchar_t. The decision to include wchar_t as a built-in type in C++
is not so old and widely implemented that it cannot be reversed to
be replaced by long char. I recognise that the schedule for
creating the C++ standard is already pressed and that this
suggestion is not simply a global replacement of long char for
wchar_t in the forthcoming draft. However, I do not believe it to
be complex - unlike run-time type identification, exception
handling and namespaces, for instance - and in many senses it is a
reduction and not an extension. It is certainly more in the spirit
of the language.</p>
<p>With this in mind, I have submitted a proposal to ISO for such a
change (for ISOlogists the proposal is numbered WG21/N0507). My
thanks to Sean Corfield for his feedback and for agreeing to
propose it - read his column, The Casting Vote, in the coming
months to find out which way this and a number of other issues go.
The feedback has generally been good, but Francis informed me that
at a recent ISO C meeting Plauger was less than impressed with the
hidden implication that the C standard is anything less than
perfect! Oh well, you can't please all of the people...</p>
</div>
<div class="footnotes"><br>
<hr class="c2" width="100">
<div class="footnote">
<p><sup>[<a name="ftn.d0e64" href="#d0e64" id=
"ftn.d0e64">1</a>]</sup> This said, Francis pointed out to me that
none of the types ending in _t in the standard are required to be
typedefs; macros also satisfy the spec.</p>
</div>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
