    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: String Theory</title>
        <link>https://members.accu.org/index.php/articles/728</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Programming Topics + CVu Journal Vol 8, #2 - Apr 1996</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c65/">Programming</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c136/">082</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c65-136/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c65+136/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;String Theory</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 03 April 1996 13:15:27 +01:00 or Wed, 03 April 1996 13:15:27 +01:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e21" id="d0e21"></a></h2>
</div>
<p>In the previous article I looked at the use of string literals
for initialising arrays of char. These literals do not have any
existence as detectable and measurable runtime entities. However,
there do appear to be times that literals do have some role as
actual runtime objects, as demonstrated by the familiar</p>
<pre class="programlisting">
int main(void) 
{
  puts(&quot;hello world&quot;);
  return 0;
}
</pre>
<p>Here the literal is clearly being used as a runtime object. So
how does this differ from, say</p>
<pre class="programlisting">
int main(void) 
{
  char message[] = &quot;hello world&quot;;
  puts(message);
  return 0;
}
</pre></div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e31" id="d0e31"></a>Welcome to the
machine</h2>
</div>
<p>To appreciate the subtleties of C strings, it helps to
understand a little about the underlying C view of the world, in
particular storage and duration.</p>
<p>When used as an aggregate initialiser, i.e. when initialising an
array of char, the string literal is merely a convenient short hand
for the longer non-text oriented aggregate initialiser syntax. As I
illustrated last time, the following</p>
<pre class="programlisting">
char message[] = &quot;hello&quot;;
</pre>
<p>is equivalent to</p>
<pre class="programlisting">
char message[6] = {'h', 'e', 'l', 'l', 'o', '\0'};
</pre>
<p>The literal has no meaningful existence at runtime, and the
message object is declared as having the default storage class for
its scope. Within a function this means that it is an <tt class=
"literal">auto</tt> variable, i.e. it lives on the
stack<sup>[<a name="d0e49" href="#ftn.d0e49" id=
"d0e49">1</a>]</sup>. Every time that function is entered enough
space for the message variable and any other locals is pushed onto
the stack and initialisation occurs. On exiting the function the
space is popped off the stack as control returns to the caller: any
pointers to this space are now invalid. For example, consider the
following relatively common coding error:</p>
<pre class="programlisting">
const char *bool_name(int truth) 
{
  const char false_name[] = &quot;false&quot;;
  const char true_name[] = &quot;true&quot;;
  return truth ? true_name : false_name;
}
</pre>
<p>The two variables declared, and hence all their elements, are
defined on the stack. The function returns pointers to an area of
the stack that is no longer valid. Technically the behaviour
resulting from the use of the return value is <span class=
"emphasis"><em>undefined</em></span>. The oft quoted Chinese curse
&quot;may you live in interesting times&quot; is a good plain language
translation of the standardese terminology!</p>
<p>The following code, however, is just what is intended:</p>
<pre class="programlisting">
const char *bool_name(int truth) 
{
  const char *false_name = &quot;false&quot;;
  const char *true_name = &quot;true&quot;;
  return truth ? true_name : false_name;
}
</pre>
<p>In this case the variables are pointers not arrays, and they
point to the literals themselves. In answer to the question posed
in the title of the article: &quot;<tt class="literal">this</tt>&quot; is an
anonymous array of <tt class="literal">char</tt> with <tt class=
"literal">static</tt> duration.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e78" id="d0e78"></a>Static</h2>
</div>
<p>The string literal survives the duration of the program, like
any other static variable. Also, like other static variables, it
has internal linkage which effectively means that no name is
exported to the linker. It is truly anonymous, so there is no way
to refer to the storage used by name in a portable manner.</p>
<p>The compiler is entitled to optimise use of storage by providing
only one definition for a string literal that is repeated within a
single translation unit, ie. a conventional C or C++ source file.
For example, the repeated string &quot;<tt class="literal">%s%s\n</tt>&quot;
in the following code need not imply separate implementation arrays
for every occurrence of the literal:</p>
<pre class="programlisting">
void prefixed(const char *base, const char *prefix) 
{
  printf(&quot;%s%s\n&quot;, prefix, base);
}
void suffixed(const char *base, const char *suffix) 
{
  printf(&quot;%s%s\n&quot;, base, suffix);
}
</pre>
<p>On many compilers it is an option to merge duplicate strings. A
compiler using merged strings can be detected easily using</p>
<pre class="programlisting">
int unique = &quot;hello&quot; != &quot;hello&quot;;
</pre>
<p>This does a comparison of the pointers, and <span class=
"emphasis"><em>not</em></span> the contents - for that use the
<tt class="literal">strcmp</tt> library function. There is not a
lot you can, or should, do with this information, but it might be
interesting to know.</p>
<p>Given that the strings themselves effectively refer to storage,
we can simplify our original example:</p>
<pre class="programlisting">
const char *bool_name(int truth) 
{
  return truth ? &quot;true&quot; : &quot;false&quot;;
}
</pre>
<p>It is as if the compiler generated something like the
following:</p>
<pre class="programlisting">
static const char __false[] = &quot;false&quot;;
static const char __true[] = &quot;true&quot;;
const char *bool_name(int truth) 
{
  return truth ? __true : __false;
}
</pre></div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e110" id="d0e110"></a>Look but
don't touch</h2>
</div>
<p>Alas, history tends to always interfere with simplicity.
Although a string literal is intuitively read-only, i.e. <tt class=
"literal">const</tt>, strings existed in C before the idea of
<tt class="literal">const</tt>-ness [I looked at this issue in a
little more detail in &quot;Literally yours&quot;, Overload 11]. Thus there
is a lot of legacy code of the form</p>
<pre class="programlisting">
char *message = &quot;hello&quot;;
</pre>
<p>If string literals were made <tt class="literal">const</tt>
overnight a remarkable amount of code would break, so the declared
type remains non-<tt class="literal">const</tt>. However, just to
confuse the issue, this does not mean that you are entitled to
modify the string. Although the following will compile, the runtime
behaviour is undefined:</p>
<pre class="programlisting">
&quot;help&quot;[3] = 'l';
</pre>
<p>It is as if the string is itself <tt class="literal">const</tt>,
but the compiler casts away this <tt class=
"literal">const</tt>-ness during compilation. The compiler is
entitled, and some do, to place strings in write protected memory
(such as the program's code segment on some systems). Overwriting a
string literal might cause your application to fall over.</p>
<p>Although that is an obvious incentive to always treat literals
as const, we can see that it also makes sense intuitively. Given
that literals are static then once changed they would not simply
revert back to their former content next time through a function.
That duplicate strings might have been optimised together merely
adds to the list of possible sources of bugs. The true read-only
nature of literals should be honoured, and company coding standards
as well as personal practice should enforce this:</p>
<pre class="programlisting">
char *no_no = &quot;don't do this&quot;; /* disallow */
const char *fine = &quot;do this&quot;; /* OK */
</pre></div>
<div class="footnotes"><br>
<hr class="c3" width="100">
<div class="footnote">
<p><sup>[<a name="ftn.d0e49" href="#d0e49" id=
"ftn.d0e49">1</a>]</sup> Some compilers and compile-time checking
tools will detect and warn you about non-<tt class=
"literal">const</tt> usage.</p>
</div>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
