    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Reading Integers Revisited</title>
        <link>https://members.accu.org/index.php/journals/1036</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">CVu Journal Vol 12, #4 - Jul 2000 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c125/">124</a>
                    (22)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c65/">Programming</a>
                    (877)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c125-65/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c125+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Reading Integers Revisited</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 03 July 2000 13:15:38 +01:00 or Mon, 03 July 2000 13:15:38 +01:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e18" id="d0e18"></a></h2>
</div>
<p>Like your Editor, I too rush to my contribution in a magazine
that falls through my letter box: I want to see how the Editor has
corrected my English and made the item comply with House style so
that next time I can send him something better. All right, its
vanity and I like to see my name in print.</p>
<p>But unlike your Editor, I find that a week spent writing,
testing, checking and cross-checking my work has failed to detect a
fault that leaps out at me from the printed page. So please let me
try to correct one now.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e24" id="d0e24"></a>The
Problem</h2>
</div>
<p>This all concerns my piece entitled Reading Integers in the
March 2000 issue of C Vu.</p>
<p>It is to do with the call to <tt class="function">strtol</tt>,
which returns a <tt class="type">long int</tt>, and what happens
when that type is cast to an <tt class="type">int</tt>. It also
depends on what your compiler does, and, to some extent, on what
you computer (or the microprocessor in it) does as well.</p>
<p>Let me assume, as happens on my computer, that the compiler
generates code for 16-bit <tt class="type">int</tt>s and 32-bit
<tt class="type">long</tt>s. So an <tt class="type">int</tt> can
hold all values between -32768 and +32767 while a <tt class=
"type">long int</tt> can hold values between -2147483648 and
+2147483647.</p>
<p>You can see how your compiler implements <tt class=
"type">int</tt>s and <tt class="type">long</tt>s by looking in the
standard header file <tt class="filename">&lt;limits.h&gt;</tt>.
This should contain definitions of <tt class=
"constant">INT_MIN</tt>, <tt class="constant">INT_MAX</tt>,
<tt class="constant">LONG_MIN</tt> and <tt class=
"constant">LONG_MAX</tt>, which are the most negative and most
positive legal <tt class="type">int</tt>s and the most negative and
most positive legal <tt class="type">long int</tt>s respectively.
You will probably find the values I gave above for these symbols,
but you do need to know what your compiler does.</p>
<p>Keeping the above in mind, let us see what my original code
does.</p>
<p>After the message prompting for user input is written,
<tt class="function">strtol</tt> is called. This reads in a number,
which let us say is 1000, and returns that value as a <tt class=
"type">long int</tt>.</p>
<p>Somehow, this is cast (the type of the number is changed) to an
<tt class="type">int</tt>, and that value is stored in the variable
<tt class="varname">i</tt>. Now 1000 is a legal value for an
<tt class="type">int</tt>, so all is well, and my code can then
safely check that something was read from the line, and that it is
in the intended range.</p>
<p>The key to the fault I have created is that 'somehow' at the
beginning of the previous paragraph.</p>
<p>Let us see what occurs if a number like 67000 is entered. This
is a perfectly legal value for a <tt class="type">long int</tt>, so
that is what <tt class="function">strtol</tt> returns.</p>
<p>But this is most certainly not a legal value for an <tt class=
"type">int</tt>, so what happens when the value is cast to an
<tt class="type">int</tt> and an attempt made to save it in i? My
compiler merely takes the bottom part of the <tt class="type">long
int</tt> value, calls it an <tt class="type">int</tt> and stores
that. And so the value 1464 is written to <tt class=
"varname">i</tt>. (To see why, work out 67000 as a 32-bit binary
number, take just the 16 least significant bits, ignoring the high
order bits, and convert back to a decimal number. You should get
1464.)</p>
<p>So <tt class="varname">i</tt> has had a value stored in it that
is nothing like the intended value, and my subsequent range
checking fails to spot this.</p>
<p>I cannot say what your compiler does (though you should find
out). If your int is the same size as a long int (as sometimes
happens), nothing will go wrong.</p>
<p>But the Standard says that when one converts from an integral
type (which includes all the integer types, of whatever length) to
a signed integer of shorter length (which is what the Standard
calls 'demotion' and is what I am doing here), the result is
'implementation defined'. That is, the compiler can do what it
likes (<i><span class="remark">but the implementor must document
it. FG</span></i>).</p>
<p>I suspect most compilers will do as mine does, and merely store
the lower part of the long in the int variable. But some compilers
might notice the problem, refuse to complete the cast, and stop the
program with a run-time error. Indeed some languages, such as Ada,
specify that this is exactly what must happen (though Ada also
allows the programmer to recover control and attempt to repair the
problem).</p>
<p>More interestingly (read 'worryingly'), the Standard does say
what to do when an integral type is demoted to an unsigned integer.
It says you get the least significant bits. At least, that is what
it implies in the sort of long sentence that gives Standards
Documents their well deserved reputation for being unreadable.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e147" id="d0e147"></a>The
Solution</h2>
</div>
<p>This is not a very useful state of affairs for my original
function. How can I correct this?</p>
<p>Well, the first step is that the range checking will have to be
carried out with <tt class="type">long int</tt>s. And that means
that the type of <tt class="varname">i</tt> must be changed to a
<tt class="type">long</tt>. Now, I do not like <tt class=
"type">long</tt>'s called <tt class="varname">i</tt> as a matter of
principle, so I have also changed the name of the variable to
<tt class="varname">l</tt>. And of course, the cast of the value
returned by <tt class="function">strtol</tt> has to come out.</p>
<p>The range check should be changed to account for the fact that I
am now comparing a <tt class="type">long</tt> (the number entered)
with an <tt class="type">int</tt> (the user's limiting values). The
compiler will do this implicitly, though I prefer to do it
explicitly.</p>
<p>But one thing I do not want to do is to change the interface to
my function and make the limits <tt class="type">long int</tt>s.
That would annoy all the people who are using it even more because
not only have I got the function wrong, but they would have to do
some work to fix it. And fortunately that is not necessary.</p>
<p>The upper and lower bounds can remain as <tt class=
"type">int</tt>s in the function prototype: they need only be cast
to <tt class="type">long int</tt>s when the time comes to perform
the range checking. And anyway, it is reasonable for the user to
expect the limits to be specified in the same type as he is trying
to read in.</p>
<p>There is one other necessary change. The format, or second,
parameter in the call to <tt class="function">fprintf</tt> which is
used when the number read in is out of range must be changed to
print a <tt class="type">long int</tt> rather than merely an
<tt class="type">int</tt>.</p>
<p>Finally, not necessary, but advisable, is to cast the type of
<tt class="varname">l</tt> to an <tt class="type">int</tt> in the
return statement. The compiler will do the necessary casting for
you implicitly, but by doing it explicitly you tell the reader that
whilst you have calculated the function result as one type, you
know a different type is to be returned.</p>
<p>And how did I come to miss this bug in the first place?</p>
<p>Well, I have been doing a lot of work over the past year in a
language other than 'C' on a machine where <tt class=
"type">int</tt> is the longest available integer type and there is
no speed penalty to be paid for using that instead of one of the
several available shorter integer types. So because all the system
calls use the <tt class="type">int</tt> type, and to avoid having
to worry about casting between different integer types, all my
integers are <tt class="type">int</tt>s. And in writing the piece
for you, I forgot about the trap and fell into it.</p>
<p>Writing for one machine in the day and another in the evening is
not as easy as it sounds.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e230" id="d0e230"></a>Some Other
Points</h2>
</div>
<p>The Editor raised a couple of points at the end of my piece that
I should also address.</p>
<p>On checking the Standard, I now see that <tt class=
"function">strtol</tt> accepts all consecutive characters which
might form a legal integer of the given final base argument without
regard for the largest number that can be saved in a <tt class=
"type">long int</tt>. So your Editor is right and I was wrong.
Having accepted the characters, <tt class="function">strtol</tt>
then attempts to form that integer and sets the error flag
<tt class="varname">errno</tt> if it does not fit a <tt class=
"type">long</tt>.</p>
<p>The reason for my using <tt class="function">strcpy</tt> to
overwrite the comment character is that in addition to the
'<tt class="literal">\n</tt>' character, <tt class=
"function">strcpy</tt> also writes a '<tt class="literal">\0</tt>'.
So after I have deleted the comment, the string in line continues
to look as though it could have been returned by <tt class=
"function">fgets</tt>.</p>
<p>The point is not really important in this example, but I suggest
that in large programs, keeping the style of an object (like a
string) the same when one changes a bit in the middle can avoid
causing problems elsewhere in the program.</p>
<p>If you do not believe this, do you think that <tt class=
"function">gets</tt> and <tt class="function">fgets</tt> reading
from <tt class="literal">stdin</tt> should do different things?
Quickly now, which one does what?</p>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e282" id="d0e282"></a>More Error
Handling</h3>
</div>
<p>I said above that <tt class="function">strtol</tt> indicates in
<tt class="varname">errno</tt> if the number it tried to read would
not fit in a <tt class="type">long</tt>. It does this by setting
<tt class="varname">errno</tt> to the value of the macro <tt class=
"constant">ERANGE</tt>.</p>
<p>Now the use of <tt class="varname">errno</tt> is probably worth
an article in its own right (<i><span class="remark">definitely.
FG</span></i>), if only because the Standard says that if a
function is not documented as using <tt class="varname">errno</tt>,
then that function can set <tt class="varname">errno</tt> to any
value it likes, even if there is no error. And if a function that
is documented by the Standard as using <tt class=
"varname">errno</tt> does not find an error, it must not clear any
previously recorded error.</p>
<p>So how should <tt class="varname">errno</tt> be used? Well,
firstly, include the standard header <tt class=
"filename">&lt;errno.h&gt;</tt>. Then, immediately before you want
to check the operation of a function that is documented as
potentially setting <tt class="varname">errno</tt>, set <tt class=
"varname">errno</tt> to zero. Call the function and then check if
<tt class="varname">errno</tt> is still zero, which implies that no
error has been detected. If it is not, then something has gone
wrong.</p>
<p>How can one determine what has gone wrong, and so decide how to
fix it? If <tt class="varname">errno</tt> has been set to the value
the Standard says might be returned, then the Standard will also
say what has gone wrong.</p>
<p>But the Standard does not preclude some other non-zero value
from being returned. And to find what that means you must rely on
either the compiler reference manual, or the <tt class=
"function">perror</tt> function which prints a diagnostic message
based on the value of <tt class="varname">errno</tt>.</p>
<p>This all means that I need two steps to ensure that <tt class=
"function">strtol</tt> worked as anticipated. First, I must check
for <tt class="varname">errno</tt> being non-zero, and not equal to
<tt class="constant">ERANGE</tt>, which means that something
unexpected has gone wrong. In that case, I use <tt class=
"function">perror</tt> to print a diagnostic message and stop the
program.</p>
<p>Secondly, when I check the range of the returned number in l, I
must also check for <tt class="varname">errno</tt> being zero. If
<tt class="varname">errno</tt> is non-zero, the previous step means
it must have the value of <tt class="constant">ERANGE</tt>, and I
proceed as if the returned number is out of range.</p>
<p>All in all, there are now a number of changes to my original
function, and so I reproduce the new listing of my <tt class=
"function">get_int</tt> function below.</p>
<p>Which brings me back to my original point. Starting from the
point of implementing the apparently simple task of safely reading
in an integer, we have come quite a way and visited several complex
areas of the 'C' language and its run time library. No wonder that
original, fictional, tutor preferred to leave his students with
<tt class="function">scanf</tt> and hope nothing would go
wrong.</p>
<pre class="programlisting">
/*: get_int.c   Prompt for, read and check, an integer from 'stdin'.  */
/* Version 2.0  Corrected use of result from 'strtol'.      */
#include  &lt;stdio.h&gt;           /* Standard I-O header.    */
#include  &lt;stdlib.h&gt;          /* Standard library header.  */
#include  &lt;string.h&gt;          /* Standard string header.  */
#include  &lt;errno.h&gt;           /* Error number header.    */
#define    TRY_LIMIT  5       /* Retry limit for reading int.  */
#define    LINE_SIZE  256     /* Maximum line length allowed.  */
#define    COMMENT    '#'      /* Comment character.    */

/*: get_int Writes the 'prompt' message to the terminal, reads a line of text as the reply  */
/* and attempts to find an integer from it. Any integer found is range checked such then    */
/* 'lower' &lt;= i &lt;= 'upper' is true. This is returned as the function value. In the event of */
/* problems, a message is printed, and the number is re-requested. After the retry limit is */
/* exhausted, the function prints another message and aborts the program.       */

int get_int (char * prompt, int lower, int upper) {
  char  line[LINE_SIZE];      /* Input line buffer.    */
  char  *com_posn;            /* Position of any comment.  */
  char  *next;                /* Where 'strtol' stopped.    */
  long  l;                    /* Number read from the user.  */
  int  try;                   /* Retry counter.      */
/* Loop per attempt to read the number from the user.    */
  for (try = 0; try &lt; TRY_LIMIT; ++try) {
    fputs (prompt, stdout);
    if (fgets (line, LINE_SIZE, stdin) == NULL) {
/* Something has gone badly wrong, possibly  End of File.    */
      fputs (&quot;'get_int' found End-of-File.\n&quot;, stdout); exit (EXIT_FAILURE);
    }
    if ((com_posn = strchr (line, COMMENT)) != NULL) /* Delete any trailing comment */
      strcpy (com_posn, &quot;\n&quot;);          /* A comment is present.  Mask it out.*/
/* Attempt to read an integer from the start of the line.  */
    errno = 0;    /* force errno to no error */
    l = strtol (line, &amp;next, 10);
    if (errno &amp;&amp; (errno != ERANGE)) {   /* check for reproted error */
      perror (&quot;Unexpected error from 'strtol'.&quot;); exit (EXIT_FAILURE);
    }
    if (! isspace (*next)) fputs (&quot;Failed to read anything from line.&quot;, stdout);
    else {
      if ((errno == 0) &amp;&amp; (l &gt;= (long) lower) &amp;&amp; (l &lt;= (long) upper)) return (int) l; 
      else   fprintf (stdout, &quot;%s %ld %s%d%s %d%s&quot;, &quot;The number&quot;, l, &quot;is not in the range [&quot;
                            , lower, &quot;,&quot;, upper, &quot;].&quot; );
    }
/* If this is not the last try, ask for another line.  */
    if (try &lt; (TRY_LIMIT - 1)) fputs (&quot;  Please try again.\n&quot;, stdout);
  }
/* When here, we have exhausted the retry limit without obtaining a suitable response. */
  fputs (&quot;\n'get_int' aborting program: retry limit reached.\n&quot;, stdout);
  exit (EXIT_FAILURE);
}
=======================================================================
/* A test program for 'get_int' function.    */
#include  &lt;stdio.h&gt;
int  get_int (char *, int, int);
int  i;        /* Number read from user. Global OK in test harness   */
int main (argc, argv) {
  fputs (&quot;Use [* Break *] to terminate run.\n&quot;, stdout);
  while (1) {      /* Loop forever.    */
    i = get_int (&quot;Integer: &quot;, -1000, 2345);
    fprintf (stdout, &quot;Read %d.\n&quot;, i);
  }
}
</pre>
<p class="c2"><span class="remark">I have changed Posul's K&amp;R
style function declartions to prototypes, and refuced or eliminated
many comments to reduce line count. FG</span></p>
</div>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
