    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Questions and Answers</title>
        <link>https://members.accu.org/index.php/articles/1082</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Programming Topics + CVu Journal Vol 12, #6 - Dec 2000</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c65/">Programming</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c123/">126</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c65-123/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c65+123/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Questions and Answers</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 05 December 2000 13:15:41 +00:00 or Tue, 05 December 2000 13:15:41 +00:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e18" id="d0e18"></a>Answers</h2>
</div>
<p class="c2"><span class="remark">Question 6 in C Vu 12.5 created
a considerable response, with a couple of long and very detailed
answers. The result is almost an article in itself. Fortunately I
do not have to traditional periodical rules where a fixed number of
pages is allocated to regular columns and so can publish a range of
alternative answers.</span></p>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e24" id="d0e24"></a>Answers
(tentative) to Questions 1-4</h3>
</div>
<p><span class="bold"><b>from: Silas S. Brown <tt class=
"email">&lt;<a href=
"mailto:ssb22@cam.ac.uk">ssb22@cam.ac.uk</a>&gt;</tt></b></span></p>
<p>I will try to answer Jun Woong's questions in C Vu, but I am not
entirely sure about my answers.</p>
<p><span class="emphasis"><em>1) If the declaration of an
identifier for an object has file scope and no storage-class
specifier, its linkage is external.</em></span></p>
<p>Yes, except that the &quot;declaration&quot; will then be a definition.
For example:</p>
<p><tt class="filename">file1.c</tt>:</p>
<pre class="programlisting">
int this_is_global = 3; /* definition at file scope and
      no storage-class specifier (static, extern, etc) */
</pre>
<p><tt class="filename">file2.c</tt>:</p>
<pre class="programlisting">
#include &lt;stdio.h&gt; extern int this_is_global; int
      main() { printf(&quot;%d\n&quot;,this_is_global); return 0; }
</pre>
<p><tt class="computeroutput">Output: 3</tt></p>
<p><span class="emphasis"><em>2) The declaration of an identifier
for a function that has block scope shall have no explicit
storage-class specifier other than <tt class=
"type">extern</tt>?</em></span></p>
<p><span class="emphasis"><em>3) Why does the standard disallow
storage-class specifiers other than <tt class="type">extern</tt> on
a function declaration inside a block?</em></span></p>
<p>The other storage-class specifiers (e.g. <tt class=
"type">static</tt>) would not make sense. For example:</p>
<pre class="programlisting">
int my_function() { int your_function(); return
      your_function(); }
</pre>
<p>Here, <tt class="function">your_function</tt> is declared at
block scope (within <tt class="function">my_function</tt>), but it
must be defined elsewhere. The declaration cannot be <tt class=
"type">static</tt>, since the function cannot be defined within the
current scope (which is <tt class="function">my_function</tt>).</p>
<p>4) Regarding arguments that are subject to macro expansion, on
<tt class="type">#pragma</tt> and <tt class="type">#error</tt> In
&quot;<i class="citetitle">C:A Reference Manual (H&amp;S)</i>&quot;, it says
that: &quot;<span class="emphasis"><em>The argument to <tt class=
"type">#pragma</tt> is subject to macro expansion. The <tt class=
"type">#error</tt> directive produces a compile-time error message
that will include the argument tokens, which are subject to macro
expansion.</em></span>&quot; Please give me examples about these. I
would like to know how the <tt class="type">#error</tt> directive
can have arguments subject to macro expansion.</p>
<p>I don't think this is true. I don't have the standard, but none
of the compilers I tried expanded the arguments of the #error
directives. The code I tried was:</p>
<pre class="programlisting">
#define shi4yan4 experiment #error
      shi4yan4
</pre>
<p>The compilers gave an error message like:</p>
<p><tt class="computeroutput"><tt class="filename">test.c</tt>:2:
#error shi4yan4</tt></p>
<p>If it had done macro expansion, it should have said &quot;<tt class=
"computeroutput">#error experiment</tt>&quot;.</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e126" id="d0e126"></a>Answers to
Question 5</h3>
</div>
<p><span class="bold"><b>from James Curran &lt;<tt class=
"email">&lt;<a href=
"mailto:James@NovelTheory.com">James@NovelTheory.com</a>&gt;</tt>&gt;</b></span></p>
<p><span class="bold"><b><span class="bold"><b>Question #5: What is
the correct behaviour of</b></span></b></span></p>
<pre class="programlisting">
int x =10;
for (int y=0, x=0; y &lt; 2; x++; y++) {}
       
</pre>
<p><span class="bold"><b><span class=
"bold"><b>Answer:</b></span></b></span></p>
<p>First of all, as published, that code is a syntax error, as
there can only be two semi-colons in the for statement. The correct
version would be:</p>
<pre class="programlisting">
   for (int y=0, x=0; y &lt; 2; x++, y++) {}       
</pre>
<p>which uses the comma operator to join the two parts into a
single expression.</p>
<p>To best understand the behaviour of a for statement, you can
always treat a statement like &quot;<tt class=
"function">for(A;B;C)...</tt>&quot; as if it were written</p>
<pre class="programlisting">
    {
        A;
        while(B)
        {...
            C;
        }
    }   
</pre>
<p>Substituting you code into that format, we get:</p>
<pre class="programlisting">
    int    x = 10;
    {
        int y=0, x=0;
        while (y&lt;2)
        {
            ....
            x++, y++;
        }
    }       
</pre>
<p>from that, it should be obvious that x is being defined as well
as initialised inside the for statement.</p>
<p>[<i><span class="remark">I think there is still a bit more to
say on this subject. FG</span></i>]</p>
<p><span class="bold"><b>from Catriona O'Connell &lt;<tt class=
"email">&lt;<a href=
"mailto:catriona38@hotmail.com">catriona38@hotmail.com</a>&gt;</tt>&gt;</b></span></p>
<p>In C Vu 12.5, Dave Midgley (p17) asks about the scope of x in
his two examples. The C++ standard in Section 6.5.3 answers this
question and shows that his compiler is conforming. His explanation
is correct.</p>
<p>Confusion might arise because of the distinction between the two
uses of the comma operator. The case which actually occurs is that
the int y causes the whole expression to become a block
declaration, thus creating x with local scope rather than creating
y with local scope and then setting x, declared outside the for
structure to 0 as independent actions.</p>
<p>If you try to compile Dave's first example with <span class=
"productname">MS Visual C++</span>&trade; <span class=
"productnumber">6.0</span> (even at SP4) it will fail with a C2374
error code because it moves the initialisation outside the for-loop
causing a redeclaration. The &quot;solution&quot; offered by Microsoft
are</p>
<p>1. Compile with <tt class="literal">/Za</tt> - breaking most
Windows code.</p>
<p>2.</p>
<pre class="programlisting">
#define for if(0);else for
</pre>
<p>The confusion exemplifies one of my programming rules of thumb;
that it is unwise to use the same variable name in different scopes
within a single module. Doing so introduces a potential lack of
clarity and a maintenance overhead. While scopes are clear in
Dave's examples, in more complex code (spread over several pages)
it might be less clear which instance is in scope.</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e198" id="d0e198"></a>Answers to
Question 6</h3>
</div>
<p class="c2"><span class="remark">I think the first answer reviews
the question in sufficient detail so I will not repeat the question
here.</span></p>
<p><span class="bold"><b>Answer from R.Butler</b></span></p>
<p>The Questioner has a file of statistical data in columnar
format, and wishes to count the number of values on the first line
of the file, assuming this to be typical, and hence determine the
number of columns. The Questioner has already discovered the
difficulties involved in trying to use <tt class=
"function">fscanf()</tt> to do this. These problems arise because
that function does not distinguish between the end-of-line
character and other &quot;white space&quot; characters. It is therefore
difficult to make it read one line from a file and then stop. For
example, given a file containing only columns of integers,
successive calls of <tt class="function">fscanf(fp, &quot;%d %d&quot;,
&amp;xValue, &amp;yValue)</tt> will happily keep reading and
assigning values until the file is exhausted. Incidentally, notice
that <i class="parameter"><tt>&amp;xValue</tt></i> and <i class=
"parameter"><tt>&amp;yValue</tt></i> must be addresses, indicated
by '<tt class="literal">&amp;</tt>'.</p>
<p>The Questioner is confident about what will happen if you give
the aforementioned function a 3-column file; nevertheless it might
be worth considering an example. Given a file like this:</p>
<pre class="literallayout">
1 2 3
4 5 6
7 8 9
       
</pre>
<p>successive calls of <tt class="function">fscanf(fp, &quot;%d %d&quot;,
&amp;xValue, &amp;yValue)</tt> will assign values to <i class=
"parameter"><tt>xValue</tt></i> and <i class=
"parameter"><tt>yValue</tt></i> thus:</p>
<pre class="literallayout">
xValue yValue
   1      2
   3      4
   5      6
   7      8
   9      ?
       
</pre>
<p>The value of the last <i class="parameter"><tt>yValue</tt></i>
will depend on whether or not it has been assigned another value
since the previous call to <tt class="function">fscanf()</tt>. If
not, it will be 8 again.</p>
<p>Returning to the original problem, my solution would be to read
the first line into a character array using <tt class=
"function">fgets()</tt>, which does stop when it reaches the end of
the line, and then pass a pointer to this array to a function which
does the following:-</p>
<p>Initialise a counter to 0.</p>
<p>Start at the beginning of the array.</p>
<p>Repeat until reaching the end of the array:-</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>Step over any spaces until encountering something which is not a
space, or the end of the array;</p>
</li>
<li>
<p>If it's not the end of the array, it must be a column, so add 1
to the counter;</p>
</li>
<li>
<p>Step over the subsequent characters until encountering a space,
or the end of the array.</p>
</li>
</ul>
</div>
<p>On reaching the end of the array, return the value of the
counter.</p>
<p>This function could be coded as follows:</p>
<pre class="programlisting">
#include &lt;ctype.h&gt;
int count_columns(char *line) {
    int    columns = 0;
    char *p = line;
    while (*p != '\0') {
        p = step_spaces(p);
        if (*p != '\0') columns++;
        p = step_non_spaces(p);
    }
    return columns;
}
char *step_spaces(char *p) {
   while(*p!= '\0' &amp;&amp; isspace(*p)) p++;
    return p;
}
char *step_non_spaces(char *p) {
   while(*p!= '\0' &amp;&amp; !isspace(*p)) p++;
   return p;
}
</pre>
<p>The <tt class="function">scanf()</tt> family of functions are
not alone in regarding the end-of-line character as a sort of
space. The <tt class="function">isspace()</tt> function, which I
use in my suggested solution, also has this characteristic. That is
why the function steps over space characters first. Doing so
ensures that it returns the right number of columns (zero) if the
line contains nothing but an end-of-line character.</p>
<p><span class="bold"><b>Answer from Graham
Patterson</b></span></p>
<p>The question concerned extending a basic <tt class=
"function">fscanf(fp, &quot;%f %f&quot;, &amp;ar1, &amp;arg2)</tt> function
call to handle a generalised case of a multiple column file of
floating point values (since the question mentioned statistical
data, although the example contained integers, I am assuming that
floating point will be used for the purpose of example). Since the
function cited is from the C library, I am using C as the
programming language.</p>
<p>There are a number of problems with the formatted input
functions in the C Standard Library. Some of these problems are to
do with their implementation, and some are inherent in the data
storage model used by the language.</p>
<p>Most authorities would advise using <tt class=
"function">fscanf()</tt> with caution, if at all. The main problem
with it is that it consumes its input during the decoding process,
which makes it impossible to re-try the conversion without
returning to the start of the file. Issuing a <tt class=
"function">rewind()</tt> or <tt class="function">fseek()</tt> is
possible on a file, but may not succeed on a pipe. Even if it is
possible, the overhead of file system interaction may be too much
for the application. The alternative is to grab a line at a time
(ignoring for the moment what constitutes a 'line'), and then use
<tt class="function">sscanf()</tt> or some other tokenising scheme
to decode the line.</p>
<p>Where the input consists of tabulated data it is possible to use
<tt class="function">sscanf()</tt> to determine a suitable format
string. One method (example code provided) is to start with an
atomic conversion specifier, such as <tt class="literal">%*lf</tt>,
which will parse a double value without assignment. We can take
advantage of the default white-space separation of numeric input in
this example. A working format string can be built by concatenating
such specifiers and testing the return of <tt class=
"function">sscanf()</tt> with the line under investigation. A
<span class="returnvalue">-1</span> return indicates that the last
specifier is either one too many for the data, or is of unsuitable
type. There is an alternative approach using the <i class=
"parameter"><tt>%n</tt></i> conversion specifier that is discussed
later in this article.</p>
<p>It is thus possible to create a custom format string with a
reasonable knowledge of the input data (general structure and data
content). The number of fields involved is now known. We obviously
need to watch for buffer overflow both in our input line and our
dynamic format string. The format string we construct is going to
be potentially at least 4 times as long as the input string e.g.
<i class="parameter"><tt>%*lf</tt></i>. The input must not only fit
within the allocated buffer, but we need to consider the
possibility of only reading a partial line. The library function
<tt class="function">fgets()</tt> conveniently includes the newline
character if it occurs before the buffer length is exceeded. This
is a useful sentinel for partial line input. If such an event
occurs the input buffer must be inspected and adjusted. The last
valid conversion may not have read the entire token. E.g.
'<tt class="literal">123\0</tt>' is valid as <tt class=
"literal">123.0</tt>, even if the file actually contained
'<tt class="literal">1234.567\n</tt>'. It is rare to need a program
that can read arbitrarily long input lines, but limitations in this
respect must be documented.</p>
<p>So at this stage we have a reasonable format string to decode
our line, and it would work if we removed the assignment
suppression characters. Since C is a compiled language, it expects
that we know the number of arguments to the function when we write
it - we do not have the run-time evaluation options of PERL or PHP,
for example. So if we had data in columns that could range from 2
to many more, how do we write our input?</p>
<p>How about:</p>
<pre class="programlisting">
int result = sscanf(buffer, format_string, &amp;double_arg1, &amp;double_arg2, &amp;double_arg3);
</pre>
<p>This would work for up to three arguments, but would break for
four or more. If the problem was finite we could code one line with
enough arguments for the worst case. And then along comes a file
that is bigger!</p>
<p>The answer is to read one value at a time. We know how many to
read from the previous work. We should also know what sequence of
types we require. This decomposes to the basic construct:</p>
<pre class="programlisting">
int offset = 0;
for(i = 0; i &lt; number_of_arguments; i++) {
   result = sscanf(buffer + offset, &quot;%lf%n&quot;, &amp;double_args[i], &amp;offset);
   /* error checks, other processing */
}
</pre>
<p>The offset is derived via the <i class=
"parameter"><tt>%n</tt></i> format specifier which records how far
into the input source the conversion has proceeded. This has to
come after the data conversion specifier to record the consumed
characters.</p>
<p>We could have used a similar construct when determining the
format string. However, building the string automatically keeps a
record of the process. There are times when it would pay to use the
format string as a base for the conversion. Typically you may have
to do this if <i class="parameter"><tt>%n</tt></i> is not available
as a conversion specifier, which is the case with some older
compiler libraries. It might also be a good choice if you have a
variety of data types in the input, since you build a control
string while you investigate the content and number of fields in
the input file. In this case the format string would be duplicated
for each pass through the loop, less one assignment suppression
character ('<tt class="literal">*</tt>'). By walking the actual
assignment along the format string we can decode the input buffer
one value at a time.</p>
<p>There is no best answer to the general problem. A lot depends on
how variable the input data may be in a particular application.</p>
<p>This example reads a row of data into an array. In many cases we
are going to have data in row === record, column === variable
format. C uses a very low-level implementation of arrays that does
not support dynamic re-sizing. We can re-allocate the memory
occupied by an array (providing it was dynamically allocated
originally via a <tt class="function">malloc()</tt> / <tt class=
"function">calloc()</tt> / <tt class="function">realloc()</tt>
call). What we cannot do is have an array,</p>
<pre class="programlisting">
double array[10];
and automatically extend it with something like:
int more = 12;
array[more]; /* will not work - array bounds exceeded */      
</pre>
<p>So the input decoding is only the first part of the problem. To
read in a table of numbers, we would probably allocate an array of
pointers to array of <tt class="type">double</tt> dimensioned by
the number of columns in the data. Each column array will have to
be dimensioned by the number of rows in the data. If we know (or
can pre-scan the input to find out) the number of rows, it is easy.
If we have to do it during the read process the algorithm becomes
one of allocating an initial space, and then re-sizing upwards when
the space is used. Unless we know how many lines are in the file in
advance, we can expect to become intimately acquainted with the
<tt class="function">realloc()</tt> function!</p>
<p>If we use another language we may avoid a lot of this processing
overhead. The C++ STL can handle the indeterminate data size more
easily than hand-coding our own memory management routines in C.
For that matter the dynamic array handling of PERL and its field
separation facilities make this sort of data parsing comparatively
trivial. But these may not be the best languages for the processing
of the data.</p>
<p>If I was just interested in re-ordering or formatting data of
this type I would consider AWK or PERL. If I was looking to do
extensive mathematical processing I would look to a compiled
language, possibly with extensive large number and numeric function
support.</p>
<pre class="programlisting">
/* module title  : fscanfex.c
 * author        : Graham Patterson (G.A.Patterson@btinternet.com)
 * revision history : [<i><span class=
"remark">Editorial note, for reasons of space, I snipped this</span></i>
 * problems      : Demonstration of concept. Not production code.
 * description   : Demonstrates a method for determining the field composition of a numeric data 
 *                            file with a view to constructing a scan format string.    */
#include &lt;stdio.h&gt;
#include &lt;string.h&gt;
#ifndef MAX_LEN
#define MAX_LEN 1024
#endif
int double_fields(const char *buffer) {
    char decode_format[MAX_LEN * 4 + 4]; 
/* Worst case - * 4 format characters per input character, plus one safety */
    const char *format = &quot;%*lf&quot;;
    int conversions = 0;      int result = 0;
    decode_format[0] = '\0';
    do     {
        strcat(decode_format, format);
        result = sscanf(buffer, decode_format);
        printf(&quot;Testing %s with %s, obtained %d\n&quot;, buffer, decode_format, result);
/* With assignment suppression we don't get the number of conversions, 
 * so we count them ourselves. */
        if(result != -1) conversions++;
    }
    while(result != -1 &amp;&amp; strlen(decode_format) &lt;= (MAX_LEN * 4));
    return conversions;    
}
/* It is assumed that this function is only called when a field is expected however an error flag or exception is still advisable! Setting an invalid offset works, but has some stylistic implications. */
double read_double(const char *buffer, int *offset) {
    double value = 0.0;   int conversion_length = 0;
    int result = sscanf(buffer + *offset, &quot;%lf%n&quot;, &amp;value, &amp;conversion_length);
    printf(&quot;Read %f at %d\n&quot;, value, *offset);
/* update the offset into the line buffer */
    *offset += conversion_length; 
    if(result &gt; 0) return value;
    else { *offset = -1;  return 0.0; }
}
int main(void) {
    FILE *fp;
    if(fp = fopen(&quot;test.dat&quot;, &quot;rt&quot;)) {
        char buffer[MAX_LEN];      int conversions = 0;
        fgets(buffer, MAX_LEN - 1, fp);
        while(!feof(fp)) {
            int offset = 0;     int i;
            conversions = double_fields(buffer);
            printf(&quot;Data file contains %d columns\n&quot;, conversions);
            for(i = 0; i &lt; conversions; i++) {
                double value = read_double(buffer, &amp;offset);
                if(offset != -1) printf(&quot;%f\n&quot;, value);
                else puts(&quot;Error reading double&quot;);
            }
            fgets(buffer, MAX_LEN - 1, fp);
        }
        fclose(fp);
    }
    else puts(&quot;Unable to open test data file&quot;);
    return 0;
}
</pre>
<p><span class="bold"><b>(Answer from Chris Main)</b></span></p>
<p>If you are using C then you can use the library function
<tt class="function">strtok()</tt> to work out how many columns are
present in the first line. It takes two arguments. The first
argument (<tt class="type">char *</tt>) should be the line the
first time <tt class="function">strtok()</tt> is called and
<tt class="literal">NULL</tt> on subsequent calls. The second
argument should be a string containing the characters separating
the values. Call it until it returns <tt class="literal">NULL</tt>
to determine how many columns there are in the first line:</p>
<pre class="programlisting">
int column_count = 0;
if(strtok(first_line, &quot; &quot;)) {
    ++column_count;
    while(strtok(NULL, &quot; &quot;)) ++column_count;
}
</pre>
<p>Using this technique, I don't think there is any way of
generalising the call of <tt class="function">fscanf()</tt> because
it takes a variable number of arguments. Instead for each row of
the file there needs to be a loop so that <tt class=
"function">fscanf()</tt> is called the same number of times as
there are columns, reading one value each time:</p>
<pre class="programlisting">
while(!feof(file)) {
    int i, n;
    for(i = 0; i &lt; column_count; i++) {
        if(fscanf(file, &quot;%d&quot;, &amp;n) != 1) break;
    }
    if(i == 5) { 
/* Another row read successfully */
    }
    else if(i &gt; 0)
    {
/* Error condition - incomplete row */
    }
    else /* i == 0 */ { /* End of file */     }
}
</pre>
<p>Unfortunately it is not possible to rely entirely on <tt class=
"function">feof()</tt> when doing this. The return value of
<tt class="function">fscanf()</tt> must be checked each time to
detect the end of the file as well as an incomplete last row.</p>
<p>A full listing is given in scan.c and, for comparison, a C++
version in <tt class="filename">scan.C</tt>. As will be seen, the
C++ version relieves the programmer of a lot of error handling and
memory management.</p>
<p>[<i><span class="remark">I have renamed the second of those
files <tt class="filename">scan.cpp</tt>. Note that these together
with other source code files will go on our web site.
FG</span></i>]</p>
<p><span class="bold"><b>from David Stone &lt;<tt class=
"email">&lt;<a href=
"mailto:dfstone@lithoi.demon.co.uk">dfstone@lithoi.demon.co.uk</a>&gt;</tt>&gt;</b></span></p>
<p>My first reaction on reading this question is: why does the
questioner want to do this? C is a low-level language, and many
packages and languages are more suitable for dealing with tables of
numbers. What is the larger task to be done with these numbers? The
data are statistical, apparently; there are many packages available
to do statistical analysis. I know that the free R package
(<a href="http://cran.r-project.org" target=
"_top">http://cran.r-project.org/</a>) [<i><span class=
"remark">would anyone like to review that for us? FG</span></i>]
can handle this format; so can many other packages. If the analysis
is simple enough, you can probably do it by reading the numbers
into a spreadsheet.</p>
<p>If the word 'statistical' is a red herring, and it really is
necessary to use a general-purpose programming language, I should
still avoid C if possible. Perl can cope with the desired input
format fairly easily, certainly more easily than C; it may be
suitable, depending on what is to be done to the numbers. The best
language in the circumstances depends on what the questioner knows,
and what is available to him, as well as the nature of the full
problem.</p>
<p>If the questioner insists that C must be used, then the first
thing I should say would be to discourage the use of <tt class=
"function">fscanf()</tt>. It is inflexible because the conversion
of the input is mixed up with reading the input. This makes error
handling difficult (for example, coping with a line with too few
values). Much better would be to read a line with <tt class=
"function">fgets()</tt>, and then start taking apart the line with
<tt class="function">strtol()</tt>. (Incidentally, it is possible
that the questioner intended to program in C++; I assume not,
because <tt class="function">fscanf()</tt> is part of the C i/o
library.)</p>
<p>The question is not clear as to whether there can be an
arbitrary number of columns and lines, or whether they have upper
bounds. Especially since the questioner is a novice, I should
suggest setting bounds if possible, because there are several
tricky points in dealing with dynamically allocated arrays,
necessary both for the input line and for the arrays if there are
no bounds.</p>
<p>If we assume an upper bound <tt class=
"constant">MAX_COLUMNS</tt> on the number of columns can be set,
then the questioner could use an array of arrays</p>
<pre class="programlisting">
long iValue[MAX_LINES][MAX_COLUMNS];
</pre>
<p>and read the data with two nested loops. The outer would loop
through the lines, calling <tt class="function">fgets()</tt> each
time, and then using an inner loop which called <tt class=
"function">strtol()</tt> until end-of-line, filling in the array
<tt class="literal">iValue[iCurrentLine]</tt>. The code would check
that no more than <tt class="constant">MAX_LINES</tt> lines were
read, and no more than <tt class="constant">MAX_COLUMNS</tt>
numbers found on each line. Other necessary checks are that no line
was longer than the line buffer passed to <tt class=
"function">fgets()</tt>, that each line had the same number of
values as the first, that each number fitted into a long, and that
each line had no extraneous rubbish. These checks will all fit
inside such a structure. They do, however, require the designer to
think through how to do error reporting.</p>
<p>At this point I should hope that the questioner was beginning to
think that coding it in C was more complex than he had originally
thought, and I should re-introduce the ideas I started with, about
using another package or language.</p>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e513" id=
"d0e513"></a>Questions</h2>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e516" id="d0e516"></a>Q1. (from Simon
W. Day)&lt;<tt class="email">&lt;<a href=
"mailto:106161.304@compuserve.com">106161.304@compuserve.com</a>&gt;</tt>&gt;</h3>
</div>
<p>Bug that took three evenings to find</p>
<p>Strictly this is not a question, but I would welcome comments on
it.</p>
<pre class="programlisting">
long k,n;
float q,p;
n=4096; 
//or other power of 2 
p=n;
q=log(p)/log(2.0);  //gives 12.00000
k=q;                //gives 11 !!!!
</pre>
<p>QC and Symantec C and gave k=12 as we would hope.</p>
<p>VC++ version 5 gets k=11. The reason seems clear (12.0000 must
really be something like 11.99999) but your readers comments on how
to avoid such pitfalls might be of general interest.</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e532" id="d0e532"></a>Q2. (from Silas
Brown) <a href="www.flatline.org.uk/~silas" target=
"_top">www.flatline.org.uk/~silas</a></h3>
</div>
<p>Spot the Bug</p>
<p>I recently found a bug in my code, which I have now fixed but
thought it would make an interesting exercise. In fact there are
lots of things wrong with it, but one in particular caused a
problem. Here is the relevant part of the code. It has been in my
program (and I have been using it) for nearly three years; if I
were starting again then I would do it rather differently now.</p>
<pre class="programlisting">
int InString::addLineFromFile(FILE* f) {
    char temp[TEMP_BUFLEN+1];
    if(!fgets(temp,TEMP_BUFLEN,f)) return(EOF);
    do {
        int i=strlen(temp)-1;
        if(temp[i]&lt;' ') {
         while(i&gt;=0 &amp;&amp; temp[i]&lt;' ') temp[i--]=0;
            addString(temp); return('\n');
        } else addString(temp);
    } while(fgets(temp,TEMP_BUFLEN,f));
    return('\n');
}
void HttpHeader::readMimeHeader(FILE* fp) {
    InString s;
    while(!feof(fp)) {
        s.clear();
        s.addLineFromFile(fp);
        const char* str=s.getString();
        if(!str[0]) return;
        ...
    }
}
</pre>
<p>The file that is being read may have been generated on another
operating system.</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e545" id="d0e545"></a>Q3. (from
comp.lang.c++.moderated)</h3>
</div>
<p>Is there a header file I need to include before I can use:</p>
<pre class="programlisting">
using namespace std;
</pre>
<p>I cannot figure out which header files (or combination) makes
the std namespace available.</p>
<p><span class="emphasis"><em>This looks like a simple question
from a raw novice, but looks can be deceptive. When answering think
carefully about any problems you might get if you tried to compile
a file with nothing else in it than:</em></span></p>
<pre class="programlisting">
using namespace std;
int main() { return 0; }        
</pre>
<p><span class="emphasis"><em>Now ask yourself if there is a tidy
solution that avoids including any specific file. Finally you might
like to comment on why the original question is unlikely to be from
a competent programmer with experience in correct use of
C++.</em></span></p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e562" id="d0e562"></a>Q4. (from Silas
Brown) <a href="www.flatline/org.uk/~silas" target=
"_top">www.flatline.org.uk/~silas</a></h3>
</div>
<p>In the expression <tt class="literal">f(a(),b(),c())</tt>, the
order of evaluation of the functions <tt class="function">a</tt>,
<tt class="function">b</tt> and <tt class="function">c</tt> is not
defined. But what about the expression <tt class=
"literal">f(g(a()),g(b()),g(c()))</tt> - is each call of <tt class=
"function">a</tt>, <tt class="function">b</tt> and <tt class=
"function">c</tt> guaranteed to be followed by a call of <tt class=
"function">g</tt>, or is the compiler free to call <tt class=
"function">a</tt>, <tt class="function">b</tt> and <tt class=
"function">c</tt> first and then <tt class="function">g</tt> three
times on the three results?</p>
<p>This can make a difference if, for example, <tt class=
"function">a</tt>, <tt class="function">b</tt> and <tt class=
"function">c</tt> all write to the same static buffer, and the
function <tt class="function">g</tt> copies it into a new area of
memory.</p>
</div>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
