    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: File Positioning</title>
        <link>https://members.accu.org/index.php/journals/928</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">CVu Journal Vol 11, #6 - Oct 1999 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c129/">116</a>
                    (22)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c65/">Programming</a>
                    (877)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c129-65/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c129+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;File Positioning</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 06 October 1999 13:15:33 +01:00 or Wed, 06 October 1999 13:15:33 +01:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e18" id="d0e18"></a>An Introduction
to File Positioning</h2>
</div>
<p>I want to talk about file positioning, and the ways you can do
it in C. Having read this piece, I hope you will feel confident
enough to read the descriptions of the functions covered in your
compiler's Library Reference Manual to answer any remaining
questions on their use. I am going to end up with a question on
file positioning to which I hope someone can help me with the
answer.</p>
<p>Now what is file positioning? The answer, simply, is it says
which bit of a file will be read (or written) next.</p>
<p>Even if all you've ever done is written files using fputs, and
read them using <tt class="function">fgets</tt>, you've done file
positioning. Why? Well, when one call to <tt class=
"function">fputs</tt> completes, the next call will write to the
next bit of the file. Similarly, with <tt class=
"function">fgets</tt>, a second call will start reading where the
first call finished. So both <tt class="function">fputs</tt> and
<tt class="function">fgets</tt> position the file so that the next
operation on it will start where the present operation
completed.</p>
<p>This probably sounds obvious. After all, if you used <tt class=
"function">fputs</tt> to write the lines of an important letter to
a file, you would not be best pleased if <tt class=
"function">fgets</tt> did not retrieve them for you in the same
order.</p>
<p>But perhaps you do want to be able to read the lines of that
letter in a random order. You might if you were writing a text
editor.</p>
<p>More likely, the things in the file are not lines of text, but
records of the resistors held in stores, or books in the library.
You might occasionally want to ask about the resistors in the next
tray on the shelf, or the book to the right of the one you just
read. But it is more probable that you would ask about a resistor
(or book) entirely unconnected with the one you have just looked
at.</p>
<p>You need to be able to position the file so that the next access
(call to <tt class="function">fputs</tt> or <tt class=
"function">fgets</tt>) processes the record you want. And, of
course, C provides a number of ways of doing this.</p>
<p>Let me get one bit of administration out of the way here. To use
any of the following functions, you must have first said:</p>
<pre class="programlisting">
#include &lt;stdio.h&gt;
</pre>
<p>to provide prototypes for (that is, tell the compiler about)
these functions. As this header file also contains the prototypes
for the file open, reading, writing and closing functions, I hope
it has already been used.</p>
<p>Perhaps the most basic file positioning operation you can do is
to go to the file's beginning. Then the first thing in the file is
the next object to be read (or overwritten). The function to do
this is called <tt class="function">rewind</tt>, because going to
the start is what happens if you <tt class="function">rewind</tt> a
file held on a magnetic tape. <tt class="function">rewind</tt> has
the prototype:</p>
<pre class="programlisting">
void  rewind (FILE *stream);
</pre>
<p>It takes one parameter, <i class=
"parameter"><tt>stream</tt></i>, (whose value is returned by
<tt class="function">fopen</tt> when it opens the file) and
repositions the associated file to its start. (Strictly, it sets
the file position indicator for the stream pointed to by its
parameter to the beginning of file, but I don't want to get into
the question of streams today). <tt class="function">rewind</tt>
doesn't return anything, so you just have to hope it worked.</p>
<p>Usually, of course, you want to go to a record in the middle of
the file. You could do this by using <tt class=
"function">rewind</tt> and then reading through the file to get to
the record you actually want, but that could be very slow. That
might in fact be the only way to find a record in a file, but let
the operating system decide that for you.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e97" id="d0e97"></a>The Original
File Positioning Functions</h2>
</div>
<p>C provides two methods of positioning a file, and two
corresponding methods of finding out where you are within a file.
The first approach is a modification of the original UNIX method,
where C was developed; the second approach tries to avoid some
problems of the first approach.</p>
<p>To understand the first approach, you have to know a little
about files in UNIX, which is that whatever the file type, and
whatever the file contents, it can be read a byte at a time. Text
files use the character '<tt class="literal">\n</tt>' to mark the
end of line and look exactly like binary files. And the UNIX file
system, being simple, lets you position directly to any byte in the
file. The demonstration that this approach could work efficiently
is regarded as one of the major early successes for UNIX.</p>
<p>The C library provides the <tt class="function">ftell</tt>
function, based on the function of the same name in UNIX, with the
prototype:</p>
<pre class="programlisting">
long int ftell (FILE *stream);
</pre>
<p>This inspects the file pointed to by the parameter <i class=
"parameter"><tt>stream</tt></i> and either returns -1L if something
has gone wrong, or some other value which is the file position.</p>
<p>On a UNIX system, that position is always the number of bytes
from the beginning of the file.</p>
<p>For non-UNIX systems the same holds only for binary files. For
non-binary files (i.e., text files) the returned value exactly
specifies where you are in the file, but it may not be a byte count
because non-UNIX systems represent the lines in a text file in many
different ways . But don't worry. Ensuring that you, as a C
programmer, see in your program what the ISO standard says, despite
what the operating system does, is the potentially complex task of
the run time library that came with your compiler.</p>
<p>To match <tt class="function">ftell</tt>, there is <tt class=
"function">fseek</tt> which moves to a specified place in the file.
The prototype is:</p>
<pre class="programlisting">
int fseek (FILE *stream, long int offset, int whence);
</pre>
<p><i class="parameter"><tt>stream</tt></i> says which file is to
be repositioned. The parameter <i class=
"parameter"><tt>whence</tt></i> says how the <i class=
"parameter"><tt>offset</tt></i> is to be interpreted. For now, use
only the macro SEEK-SET for <i class=
"parameter"><tt>whence</tt></i>, which says that <i class=
"parameter"><tt>offset</tt></i> has been obtained from an earlier
call to ftell.</p>
<p><i class="parameter"><tt>offset</tt></i> must be used by fseek
on the same stream as was used with <tt class="function">ftell</tt>
to find it. Closing a file with <tt class="function">fclose</tt>
causes the stream to disappear and opening the file again creates a
new stream to which the old values of <i class=
"parameter"><tt>offset</tt></i> cannot be applied. It is certainly
not permitted to open the same file on two different streams by
calling <tt class="function">fopen</tt> twice and then use one of
the streams to obtain file positions from <tt class=
"function">ftell</tt> which are then given to <tt class=
"function">fseek</tt> in an attempt to position the other
stream.</p>
<p>The sort of way these two calls might be used, in outline,
is:</p>
<pre class="programlisting">
#include     &lt;stdio.h&gt;

FILE    *stream;
long int        offset;

/* Open the file. */
stream = fopen (&lt;whatever&gt;);
/* Do something with the file.*/
fgets (s, n, stream);
/* Find out where we are in the file.*/
offset = ftell (stream);
/* Do some more work on the file.*/
fgets (s, n, stream);
/* Return to the marked point in the file.*/
fseek (stream, offset, SEEK _ SET);
/* And re-read the file from that point.*/
fgets (s, n, stream);
/* Finally, close the stream.*/
fclose (stream);
</pre>
<p>(I leave it to you to complete the example and insert checking
of all the error codes.)</p>
<p>Because we know, for binary files at least, that the value of
<i class="parameter"><tt>offset</tt></i> is actually the file
position in bytes, a variation on the usage is to add the size of
the record to <i class="parameter"><tt>offset</tt></i> before
calling <tt class="function">fseek</tt>. Continuing the previous
example, this looks something like:</p>
<pre class="programlisting">
long int         offset2;
offset2 = offset + (long int) sizeof (struct ... )
fseek (stream, offset2, SEEK_SET);
</pre>
<p>and means that instead of re-reading the record whose position
was marked, the following record is read instead.</p>
<p>One could also use the <i class="parameter"><tt>offset</tt></i>
macro to position into the middle of a structure, but I think that
that is asking for trouble. There are too many difficult questions
to be answered as to how structures are held in memory, and what
they look like when written to disk for this to be reliable. It
would be far better to read the whole structure into a temporary
object, and then extract the wanted component.</p>
<p>There is another point about <tt class="function">ftell</tt> and
<tt class="function">fseek</tt>. Because the file position is given
as a <tt class="type">signed long</tt> integer, it can never
reference a file longer than 2 GBytes on systems that use 32 bit
long integers.</p>
<p>This was not a problem when UNIX was written, for the largest
disk drives then had 300 Mbytes capacity and were as big as a
washing machine (and sounded like one as well). But now 36 GByte
disk drives are being advertised, whose capacity is well beyond the
indexing range of a 32-bit integer. Some computers now use 64-bit
<tt class="type">long</tt> integers which would cope with this, but
clearly some better method of specifying the file position is
needed if we are not to be caught out by this numbers game.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e213" id="d0e213"></a>The New File
Positioning Functions</h2>
</div>
<p>The C standard has thought of this point, and introduced the
second pair of file positioning functions. <tt class=
"function">fgetpos</tt> lets you find out where you are in a file,
and <tt class="function">fsetpos</tt> positions the file at a given
point. Their prototypes are:</p>
<pre class="programlisting">
int fgetpos (FILE * stream, fpos_t  * pos); 
int fsetpos (FILE * stream, const fpos_t  * pos);
</pre>
<p>This has introduced the object type <tt class="type">fpos_t</tt>
which holds a file position. This type is defined in stdio.h. The
standard says only that it is capable of recording all the
information needed to specify uniquely every position within a
file. It does not say how the compiler author must do this. And if
you try looking in stdio.h, you may find that the definition there
is designed to avoid telling you the actual definition. This
prevents you from getting too clever for your own good!</p>
<p>Both functions operate on the file opened on <i class=
"parameter"><tt>stream</tt></i>. They require in <i class=
"parameter"><tt>pos</tt></i> the address of the file position
variable to be used. Requiring a pointer to the file position
allows both these functions to have the same call sequence which
makes their use easier to remember.</p>
<p>The only way the standard lets you set a value of type
<tt class="type">fpos_t</tt> is by calling <tt class=
"function">fgetpos</tt>. You can only use it by giving it to
<tt class="function">fsetpos</tt> to set a file position. As with
<tt class="function">fseek</tt> and <tt class="function">ftell</tt>
the object pointed to by <i class="parameter"><tt>pos</tt></i> can
only be used to position the stream from which it was obtained.
Using a different stream opened on the same file is not good
enough.</p>
<p>You can, of course, also copy the file position to other
variables of type <tt class="type">fpos_t</tt>, or use it as the
parameter to one of your own functions.</p>
<p>What this means that one thing you cannot do is mimic the
previous example by adding something to the object pointed to
<i class="parameter"><tt>pos</tt></i> so as to access the file
somewhere other than where you recorded the position. This makes it
easier for the compiler author to implement <tt class=
"type">fpos_t</tt>. And he is able to change his implementation of
<tt class="type">fpos_t</tt> to cope with bigger disk drives as
they are introduced. The cost is the loss of a facility that is
potentially non-portable and difficult to implement.</p>
<p>Look at the following example of how to use these functions</p>
<pre class="programlisting">
#include &lt;stdio. h&gt;

#ifndef FALSE
#define FALSE   0
#define TRUE            1
#endif

/* Program configuration. */
#define N_RECS                          100
#define STR_LEN                 36
#define ADDR_LINES      4

typedef char    string[STR_LEN];

typedef struct {
        string  name;
        string          addr_1[ADDR_LINES];
        string  telephone
        } user_record_type;

int     main (int argc, char * argv[]) {
        FILE    *dbase;
        fpos_t  rec_posn[N_RECS];
        int     rec_no;
        user_record_type        user_details;

/* Open database file. */
        if ((dbase = fopen (argv[1], &quot;rb&quot;)) == NULL) return 1;  /* Program failed. */
/* Build an array of record positions. */
        for     (rec_no = 0; rec_no &lt; N _ RECS; ++rec_no) {
                fgetpos (dbase, &amp;rec_posn[rec_no]);
                fread (&amp;user_details, sizeof (user_record_type), 1, dbase);
        }
/* Loop per record requested. */
        while (TRUE) {
                fputs (&quot;Record number? &quot;, stdout);
                fscanf (stdin, &quot;%i&quot;, &amp;rec_no);
                if ((rec_no &lt; 0)|| rec_ no &gt;= N _ RECS))
                                                         break; /* Usual loop exit. */
/* Position to selected record. */
                fsetpos (dbase, &amp;rec_posn[rec_no]);
                fread (&amp;user_details,
                sizeof (user_record_type), 1, dbase);
/* Print 'user_details'.        Omitted. */
        }
/* Close the database file. */
        fclose (dbase);

return 0;

/* Successful completion. */
}
</pre>
<p>Again, I've omitted all the error checking and many of the
comments I would normally include.</p>
<p>The program opens the database whose name is taken from the
command line. It then builds a table giving the position of each
record in the file. Finally, it repeatedly asks for a record
number, finds that record in the file, reads it and prints it out.
The program stops when an out of range record number is given.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e283" id="d0e283"></a>My
Question</h2>
</div>
<p>The proceeding program is somewhat cumbersome, as it has to read
the entire file each time the program is started, merely to find
the positions of every record in the file.</p>
<p>Which brings me to my question.</p>
<p>I would like to write out the file positions of all the records
in the file to avoid having to generate the file positions list
every time I start the program. But how can I do this? And I don't
want to be told to use:</p>
<pre class="programlisting">
fwrite (rec_posn, sizeof (fpos_t), N_RECS, dbase);
</pre>
<p>The standard says the file positions are valid only for the
stream on which they are requested. As soon as I stop this program
run, that stream disappears. When I open the file again, whether in
this program or another, I have got a different stream. And so my
previously saved list of file positions is invalid. How can I get
round this? How can I use the functions in the standard C library
to write an indexed sequential access file?</p>
<p>I suspect that the practical answer is that a useful operating
system would ensure that the file positions were the same,
irrespective of whether the file position was found, and used, on
the same, or separate, streams. And I suspect the file positions
would be the same irrespective of whether it was this program run,
or an earlier run of a different program, which generated the list
of file positions.</p>
<p>So I suspect I should just get on with writing my program. But
that requires ignoring the standard and being prepared to accept
the consequences.</p>
<p>Now, who can help me?</p>
<p class="c2"><span class="remark">I am not convinced that the
writers' expectations can be met. I think this is very much system
dependent. For example, efficient finding of data at system level
requires knowledge of the exact physical location on the
hard-drive. There are many things that would result in a file being
relocated (just think about what happens when you defrag a disk.)
The C programming language attempts to give as much freedom to
implementers as it can. If I am still editor I will be happy to
publish follow-ups to this article. As the material was provided in
hardcopy only (requiring me to scan in the material) because the
author does not have access to standard disk formats, all
communication, answers etc. will have to be through the pages of
ACCU publications.</span></p>
<p class="c2"><span class="remark">By the way I consider this
article would be more suited in future to the retargeted Overload
but the editorial call would be close.</span></p>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
