Journal Articles
Browse in : |
All
> Journals
> CVu
> 116
(22)
All > Topics > Programming (877) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: File Positioning
Author: Administrator
Date: 06 October 1999 13:15:33 +01:00 or Wed, 06 October 1999 13:15:33 +01:00
Summary:
Body:
I want to talk about file positioning, and the ways you can do it in C. Having read this piece, I hope you will feel confident enough to read the descriptions of the functions covered in your compiler's Library Reference Manual to answer any remaining questions on their use. I am going to end up with a question on file positioning to which I hope someone can help me with the answer.
Now what is file positioning? The answer, simply, is it says which bit of a file will be read (or written) next.
Even if all you've ever done is written files using fputs, and read them using fgets, you've done file positioning. Why? Well, when one call to fputs completes, the next call will write to the next bit of the file. Similarly, with fgets, a second call will start reading where the first call finished. So both fputs and fgets position the file so that the next operation on it will start where the present operation completed.
This probably sounds obvious. After all, if you used fputs to write the lines of an important letter to a file, you would not be best pleased if fgets did not retrieve them for you in the same order.
But perhaps you do want to be able to read the lines of that letter in a random order. You might if you were writing a text editor.
More likely, the things in the file are not lines of text, but records of the resistors held in stores, or books in the library. You might occasionally want to ask about the resistors in the next tray on the shelf, or the book to the right of the one you just read. But it is more probable that you would ask about a resistor (or book) entirely unconnected with the one you have just looked at.
You need to be able to position the file so that the next access (call to fputs or fgets) processes the record you want. And, of course, C provides a number of ways of doing this.
Let me get one bit of administration out of the way here. To use any of the following functions, you must have first said:
#include <stdio.h>
to provide prototypes for (that is, tell the compiler about) these functions. As this header file also contains the prototypes for the file open, reading, writing and closing functions, I hope it has already been used.
Perhaps the most basic file positioning operation you can do is to go to the file's beginning. Then the first thing in the file is the next object to be read (or overwritten). The function to do this is called rewind, because going to the start is what happens if you rewind a file held on a magnetic tape. rewind has the prototype:
void rewind (FILE *stream);
It takes one parameter, stream, (whose value is returned by fopen when it opens the file) and repositions the associated file to its start. (Strictly, it sets the file position indicator for the stream pointed to by its parameter to the beginning of file, but I don't want to get into the question of streams today). rewind doesn't return anything, so you just have to hope it worked.
Usually, of course, you want to go to a record in the middle of the file. You could do this by using rewind and then reading through the file to get to the record you actually want, but that could be very slow. That might in fact be the only way to find a record in a file, but let the operating system decide that for you.
C provides two methods of positioning a file, and two corresponding methods of finding out where you are within a file. The first approach is a modification of the original UNIX method, where C was developed; the second approach tries to avoid some problems of the first approach.
To understand the first approach, you have to know a little about files in UNIX, which is that whatever the file type, and whatever the file contents, it can be read a byte at a time. Text files use the character '\n' to mark the end of line and look exactly like binary files. And the UNIX file system, being simple, lets you position directly to any byte in the file. The demonstration that this approach could work efficiently is regarded as one of the major early successes for UNIX.
The C library provides the ftell function, based on the function of the same name in UNIX, with the prototype:
long int ftell (FILE *stream);
This inspects the file pointed to by the parameter stream and either returns -1L if something has gone wrong, or some other value which is the file position.
On a UNIX system, that position is always the number of bytes from the beginning of the file.
For non-UNIX systems the same holds only for binary files. For non-binary files (i.e., text files) the returned value exactly specifies where you are in the file, but it may not be a byte count because non-UNIX systems represent the lines in a text file in many different ways . But don't worry. Ensuring that you, as a C programmer, see in your program what the ISO standard says, despite what the operating system does, is the potentially complex task of the run time library that came with your compiler.
To match ftell, there is fseek which moves to a specified place in the file. The prototype is:
int fseek (FILE *stream, long int offset, int whence);
stream says which file is to be repositioned. The parameter whence says how the offset is to be interpreted. For now, use only the macro SEEK-SET for whence, which says that offset has been obtained from an earlier call to ftell.
offset must be used by fseek on the same stream as was used with ftell to find it. Closing a file with fclose causes the stream to disappear and opening the file again creates a new stream to which the old values of offset cannot be applied. It is certainly not permitted to open the same file on two different streams by calling fopen twice and then use one of the streams to obtain file positions from ftell which are then given to fseek in an attempt to position the other stream.
The sort of way these two calls might be used, in outline, is:
#include <stdio.h> FILE *stream; long int offset; /* Open the file. */ stream = fopen (<whatever>); /* Do something with the file.*/ fgets (s, n, stream); /* Find out where we are in the file.*/ offset = ftell (stream); /* Do some more work on the file.*/ fgets (s, n, stream); /* Return to the marked point in the file.*/ fseek (stream, offset, SEEK _ SET); /* And re-read the file from that point.*/ fgets (s, n, stream); /* Finally, close the stream.*/ fclose (stream);
(I leave it to you to complete the example and insert checking of all the error codes.)
Because we know, for binary files at least, that the value of offset is actually the file position in bytes, a variation on the usage is to add the size of the record to offset before calling fseek. Continuing the previous example, this looks something like:
long int offset2; offset2 = offset + (long int) sizeof (struct ... ) fseek (stream, offset2, SEEK_SET);
and means that instead of re-reading the record whose position was marked, the following record is read instead.
One could also use the offset macro to position into the middle of a structure, but I think that that is asking for trouble. There are too many difficult questions to be answered as to how structures are held in memory, and what they look like when written to disk for this to be reliable. It would be far better to read the whole structure into a temporary object, and then extract the wanted component.
There is another point about ftell and fseek. Because the file position is given as a signed long integer, it can never reference a file longer than 2 GBytes on systems that use 32 bit long integers.
This was not a problem when UNIX was written, for the largest disk drives then had 300 Mbytes capacity and were as big as a washing machine (and sounded like one as well). But now 36 GByte disk drives are being advertised, whose capacity is well beyond the indexing range of a 32-bit integer. Some computers now use 64-bit long integers which would cope with this, but clearly some better method of specifying the file position is needed if we are not to be caught out by this numbers game.
The C standard has thought of this point, and introduced the second pair of file positioning functions. fgetpos lets you find out where you are in a file, and fsetpos positions the file at a given point. Their prototypes are:
int fgetpos (FILE * stream, fpos_t * pos); int fsetpos (FILE * stream, const fpos_t * pos);
This has introduced the object type fpos_t which holds a file position. This type is defined in stdio.h. The standard says only that it is capable of recording all the information needed to specify uniquely every position within a file. It does not say how the compiler author must do this. And if you try looking in stdio.h, you may find that the definition there is designed to avoid telling you the actual definition. This prevents you from getting too clever for your own good!
Both functions operate on the file opened on stream. They require in pos the address of the file position variable to be used. Requiring a pointer to the file position allows both these functions to have the same call sequence which makes their use easier to remember.
The only way the standard lets you set a value of type fpos_t is by calling fgetpos. You can only use it by giving it to fsetpos to set a file position. As with fseek and ftell the object pointed to by pos can only be used to position the stream from which it was obtained. Using a different stream opened on the same file is not good enough.
You can, of course, also copy the file position to other variables of type fpos_t, or use it as the parameter to one of your own functions.
What this means that one thing you cannot do is mimic the previous example by adding something to the object pointed to pos so as to access the file somewhere other than where you recorded the position. This makes it easier for the compiler author to implement fpos_t. And he is able to change his implementation of fpos_t to cope with bigger disk drives as they are introduced. The cost is the loss of a facility that is potentially non-portable and difficult to implement.
Look at the following example of how to use these functions
#include <stdio. h> #ifndef FALSE #define FALSE 0 #define TRUE 1 #endif /* Program configuration. */ #define N_RECS 100 #define STR_LEN 36 #define ADDR_LINES 4 typedef char string[STR_LEN]; typedef struct { string name; string addr_1[ADDR_LINES]; string telephone } user_record_type; int main (int argc, char * argv[]) { FILE *dbase; fpos_t rec_posn[N_RECS]; int rec_no; user_record_type user_details; /* Open database file. */ if ((dbase = fopen (argv[1], "rb")) == NULL) return 1; /* Program failed. */ /* Build an array of record positions. */ for (rec_no = 0; rec_no < N _ RECS; ++rec_no) { fgetpos (dbase, &rec_posn[rec_no]); fread (&user_details, sizeof (user_record_type), 1, dbase); } /* Loop per record requested. */ while (TRUE) { fputs ("Record number? ", stdout); fscanf (stdin, "%i", &rec_no); if ((rec_no < 0)|| rec_ no >= N _ RECS)) break; /* Usual loop exit. */ /* Position to selected record. */ fsetpos (dbase, &rec_posn[rec_no]); fread (&user_details, sizeof (user_record_type), 1, dbase); /* Print 'user_details'. Omitted. */ } /* Close the database file. */ fclose (dbase); return 0; /* Successful completion. */ }
Again, I've omitted all the error checking and many of the comments I would normally include.
The program opens the database whose name is taken from the command line. It then builds a table giving the position of each record in the file. Finally, it repeatedly asks for a record number, finds that record in the file, reads it and prints it out. The program stops when an out of range record number is given.
The proceeding program is somewhat cumbersome, as it has to read the entire file each time the program is started, merely to find the positions of every record in the file.
Which brings me to my question.
I would like to write out the file positions of all the records in the file to avoid having to generate the file positions list every time I start the program. But how can I do this? And I don't want to be told to use:
fwrite (rec_posn, sizeof (fpos_t), N_RECS, dbase);
The standard says the file positions are valid only for the stream on which they are requested. As soon as I stop this program run, that stream disappears. When I open the file again, whether in this program or another, I have got a different stream. And so my previously saved list of file positions is invalid. How can I get round this? How can I use the functions in the standard C library to write an indexed sequential access file?
I suspect that the practical answer is that a useful operating system would ensure that the file positions were the same, irrespective of whether the file position was found, and used, on the same, or separate, streams. And I suspect the file positions would be the same irrespective of whether it was this program run, or an earlier run of a different program, which generated the list of file positions.
So I suspect I should just get on with writing my program. But that requires ignoring the standard and being prepared to accept the consequences.
Now, who can help me?
I am not convinced that the writers' expectations can be met. I think this is very much system dependent. For example, efficient finding of data at system level requires knowledge of the exact physical location on the hard-drive. There are many things that would result in a file being relocated (just think about what happens when you defrag a disk.) The C programming language attempts to give as much freedom to implementers as it can. If I am still editor I will be happy to publish follow-ups to this article. As the material was provided in hardcopy only (requiring me to scan in the material) because the author does not have access to standard disk formats, all communication, answers etc. will have to be through the pages of ACCU publications.
By the way I consider this article would be more suited in future to the retargeted Overload but the editorial call would be close.
Notes:
More fields may be available via dynamicdata ..