Programming Topics + CVu Journal Vol 8, #1 - Feb 1996
Browse in : All > Topics > Programming
All > Journals > CVu > 081
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: String Theory

Author: Administrator

Date: 03 February 1996 13:15:26 +00:00 or Sat, 03 February 1996 13:15:26 +00:00

Summary: 

Body: 

Strings in C are at once both a major convenience and one of the greatest sources of misunderstanding, and hence of bugs. C is fairly free and easy with a number of its features, but there can be a price to pay for living in a free state.

In this series I hope to introduce the fundamentals of C strings, focus on some of the pitfalls and, hopefully, answer a few questions.

Begin at the beginning

So what is a string in C? It is a sequence of non-null characters terminated by a null character. Although implemented as an array, a random access data structure, for many purposes we must treat a string as a sequential data delimited structure. For example, to determine the length of a string we start at the zero'th character and move forward until we hit the null.

So, if in principle, a string is an array, albeit one with a special interpretation, we should be able to create one from scratch and print it out:

#include <stdio.h>
int main(void)
{
  char message[6] = { 'h', 'e', 'l', 'l', 'o', '\0' };
  puts(message);
  return 0;
}

The null delimiter at the end means that we must always remember that the space occupied by a string is one more than its effective length. Forgetting this provides one of the most fruitful sources of errors in C.

Arrays can be declared in C without specifying their length. In this case an initialiser must be present so the compiler can determine the length for itself. The following is equivalent to the code above:

#include <stdio.h>
int main(void) 
{
  char message[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
  puts(message);
  return 0;
}

This is less error prone and eases change:

#include <stdio.h>
int main(void)
{
  char message[] =
  {
    'h', 'e', 'l', 'l', 'o', ' ',
    'w', 'o', 'r', 'l', 'd', '\0'
  };
  puts(message);
  return 0;
}

Comfortably null

Something worth clearing up before we go any further is the concept of nullness. In the context I have discussed so far, the null character is the character with an integer value of zero. In C this is true regardless of the character set used. The standard way of writing this is '\0', as above. I could equally well have written an integer 0, but from the reader's point of view this is not as clear.

In text you will often see the null character referred to as NUL. This is its ASCII name, and should not be confused with NULL, which is the standard C macro for indicating a null pointer - noting also that NULL is not the same thing as the null pointer. You should never, ever write:

char wrong_headed = NULL;

Although it may compile on some systems, this is coincidence and is not guaranteed. NULL is intended to be used as a pointer, so do not assume anything else about it - it may be defined as 0, 0L and, in C but not C++, (void*)0.

In case you are tempted to write the following:

#define NUL '\0'

Don't bother: it buys you nothing. It is more likely to confuse others than help them. Using the literal '\0' is far clearer as it is unambiguous, cannot be redefined, and says the same thing to all programmers.

Taking it literally

Clearly, initialising arrays character at a time is tedious. Recognising the frequent use of strings, there is a convenient short hand for initialising an array of char as a string:

char message[] = "hello world";

We are not assigning the literal on the right hand side to the array on the left: the literal in this context is a simply a short hand for the aggregate initialiser we were using before. Note also that initialising in this way guarantees you a null terminating character: it's implied with the double quotes; you get it for free. Occasionally you see people write rubbish like:

char message[] = "hello world\0";

Apparently this is done "just to be safe". This is nonsense, and confers no more safety on the code than putting a St Christopher's in the compiler's original packaging box. Although there are certainly uses for double null terminating a string, the code above advertises a lack of C knowledge rather than a clever data structure.

Next time

There are many topics to cover: arrays versus pointers, string lifetime, const-ness, comparison, manipulation, dynamic allocation, and a number of popular string myths, to name but a few. I won't commit myself just yet as to what the next article will be on: I will await any comments, questions or requests you might have (kevlin@two-sdg.demon.co.uk). In the absence of any feedback, whim will dictate the content of the next "String Theory".

Notes: 

More fields may be available via dynamicdata ..