Title: A Critique of Some Code Revisited

Author:

Date: 03 November 1998 13:15:28 +00:00 or Tue, 03 November 1998 13:15:28 +00:00

Summary:

Body:

Dear Francis & Spencer,

Please feel free to forward this e-mail to the Harpist & do anything you like with the sources provided - including throwing it away, if you wish!

extern char *strtok(char * s1, const char * s2);
        strcpy(inifilename, chr_ptr);
        if (chr_ptr = strtok(inifilename, ".")) {
          chr_ptr = chr_ptr + (strlen(chr_ptr)+1);
        }
        strcat(inifilename, ".ini");

The Harpist is incorrect to say that strtok returns the address of the character after the dot if there is one else it returns NULL. The only circumstance (see addendum/corrigendum at end) in which strtok can return NULL when s1 is non-NULL, is if s1 contains only characters from s2 (which includes the case where s1 is a zero-length string). Therefore, inifilename would have to be a string consisting of just 0 or more "." characters for a NULL return.

strtok is a complex function that has potentially undesirable output unless you know precisely what it does, including behaviour regarding such things as consecutive separator characters in the source string and which characters it overwrites in the source string. strtok also has the disadvantage of requiring static storage to remember pointers for future calls which can cause threading problems, so my personal preference is to avoid strtok and use the other related library calls directly: strspn, strcspn and strpbrk - which many people I've spoken to don't appear to have heard of.

I've thrown away the entire function as it stands and replaced it with the following collection of functions that should do what the programmer originally wanted except for the method of storing the contents of the file in memory. I don't like strupr as it infringes on the C library's namespace, so I've changed the name used. This set of functions should be more portable, with the filename-style specific code containing in just the single set of declarations at the top of validate_ini_filename(). As the Harpist found too, the format of the file is in indeterminate - some of the comment in the original source seems to apply to the file and some to the memory buffer, and then there's the magic lAsTeNtRy which is completely undocumented.

I realise that Spencer said that his manager would rewrite everything totally and not give understandable explanations, but what I hope that my version of validate_ini_filename() demonstrates is that code can be written in such a way that it follows a natural language version of the algorithm directly. For example, the statements:

  spn = strcspn(ptr, extn_separator_str);
  if (spn > max_baselen) return name_too_long;

would match with "if the base length is longer than maximum allowed, then it should be rejected with a 'too long' error.". Hopefully, it should be obvious that the code matches that description rather better than:

  strtok(ptr, ".");
  if (strlen(ptr) > 8) return name_too_long;

(although this does rely on the person reading the code knowing what the strcspn function does!)

The removal of magic numbers and of complex pointer arithmetic should make it easier to read through the functions and understand what each one is doing, even in the absence of many of the comments.

I've added a short main() function to drive the rest of the code and dump out the table at the end of the run.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stddef.h>
#include <ctype.h>

typedef struct lookup_table_entry {
  struct lookup_table_entry *next;
  const char *value;
  char key[1];
} lookup_table_entry;

typedef struct lookup_table {
  lookup_table_entry *head;
} lookup_table;

enum errors {
  success,
  name_too_long,
  name_missing,
  file_missing,
  insufficient_RAM,
  line_too_long
};

lookup_table *new_lookup_table(void) {
  lookup_table *t = malloc(sizeof(lookup_table));
  if (t != NULL) t->head = NULL;
  return t;
}

enum errors add_lookup_pair(lookup_table *t, 
                  char *key, char *value) {
  const size_t key_len = strlen(key) + 1;
  const size_t value_len = strlen(value) + 1;
  const size_t te_len = 
            offsetof(lookup_table_entry, key);
  lookup_table_entry *te = 
        malloc(te_len + key_len + value_len);

  if (te == NULL) return insufficient_RAM;
  te->next = NULL;
  te->value = te->key + key_len;
  memcpy(te->key, key, key_len);
  memcpy(te->key + key_len, value, value_len);
  if (t->head == NULL) t->head = te;
  else {
    lookup_table_entry *te_srch = t->head;
    while (te_srch->next != NULL)
                   te_srch = te_srch->next;
    te_srch->next = te;
  }
  return success;
}

static void strupr(char *ptr) {
  for (;*ptr; ++ptr) if (islower(*ptr)) *ptr = toupper(*ptr);
}

/* Caller must always free the value placed into 
     *filename.  This code  guarantees that the value
      will always be valid to pass to free()
 */
enum errors validate_ini_filename
                     (const char *arg, char **filename) {
  /* Filename structure definitions and limits */
  const size_t max_baselen       = 8;
  const size_t max_extnlen       = 3;
  const char   dir_sep           = '\\';
  const char   extn_sep          = '.';
  static const char extn_separator_str = ".";
  static const char dflt_ext[]   = ".ini";

  size_t spn;
  char *buffer, *ptr;

  /* Allocate enough memory, set return value to 
        point to result  or NULL so that *filename is 
       always valid for safety.  Ensure  that there is 
       room for the extension we may want to add.
   */
  *filename = buffer = 
        malloc(strlen(arg) + sizeof(dflt_ext));
  if (buffer == NULL) return insufficient_RAM;
  strcpy(buffer, arg);

  /* Locate leafname */
  ptr = strrchr(buffer, dir_sep);
  if (ptr != NULL) ++ptr; else ptr = buffer;

  spn = strcspn(ptr, extn_separator_str);  
/* find length of base name */
  /* Check part before the extension separator - is it too long? */
  if (spn > max_baselen) return name_too_long;
  /* Was it zero length? eg.  ".ini" */
  if (spn == 0) return name_missing;

  if (ptr[spn] != extn_sep) {
    /* There wasn't an extension, so add one */
    strcpy(ptr + spn, dflt_ext);
  }
  else if (strlen(ptr + spn + 1) > max_extnlen) {
    /* There was an extension already but was too long */
    return name_too_long;
  }
  return success;
}

enum errors read_ini_file(FILE *f,
               lookup_table *table){
  char buf[80];
  enum errors result = success;

  while (result == success 
    && fgets(buf, sizeof(buf), f)) {
    char *statement, *value;
    /* Test for truncated line */
    char *ptr = strpbrk(buf, "\n\r");
    if (ptr == NULL && !feof(f)) {
      result = line_too_long;
      continue;
    }
    /* Strip unnecessary stuff from end of value.
          Search for  any line or value terminators. */
    ptr = strpbrk(buf, "\n\r#;");
    if (ptr != NULL) *ptr = '\0';
    /* delimit the statement */
    statement = strtok(buf, "=, ");
    if (statement == NULL 
        || !isalpha(*statement)) continue;
    strupr(statement); 
              /* XXX: non ANSI str[a-z].*  */
    /* Find value - if present */
    value = strtok(NULL, "");
    if (value != NULL) {
      char *ptr;
      /* Skip leading whitespace */
      value += strspn(value, "=, ");
      /* Chop off trailing whitespace */
      ptr=value;
      while (*ptr  && !(isspace(*ptr)
             ||iscntrl(*ptr)))  ++ptr;
      *ptr = '\0';
      result = add_lookup_pair(table, statement, value);
    }
  }
  return result;
}


int ini_proc(const char *arg,
                 lookup_table *table) {
  enum errors result = name_missing;

  if (arg != NULL) {
    char *filename;
    result = validate_ini_filename(arg, &filename);
    if (result == success) {
      FILE *inifile = fopen(filename, "r");
      if (inifile != NULL) {
        result = read_ini_file(inifile, table);
        fclose(inifile);
      }
      else {
        result = file_missing;
      }
    }
    free(filename);
  }

  return result;
}

int main(int argc, char *argv[]) {
  int result = insufficient_RAM;
  lookup_table *table = new_lookup_table();
  if (table != NULL) {
    lookup_table_entry *lte;
    result = ini_proc(argv[1], table);
    for (lte=table->head; lte; lte=lte->next) {
      printf("%s --> %s\n"
                  , lte->key, lte->value);
    }
  }
  return result;
}

Addendum & Corrigendum

At first glance you are both wrong ☺ The Harpist's assertion is incorrect as is your assertion that strtok can only return a null pointer if s1 is non-null. By my reading it can also do so if s2 points to the end of the string (I think that means that it points one beyond the end)

Did I really say that? ☺ What I meant was:

strtok(s1,s2) can return NULL under two circumstances:

s1 != NULL and s1 contains only 0 or more characters from s2.
s1 == NULL and the remainder of the tokenised string is only 0 or more chars from s2.

If I understand it, the first call (with a non-null pointer) sets it up and subsequent calls return tokens until there are no more at which stage it returns a null pointer.

The initial call also returns the first token, if there are any. I think this just underlines that complexity of strtok! This, together with the static storage and the obliteration of the source string, is the reason why I prefer to use strspn, strcspn & strpbrk and do any overwriting myself. After all, all the implementation of strtok on my machine does is call the first two with bits of glue to remember points and write out the zero bytes.

The other issue which I found was not addressed to my satisfaction by many of the books I have read is what happens when you change the separator characters. For example, the sequence:

strtok("f1:f2::,f3,", ":");
strtok(NULL, ":");
strtok(NULL, ",");
strtok(NULL, ",");
strtok(NULL, ",");

returns what sequence of results? Certainly at first, I was unsure what I would get - but it should return "f1", "f2", ":", "f3", NULL.

Notes:

More fields may be available via dynamicdata ..

Journal Articles

Title: A Critique of Some Code Revisited

Addendum & Corrigendum