Title: ISBNs Revisited

Author:

Date: 04 October 1999 13:15:33 +01:00 or Mon, 04 October 1999 13:15:33 +01:00

Summary:

Body:

This 'competition proved to be the most popular that I have ever run. It seems to have captured the imaginations of a lot of you. It encouraged several to dip their toes in the water. But it has also presented me with a problem in that I cannot find space for all your submissions. This is sad because I like to publish what you have made the effort to write. Every entry had merit and the approaches were varied. I think there is still room for something better. I shall be tackling this problem in my workshop on the Friday afternoon of JACC. The final result from that session will go up on our web site. To squeeze as much in as possible I have taken some liberties with the code layout. If you hate that, please put up with it.

Anything that was submitted will be on this issue's disk. Incidentally this will be the last regular disk with issues of C Vu. In future Code and other relevant material will be placed on our web site. I am not sure whether this should be in a members only area or not. What do you think?

Another issue arises out of dropping the disk and that is how we should cater to members without Internet access. I hope that a few of you will volunteer to provide such material to Internet-less members local to you.

Finally, I hope you will all agree that David Beard's entry is a deserving winner. If David contacts me I will see what would be most suitable as a prize book for him.

From the Harpist

Francis consulted me about the various entries to this competition. One thing struck me as curious. Every case processes the putative ISBN left to right. Now given the existence of SBNs (9 characters) and ISSNs (8 characters) this seems to be lacking in thought. In addition, no-one seems to have computed the last symbol from the rest. As the last symbol is a special case it would seem to make sense to treat it differently. Actually if you process right to left, you might as well deal with the exception right away. But it is always worth thinking about computing a check digit and then comparing. Note these comments apply generally though they maybe less suited to some languages than to others.

Consider the following function:

int checksns(const char source[]){
  int end = strlen(source);
  int multplier = 2;    /* start with penultimate character */
  int total = 0;
  if(source[end-1] == 'X' || 
        source[end-1] == 'x') total =10;
  else total = source[end-1]-'0';
  int i;
  for(i = end-multiplier; i>=0;++i)) {
     total += (source[i] - '0') * multiplier;
    ++multiplier;
  }
  return (total % 11);
}

This assumes that the relevant code (ISBN, SBN, ISSN) has already been compacted by stripping out dashes, spaces etc. but it does not require that you know which of the codes you are processing.`

An awk solution from Peter S Tillier `<petert@eq1152.demon.co.uk>`

This is in response to your request for other language solutions to the ISBN validation problem I have often thought that the programmers that I meet don't think about using different languages to solve a particular problem. For example, awk and perl are very useful for the purpose of validation of data and, depending on the type of data, for reformatting data.

This code for ISBN.awk took about 20 minutes to type initially - then I added comments.

# ISBN.awk - check a file of ISBN/SBN nos for validity (PST 19990711)
#            syntax: awk -f ISBN.awk [[filename] ...]
# An ISBN is formed of 9 digits followed by one digit  or an upper- or lower-case letter "X" representing ten.
# The ISBN is valid if:    mod(ISBN[1]*10 + ISBN[2]*9 .. +ISBN[10]*1,11) == 0
# When used the ten characters of the ISBN are separated  in different ways by either hyphens  or spaces .
# An SBN is formed of nine characters and can be coverted  to an ISBN by adding an initial zero, "0"
# Rule 1 - save original value
    {
    ISBN = $0; isSBN = 0
}
# Rule 2 - replace "-" and spaces by nulls.
/[ -]/ {
    gsub("[ -]","")
}
# Rule 3 - check length & determine ISBN or SBN
    {
    if ((length > 10) || (length < 9)) {
        print ISBN " is invalid. The length is wrong."
        next
    }
    else if (length == 9) { # it's an SBN, so add leading "0"
        $0 = "0" $0
        isSBN = 1
    }
}
# Rule 4 - check for nine digits plus digit, "x" or "X".     NB for POSIX awk the pattern [0-9]{9}[0-9xX]
#          would be more explicit, but unnecessary as  we  know from Rule 3 that we have exactly 10 chars.
/^[0-9]+[0-9xX]$/ {
    total = 0
    for( i = 1; i < 10; i++ )
        total += (11 - i)*substr($0,i,1)
    if ((substr($0,10,1) == "x") || (substr($0,10,1) == "X"))
        total += 10
    else
        total += substr($0,10,1)
    if ((total % 11) != 0) {    # not a valid ISBN or SBN
        print ISBN " is invalid.  The check-sum is wrong."
        next
    }
    if (isSBN)
        print ISBN " is a valid SBN."
    else
        print ISBN " is a valid ISBN."
    next
}
# Rule 5 - invalid by virtue of incorrect formation
    {
    print ISBN " is improperly formed."
}

The ISBN.awk program puts all its validation messages out on stdout. In a "real life" situation I would direct the error outputs to stderr and the valid outputs to stdout using redirection on the appropriate print statements.

Peter also provided a solution for ISSN's which is on this issue's disk.

From James Holland

I have taken the fragments of the student's code and assembled them into a working program. I have tried to keep as much of the original code as possible and to keep modifications to a minimum. I will not comment on the more minor changes I have made, instead I will outline what I consider to be the more important issues.

There is a problem with the first if-statement of the validate() function. As Francis pointed out, the function does not exclude the 'X' from the first nine characters. I have attempted to alter the student's code to remedy this. I have introduced the idea of a digit counter that keeps track of the number of characters accepted. The 'X' is only permitted when the digit counter has a value of 9 (i.e. the tenth character). I have modified the if-statement accordingly as shown in file modified\isbn.c. This part of the program is becoming rather cumbersome and difficult to follow. When this happens it is worth investigating to see if there is a simpler way to tackle the problem. Francis provided a clue in has original commentary. I will, however, persevere for the time being.

Another problem with the code is that any dashes ('-' characters) the user cares to enter are stored in an array. The problem is that we do not know how many dashes the user will enter. The size of the input buffer will have to be finite and so a limit will have to be placed on the number of dashes accepted. I have increased the size of the input buffer to something quite large in the hope that the user will never enter such a large number of dashes. If the user does enter a large number the validate() function will terminate by means of the outer for-statement. The loop will terminate when the number of characters in the buffer is one less than the size of the buffer. The last element of the array has to be reserved for the null terminator.

The program is becoming very messy. There must be a simpler way. The method I have adopted (in my version of the program) is not to store the dashes at all. They are not wanted. The ISBN can be written back to the user without the dashes. This will also remove the need for a function to strip out the dashes. While we are on the subject, there is a slight deficiency in the students dashes() function.

The dashes() function will only remove every other dash from a contiguous block of dashes. The loop must not increment if a dash is detected as it may be replaced by another dash during the string copy. I have removed the incrementing of the loop variable from the for-statement and made it part of the if-statement. I have also made the loop exit when the null terminator character is encountered. This is a more natural way of handling strings.

As far as I can make out the function for checking the numbers for ISBN compliance, isbntest(), works perfectly. It is, however, somewhat long winded. I suggest a shorter method in shown in new_c\isbn.c.

When comparing ASCII characters with constants it is better to express the constants in the form of characters, i.e., '0' and '9' rather than their ASCII values. This will make the code easier to read. There is a standard macro, isdigit() that can de used to detect an ASCII digit character. This would be simpler and less error prone than testing for a range of values.

After returning from isbntest(), the student's code obtains the remainder after dividing the returned value by ISBN (14). This is not required. The remainder after division by 11 has already been found. The line result = (number % ISBN); is not required. The variable in the following if-statement should then be number as opposed to result.

I now offer my solution to the problem of verifying ISBNs. The problem is broken down into functional blocks, each represented by a separate function. Prototypes of these functions are provided. Function main() is pretty simple. The program flow is controlled by the do-while loops. All the hard work is done in the functions.

The function get_ISBN() is responsible for obtaining a potential ISBN from the user. To start, it displays a message explaining what the user is expected to do. The business of accepting the various parts of the ISBN is divided into three stages and controlled by three do-while loops. The first loop is used for the first nine characters of the ISBN and accepts only the digits '0' to '9'. Notice the use of the standard macro isdigit() for this purpose.

When the ninth character has been entered the second do-while loop takes over. This loop is practically identical to the first except it accepts the 'x' along with the digits '0' to '9'. As soon as the last character is entered the program flow drops into the third and last do-while loop.

The third loop simply waits until the user finishes the ISBN by entering a carriage return. Notice that all three loops accept and display dashes and spaces but do not transfer them to the digits buffer.

It could be argued that the common parts of these loops could be combined in some way. This may be possible but to keep everything simple and obvious has advantages for reliability and maintenance.

The program now returns to main() where user_confirm() executes. This displays the ISBN and asks the user if the number is correct and awaits a reply. The function get_reply() is called upon to ensure the user answers appropriately.

The program now executes verify_ISBN() which performs the task of checking ISBN conformance. Despite the apparent difficulty of the calculations this routine turns out to be quite simple. The function first checks to see if the last character is an 'x'. If it is an 'x' it is replaced by a character that is numerically one greater than '9'. What character this value represents is not important. The next part of the function scans the elements of the digits array. For each character it subtracts the value '0'. This leaves the numerical value of the character. This value is then multiplied by its positional value. The result is accumulated in the check_sum variable. The final task of verify_ISBN() is to determine if there is any remainder after division by eleven.

The program returns to the display_result() function. The user is informed as to the outcome of verify_ISBN().

The outer do-while loop is controlled by the function another_go(). A prompt is displayed and the user replies. If the user enters a 'Y' or 'y' the program goes around the outer loop one more time. If the user types a 'N' or 'n' the program terminates. Any other response results in a warning tone and a request to enter a 'Y or 'N'.

This completes the detailed description of the program. There are, however, a couple of features that may be of some interest. There is only one global variable, the array that holds the ISBN. It is generally a good idea to keep the number of global variables as low as possible. This keeps functions as self contained as possible.

It is also considered a good idea not to have too many preprocessor #defines. It turned out that my program does not need any, but this has left some 'magic' numbers scattered about. I have not tried to remove them as I think their use is reasonably obvious. In fact I think, in this case, it is best to keep them. Replacing the 'magic' numbers with names can be even more obscure. The only problem with 'magic' numbers is that when the program is changed a search has to be made for each one. Whereas, if #defines had been used only one place in the program need be changed.

I also provide a version of the program written in Pascal, new_pas\isbn.pas. I must admit that it is a direct copy of the C version with the syntax changed as required. There are, however, some points of interest. The digits array needs to be 10 elements long. This is because Pascal has length encoded strings and so does not require a null terminator character.

C has do-while, Pascal has repeat-util. This meant that logical expressions had to be negated. Pascal has Boolean types and these were used where appropriate throughout the program. Pascal does not have a break statement. To keep the same structure of the C version of the get_reply() function I have cheated a bit by introducing a variable named break and modifying the while loop slightly. I think it worked reasonably well in this case but had the code been more complicated I may have had to rewrite the function using proper Pascal constructs.

Pascal is more particular about assigning variables of one type to variables of another type (it is more strongly typed than C). I have had to circumvent this type checking with a few type casts here and there. This probably means I have not been programming in the true spirit of Pascal. Had I to write the program in Pascal from scratch things may have been different. I have not programmed in Pascal for quite a few years and I found it difficult to suddenly swap languages like this. I even had to look up some of the syntax in the manual. Still, it was a good exercise.

/******************************************************************
*  Author: James Holland                                                     
*  Date: 18th July 1999                                                      
*  Compiler: Borland C/C++ version 5.02.                   
*  This program requests an International Standard Book Number from the user and then confirms that the ISBN *   is valid. The program was written for the Code Review Competition, C Vu volume 11 No 5 (page 36).               
******************************************************************/

void get_ISBN(void);      int user_confirm(void);      void display_result(const int);
int verify_ISBN(void);       int another_go(void);    int get_reply(void);
char digits[11];

int main(void) {
  do  {  /* While the user wants to enter another ISBN. */
    do  {   /* While the user does not accept the number. */  get_ISBN(); } while (!user_confirm());
    display_result(verify_ISBN());
  } while (another_go());
  return 0;
}


void get_ISBN(void) {
  char character;
  int digit_counter = 0;
  clrscr();
  puts("Please enter an International Standard Book Number.");
  do  { /* Go around this loop until nine digits have been entered. */
    character = getch();
    if (isdigit(character))  {   putch(character);  digits[digit_counter++] = character;  }
    else  if (character == '-' || character == ' ') putch(character);
  } while (digit_counter < 9);
  do  { /* Go around this loop until a character or an X has been entered. */
    character = getch();
    if (isdigit(character) || character == 'x' || character == 'X') {
      putch(character);
      digits[9] = character;
      digit_counter++;
    }
    else  if (character == '-' || character == ' ') putch(character);
  } while (digit_counter < 10);
  do  {  /* Loop until the user enters carriage return. */
    character = getch();
    if (character == '-' || character == ' ') putch(character);
  } while (character != 13);
}

int user_confirm(void) {
  clrscr();       printf("Is this number correct? %s", digits);
  return get_reply();
}
int verify_ISBN(void) {
  int digit_index;      int check_sum = 0;
  /* If the last digit is an x change it to the value 1 greater that '9'. */
  if (digits[9] == 'x' || digits[9] == 'X') digits[9] = '9' + 1;
  /* Calculate the sum of products of the ISBN */
  for (digit_index = 0; digit_index < 10; digit_index++) 
    check_sum += (10 - digit_index) * (digits[digit_index] - '0');
  return check_sum % 11 == 0;
}
void display_result (const int result) {
  clrscr();
  if (result)  puts("You have entered a valid book number.");
  else  puts("You have entered an invalid book number.");
}
int another_go(void) { printf("Would you like to enter another ISBN?"); return get_reply(); }

int get_reply(void) {
  char reply;
  while (1)   {
    reply = getch();
    if (reply != 'N' && reply != 'n' && reply != 'Y' && reply != 'y')  
          printf("\a\nPlease enter Y or N...");
    else  break;
  }
  return reply == 'Y' || reply == 'y';
}

From: "Catriona O'Connell" `<catriona38@hotmail.com>`

Here is a critique of Phil Collings' ISBN code published in C Vu 11.5. I hope that Phil finds the comments useful.

The variable names in the #defines are not very informative. If something is defining a maximum length then maybe it should be made clear in its name. MAX may also be a less than optimal name as many programmers use MAX to return the maximum of two values.
The validate() function takes length as an argument but has fixed size buffers internally. This may lead to buffer overflow problems. The sizes of objects should be determined in one place consistently. Here we have length and ISBN potentially determining the number of characters that will be processed, but no checks are made to ensure consistency.
Magic numbers are used extensively in the code that validates the characters. It would be better to use the standard isxxxx functions. This would also provide greater portability and trans-parency. Not everyone works with ASCII character sets. I would also put brackets around some of the test expressions just to make it clear to future readers what the test intentions were.
Using printf() to beep at the user is a little bit heavy especially when later puts() is used to print more text to the screen.
The function dashes() is declared to return an int, yet it is neither set nor checked by the caller. Either define as void or do something with the return value.
I am very concerned about the use of strcpy() to move parts of a string within itself. This amounts to moving overlapping strings and strcpy() is not the function for this because the result may be undefined. You have no guarantee about how the characters will be moved. If you really do have to save memory then use the memmove() function, which is guaranteed to move overlapping storage correctly. Failing that, be a devil and use a few more bytes of storage to create a temporary string and then copy it back into start.
The line result = (number % ISBN) is erroneous and pointless. The number returned by isbntest() is already modulo 11, therefore taking it modulo 14 (ISBN) makes no difference. However, there is no requirement to take any number modulo 14 in the algorithm.
getch() returns an int, not a char but type conversion will handle the difference.
I do not see why the end[] array is initialised since its elements are not modified, but replaced. In fact there is no requirement to store the values in the end[] array as at each stage the value could be added to sum directly.
The 11 lines of if-statements should either be replaced by a select structure or a more algorithmic approach.
No checks are in place to actually check that enough valid characters have been entered. In the worst case a user can enter a string of dashes and this will be accepted by the code, although it will eventually report that it was a bad value. Similarly nothing is done to enforce the rule that X may only appear as a check digit, which is even more dangerous as a malformed ISBN may then be accepted as valid.

The following are my attempts at improving the code as presented in C Vu 11.5. Three routines are presented as I thought of different techniques for handling the input to the function in a secure way. The actual calculation of the ISBN is the same in each routine.

I have tried to make this a simple, robust and secure solution. In my current job, I am responsible for ensuring that we identify and apply the patches to software problems that may be exploited by the less ethical members of humanity. Buffer overflows seem to be one of the more common problems as is not validating the input. I have restricted my attentions to a string validation func-tion and have not considered an interactive input.

The first solution (checkISBN) allocates a 10 byte buffer and copies into it a maximum of 10 valid ISBN characters, excluding separators and extra white space. The technique of defining the valid characters is one I picked up from writing secure CGI scripts. The length of an ISBN, (excluding separators) is 10 characters and I have defined this with an enum to reduce the number of "magic numbers" in the code. I have taken your suggestion that the final digit (checksum) is treated separately from the main body of the number.

The second solution (checkISBN2) dynamically allocates a buffer to hold the processed input string. Subsequent processing is as before.

The third solution (checkISBN3 - there is a pattern here) also dynamically allocates a buffer, but we use the knowledge that the buffer will not contain more characters than there are in the input string. This reduces the (slight) overhead in the calls to realloc().

int main(){
  /*  Check ISBN.
   * Simple driver code to test the ISBN routines.  checkISBN2 may be replaced by checkISBN but it is less      * accurate.  Catriona Siobhan O'Connell 10 July 1999    */
  int checkISBN(char *strISBN);    int checkISBN2(char *strISBN);
  int checkISBN3(char *strISBN);  
int RetCode = checkISBN3("0-13-110362-8");
  printf("ISBN = 0-13-110362-8 Return Code = %d\n",RetCode);   /* Valid */
/* a set of similar tests snipped to save space */
  return(0);
}

int checkISBN(char *strISBN) {

   /*   Validate an ISBN number.
    *  Input: Null terminated string representing the ISBN.
    *  Output: Integer return code.
    *         0   Success
    *         >0  Correct structure but not a valid number.
    *         -1  Incorrect number of valid ISBN characters.
    *         -2  X used in a non-terminal position.
    * Catriona Siobhan O'Connell 10 July 1999  */
   
  char *allowed_characters = "0123456789Xx";
  char arrISBN[10];
  int ch;   int sum = 0;  int multiplier = 10;  int i=0;   int character_count = 0;
  enum ISBN_CONSTANTS {ISBN_LENGTH=10};

  /*  Extract the valid characters from the ISBN passed as an argument.  This technique allows for the separator to be any character - including  none. Only the first ten valid characters will be copied into the   arrISBN[] array.            */  

  while ((ch = (*strISBN++)) != '\0') 
    if (strchr(allowed_characters,ch) != NULL)    {
      if (i < ISBN_LENGTH)  arrISBN[i] = toupper(ch);

I am a little uncomfortable with the logic of this section. Francis

      i++;
    }  }
  /*  First validity check - is it long enough?  */
  if (i != ISBN_LENGTH) {
    fprintf(stderr,"ISBN does not contain exactly 10 permitted characters\n");
    return(-1);

It would have been nice to have had the magic return values hidden by some such device as an enum. And it is a good idea to adopt a coding style that minimises multiple return statements. Francis

  }
  /*  Assume that we are working with a valid ISBN.   This loop is over the first nine characters which should     *   all be numeric.  If they are not numeric then we have an invalid ISBN.  */
  for (i=0;i<ISBN_LENGTH - 1;++i)  {
    if (isdigit(arrISBN[i]))   sum += (arrISBN[i] - '0') * (multiplier--);
    else {
      fprintf(stderr,"X is only permitted as a final character\n");
      return(-2);
    }
  }

  /*   So far so good. The last character of an ISBN may be 0..9 or X (10).   Add the check digit to the sum      *  already calculated. */
  if (isdigit(arrISBN[ISBN_LENGTH -1 ]))sum += arrISBN[ISBN_LENGTH - 1] - '0';
  else if (arrISBN[ISBN_LENGTH -1 ] == 'X')   sum += 10;
  /*   The final sum should be divisble by 11. We return the sum  modulo 11, which will give 0 for a correct     *  ISBN and a positive  number for an incorrect ISBN.  */
  return (sum % 11);
}

I snipped the code for checkISBN2 checkISBN3 but you can find it on this issue's disk. In general I would not use dynamic storage for this kind of problem because the amount of storage is relatively small and should not vary.

However I have extracted the following line because it highlights a couple of coding problems:

  arrISBN = (char *)realloc(arrISBN, BufferSize * sizeof(char));

The first is a minor one that the compiler will handle by removing it. The sizeof(char) is 1 by definition. and so asking the compiler to calculate it (sizeof is evaluated at compile time) is unnecessary and just obscures the code.

More serious is the cast. I know that realloc() (along with malloc and calloc) returns a void* but not only does C not require that a void* be cast to other pointer types (unlike C++) but doing so could obscure other problems. In simple terms, be wary of unnecessary casts. (Generally in C++ you should use new because it provides a typesafe return type.) Francis

From: "R. Dixon" `<p.m.dixon@btinternet.com>`

Life is never simple. I thought that from the sample numbers provided, and by looking in several books, that ISBN numbers always had the length of 23 with 10 digits and three spaces or hyphens and wrote a program that included that as a check. I then thought that perhaps there is other information embedded in the number, perhaps there is, such as the subject. I therefore looked to see if I could find this. No Luck! In doing this I found an ISBN number with two spaces and a length of 12. As I do not know, and have not been able to find out, any information about the structure of SBN numbers, apart from the fact that they only contain 9 digits, I have not been able to make the checks for these numbers as stringent as I would otherwise.

The right hand digits of ISBN numbers encode the publisher/imprint (though how many digits apply varies. So 0-201 are all Addison Wesley books, 0-471 are John Wiley, 1-56592 are O'Reilly.) I believe that there was originally a plan to have the leading digit give a geographical code but I do not think it ever came to anything. You can convert an SBN into an ISBN by prefixing a zero. The format of an ISBN is entirely optional though most publishers use: d-publisher code-book code-checkdigit.

I attach my effort in C and also one in Pascal. The latter because it seemed very suitable for a problem of this type because of sets. It also has the merit of making a smaller executable file, 5568, against 6028 using the Watcom C++ compiler, 6699 using the Microsoft C compiler and 7050 using Turbo C++! I wondered whether this had anything to do with the different program structures so wrote two programs as near identical as the two languages would allow and still the Pascal executable was smaller.

If you are interested Rob's code is on the disk. Before you test the code with your compiler you will need to remove a couple of coding errors. I also wonder to what extent the metrics of the C/C++ versions reflect the way Rob has coded this problem. And the optimisation switches may have a considerable influence. In practice neither code size - within reason - nor speed is going to matter much with this program. Clarity of code, time to completion and ease of maintenance are going to be more important.

Mick Farmer `<mick@dcs.bbk.ac.uk>`

Here's my hack at the ISBN numbers program presented by Paul Collings. I agree that the validation should be separate from the input. I've also divided the validation into two parts for simplicity. It's a bit lax about multiple hyphens and/or spaces, but I wanted to keep it short.

#define MAX 256
/* Given a 10-digit number, verify checkdigit by
  * returning TRUE or FALSE.  */
int isisbn(char *s) {
  int i;
  int sum = 0;
  for (i = 0; i < 9; i++) {
    sum += (10-i)*(s[i]-'0');
  }
  if (s[i] == 'x')   sum += 10;
  else  sum += (s[i]-'0');
  return !(sum%11);
}
/* Given a string, validate characters and put into canonical form by removing hyphens and spaces. A final 'X' is replaced by 'x'.  Return length.  */
int canonical(char *s) {
  int i;
  int j = 0;
  for (i = 0; s[i]; i++) {
    if (isdigit(s[i])) {
      s[j++] = s[i];
    } else if (s[i] != '-' && s[i] != ' ') break;
  }
  if (!s[i]) {
    s[j] = 0;
    return j;
  }
  if (!s[i+1] && (s[i] == 'X' || 
                s[i] == 'x')) {
    s[j++] = 'x';
    s[j] = 0;
    return j;
  }
  return -1;
}
/* Test harness. */
int main(void) {
  char buf[MAX];
  while (1) {
    char *p;
    printf("ISBN: ");
    if (!fgets(buf, MAX, stdin)) {
      fprintf(stderr, "Error in fgets\n");
      continue;
    }
    if (!(p = strchr(buf, '\n'))) {
      fprintf(stderr, "Missing newline\n");
      continue;
    }
    *p = 0;
    if (canonical(buf) != 10) {
      fprintf(stderr, 
          "Invalid ISBN (chars or length)\n");
      continue;
    }
    if (!isisbn(buf)) {
      fprintf(stderr, 
            "Invalid ISBN (checkdigit)\n");
      continue;
    }
  }
}

A ramble around ISBN numbers by David Beard

I am a Communications Engineer or, as some would have it, once was. These days I lecture, mainly on data networking topics, mainly to computing students. This, coupled with a mild dyslexia as far as mnemonics are concerned, caused Paul Colling's article, "ISBN Numbers" in the July 1999 C Vu issue to catch my attention. Having determined it wasn't about ISDN(!), Paul's article and Francis's comments nevertheless retained my interest and provoked a number of thoughts some of which, to avoid rambling too far afield, I will save for a possible, later, article.

I started learning C so long ago - at least 16 years- that I can't remember precisely when I started. I'm still learning though, and I don't imagine it will ever be otherwise. Starting out, I can recollect particular problems with the for statement and Paul's code suggests he had similar difficulties at the time he wrote it. Looking back though, contrasting Paul's efforts with my own at a similar stage, he rather puts me to shame. I should also bow my head low because I confess I'd never previously given any thought to ISBN numbers or their structure. Now I know, so thanks to Paul for that, for the reminiscences he unwittingly provoked, for an interesting article and the for the stimulation it and Francis's comments provided.

Enough of the personal waffle, let me map out the general route for the rest of this article. I detour around some, though not all, of the invitations Francis made in his footnotes and confine myself to offering criticism of Paul's code, providing a rewrite - still in C - and making a few observations of a slightly more scenic nature here and there. The first observation of this kind I would make is that I agree wholeheartedly with Francis that separation of data entry and validation is desirable. I would go further and argue that task separation in general is almost always desirable.

Paul's overall design suffers largely because it does not separate tasks and is rather monolithic as a result. This a inevitable consequence of the manner in which his validate() function calls - directly or indirectly - the other functions. The design of these assumes they will only be called via validate() and this limits both their usefulness and that of the code overall. The use of getch() within validate() might be viewed, according to taste, as bonding the monolith to raw keyboard data entry or as a further extension of the monolith - what results is not entirely to mine.

There is a problem with the usage of getch() relating to behaviour if "non-ASCII keys" - i.e. extended, or extended key combinations - are pressed. If this occurs, getch() first returns 0x0 and then the next call returns a key scan code identifying the physical key or key combination. The problem is that scan codes may correspond to one of the "normal" keystrokes expected. Thus, on my keyboard, Paul's validate() responds to a ALT-N keypress, for example, by beeping as a result of reading the 0x0 and then displaying a '1' - the scan code in this case - is read. Users sharing my miss-typing skills might well regard such behaviour as beeping confusing.

There's a simple fix as my alternative to validate() - I've called it isbkbin() - demonstrates. Scan code tables can be found in e.g.: Thom, "The PC Programmers Sourcebook", Microsoft Press. ISBN 1-55615-321-X - and yes, it's a valid ISBN code, although that of my 1991, 2nd edition which, like myself, is a little dog-eared and not quite up to date in all respects.

A more serious problem with Paul's validate() is that the user can press the enter or return key before inserting a full 13 characters. I infer that check() presents the input data to the user and allows selection from some "proceed" or "cancel" option? But, suppose the user selected the "proceed" option in instances where the input data from validate() did not constitute a legally formed code-word because insufficient data were entered? Even if my guesses regarding check() are wrong, those uninitialised buffers littering the code suggest potential for "illegal" characters to reach isbntest() and its design simply does not allow for this. On the same theme, and also underscoring the previous observation, the intervening dashes() function has a bug as it fails to fully remove '-' occurrences unless these occur in isolation. Thus, one '-' would be left if a pair occurred in the string passed to it and this would, of course, reach isbntest().

One might contemplate modifying validate() to ensure the data it returns is always a legally formed code-word but, as the "grammar" involved here is not context free, this is not without problems. Suppose, for example, enter or return were disallowed as keystrokes. Suppose also that separators were forbidden to occur in "runs" and that these were also disallowed following entry of the 10th code symbol. What now happens when the user proceeds to enter 10 valid code symbols without separators? The user cannot proceed to enter a full 13 characters as there's nothing legal left to enter and validate() can't return as it now needs a full 13 character entry before it can. We have created a deadlock and a reboot would be required. Of course, edit facilities might be added to validate() to avoid deadlocks, but other additions would be needed too. Following this path would inevitably tend to make validate() an almost, "free standing", platform-specific, separate solution, with data from other sources being dealt with in some other way. Such a divergent approach and the modifications involved suggest much pain for rather dubious gains.

Taking all of the above into account, the obvious way to handle the possibility of keyboard entered data being in error also provides for handling the same problems with data from a plurality of sources. What this entails is some replacement for isbntest() with the added ability of being able to test for legality of input - after separator removal that is - whilst retaining the existing capability to perform a check sum if the data proves legal.

Some detailed aspects of Paul's existing validate() function give concern. It appears the input buffer size sets a upper limit on what the user can be allowed to enter. This size is determined using ISBN which is #defined to 14. Given this, the length argument appears superfluous. It also strikes me that the outer for loop is not only strange but also redundant. Elsewhere, I'm unsure why a role was not found for the standard library function, isdigit(). Possibly this is because the tests performed rather "accentuate the negative" by checking for what things are not. This is fine but a seemingly odd way of doing things given there are fewer valid than invalid keystrokes.

Paul's dashes() function, aside from the bug identified earlier, is limited by its assumption that the '-' character is the separator. Some other points regarding dashes() occur to me. Firstly, the return it obtains from isbntest() - a modulo 11 number - is subject to a "modulo ISBN" operation via:

result=(number%ISBN);     /*why the brackets incidentally?*/

Shurely shum mishtake? ISBN is #defined to 14 so the operation is both redundant and misleading. Secondly - for loop problems again - Paul introduces a superfluous variable index used for array indexing within the for loop that kicks off dashes(). The indexing can, of course, be performed using the loop variable.

The calls to clrscr() appearing in dashes() and in isbntest() are unfortunate. Clearing the screen has no logical association with either function and they don't need be there anyway. Whilst no more logical in absolute terms, within the context of Paul's design, a single call to clrscr() from validate() - immediately preceding the call to dashes() - would make more sense. As things stand, one unnecessary screen clear occurs due to clrscr() appearing in both isbntest() and dashes().

Moving on to dashes(), a more powerful function is possible if the call to isbntest() and also, of course, the bug I identified earlier, are removed. My alternative, str_rem(), strips occurrences of not just one character but any occurrences of an arbitrarily specified set of characters from a string and its usefulness transcends the present application.

And so to isbntest() itself. As mentioned, this has fundamental weaknesses. It not only assumes the string it handles contains exactly 10 "legal" characters, it also doesn't appear to take account of positional constraints on appearances of 'X' or 'x' character(s). Moreover, it is - as Paul appears to recognise - rather obfuscated. Leaving aside the fact that this doesn't address the weaknesses, what's wrong with doing the obvious:-

result= ((start[0]-'0')*10) +(start[1]-'0')*9) + .........+((start[8]-'0')*2);
  if( (start[9] =='X') || (start[9]=='x' )) result+=10;
  else  result+=start[9];
return result%11;

Or we could write

result += ((start[9] =='X') || (start[9] =='x'))? 10 : start[9];

or, if all that typing doesn't appeal:-

for(loop=0;loop<MAX-1;loop++) result+=(10-loop)*(start[loop]);
if( (start[loop] =='X')|| (start[loop]=='x') ) result+=10;
else  result+=start[loop];
return result%11;

My version of isbntest() - I've called it isbnchk() - uses something of the latter approach but adds checks to ensure it is dealing with "legal" input - returning negative error values if it finds it isn't.

I've referred to my alternative solution in piecemeal fashion. An overall description might be useful. I rather liked the interactive user interface Paul provided via getch() for keyboard input and this has been retained. Somewhat tighter con-straints on keyboard entries have been applied whilst, at the same time, some freedom is allowed with separators. However, isbnkbin() does not directly call str_rem() and neither does the latter call isbnchk() - my replacement for isbntest(). These latter functions are free-standing and neither assume the data they handle is "legal" - hence these can handle input from any source. The constraints on data entry set by isbnkbin() play the role of providing feedback and guidance to the user whilst the role of isbnkbin() itself is relegated to one of providing one of many possible sources of input data strings.

I'll conclude by presenting the detailed code. I've thrown in a simple-minded main() function to test/exercise everything and there are a few comments which may help to use the functions in other contexts than the one my main() function creates. This was all put together in rather a hurry so everything's in one file and I've dispensed with a few things - e.g. sentinels - as a result. No matter, it can stand as is and perhaps leave exposed bad habits I've let creep in over the years. If anyone would care to point out the errors of my ways, I would appreciate criticism of my efforts too. Here they are:-

#define ISBN  14
#define MAX   10
#define DONE  ISBN
#define SEPMAX 3
#define CRET  13
int isbnkbin(char *);    int  isbnchk(const char *);  void str_rem(char *, const char *);
enum badchk{illen=-3, illchar=-2, null_arg=-1};
/*************************************************************************
isbnkbin()   - roughly equivalent to validate() of original program.
Contrains data entry:-   ' ' or '-' allowed as separator.
      Separator disallowed as first entry or following entry of 10 code symbols. 
      Contiguous runs of separators disallowed.
      Valid code symbols '0'-'9' or 'X'/'x'. 'X'/'x' only accepted as 10th symbol.
      Separators can be ommitted.
User terminates input via ENTER/RETURN key and may do this at any point. (see also comments in code).
Returns: total number of code digits accepted.
*********************************************************************/
int  isbnkbin(char *input) {
int  keystroke, digits=0, sep_cnt=0,  previous;    int  inchar, retval=0,  state=0;
   while(state<DONE){ 
/*actually, state can't reach DONE. Could change  to  force an immediate return once full 10  code symbols with 3 separators entered. Seemed a little confusing to use and consistency of always requiring enter or return to terminate user input seems  preferable. Change #define for DONE to ISBN-1 if you disagree.  */
  if( (keystroke=getch()) == 0 ){   getch();       /*Absorb scan code*/  putchar('\a');  continue;  }
  if(keystroke==CRET)  break;
  inchar=keystroke; /*Save actual keystroke*/
  if(keystroke=='x')/*so we can do some mapping*/ keystroke='X'; /*to save a little on testing*/
  if(keystroke==' ')  keystroke='-';
  if(state!=0){   /*get previous accepted char -if any*/
     previous=input[state-1];
     if(previous==' ') /*more mapping*/ previous='-';
  }
  if( isdigit(keystroke) && (digits < MAX ) ) digits++; 
  /*X is acceptable if digits==MAX-1*/
  else if( (keystroke=='X') && (digits==MAX-1) )  digits++;
  /* '-' ok provided we haven't reached full quota AND its not the first entry AND  previous entry not '-' AND we havent already  got 10 code symbols entered and accepted.                    */
  else if((sep_cnt < SEPMAX)&& keystroke=='-') && (state!=0)&&(previous !='-')&& (digits!=MAX))
     sep_cnt++;
  else{ /* unnaceptable keystrokes*/ putchar('\a'); /*beep*/   continue;      /*and get out of way*/   }
  input[state++]=inchar;   retval=digits;  putchar(inchar);
   }/*end of while loop*/
putchar('\n'); input[state]='\0';
return retval;
}
/********************************************************************
This function removes occurrences of characters which are in the 2nd argument string from the first string. A NULL  pointer for either/both arguments is valid though not very useful. If the caller passes such thats the caller's problem!
*********************************************************************/
void str_rem(char *str,const char *strip) {
  char *tok1,*cp;
/*Can't strip anything from nothing and can't strip nothing from anything.*/
   if((str!=NULL)&&(strip!=NULL)){
  cp=str;    tok1=strtok(cp,strip);   if(tok1!=str)  /*leading chars to remove*/ strcpy(str,tok1);
  while(tok1!=NULL){  tok1=strtok(NULL,strip);  strcat(cp,tok1); }
 }
}
/*********************************************************************
isbnchk().
Returns:  a ISBN check sum if passed a string containing MAX (ie 10) legal code characters.
The value returned will be 0 if the string contains a valid ISBN code or some other value in the range 1-10 if the code fails the check sum but is otherwise legally formed. Other returned values are:-
nullarg  (-1)  argument is a NULL pointer, illchar  (-2)  argument string of acceptable length but contains illegal chars, 'X'/'x' is regarded as illegal if it not the MAXth (ie 10th) char.
illen    (-3)  argument string not of length MAX (ie 10) chars.
Note: isbnchk() has no notion of separators or other characters the caller may regard as "innocuous"-eg leading or trailing space. The caller must use the str_rem() function provided -or other means - to ensure such characters are not present in the string passed to isbnchk() for it to be able to compute a checksum.  Failure to do so will cause a return  of -2 or -3 in circumstances where the caller regards the string as containing a "legally" formed ISBN code-word.
Thinks... I'm not too convinced using MAX rather than 10 aids clarity.
*************************************************************************/
int isbnchk(const char *ptr) {
  int len, loop, retval, sum=0;
  if(ptr==NULL) retval=null_arg;
  else if ( (len=strlen(ptr)) !=MAX ) retval=illen;
  else {
    for(loop=0;loop< len;loop++){
      if( !isdigit(ptr[loop])){  /*accept only '0'-'9' */
        if( loop < (MAX-1) ){   /*unless last position*/ retval=illchar;  break; }  
                     /*in which case we can accept X or x*/
        else if((ptr[loop]!='X') && ( ptr[loop] !='x') ){retval=illchar;  break; }
      }
/*At this point, characters we are dealing with are legal. The contribution each makes to the check sum is the product of decimal value it represents in the code and its "place" value.*/
      if((ptr[loop]=='x')||(ptr[loop]=='X'))  sum += 10*(10-loop);
      else  sum+=( ptr[loop]-'0' )*(10-loop);
     }/*end of loop*/
    if(loop==len) /*went right through so all input encountered was legal*/ retval= sum%11;
  }
  return retval;
}
/***** A simple main() function to exercise/test things ********/
int main() {
  int val;   const char *seperats=" -";    char input[ISBN], raw[ISBN];
  puts("Enter ISBN code to check:");   val=isbnkbin(input);
  printf("iskbin() returned %d as digit length\n",val);
  strcpy(raw,input);  /*save input as str_rem() may alter it*/
  str_rem(input,seperats);    val=isbnchk(input);
  if(val==0) printf("entered code: %s is valid\n",raw);
  if(val>0)   printf("entered code: %s failed check sum\n",raw);
  if((val==illen)||(val==illchar)) printf("entered code: %s is illegal\n",raw);
  getchar(); return val;
}

Notes:

More fields may be available via dynamicdata ..

Title: ISBNs Revisited

From the Harpist

An awk solution from Peter S Tillier <petert@eq1152.demon.co.uk>

From James Holland

From: "Catriona O'Connell" <catriona38@hotmail.com>

From: "R. Dixon" <p.m.dixon@btinternet.com>

Mick Farmer <mick@dcs.bbk.ac.uk>

A ramble around ISBN numbers by David Beard

An awk solution from Peter S Tillier `<petert@eq1152.demon.co.uk>`

From: "Catriona O'Connell" `<catriona38@hotmail.com>`

From: "R. Dixon" `<p.m.dixon@btinternet.com>`

Mick Farmer `<mick@dcs.bbk.ac.uk>`