Title: Tools and Un-smart Pointers

Author:

Date: 03 June 1999 13:15:31 +01:00 or Thu, 03 June 1999 13:15:31 +01:00

Summary:

Body:

There are numerous tools that generate C code, ranging from lex/flex and yacc/Bison (for generating lexical analysers and parsers) to KL/1 (a parallel logic programming language developed on the Japanese Fifth Generation project - www.icot.or.jp/). There are often limits to what you can practically do when writing your own C/C++ code around these tools.

KL/1, for example, was designed as a complete language in itself rather than a tool, and as such it assumes that your program will be written in KL/1 with only a tiny amount of inline C when you really want it (like inline assembler in C). In some applications I would prefer to use C or C++ with a minimal amount of KL/1 where this is useful. The KL/1 paradigm is parallel components exchanging data through streams (like Prolog lists, and the paradigm can be useful even on a sequential processor), and it is quite conceivable that you'd want to use this only for the top level of an algorithm and not for every detail.

The problem is that KL/1, like many tools, only really allows data to be exchanged with C in the form of integers (unless you rely on undocumented implementation-specific features). It would be nice to handle arbitrary objects as well, even if only the C code can interpret them. (For brevity I use the term "object" to refer to both classes and structs, whether or not they fit a pure OOP definition of object, and I'm thinking of C programmers as well as C++ ones.)

The natural way to achieve this is to create the objects on the heap and store pointers to them in the integers, and many do this in yacc by casting a pointer to an integer. However, this makes the assumption that sizeof(int) >= sizeof(void*) (this assumption is especially dangerous if the integers get converted to a different form by the tool, as in KL/1), and it also becomes difficult to free objects that are no longer needed. Especially if the tool can backtrack, you have to know when it really has finished with something, but even then you need to get hold of the pointer if you want to free an object.

In C Vu 8.3 I wrote about a multi-user text adventure I had hacked out at a college for the blind. Every object was indirectly referenced from an integer by a small group of functions that used it as an offset into an array, and this was used for marshalling data by the virtual memory (I was using DOS), the passing objects around between machines, and of course the saving to disk. (Java calls marshalling "serializing" but some don't like that because it's the same as an unrelated OS term.) Francis suggested I read a book about smart pointers, which were of course more elegant than what I was doing because they were all wrapped up in classes that could do interesting things. Incidentally I lost the code when I left the college in an accident of the "I think I've got this at home already" type.

For tools like KL/1 (I hope I don't offend anyone by treating it as a tool), such "un-smart" indirection from array offsets could be useful (and possibly encapsulated if desired). Integers do not have to store actual memory addresses, and the addresses of all objects are available for freeing (although you still have to make sure you don't free things too soon). If appropriate, the marshalling for various reasons is still possible, although it's not often you would use it in an application like this.

Perhaps more useful for some tools is the ability to make objects copy-on-write. Simply copy the pointer to a different element in the master array and somehow mark both as copy-on-write, then whenever you want to change the object, call something that checks this and copies the object if necessary. KL/1 won't need this often because variables are fixed once they're defined, but general breadth-first tree searching might benefit. This can be useful in some applications, e.g. a top-down parser that lists all possible interpretations of a phrase in an ambiguous grammar. Breadth-first algorithms (i.e. ones that check several branches of possibilities at once) have the advantage that solutions are found sooner if they are not far from the start (a recursive backtracking algorithm may waste a lot of time exploring the wrong branch), as well as possibilities of merging branches if they are common after a while (very conceivable in ambiguous parsing) and cancelling branches that for some reason can be decided to be futile by other branches. Breadth-first algorithms don't need nearly so much stack (because they don't need to backtrack), but they might need numerous copies of the data structures that are nearly identical, and here copy-on-write can be useful.

Aside: One way to implement a breadth-first search is using BCPL-style co-routines - branch new ones when necessary. These are like multithreading except that all context switching is under the programmer's control, and although they were not carried over into C (there is a C++ co-routine library somewhere) one can still think in terms of them. It can get messy though. KL/1 is also interesting because it runs by taking a clause from a heap of them, evaluating it to give further clauses and throwing them back into the heap - the only real control you have on the order of evaluation is that it won't execute clauses whose "guard" conditions test variables that aren't yet available. And you can of course use the classic method of keeping a queue of nodes to look at (if it doesn't get too big) and putting their sub-nodes on the end of the queue - to do this you need to write your algorithm in such a way that it can look at only one node each time it's called.

The main disadvantage of pointers via integers is that it throws type checking to the wind, and imagine what would happen if you accidentally tried to do pointer arithmetic. (The speed overhead in the extra indirection is slight.) Run-time checks are a poor substitute for compile-time checks. If you are dealing with objects of more than one type, you would either have to be very careful or use yet another tool to help you write the program.

Here is some C code that implements the most basic functionality of the method, slightly obfuscated for space:

#include <assert.h>
#include <stdlib.h>
enum { initSize=50 };
void** master; int mSize;
int mPtr=1; /* not 0 so number 0 can mean null */
void zero(int x, int y) { for(;x<y;x++) master[x]=NULL; }
void grow(int newSize) { 
    assert((master=(void**)realloc(master,sizeof(void*)*newSize))!=0);
     zero(mSize,newSize); mSize=newSize; 
}
void init() {
     assert((master=(void**)malloc(sizeof(void*)*(mSize=initSize)))!=0);
     zero(0,mSize); 
}
/* inline */ void readObj(int n) { return(assert(master[n]), master[n]); }
/* (readObj could be a macro but note that n would be evaluated twice) */
int newObj(void* p) { if(mPtr==mSize) grow(mSize<<1);  master[mPtr++]=p; return(mPtr-1); }
void freeObj(int n) { free(readObj(n));  master[n]=NULL; }

I have of course been assuming that only a single processor is to be used (or at least the memory is shared); KL/1 supports distributed memory parallelism and if this is used then provision should be made or it. If you must do this, the safest method (although slow) is probably to marshal objects and pass them through KL/1's own mechanism; that adventure game's code would be of little use here as although it was distributed it didn't exploit concurrency - there was a token passing system for changing the objects. In a real situation you would also want to re-use old array elements and perhaps add more asserts to the code (the most obvious being a bounds check).

Notes:

More fields may be available via dynamicdata ..