Journal Articles

CVu Journal Vol 14, #4 - Aug 2002 + Francis' Scribbles from CVu journal
Browse in : All > Journals > CVu > 144 (17)
All > Journal Columns > Francis' Scribbles (29)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Francis' Scribbles

Author: Administrator

Date: 07 August 2002 13:15:53 +01:00 or Wed, 07 August 2002 13:15:53 +01:00

Summary: 

Body: 

General Issues

When I write this column I try to provoke readers into thinking. I do not mind if you completely disagree with me, I do mind if you do not think about what you read. From my perspective the great failure of modern democracy is that the masses abdicate responsibility by handing it over to a small number of individuals. Too often we seem to ignore that individual politicians have their private agendas that may have little if anything to do with the well-being of the wider communities they claim to represent. Those of us with relevant technical knowledge have a duty to criticize (constructively) and in a way that allows others to understand the issues. Readers of C Vu are way above average in their understanding of information technology. I will come back to this later in this column, but for now think about it.

Personal Views

I was playing Bridge a couple of evenings ago when an long term acquaintance who makes a living writing educational software arrived at my table. As we finished the round slightly quicker than most of the room I enquired how things were going. He said fine. In response to my query he added that he was still programming in C++. (Now to give you a little background, when he first converted from Visual Basic to C++ I had asked him why he had chosen to use Visual C++. His response was that his friends/colleagues had told him that was the best choice and everyone used it.)

This time I asked if he wasn't tempted to look at C#. His response was that he did not want to tie himself down to an MS proprietary language. MS were a company driven by their marketing department and what they had done to Java was a disgrace. I asked what books on software development he had read recently. He looked at me as if that was a very rude question and opined that real software developers had enough to do without reading books. There seemed no point in asking him about attending conferences or consulting others via relevant newsgroups.

There is a consistency about his views in that he does not find it necessary to consult with others who may have some genuine input. He works alone and acquires programming skills by the seat of his pants. He is far from being the only one to work like this. His programming is considerably above average and the educational software he produces is much better than average. Neither of those statements says very much other than to highlight the abysmal state that general software development is in and most particularly the pathetic state of too much educational software.

We live in a World where new technology floods us. What would you think of a five year old graphics card, motherboard, hard drive? You might find some use for them but not for much longer. So what do you think about someone whose development skills were last upgraded five years ago? Just as the cutting edge hardware circa 1997 is badly dated today, so is your 1997 skill set. The only reason that so many get away with it is that many others are still using their 1992 skill set. The fact that many readers of this column may think I am exaggerating is indicative of the problem.

In my exit interview from school I was asked if I thought I had learned how to learn. When I said I thought I had, my headmaster said 'Good, we have been successful.' He was right then, and his view of the principle objective of education is even truer today. If you did not learn something new yesterday, the day was wasted. If you did not upgrade your programming skills last year then you have stopped being a programmer and become, if only temporarily, a software hack (in the journalistic sense).

A Book

Some of you know that my first intellectual love was mathematics (indeed I was once caned for reading a book on mathematics when I should have been doing my French prep - homework if you are unfamiliar with the boarding school term). I was in Blackwells the other day browsing through the mathematics section - noting that it had rebounded from its low point five years ago. My eye fell on a book titled 'The Maths Gene'. A quick look convinced me that this was a book I wanted to read, so I bought a copy.

I haven't finished it yet because keeping up with my review reading has got in the way, but even so I know that it is a book that should be widely read. It should be read by students of mathematics so that they understand why others find maths difficult, it should be read by parents so they understand why maths should not be hard and it should be read by political commentators so that they understand why simple arithmetic is easier for a Chinese child than it is for an English or German one.

If you find mathematics (no, not arithmetic, but the science of patterns) easy, read this book to understand why, and why others find it hard. If you find it hard, read this book to discover why you are not alone.

However what struck me as I was reading this morning was how similar the author's description of doing mathematics was to the way I program. First immerse yourself in the problem and the context (he describes it as building a house) and then you will find the solution somehow surfaces. I program the same way. I think all round the problem explore different approaches in my mind and then it seems to gel and I get down to cutting code. I wonder how many of you do something similar.

An Issue of Privacy

Time to get back to politics.

I wonder how you would respond to a law that required the Royal Mail (US Post or whatever) to keep a copy of all your letters, postcards and the contents of all your parcels. How would you respond to a government that required your telco to keep a recording of all your telephone conversations? What about keeping a record of all your journeys?

The reason that this has never been done has little to do with your government being more considerate of your privacy and a great deal to do with feasibility. Suppose that back in 1951 the then government had required recording of all phone conversations and copying of all correspondence. Everyone would have fallen about laughing because it would just have been impossible and everyone would have ignored it as pure political fantasy. Furthermore even if successful, it would be useless because it would have been impossible to extract any information from such a heap of data.

Of course these days it is still impractical to copy every letter but it is far from so when it comes to electronic communication.

The second great defense of our liberty was that even given all the data it used to be practically impossible to mine it for the information you wanted. Stealing the tax records for 1960 would not get the thief very far because the sheer bulk of information would have made it useless. Stealing the tax records for 2001 is both easier and more useful. Once you have it electronically you can start mining it for profitable data.

Look back at your email. Can you? Do you have an archive of all your email? If not how are you going to demonstrate that an apparently incriminating message wasn't once the context of the other messages is taken into account? Even worse, how about the implications of steganography (hiding information as in the low bits of a graphic). Of course no one is silly enough to hide critical information without also encrypting it. Now try this scenario. A search of your email records leads the authorities to think you were involved in some crime they are trying to solve. They now look at all those pictures of your grandchildren you received electronically. Throw a few terra-computers at them and the authorities should be able to find a few hidden messages whether they are there or not.

Face recognition is not far away and then all those CCTV systems become so much more useful.

Identity cards really ought to have some chip technology buried in them so that we no longer have to worry about carrying grubby cash around with us, just let the bus, toll booth etc. recognize you. Swiping the card is such a bore so let the systems do direct interrogation untouched by human hand. Once we can do that, how long before the police point out how useful travel information would be in tracking criminals.

But worse, like tax records, once such data is stored electronically it becomes possible to steal it and search it electronically. Even for the completely law abiding citizen there are many things that they would not want known by their boss, parents etc. And that leaves them open to blackmail.

Am I being paranoid? Probably, but that does not mean they aren't out to get us. The greatest protection of our privacy is the anonymity of the masses. We are close to being able to strip that away. So what do we do about it? I wish I had an answer. Just remember that the proposal for road pricing that surfaced a few months ago (actually I think the idea is a good one) relies on being able to track every vehicle in the country at all times. The problem is that we are not far from being able to do that. That scares me.

Separate Compilation of Templates

Many of you know that there has been a long standing promise that templates will be separately compilable. Some of you may even know that a keyword, export, was added to C++ to provide extra support for such, or at least so it was claimed.

However I have recently come to realize that the very way that templates are specified in the C++ Standard means that separate compilation in the sense most of us think of it is impossible for templates.

Non-template classes and functions can be implemented without the compiler having to have any knowledge of the context in which they will be used. That is exactly the purpose of placing implementation code is separate files and providing information necessary for use in header files. We decouple application code from implementation code. One of the weaknesses of using inline functions is that we overrule that decoupling. (Actually as optimizers get better, the need for explicit inlining steadily goes away. There is a mode in the latest release of VC++ that delays code generation till link time and so supports even better optimization if you are willing to pay the price of longer link times)

Now as we currently have templates specified we can NOT decouple the implementation from the point of use. All that export does is to delay the instantiation to some more convenient stage such as a pre-linker. This comes at a heavy cost, both to the user and to the compiler implementor because (potentially a great deal of) context information must be saved at each point of instantiation. export removes the need to recompile the user files when a template implementation is changed but that is not sufficient for separate compilation.

Now to my question: do you want genuine separate compilation of templates to be added? By this I mean a mechanism whereby the implementation need know nothing about the context of use. I need to know your feelings on this issue to decide if it is worth putting in the effort to get a suitable change adopted in the next C++ Standard.

Problem 3

As I know that many of you cannot put your hands on the last issue quite as easily as I might like, I have to re-publish the last little problem to give you a chance to understand my comments. So here is what I gave you last time:

My dictionary defines percentile as "the value below which fall a specified percentage of a large number of statistical units (e.g. scores in an examination)". The quartiles, lower and upper, are the 25th and 75th percentile. The median is the 50th percentile. For the latter statistics has special rules for small samples where it is often necessary to select a representative value that is not a value from the sample.

Now look at the following code for computing a percentile in the form of a supposedly STL conformant algorithm.

template<class RandomIterator, 
       class ValueType>
ValueType percentile(
  RandomIterator start,
  RandomIterator end,
  ValueType percent)
{
  typedef typename 
    std::iterator_traits<RandomIterator>::
        difference_type DifferenceType;
  DifferenceType n=end-start;
  ValueType rank=(n+1)*percent/100;
  DifferenceType
    intRank=static_cast<DifferenceType>(rank);
  ValueType fraction=rank-intRank;
  RandomIterator pos=start+intRank;
  ValueType result;
  result = *(pos-1) * (1.0-fraction) 
              +*pos * fraction;
 return result;
} 

What is the fundamental flaw? Let me help you:

int main(){
  int array[10] = {0,1,2,3,4,5,6,7,8,9};
  cout << percentile(array, array+10, 25);
  cout << percentile(array, array+10, 25.0);
  return 0;
}

For those not familiar with the STL the above mumbo jumbo is a fairly typical piece of template code. It starts by declaring a function template with two template parameters. The names are a clue to correct use. The first template type parameter is going to be used as a random iterator and this will be deduced from the first two arguments of a function call. The second type parameter is going to be for a value and will be deduced from the third argument.

The first incantation of the function body says: 'take the first template parameter type and look up in the std namespace to find a specialisation of iterator_traits for that type. There you will find a typedef that gives meaning to difference_type. Rename that type DifferenceType. The purpose of that is simply to make the subsequent code more readable (but it helps to know the STL naming idioms)

So let us look at the first function call:

percentile(array, array+10, 25)

The compiler deduces that RandomIterator is a synonym for int* because that is type of array, and confirms that this is correct because that is also the type of array+10. It looks up the difference_type for int* and finds that it is ptrdiff_t. It then looks that up and will probably find that it is int. So our DifferenceType is int. Next it uses the third parameter to deduce that ValueType is a synonym for int (the type of 25). Now let me rewrite that instance of a template function in a non-template form.

int percentile(int* start, int* end,
                       int percent) {
  int n=end-start;
  int rank=(n+1)*percent/100;
  int intRank=static_cast<int>(rank);
  int fraction=rank-intRank;
  int* pos=start+intRank;
  int result;
  result = *(pos-1) * (1.0-fraction) 
              +*pos * fraction;
 return result;
}

Now do you see the first nasty problem lurking inside that smart template code? Let me be honest, I completely missed it until I manually instantiated the template; writing robust template code is hard work. Have you seen it yet? What on Earth is that magic 1.0 doing in code that has been carefully typed almost everywhere else? And when you notice that, what about that magic 100. Actually the code works but more by chance than by design. Rewrite (n+1)*percent/100 as percent/100*(n+1) and it fails.

Now look at the second instantiation of the template. Short-circuiting to save space and we get the following code:

double percentile(int* start, int* end,
                    double percent) {
  int n=end-start;
  double rank=(n+1)*percent/100;
  int intRank=static_cast<int>(rank);
  double fraction=rank-intRank;
  int* pos=start+intRank;
  double result;
  result = *(pos-1) * (1.0-fraction) 
              +*pos * fraction;
 return result;
}

Now let us try to understand what was going on in the mind of the writer (I have an advantage because I discussed it with him). He wants to select the type of the return value even though we might expect that to be the type of an array element. This is intended to allow for percentiles coming in the interval between specific elements (e.g. the 50th percentile - median - of a list of ten objects will be halfway between the fifth and sixth). However the way he has passed this type is by using a parameter whose type is used internally. The consequence is intermediate values are wrongly calculated in the case where the ValueType is int. Check it out and you will find that fraction is always zero in this case which means that no interpolation is done. That is incorrect, even if you simply want an integer answer.

If you hijack a genuine parameter to manage the return type your code will not be reliable. And if you are still unconvinced, try reworking the code for an array of float or double.

Problem 4

When doing code inspections you need to cultivate a suspicious mind. In that light consider the following simple function and comment on what you would check and what minimal changes you would require.

void foo(){
  mytype* mt_ptr = new mytype;
  bar(mt_ptr);
  delete mt_ptr;
}

Notes: 

More fields may be available via dynamicdata ..