Journal Articles

CVu Journal Vol 12, #4 - Jul 2000 + Letters to the Editor
Browse in : All > Journals > CVu > 124 (22)
All > Journal Columns > LettersEditor (132)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: The Wall

Author: Administrator

Date: 08 July 2000 13:15:38 +01:00 or Sat, 08 July 2000 13:15:38 +01:00

Summary: 

Body: 

Errata

Dear Colin,

Thank you for copying me into your e-mail to Francis. I've just looked at the code and realised that there was an error in the algorithm which will be provoked by the EBCDIC character set, but is also susceptible to other character selections.

The line

map[*these >> 5] |= (1 << (*these & 7));

should be replaced with

map[*these >> 3] |= (1 << (*these & 7));

and

map[*src >> 5] & (1 << (*src & 7)) src++;

should be replaced with

map[*src >> 3] & (1 << (*src & 7)) src++;

I believe this should fix the problems you have seen. My thanks for your feedback.

Catriona O'Connell

Dear Paul,

Oh dear. I am now very confused. I checked the published code and it is correct. I was looking at a previous version of the code on another server (dark mutterings about version control inserted here...). As far as I can tell the algorithm maps each value of a char to a unique bit in the array of 32 bytes. Unfortunately I do not have a C compiler on the mainframes I have available to me, so I cannot reproduce your results. It would be useful if you could produce some diagnostics - for example a dump of which bits are set by a selection of characters.

Catriona O'Connell

And a response

Dear Catriona,

Just sent you an email and shortly thereafter saw you sent me another one earlier today before I replied. That you don't have access to a compiler on a machine which uses EBCDIC shouldn't really matter: a text file is attached (its contents replicated below your email) which lists the codes in EBCDIC for characters common to EBCDIC and (strict seven bit) ASCII. There are one hundred and twenty nine lines (excluding extra blank lines at the end): the first contains a C comment and the rest just a code and a newline character. They are ordered in ASCII numbering, so the last code is the last character in ASCII expressed in its EBCDIC code etc. I am not going to be very reachable by email for a few weeks.

Colin Paul Gloster

For obvious reasons I am not going to publish 128 single character lines, but the following (top of next column) is Paul's list of EBCDIC codes (in hex) equivalent to the ASCII 7-bit coding. FG.

/*ASCII from zero to 127 in EBCDIC*/

00 01 02 03 37 2D 2E 2F 16 05 25 0B 0C 0D 0E 0F 
10 11 12 13 3C 3D 32 26 18 19 3F 27 1C 1D 1E 1F 
40 4F 7F 7B 5B 6C 50 7D 4D 5D 5C 4E 6B 60 4B 61 
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 7A 5E 4C 7E 6E 6F 
7C C1 C2 C3 C4 C5 C6 C7 C8 C9 D1 D2 D3 D4 D5 D6 
D7 D8 D9 E2 E3 E4 E5 E6 E7 E8 E9 4A E0 5A 5F 6D 
79 81 82 83 84 85 86 87 88 89 91 92 93 94 95 96 
97 98 99 A2 A3 A4 A5 A6 A7 A8 A9 C0 6A D0 A1 07

Days Later

Dear Paul and Francis,

I have a suspicion that the effect we are seeing with eliminate() is a result of a char to int conversion. When a char is converted to an int, the standard does not specify if type char is signed or unsigned, so it may be possible to create what looks like a negative integer as a bit pattern.

However according to K+R p44, " the standard requires that any character in the machines standard printing character set will never be negative", but the goes on to say "But arbitrary bit patterns stored in character variables may appear to be negative on some machines, yet positive on others".

A right-shift of an unsigned quantity always fills vacated bits with zero, whereas right-shifting a signed quantity will fill with sign bits. Given that all the alphameric characters in EBCDIC have the highest value bit set, this may be being interpreted as a sign bit. If this is the case, I would expect to see storage beyond the map[32] array being accessed - giving rise to the spurious effects you reported. The offset is quite small, so it probably falls either in the program storage or within a 4K page allocated by the program (hence no Abend 0C4).

The following code changes should resolve the problem:

map[*these >> 3] |= (1 << (*these & 7));

becomes

map[(*these >> 3) & ~(~0 << 5)] |= 
                    (1 << (*these & 7));

and

if (map[*src >> 3 ] & (1 << (*src & 7)))

becomes

if (map[(*src >> 3) & ~(~0 << 5) ] & 
                     (1 << (*src & 7)))

Could you let me know if this resolves the problem?

Catriona O'Connell

And that is as far as the story goes. On a system where the normal character set includes 8-bit codes with the high bit set, char must be either unsigned or more than 8bits. However, there will be a problem if you write code and compile it for an ASCII system and then read an EBCDIC coded file. FG.

Notes: 

More fields may be available via dynamicdata ..