    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: A Tale of Old Java</title>
        <link>https://members.accu.org/index.php/journals/996</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">CVu Journal Vol 12, #3 - May 2000 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c126/">123</a>
                    (22)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c65/">Programming</a>
                    (877)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c126-65/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c126+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;A Tale of Old Java</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 03 May 2000 13:15:36 +01:00 or Wed, 03 May 2000 13:15:36 +01:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e18" id="d0e18"></a></h2>
</div>
<p>I'd venture to suggest that for all but the very youngest, C
and/or C++ has featured in the past of many Java programmers. Even
if you are primarily a C/C++ programmer, you may at some point come
in to contact with Java. If so, you may be in for one or two nasty
surprises, one of which I hope you'll be able to avoid after
reading about the following real-life example.</p>
<p>This is a snippet of Java I was recently trying to fix (apart
from that array declaration in the first line, it could be C++,
couldn't it?):</p>
<pre class="programlisting">
  byte[] cbuf = new byte[destBuf.length];
  int k, i;
  for(i = k = 0; i &lt; destIndex; i++, k++) {
    cbuf[k] = destBuf[i];
// skip the run-length codes 
// between 0x80 and 0xC0
    if(cbuf[k] &gt;= 0x80 &amp;&amp; cbuf[k] &lt; 0xC0) {
      cbuf[++k] = destBuf[++i];
      }
  ...
  ... etc.
</pre>
<p>There is a problem here, which is at least partly due to
confusion over the difference between value and representation. See
if you can spot what the problem is. While you ponder, I'll tell
you a little more about the application it was taken from.</p>
<p>Like many of my colleagues, I have a Palm Pilot. This is a handy
little device; one of the many useful things it can do is store and
display documents. If you enjoy reading a little fiction just
before you go to sleep, and your latest bedtime material is a
download from Project Gutenberg [<a href=
"#Gutenberg">Gutenberg</a>], you're far more likely to persuade
your spouse to share the bed with a PDA than a laptop! Now unused
RAM isn't necessarily abundant on Pilots, so there is a de facto
standard document file format which includes compression [<a href=
"#DOC">DOC</a>]. A freeware Java tool [<a href="#Brisk">Brisk</a>]
is available to prepare such documents from plain text files - the
problematic snippet above comes from that tool's compression
routine, hence the reference in the comment to run-length
codes.</p>
<p>Spotted it yet? If not, here's a little digression which may
point you in the right direction. What did I mean earlier by the
difference between value and representation? It's very simple, and
it's at the heart of our programming activities, so naturally we
tend to forget about it! When we see something like <tt class=
"literal">0x80</tt>, we happily interpret <tt class=
"literal">0x80</tt> as meaning 80<sub>16</sub> , which is
128<sub>10</sub>. We also tend to interpret <tt class=
"literal">0x80</tt> as the bit pattern 10000000, and it is then all
too easy to assume that this represents 10000000<sub>2</sub> - they
do, after all, look remarkably similar! Well, a bit pattern means
precisely what I want it to: nothing more, and nothing less. It all
depends upon the precise definition of the relationship between bit
pattern and the value it represents, but much of the time it is
convenient to blur the distinction between value and
representation. Of course, there are occasions when it is crucial
that we should avoid this blurring. Later on, I will try and make
it clear when I want <tt class="literal">0x80</tt> to mean the
numeric value 80<sub>16</sub> , and when I want it to mean the bit
pattern 10000000.</p>
<p>Now it would also be nice if the languages we use helped us to
make this distinction when we need to. Well I don't know about you,
but to me, &quot;byte&quot; means a concrete unit of storage, whereas &quot;short&quot;
and &quot;long&quot; represent more abstract concepts (after all, the first
is a noun and the latter two are adjectives we use as nouns). When
I see the types <tt class="type">short</tt> and <tt class=
"type">long</tt> in Java, I think &quot;short integer&quot; and &quot;long
integer&quot;. When I see <tt class="type">byte</tt>, I don't think
&quot;integer stored in a byte&quot;, rather I tend towards thinking &quot;8-bit
unit of storage&quot;, and then I slide down the slippery slope to
thinking bit pattern. I wonder if anyone else has this &quot;problem&quot;?
Why wasn't the type called a <tt class="type">tiny</tt>? At least
we have an unambiguous nomenclature in the field of communications,
where we use &quot;octet&quot; to refer to an 8-bit segment of a stream. This
octet may at some stage end up in a byte of storage, but what it
represents is neither specified nor implied by its name alone.</p>
<p>Enough digression, and enough clues. Time to put you out of your
misery if you're not there yet: Java only has signed numbers (it
only interprets binary representations of numeric values as signed
numbers), so a variable of type byte can only take values from -128
to +127. This means that <tt class="literal">cbuf[k] &gt;=
0x80</tt> is never true! (And coincidentally, <tt class=
"literal">cbuf[k] &lt; 0xC0</tt> is always true, although it will
never be evaluated.)</p>
<p>Let us look at this more closely, because I might have convinced
you that it isn't immediately obvious what <tt class=
"literal">0x80</tt> means in this context. You might expect
<tt class="literal">0x80</tt> to be implicitly cast to a byte for
the purposes of the comparison, and the effect of that casting
might be to do precisely nothing (other than take the least
significant byte...) You might then expect the comparison to be the
logical equivalent of <tt class="literal">bytevalue &gt;= -128</tt>
which you might reasonably expect to be always true.</p>
<p>OK - I'll stop trying to confuse you and tell you what is
actually happening. There are two aspects of Java which we need to
understand, the first of which is the nature of literals. In Java
source code, <tt class="literal">0x80</tt> is (for the purposes of
the Java Language Specification [<a href="#Gosling-">Gosling-</a>])
a <tt class="type">HexIntegerLiteral</tt>, which is one of the
types of <tt class="type">IntegerLiteral</tt>. An <tt class=
"type">IntegerLiteral</tt> is of type <tt class="type">int</tt>,
unless it has the suffix <tt class="literal">l</tt> or <tt class=
"literal">L</tt>, in which case it is of type <tt class=
"type">long</tt>. So <tt class="literal">0x80</tt> is effectively
of type <tt class="type">int</tt>, with value 128. (The <tt class=
"type">HexIntegerLiteral</tt> with value -128 is <tt class=
"literal">0xFFFFFF80</tt>.)</p>
<p>The second aspect of Java which is relevant here is the
application of binary numeric promotion. The full details are too
lengthy to reproduce in their entirety [<a href=
"#Promotion">Promotion</a>]; all we need to know is that in this
case, both operands of the &gt;= comparison operator are promoted
to type <tt class="type">int</tt> by &quot;widening conversion.&quot;
Widening conversion preserves value, in this case by sign-extension
of the twos-complement representation [<a href=
"#Widening">Widening</a>]. This means that when the application
runs, the comparison operator &quot;sees&quot; an integer value in the range
-128 to 127 on the left hand side, and an integer value of 128 on
the right. It's now obvious why <tt class="literal">cbuf[k] &gt;=
0x80</tt> is never true.</p>
<p>(Suppose that this was C++, and we'd used the type <tt class=
"type">unsigned char</tt> instead of <tt class="type">byte</tt>,
just to make life a little easier. Are you sure you know how the
analogous rules would apply to the evaluation of this expression?
What about if we'd used <tt class="type">signed char</tt>, or even
just <tt class="type">char</tt>?)</p>
<p>So, how do we fix the code so that it does what we want it to?
Well, we want to trigger the execution of the body of the if block
if the representation of <tt class="varname">cbuf[k]</tt> is in the
range <tt class="literal">0x80</tt> to <tt class=
"literal">0xC0</tt>, which means that the byte value is in the
range -128 to -64. (After what I've said about distinguishing value
and representation, you might take offence at my referring to a
representation as having a range! I suppose I ought to characterise
the representation as a mapping, where the byte values 0 to
127<sub>10</sub> map to the bit pattern representations 00000000 to
01111111, and the values -128<sub>10</sub> to -1 map to the bit
pattern representations 10000000 to 11111111, the bit patterns in
each half of the mapping incrementing as though they were binary
numbers. I hope that makes things a bit more palatable.) How can we
cleanly change the code, bearing this in mind? One possible fix is
to change the comparison line to</p>
<pre class="programlisting">
if(cbuf[k] &lt; (byte)0xC0) {
</pre>
<p>I wonder what you think about this? I can't decide whether it's
evil or cute, and here's why. That cast on the right of the
comparison might trick you into thinking that <tt class=
"type">byte</tt> values are being compared; after all, the left
hand side is a <tt class="type">byte</tt>, isn't it? Don't forget
binary numeric promotion. Oh, and I didn't mention narrowing
conversion, did I? [<a href="#Narrowing">Narrowing</a>] That
comparison operator is ultimately going to see two integer
operands; the left hand side undergoes widening conversion as
before, so that it is notionally an <tt class="type">int</tt> in
the range -128 to 127, but what happens on the right hand side?
Recall that <tt class="literal">0xC0</tt> is a <tt class=
"type">HexIntegerLiteral</tt>, and so is notionally the 32-bit
<tt class="literal">0x000000C0</tt>. When it is cast to <tt class=
"type">byte</tt>, it undergoes narrowing conversion, which does not
necessarily preserve value. In a narrowing conversion, the relevant
number of most significant bits are dropped so that the result fits
into the new representation. In our case we end up with the byte
value with representation 8-bit <tt class="literal">0xC0</tt>,
which we now know is a value of -64. Finally, it undergoes widening
conversion to the <tt class="type">int</tt> required for the
comparison operator, and it ends up with the 32-bit representation
<tt class="literal">0xFFFFFFC0</tt>, but still value -64.</p>
<p>This is evil because &quot;hidden&quot; conversions are occurring, but
cute because the effect is the same as if two bytes were being
compared. Another &quot;simple&quot; fix is</p>
<pre class="programlisting">
if(cbuf[k] &lt; 0xFFFFFFC0) {
</pre>
<p>or, equivalently,</p>
<pre class="programlisting">
if(cbuf[k] &lt; -64) {
</pre>
<p>I don't like either of these, because they begin to obscure what
we are trying to achieve. How about adding a bit of redundancy</p>
<pre class="programlisting">
if(cbuf[k] &gt;= (byte)0x80 
          &amp;&amp; cbuf[k] &lt; (byte)0xC0) {
</pre>
<p>on the grounds that it looks like the preceding comment, and
that the first comparison may be optimized out anyway? Again, I
don't like this. What would you do? (No, don't recode it in C or
your favourite language - I want engineering Java solutions!)</p>
<p>The end of this tale is that just before I sent my discovery to
the code's author, I discovered (too late) that I was looking at
old Java (well old code, at least). Somehow I had gotten hold of a
rather dated version - the latest version came with a list of bug
fixes which included the bug I'd been looking at. (As an aside, you
will know that bugs in compression and decompression can be what I
call &quot;loud&quot;. The tiniest error can have a huge effect, but that's a
story for another time!) I was quite keen to see this particular
fix, and I think you'll find it interesting:</p>
<pre class="programlisting">
destBuf = new byte[(buf.length*3)/2];
int k, i, destTemp;
for(i = k = 0; i &lt; destIndex; i++, k++){
  destBuf[k] = destBuf[i];
  destTemp = destBuf[k] &amp; 0xff;
  // skip the run-length codes
  if(destTemp &gt;= 0x80 &amp;&amp; destTemp &lt; 0xc0){
    destBuf[++k] = destBuf[++i];
  }
...
... etc.
</pre>
<p>I will come clean. When I first saw this, though I knew it
worked, I could not quite see how. We have an extra line,
<tt class="literal">destTemp = destBuf[k] &amp; 0xff</tt>, which
you might think performs a redundant mask of the <tt class=
"literal">byte destBuf[k]</tt>, and then performs a widening
conversion to an <tt class="type">int</tt> before assigning a value
in the range -128 to 127 to <tt class="varname">destTemp</tt>. Hmmm
- that would put us almost back where we started. It helps to know
that binary numeric promotion also applies to integer bitwise
operators. So <tt class="literal">destBuf[k] &amp; 0xff</tt>
actually has the effect of promoting the <tt class="literal">byte
destBuf[k]</tt> by sign extension to 32 bits, and then zeroing the
top 24 bits. That mask isn't redundant after all - it's vital to
zero those sign extension bits. So <tt class=
"varname">destTemp</tt>, an <tt class="type">int</tt>, has a value
in the range 0 to 255 when we come to that familiar <tt class=
"literal">if</tt> statement. This time, we really do need that
extra <tt class="literal">destTemp &gt;= 0x80</tt> comparison.</p>
<p>I'm not going to comment on what I think is the best fix; I
think you should, though! A final foray into &quot;interesting&quot; code.
What do you think Java will make of the following code
fragment?</p>
<pre class="programlisting">
  byte b = -128;
  int i = -2147483648;
  b = (byte)-b;
  i = -i;
</pre>
<p>Remember that a byte variable is unable to hold the value 128,
and an int is unable to hold 2147483648. There are a number of
options here. Perhaps the code won't compile. (It certainly won't
if you take away that (byte) cast. Why?) Perhaps there'll be an
overflow at runtime, and an ArithmeticException will be thrown.
Perhaps the code will run quite happily, and i and b will end up
with values you might not expect. Since there are plenty of free
JDKs around, why don't you spend a few moments investigating this?
If after you do this you come to the conclusion that Java's
behaviour is odd, then I should remind you that at least this is
defined behaviour. What value should this fragment of C++ leave in
<tt class="varname">i</tt>?</p>
<pre class="programlisting">
  char c = 255;
  int i = c;
</pre>
<p>I think that's enough for now. Although there are further
interesting aspects to this topic, I hope I've aroused your
interest enough for you to go and investigate them yourselves. When
I first realised that I'd spent some valuable time fixing code that
had already been fixed, I must admit I felt I'd been wasting my
time. But I decided to share the experience because I recognised
that here were some important lessons. I hope you agree.</p>
<div class="bibliography">
<div class="titlepage">
<h2><a name="d0e284" id="d0e284"></a>References</h2>
</div>
<div class="bibliomixed"><a name="Gutenberg" id="Gutenberg"></a>
<p class="bibliomixed">[Gutenberg] Project Gutenberg Official Home
Site: <span class="bibliomisc"><a href="http://promo.net/pg/"
target="_top">http://promo.net/pg/</a></span> (Project Gutenberg is
generating a &quot;plain&quot; text archive of as many books as it can,
copyright issues permitting. By the time you read this, more than
2,500 texts should be available. From the Official Home Site, you
should be able to find your nearest mirror site.)</p>
</div>
<div class="bibliomixed"><a name="DOC" id="DOC"></a>
<p class="bibliomixed">[DOC] The DOC Format: <span class=
"bibliomisc"><a href="http://pyrite.linuxbox.com/" target=
"_top">http://pyrite.linuxbox.com/</a></span> (Unless a particular
item is difficult to find on a particular website, I try to give
the URL for the home page only, as so many sites seem to be
re-arranged at regular and too-frequent intervals, invalidating
specific URLs. On this home page, you should be able to find a
direct link to &quot;The DOC Format&quot;.)</p>
</div>
<div class="bibliomixed"><a name="Brisk" id="Brisk"></a>
<p class="bibliomixed">[Brisk] Brisk Software Home Page:
<span class="bibliomisc"><a href="http://www.qni.com/~brisk/"
target="_top">http://www.qni.com/~brisk/</a></span> (Look for
MakeDocJ in Pilot Software.)</p>
</div>
<div class="bibliomixed"><a name="Gosling-" id="Gosling-"></a>
<p class="bibliomixed">[Gosling-] James Gosling, Bill Joy &amp; Guy
Steele: The Java Language Specification Addison-Wesley. Reading,
Mass. 1996. - Section 3.10.1 Integer Literals. (This book is
written in a quite readable way, especially for a reference work,
but for me it has a major irritation. Perhaps to try and &quot;lighten&quot;
the text (a worthy aim), or perhaps for less worthy reasons, the
authors have thought fit to include numerous quotations. A good
quotation is always apposite, and often carries a succinct message.
Practically none of the quotations in this book have either of
these qualities - they seem to have been largely selected on the
basis of containing a particular keyword, the meaning or message of
the quotation being irrelevant. This unfortunately seems to be a
growing tendency amongst authors, and to my mind is the opposite of
wit. However, there is humour (of a kind) in this book. Look up
&quot;prime&quot; in the index.)</p>
</div>
<div class="bibliomixed"><a name="Promotion" id="Promotion"></a>
<p class="bibliomixed">[Promotion] Ibid. - Section 5.6.2 Binary
Numeric Promotion</p>
</div>
<div class="bibliomixed"><a name="Widening" id="Widening"></a>
<p class="bibliomixed">[Widening] Ibid. - Section 5.1.2 Widening
Primitive Conversions</p>
</div>
<div class="bibliomixed"><a name="Narrowing" id="Narrowing"></a>
<p class="bibliomixed">[Narrowing] Ibid. - Section 5.1.3 Narrowing
Primitive Conversions</p>
</div>
</div>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
