    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: A Letter on Java</title>
        <link>https://members.accu.org/index.php/articles/745</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">CVu Journal Vol 11, #1 - Nov 1998</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c134/">111</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;A Letter on Java</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 03 November 1998 13:15:28 +00:00 or Tue, 03 November 1998 13:15:28 +00:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="section" lang="en">
<div class="titlepage">
<h2><a name="d0e20" id="d0e20"></a></h2>
</div>
<p>Dear Francis,</p>
<p>I find George Wendle's dislike of the Java Date class
interesting. According to the 1.1.6 API specification, Date is not
deprecated as he heard, but many of its methods are because they
did not work well with internationalisation. Now, a Date is used to
store a time in milliseconds, and a Calendar (which can be extended
to support Chinese calendars etc as well as the Gregorian one) is
used to view a Date. The relationship is quite clearly explained in
the specification. The deprecated Date methods do handle the year
as an integer minus 1900, so this does leave scope for Y2K problems
in poor implementations (although there is nothing in the
specification that limits the year to two digits), but this is only
in the deprecated methods.</p>
<p>One thing that I don't like about Java is its handling of
international characters. The intention is admirable, but why are
all those &quot;byte to character converters&quot; only briefly mentioned in
the specification and hidden away in the sun.io package without
support for anybody who wants to customise them? For example, I
could quite easily write the classes to add HZ encoding of Chinese
(which is common in email and Usenet but not supported by Java),
but, if I did so, then I would be relying on undocumented stuff and
reverse engineering, and all the risks that this entails. Honestly,
I can't even find a definitive list of the converters available,
but then, when I decompiled some of them (albeit with a not-so-good
decompiler that usually outputs opcodes or gives up), I'm hardly
surprised that they don't want to advertise them too much. Let me
give you five examples:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>Any encoding with a state (e.g. a byte sequence that switches
into and out of the encoding, like JIS or one of the EBCDICs) will
throw an <tt class="exceptionname">InternalError</tt> in some
circumstances, most notably when called with one single character.
The reason is that an internal array is dimensioned to (maximum
bytes per character * number of characters), and, although the
&quot;<tt class="methodname">getMaxBytesPerChar</tt>&quot; methods do return
numbers that are big enough to include the &quot;<span class=
"emphasis"><em>switch into the encoding</em></span>&quot; sequences,
they are not big enough to also include the &quot;<span class=
"emphasis"><em>switch out of the encoding</em></span>&quot; ones, and
you can guess the rest. Didn't they test this stuff?</p>
</li>
<li>
<p>The &quot;JIS auto detect&quot; converters. These try to detect one of
three common types of Japanese encoding (JIS, Shift-JIS and EUC).
Unfortunately, they only look at the very start of a stream and
then stick by their judgement. In my application, the start of the
stream happened to be an HTTP header, and, because this was all
7-bit characters, the class always chose JIS by process of
elimination. Not to worry - in Japanese you can get away with
throwing all your data at each converter in turn and trying again
if you get an exception. It is not that difficult to write a
converter that copes with changing encoding systems on the fly (I
did it in a few lines of C++), but this can go wrong. I would at
least expect the state to not be finally set until Japanese
characters have actually been processed, though. But again, if I
modified the library then my code could be version or even
implementation dependent. Most of the reason why I bothered to use
Java anyway (rather than write my own in C++ and have done with it)
is so that improvements in Java automatically make my program
support more encodings. I suspect that the existing converter was
written by somebody who had nothing to do with Japan and was just
hacking out code to complete a library (if that sounds familiar to
anyone).</p>
</li>
<li>
<p>The UTF-8 and related converters (UTF8 is sometimes used for
Chinese). When I saw these, I found to my horror that they can
sometimes generate more than one Java '<tt class="type">char</tt>'
for what is actually one character, and programmers are assured
throughout the specification that one '<tt class="type">char</tt>'
contains one character. This compromises the whole Java
internationalisation philosophy, potentially bringing back all the
old problems of &quot;<span class="emphasis"><em>this code assumes that
one <tt class="type">char</tt> equals one character and it won't
port to Chinese</em></span>&quot; and so on. Further, they use the
user-defined area of Unicode, which really messes things up for
people who are thinking of using that area themselves. Granted, if
you want to represent millions of characters in a language that
only uses two-byte Unicode, you have to do something awkward, but
they could at least have documented it.</p>
</li>
<li>
<p>The Korean KSC5601 converter contains a private member variable
called <tt class="varname">outputSize</tt>, which should be set to
0 in several places (most notably in the <tt class=
"methodname">reset()</tt> method) but is not. As a result,
undefined characters can sometimes be written to the output,
leading to buffer overruns and internal errors. (To see the effect,
try looping through all the Unicode characters while catching the
conversion exceptions and it should go wrong at about &amp;1100,
which I think is Korean.)</p>
</li>
<li>
<p>At one point the ISO-2022 superclass constructs a <tt class=
"classname">String</tt> with the platform's default encoding,
assuming that this encoding will not change the byte sequence. It
also has poor handling of unrecognised escape sequences and it and
its subclasses do not properly distinguish between the various
planes of the CNS11643 Chinese encoding; this would show up as
wrong characters.</p>
</li>
</ol>
</div>
<p>The other thing about the Java character converters is the
omissions. There are some very common encodings that are not
supported, and some very obscure ones that are. It seems to me that
this is because some encodings could be supported without writing
any more code, and it's easy to just add a different
character-mapping table. The whole thing seems as though somebody
was trying to impress the management by supporting as many
encodings as they could, without regard to which ones, like an
email program that supports hundreds of binary formats but not
uuencode. Some common encodings are supported, but there are others
that I would not expect to see omitted from such a large library,
yet they are.</p>
<p>So, I'm glad that the ISO meeting is in Tokyo; as George says it
may discourage American participation, but I hope that a decent
Asian programmer seriously sorts them out. They need it. By the
time you read this, it should all have already happened, and it
will be interesting to see if any changes are made.</p>
<p>Another thing I don't like about Java is its half-finished Web
implementation. If you're going to make getting Web pages part of
the API, you might as well add support for pages that require
authentication - at present this requires user interaction, so you
can't write a proxy or CGI gateway with it. If you're going to put
picture display (e.g. GIF display) in the AWT, you might as well
put GIF writing in there as well, or at least if you're going to
have an AWT class for an off-screen buffer then you might as well
give it methods to read the points. Also you might as well make it
so that you can have an off-screen buffer without having to
instantiate anything else, so you can do graphical operations
without having to display anything at all (this would make it very
easy to write a CGI program that returns a GIF of a given Unicode
character in a given font, for example).</p>
<p>I was very excited about the Java JIT, what with its claims that
it can go faster than C++ because it can do more processor-specific
optimisations and so on, but the implementation on Sun's website
was a bit of a joke - it added several seconds to program execution
while it compiled, which may be all right in some applications but
not every time you get a CGI query! Can't the JIT save the data
structures it generates for use next time the program is run? The
alternative would be to have a non-portable LRWP (long-running web
process), which could take up quite a chunk of resources if done in
Java. In my case, I did most in C++ and spawned Java only for the
encoding conversion, and eventually I thought &quot;this is silly&quot; and
wrote my own library in C++. And before anyone asks, yes I did
pinch a few mapping tables, but I could have got them from
elsewhere had I been online at the time, and anyway it's for
personal use (if running a Web server counts as personal use).</p>
<p>Oh, one final thing: How to make Java throw an access violation
or a core dump (at least, the version I've got): Create a process
with <tt class="methodname">Runtime.exec()</tt>, make sure that
that process finishes, and try to access its input and output
streams. And there's nothing in the documentation about this, not
even a boolean <tt class="methodname">hasItFinishedYet()</tt>
method.</p>
<p>On a related subject, Suradet Jitprapaikulsarn's letter made me
think. Does she really want a book in English? I still have a vivid
image of how a certain immigrant must have felt when I dumped a
huge library copy of Deitel &amp; Deitel on her, and Suradet's
pupils might find it even more difficult. Are translations
available for the books reviewed in C Vu? Should C Vu mention such
facts along with authors and ISBN numbers? Are there books in other
languages but not in English, that some ACCU members can review? If
so, should the reviews be written in English? (I'd say yes they
should, if English is used as a common language throughout ACCU,
because that way someone can recommend a book on the review alone
even if they don't speak its language) English is not as universal
as you might think, and if we want to be really international then
we have a long way to go.</p>
<p>Regards</p>
<p>Silas S. Brown</p>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
