    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Incremental Design: A Case Study of a Compiler</title>
        <link>https://members.accu.org/index.php/articles/291</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Programming Topics + Overload Journal #69 - Oct 2005</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c65/">Programming</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c143/">69</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c65-143/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c65+143/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Incremental Design: A Case Study of a Compiler</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 02 October 2005 05:00:00 +01:00 or Sun, 02 October 2005 05:00:00 +01:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e18" id=
"d0e18"></a>Introduction</h2>
</div>
<p>Agile development is great for keeping control of details, but
how well does it deal with development involving heavy design? This
article is about using a very agile approach to write a dynamic
compiler. Compilers are probably the best area to do design in
because of the large body of compiler literature and theory.</p>
<p>I'll discuss my experience of writing a compiler using an agile
approach. The key challenge is working a lot of design, often from
papers not personal experience, into an agile development process.
Agile incremental development works very well for writing a
compiler because it's a very effective way to learn from other
people's written experience. It's a very effective way to learn
while developing and get the learning into the code quickly.</p>
<p>The compiler, Exupery, is a hobby project. It compiles Squeak
Smalltalk bytecodes into x86 machine code which is then executed
inside the same process that is running the compiler. One of the
original goals was to learn something general about software
engineering by personally writing a technically difficult piece of
software. The other original goal was to produce a useful open
source project. Hopefully, you'll gain some insight from this
article.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e27" id="d0e27"></a>Compilers</h2>
</div>
<p>A minimally useful compiler is a large project and a fully
optimising compiler doing everything imaginable is impossibly large
(a trip to the library or CiteSeer<sup>[<a name="d0e32" href=
"#ftn.d0e32" id="d0e32">1</a>]</sup> makes it even bigger).</p>
<p>There are several diferent compiler designs that could make
sense for an object oriented dynamically typed language. The
possible compiler designs run from compiling as quickly as an
interpreter interprets [<a href="#May">May</a>] [<a href=
"#Deutsch-">Deutsch-</a>] to very slow compilers that produce very
high quality code. [<a href="#Appel">Appel</a>]</p>
<p>Most JIT (Just-in-Time) compilers compile fast but not as fast
as interpreters, they stick to linear time worst case algorithms
because compilation happens at run time where noticable pauses are
not acceptable. Most mature batch compilers tend to be very slow,
aiming at fast execution times (at least at higher optimisation
levels) though a lot are simpler to make them buildable.</p>
<p>JITs are a great strategy for very fast compiling compilers
because they can recoup on the second execution. But they are much
less appealing when execution time is important because the compile
time pauses will get longer (potentially a lot longer, as compilers
often use O(N<sup>2</sup>) - or greater - algorithms).</p>
<p>Compilers offer some strong benefits for up front design because
of the rich body of literature, but they also have some strong
disadvantages. It's likely that there are no successful projects
working in the same design space: language implementation;
optimisation style (fast compilation or fast execution); and basic
framework (SSA, dataflow, simple tree walkers, one pass). It is
also very likely that no-one on the team has done anything
similar.</p>
<p>Judging design literature without having personally implemented
similar projects is hard, if not impossible, but many key decisions
have to be made early. These are really project-level design, not
technical, but they'll influence the choice of algorithm and
intermediate forms.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e59" id="d0e59"></a>Register
Allocation</h2>
</div>
<p>Register allocation is the process of assigning all the
variables (including ones holding intermediate values) to a fixed
number of physical registers. There are several different
approaches to register allocation. The main two are colouring
register allocators and linear scan allocators. Colouring register
allocators are slower but produce better code. Linear scan
allocators have seen a lot of work recently because they are more
suitable for JIT compilers.</p>
<div class="sidebar">
<p class="title c3">Colouring Register Allocators</p>
<pre class="programlisting">
while hasSpills
  build interference graph
  simplify and move removal
  hasSpills := select registers
  if hasSpills
    insert spill code
  else
    assign variables to registers
</pre></div>
<p>A colouring register allocator works by first building an
inteference graph. Nodes represent the variables, and edges mean
the variables can be live at the same time. If two variables have
an edge connecting them then they cannot be assigned to the same
register.</p>
<p>Simplification is the process of removing nodes where their
degree (number of edges) is less than the number of registers
available. While simplifying we also remove moves to and from
registers where possible (coalescing). As variables are simplified
they are pushed onto a stack. If no variable can be simplified then
one is chosen and pushed onto the stack as a spill candidate.</p>
<p>Selecting registers involves popping variables off the stack
then assigning them to a register that isn't used by any of their
neighbours. It will be possible to find a register for it because
when it was selected it had less neighbours than the number of
registers . If a variable was pushed as a spill candidate then we
try and find a register for it if some of its neighbours were
allocated to the same register but if we can't it will be spilled
(stored in memory rather than a register).</p>
<p>Colouring register allocation was first practically formulated
in 1981 by Chaitin [<a href="#Chaitin">Chaitin</a>]. Briggs
introduced a few key improvements in 1992 [<a href=
"#Briggs">Briggs</a>] including making final spill decisions during
selection, which is more efficient because some of a variable's
neighbours might have been allocated the same register, so spilling
would have been overly pessimistic. George and Appel in 1996
[<a href="#George-">George-</a>] extended Briggs' work by
coalescing moves while simplifying, rather than as a separate
phase, which allows a lot more moves to be removed. There's a lot
more work on colouring register allocators - Google scholar finds
over 500 documents. It would be stupid not to do a serious
literature search before implementing in a domain which has been
studied so much.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e86" id="d0e86"></a>Squeak and Why
It's Challenging to Implement Efficiently</h2>
</div>
<p>Squeak is the main open source implementation of Smalltalk.
Smalltalk is a pure dynamically-typed object oriented programming
language. Everything is an object including Contexts (stack
frames), Blocks (closures), and integers. <tt class=
"type">SmallIntegers</tt> fit into a machine word with a tag and
will overflow into large integers.</p>
<p>The core language is very pure. Even conditionals <tt class=
"literal">ifTrue</tt>: and <tt class="literal">ifFalse</tt>: (the
equivalent of <tt class="literal">if</tt> statements in C) are
implemented in the library, but Squeak, and most other Smalltalks,
cheat here for speed. <tt class="type">SmallIntegers</tt> are only
slightly different from other objects - it's possible to add
methods to <tt class="type">SmallIntegers</tt> but not to subclass
them. Squeak, and most other implementations, provides special
bytecodes that perform SmallInteger operations. If the arguments
are not <tt class="type">SmallIntegers</tt> or the operation fails
then a normal message <tt class="literal">send</tt> will be
performed.</p>
<p>Every expression needs type checking, which is the cost of being
a pure dynamically typed language. This costs about 2 cycles to
type check per operand. Theoretically type checking can be removed
in most cases.</p>
<p>Tagging and detagging integers in Squeak is slower than
necessary because the integer tag is 1 rather than 0. Small integer
operations need to remove the 1 tag before operating then add it
afterwards.</p>
<p>Booleans are <tt class="type">real</tt> objects. The only
restriction is there can only be one true object and one false
object in the system. This means that the expression <tt class=
"literal">a &lt; b ifTrue: [do something]</tt> naively needs to
perform the comparison, convert the control flow into a boolean
object (a little <tt class="literal">if then else</tt> sequence),
then convert the boolean back into control flow (another <tt class=
"literal">if object is true then else</tt> sequence). That's a lot
of instructions and also an extra conditional branch for the
processor to (potentially) mispredict.</p>
<p><tt class="literal">Send</tt>s (function calls) are very common.
Theoretically all expressions are <tt class="literal">send</tt>s
and most are in practice. <tt class="literal">While</tt> loops,
<tt class="literal">if</tt> expressions, and small integer
operations are theoretically <tt class="literal">send</tt>s but are
expanded when generating bytecodes.</p>
<p>Smalltalk runs in an image which hosts both the IDE and the
program being developed. The entire system is written in itself and
modifiable live. Smalltalk systems host their own development
tools. It's possible, and normal, to change how the development
system works live during normal development. The debugger,
profilers, and editors (called browsers) are normal code that can
be inspected and changed.</p>
<p>Compilation is done incrementally. When developing, individual
methods are compiled as they are edited and will be used for every
call after they have compiled. It's also possible to modify the
system programmatically. This is how the programming environment is
implemented. Any method may be modified at any time including by
code. Any object may be swapped with any other object at any time
(<tt class="literal">become:</tt>), all references to the objects
get changed by this operation. A lot can be changed during normal
program execution.</p>
<p>Squeak is currently implemented by an interpreter. The
interpreter executes a bytecode in about 10 clocks which is the
branch misspredict penalty. That's a fairly tough target to beat
with a naive compiler but a well designed simple compiler could do
it.</p>
<div class="sidebar">
<p class="title c3">Smalltalk Syntax</p>
<p>Smalltalk syntax is very simple. There are three types of
messages, blocks, and literal expressions.</p>
<div class="informaltable">
<table border="1" cellspacing="0">
&lt;colgroup&gt;
&lt;col width=&quot;50%&quot;&gt;
&lt;col width=&quot;50%&quot;&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
<tr>
<td><tt class="literal">Factorial printString</tt></td>
<td>Unary messages are a single word.</td>
</tr>
<tr>
<td><tt class="literal">* + ** &lt;=</tt></td>
<td>Binary messages look like operators in other languages. They
have an argument before and after them.</td>
</tr>
<tr>
<td>
<pre class="literallayout">
Receiver printOn: 'hello'.
Receiver 
    at: 'a key' 
    ifAbsent: 
        [self doSomething]
</pre></td>
<td>Key word messages are a sequence of words ending in :. Each
argument always has a keyword before it ending in :</td>
</tr>
<tr>
<td><tt class="literal">[1+ 20] [:each| each + 10]</tt></td>
<td>Blocks are bits of code that can be executed later. They look
like <tt class="literal">[anExpression]</tt> or <tt class=
"literal">[:each| self aMessage: each]</tt>. They can take
arguments, each argument begins with <tt class="literal">a :</tt>,
after the last argument there is <tt class="literal">a |</tt>.</td>
</tr>
<tr>
<td><tt class="literal">|a b c|</tt></td>
<td>Defines three local variables called <tt class=
"literal">a</tt>, <tt class="literal">b</tt>, and <tt class=
"literal">c</tt>. Definitions can only occur at the start of
methods or the start of blocks.</td>
</tr>
&lt;/tbody&gt;
</table>
</div>
<p>Precedence is unary messages then binary messages then keyword
messages. So <tt class="literal">aStream nextPutAll: 10 * 200
factorial</tt> fully bracketed is: <tt class="literal">aStream
nextPutAll: (10 * (200 factorial))</tt>. The result is to print a
very large number on <tt class="literal">aStream</tt>.</p>
<p>Control flow is logically implemented using message <tt class=
"literal">send</tt>s and blocks. <tt class="literal">a &lt; b
ifTrue: [self doSomething]</tt> executes the block <tt class=
"literal">[self doSomething] if a &lt; b. [a &lt; b]. whileTrue: [a
:= a + 1]</tt> is a basic <tt class="literal">while</tt> loop. It
first executes <tt class="literal">[a := a + 1]</tt> while
executing <tt class="literal">[a &lt; b]</tt> returns <tt class=
"literal">true. #at:ifAbsent:</tt> used as an example of a keyword
message will execute its block argument if it couldn't find the key
(the first argument). Reading <tt class="literal">[]</tt> brackets
as if they were brackets from C is tolerable but Smalltalk's
<tt class="literal">[]</tt> are semantically richer because they
return a first class object.</p>
<p>Smalltalk is a natural environment for JITs (which it's had
since '83). Writing a traditional JIT inside the language poses
problems. A JIT normally stops execution until it has compiled a
method then jumps into it. What happens if it's written and running
in the environment it's compiling and needs to compile part of
itself to finish compiling a method?</p>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e261" id="d0e261"></a>Initial
Planning</h2>
</div>
<p>When starting a large project, the first problem is deciding if
the project is worthwhile, which involves enough design to estimate
the costs of a minimally useful system. A minimally and generally
useful compiler is probably around 10-20kloc which is a large
system to write as a hobby. It's possible to write a compiler in
less, but to be worth the cost of maintenance it must be noticably
faster than the interpreter. Simple compilers can be slower than
tuned interpreters. Exupery is a big project for a single person's
hobby, so ideally it should also have big motivating goals. The end
goal is to be as fast as C or Fortran for most programs.</p>
<p>Exupery is designed to become faster than any previous Smalltalk
implementation by combining a solid classical optimiser with
inlining driven by dynamic type feedback. That's too much to
achieve in one step but it still helps to know that the end goals
are possible.</p>
<p>To completely eliminate the cost of Smalltalk's dynamic power we
need to:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>Remove integer tagging and detagging.</p>
</li>
<li>
<p>Remove boxing and deboxing of Booleans and Floats</p>
</li>
<li>
<p>Remove all send overhead including calling blocks (higher order
functions)</p>
</li>
<li>
<p>Optimise the low level intermediate to an equivalent level to
C</p>
</li>
</ol>
</div>
<p>This could be done by:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>Having solid back end producing good code. In practice this just
means a good register allocator and some care. This was the biggest
source of remaining waste noticed in Self [<a href=
"#Holzle">Holzle</a>]</p>
</li>
<li>
<p>Using dynamic type feedback to discover what types are used for
each send site then inlining commonly called messages.</p>
</li>
<li>
<p>Using type analysis to remove unnecessary type checks and
boxing/ unboxing of Booleans and Floats</p>
</li>
<li>
<p>Induction variable analysis will allow the removal of most type
checks of loop counters and array range checks</p>
</li>
</ol>
</div>
<p>Dynamic type feedback provides information on what types were
actually used. The solid optimisation framework can then remove
redundant tagging and detagging in the common case
code<sup>[<a name="d0e303" href="#ftn.d0e303" id=
"d0e303">2</a>]</sup> and moving unnecessary code out of loops
(e.g. the code that figures out how big an object is to range check
it. The array is almost certainly the same object across the entire
loop).</p>
<p>So long as compilation happens in a background thread and never
causes execution to halt, it's possible to combine very fast
generated code with no pauses. Background compilation also makes it
possible to run the compiler in the same environment that's being
compiled. The advantages are: there are no compilation pauses, slow
optimisations can be used, and the compiler can be written in
Smalltalk as a normal program. The key disadvantage is there must
be another way of executing Smalltalk but we already have an
interpreter.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e309" id="d0e309"></a>Task
Breakdown</h2>
</div>
<p>The vision of a compiler above is too big to commit to building.
So let's break it down into three different projects. First the
full compiler meeting the initial ambitious goals. This is the goal
that's really motivating. Second the simplest possible compiler
that would be useful. This is the first version that should get
widespread use. Third the simplest possible compiler that can be
built and tested. This compiler is the first stepping stone to the
second project.</p>
<p>The project's current high level breakdown (release plan)
is:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>Get it compiling a basic program. Just to compile an iterative
factorial</p>
</li>
<li>
<p>Make that faster than the interpreter.</p>
</li>
<li>
<p>Compile the bytecode benchmark in the same process as the
interpreted compiler</p>
</li>
<li>
<p>Make that faster than the interpreter</p>
</li>
<li>
<p>Add support for most of the language. (minus blocks)</p>
</li>
<li>
<p>Optimise sends with PICs</p>
</li>
<li>
<p>Optimise bytecodes. Make it faster than VisualWorks, the fastest
commercial Smalltalk</p>
</li>
<li>
<p>Make it practical. This is the current task</p>
</li>
<li>
<p>Make sends fast. Full message and block inlining</p>
</li>
<li>
<p>Build a basic optimiser. Just to remove integer tagging and
untagging</p>
</li>
<li>
<p>Extend the optimiser to handle classical optimisations such as
common sub expression elimination and code hoisting. Moving #at:
overhead and write barrier checks out of loops is a specialised
case of code hoisting which may double the bytecode performance</p>
</li>
<li>
<p>Extend the optimiser down to replace the low level optimiser.
This will allow the classical optimisations to remove redundancy
that was hidden inside the tagging and boxing code.</p>
</li>
<li>
<p>Add floating point support. To be worthwhile, native compiled
floating point support needs to remove most of the boxing and
deboxing of floats. This may require the entire optimiser in this
plan or it might be worthwhile just with tree traversal based
optimisations and dynamic type feedback. Measurements on real code
could answer this question.</p>
</li>
<li>
<p>Induction variables. Extend the optimiser to optimise induction
variables (a fancy name for loop counters and friends). This should
allow all range checks using looping variables to be fully
optimised and to remove the overflow checks when incrementing loop
counters. It should remove the last remaining overhead in common
array loops.</p>
</li>
</ol>
</div>
<p>The original release plan was for five major releases 1) a basic
compiler, 2) decent bytecode performance (at least twice the
interpreter's) and compiling inside the image, 3) add most of the
bytecodes, 4) decent send performance (probably inlining), 5) make
it practical. Each release is a few months' full time work.</p>
<p>Implementing the release plan up to &quot;Make it practical&quot; or &quot;Make
sends fast&quot; would produce a roughly minimal compiler that was
practically useful. This is where the compiler is up to now, it's
four times faster for the bytecode benchmark and twice as fast for
the send benchmark than the interpreter. The only algorithm that's
more complex than a tree traversal is the colouring register
allocator. That such a simple compiler could be so much faster than
the interpreter was easy to predict initially by looking at the
performance of traditional JITs, which cannot use any complex
optimisations because they cannot afford long compile time
pauses.</p>
<p>From the minimal practical compiler, I broke out a minimal
testable compiler which was the first thing built. The minimal
testable compiler just compiles an iterative factorial method into
assembly which was then assembled and linked to a C driver routine.
The next iteration added a register allocator and some performance
tuning to instruction selection to use some addressing modes. Then
I started compiling directly to machine code and running the
generated code inside the same process that was running the
compiler.</p>
<p>This little compiler could then be extended by adding language
features and optimisations. It is the core of the minimal useful
compiler. The overall architecture is there, all the phases exist
in a nontrivial form. Building up towards the basic minimal
compiler is now just work and keeping control of the details. It's
a recursive process that's re-entered in the section &quot;Later
Design&quot;.</p>
<p>Breaking down large tasks is critical. There are too many
details to keep straight. Stuff that isn't important at the
strategic level because it definitely can be made to work can still
cause crashes or wrong answers. Making a task breakdown is often
enough design to safely begin implementation.</p>
<p>At the beginning a brief project plan is useful to plan just
enough to see that the project is worth starting, however,
justifying a large project by its end goals is a mistake. Ideally
each iteration should pay for itself. [<a href=
"#Beck-">Beck-</a>]</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e374" id="d0e374"></a>Testing and
Small Steps</h2>
</div>
<p>The main benefit of tests is the ability to work in very small
chunks. It's possible to write (or modify) tests one evening then
modify the code later. It's very easy to pick up coding on a broken
test.<sup>[<a name="d0e379" href="#ftn.d0e379" id=
"d0e379">3</a>]</sup></p>
<p>Exupery is written using two separate test suites; each has
almost complete test coverage. The programmer tests are unit tests
with some code integration tests thrown in, they will never crash
the development environment. The customer tests compile and run
example fragments<sup>[<a name="d0e384" href="#ftn.d0e384" id=
"d0e384">4</a>]</sup> which can crash the development environment
if they generate faulty programs (programs that corrupt memory
etc).</p>
<p>When writing code to extend the language coverage, first I write
some customer tests while figuring out what all the special cases
are for the new statement. Then I'll run them, they break. Then
it's time to write programmer tests to drive the implementation.
The programmer tests drive the implementation of the language
feature for each component. For a new integer operation this will
start with byte code reading, then intermediate generation, then
adding the new instruction to the back end. One the tests have been
written, it's possible to focus on the details without needing to
keep the entire picture in mind all the time, if a critical detail
is ignored the tests break. Tests specify behaviour precisely
enough to allow it to be safely ignored, this reduces the amount
that I need to keep in my head, which allows me to work with
smaller chunks of time.</p>
<p>Just having a customer test suite that compiles and runs
programs is not enough. There are too many ways to implement, the
test that drives development should provide the direction to
develop. The test will often be written one evening, and the code a
few days later, therefore the test must carry the subtlties to be
implemented because short term memory cannot.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e392" id="d0e392"></a>Debugging and
Reliability</h2>
</div>
<p>Debugging can be a major problem because bugs usually show up
when you run a compiled program and it crashes (or produces the
wrong answer). The first stage is figuring out why the generated
program is crashing. Then figuring out where the bug is in the
compiler, which is non-trivial because one stage may be failing
because a previous stage did something unanticipated.</p>
<p>Reliability is a big problem when writing a compiler, there is a
lot of functionality, and every user visible action involves most
of the compiler, including several complex algorithms . Thus
tracking down a bug can be a lot of work, because first it must be
found in the compiled code, then it must be traced back though the
compiler. Even when compilation fails it's often not clear which
stage is at fault, is the register allocator failing because it's
complex and still buggy, or is the input invalid because of an
upstream bug?</p>
<p>Reliability is a key issue for compilers, not just because it's
better not to have real compiler bugs, but also because it's easy
to lose control of quality. They are too complex to debug into
shape if the quality ever drops.</p>
<p>Non-deterministic bugs are surprisingly easy to create. First
because analysis algorithms may only exhibit a bug on one ordering
(iterating over hash tables can make order nondeterministic) - the
order isn't important for correctness, but it can expose or hide
bugs. Having a large automated test suite and running it frequently
makes intermittent bugs obvious as the suite will only sometimes
fail or crash.</p>
<p>Large test suites help to ratchet up the quality. By keeping all
the tests passing it's easy(ish) to avoid introducing new bugs.
Adding more optimisations however adds more ways for subtle bugs to
creep in as there are more ways for features to interact.
Optimisations require more customer/integration tests to get the
same level of coverage.</p>
<p>Originally I thought that I'd need to run my acceptance tests in
a separate forked image to make it safe. The customer tests run
generated machine code which can crash the development environment.
Having your development enviroment which includes the editors and
debuggers crash regularly during development is painful. Forking
separate processes hasn't been necessary, manually identifying the
crashing tests from the stack trace when it crashes and throwing an
exception at the beginning of the crashing tests has proved enough.
It's very unusual for a lot of tests to be broken for a long time.
The programmer tests keep the quality high enough that unexpected
customer test crashes are sufficiently rare.</p>
<p>It's important to be able to quickly fix broken tests. The code
would get very brittle because it would take too long to debug if
any test started failing. Knowing that all the tests passed
recently reduces the amount of code that could have introduced a
bug and thus the time required to fix it.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e409" id="d0e409"></a>A Little
Duplication</h2>
</div>
<p>Often there isn't a clear right way to factor (erm, design) some
code but it is painfully clear that the current structure is
inadequate. Sometimes there are obvious small improvements that we
can make but there are still cases where there isn't an obvious way
to improve the structure. There it's worthwhile to introduce some
duplication. A common case is test code, what should vary between
the tests? What should be factored into helper methods and hidden?
There's a cost, more context is needed to read the method, if it's
reused this is a big saving. Copying and pasting the first test 10
times, changing the parts needed, then refactoring away the
duplication has worked well. Intermediate creation and assembling
are good examples in the production code, both have an involved
detail driven design which evolved through refactoring. The initial
duplication provided the environment to find the right structure,
then refactoring consolidated the winning design features.</p>
<p>One big problem with intermediate creation was finding a design
that was both clean and easy to test without duplication in the
tests. The obvious way to test intermediate creation is to compile
short methods into intermediate then check that the intermediate
generated is correct. The problem is the basic intermediate
building blocks such as integer tagging are repeated, making them
very hard to change without needing to change a lot of tests.</p>
<p>My first approach was to break the intermediate generator into
two objects, the first handling the higher level logic and the
second handling the repeated low level operations. This helped, but
there was often too much code in each test. The next insight was to
make the intermediate emitter mock itself by having an instance
variable that normally just impersonated self (this in C++). This
provides a flexible recursive way to test intermediate emission. A
message can be sent either to self (included in the test) or to the
mockable instance variable (which is normally the same as self).
This provides enough flexibility to test any part of intermediate
generation in isolation. Testing components in isolation requires
very decoupled design.</p>
<p>A little duplication is often a good thing, especially when
there isn't an obviously right answer and a better design is
obviously needed. The duplication allows the code to evolve in
different ways without being overly constrained with the
requirements from elsewhere. Duplication and copy and paste
programming makes experiments with design cheap. Refactoring will
remove the duplication later, once the kernel of a design has been
found. Tests over the duplicated code makes it safe to refactor
once a decent structure is found. Refactoring away the duplication
is likely to refine the design as it needs to generalise to deal
with more solutions.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e420" id="d0e420"></a>Refactoring
and Architectural Correctness</h2>
</div>
<p>Large refactoring often involves moving temporarily away from
architectural correctness.</p>
<p>For instance, there are two different, and both correct, ways of
generating expressions in a simple intermediate language. Either
expressions are written in trees representing the abstract syntax
tree, or individual sub-trees are written out sequentially as a
list. Eventually the code will end up as a sequential list, but the
tree form provides a lot of cheap and simple manipulative power
(the tree optimisers). Either has obviously correct semantics but
mixing them arbitrarily will produce bugs when side effecting
expressions are executed in a different order.</p>
<p>Who is responsible for unique execution of expressions is a
similar problem. The expression <tt class="literal">array at:
anExpression</tt> uses <tt class="literal">anExpression</tt>
multiple times. First to type check it (is it an integer), then to
lower range check it (is it greater or equal to 1), then to upper
range check it, then finally to index into the array. Is <tt class=
"literal">anExpression</tt> responsible for returning a register in
case it's used multiple times or is <tt class="literal">#at:</tt>
responsible for placing the result in a register before using it
multiple times?</p>
<p>In both cases I've switched from one architecturally correct
style to the other. The simplest way to work is to generate code
adding it directly to a basic block and make each expression
responsible for returning a register which is given to any
expression which receives its result. This is very easy to test -
check that all results from expressions are either a literal or a
register. However tree optimisations require expressions in trees
so they can optimise operation sequences across bytecodes (high
level tree optimisations). Also it's slightly more optimal to make
the user responsible for uniqueness because then larger trees are
formed to feed into instruction selector.</p>
<p>The problem is, when refactoring from one style to another,
there will be a noticeable time when the entire system is broken.
Either format in both cases is correct but half and half is always
wrong although it may work sometimes. But to refactor, we should be
moving in small steps keeping a working system. Breaking the system
provides a lot of tactical guidance (fix what's broke until
everything works again).</p>
<p>Here, there's a key difference between having strong guarantees
and having all tests pass. The strong guarantee is the reason to
believe the current tests are sufficient. If all tests are passing
and they are good enough to provide a strong guarantee then the
program should be bug free. Finding strong guarantees is definitely
a heavy thought activity, closer to Dijkstra than TDD is normally
portrayed. An architectural refactoring will often need the the
test suite to be changed, not to make it pass again, but to make it
cover the new design well enough.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e447" id="d0e447"></a>Later
Design</h2>
</div>
<p>Design after the project has started is building the
understanding needed to make decisions when required. It's best to
make major decisions by deferring them to the last possible moment,
but that's the wrong time to build the understanding required to
make them.</p>
<p>Later design involves exploring how different infrastructure
investments produce different performance improvements. The
tactical decisions will be driven by specific benchmarks, but
choosing which benchmarks (and tests) will drive is important. Also
looking for powerful combinations that enable a lot of later
features to be easily implemented is important.</p>
<p>When key decisions need to be made early it pays to make them by
default, say by ignoring the cost of moves and registers knowing a
colouring register allocator can clean this up. It's important to
do the thought experiments to know that there are good ways to
solve the problems when they come up.</p>
<p>Many design decisions are easy to make at the right time but
some are not. The best that I can do is guess which decisions need
thought then start thinking and reading about them preferably well
before they really need to be made.</p>
<p>Often the key long term decisions are attempts to find optimal
and minimal sets of optimisations - to find the different hills.
Key questions worth thinking about are:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>What is the minimal infrastructure needed to compete with C in
speed for most programs?</p>
</li>
<li>
<p>What is the minimal infrastructure required for high speed
floating point and what is required for a useful floating point
speed improvement?</p>
</li>
<li>
<p>What is the minimal system that is practically useful and worth
the risk of using for real production use for any major market
segment?</p>
</li>
</ul>
</div>
<p>There are some features that are worth investing in because they
will take us to a better hill. Dynamic inlining is the ideal
solution to common message sends as it makes common sends nearly
free. Building a decent optimisation framework will make a lot of
serious classical optimisations very cheap to build. Often key
pieces of infrastructure can not be justified by their first use
(register allocation could be) but are still worth building.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e472" id="d0e472"></a>Register
Allocation, the First Big Piece of Infrastructure</h2>
</div>
<p>Why choose to use a colouring register allocator?</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>Removes complexity from the front end</p>
</li>
<li>
<p>Hides two address operations from the front end which both
simplifies the front end and makes it more portable</p>
</li>
<li>
<p>Provides an efficient general solution</p>
</li>
<li>
<p>Not optimal for the x86 without using very sophisticated
solutions because of register pressure due to only having 6-7
usable registers, however, move coalsecing is very useful to avoid
the front end needing to know about 2 operand addressing.</p>
</li>
</ol>
</div>
<p>I chose to implement a naive register allocator first then
evaluate whether it was worthwhile to use a complex colouring
allocator. I also made design decisions heavily favouring a
colouring register allocator by ignoring the cost of moves and
registers in the front end. The rest of the code was designed to
work well with a colouring register allocator but debugged with a
very simple allocator. This meant that when writing the allocator
it was much easier to debug because the rest of the initial
compiler was written and tested. Delaying the final decision to
implement meant that when I did implement I was sure it was
necessary rather than just relying on a (correct) guess.</p>
<p>The colouring register allocator was the first, and currently
only, complex algorithm in Exupery. It was in the initial design
but not implemented until the second iteration. The first iteration
compiler was very slow, it assumed that there was a good register
allocator that would remove unnecessary moves and efficiently
allocate the remaining registers. The colouring register allocator
was needed to make Exupery faster than the interpreter.</p>
<p>The register allocator serves to hide most of the oddness of the
x86 from the compiler's front end. It cleans up the moves
introduced to fake 3 operand instructions and removes most of the
short lived temporaries used to by general code sequences in the
front end.</p>
<p>At what stage is it justified to introduce a complex register
allocator? The choice is either a single complex register allocator
or adding a lot of complexity to the front end of the compiler to
minimise the use of registers and move instructions generated.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e498" id="d0e498"></a>Design Lock
In, Assumptions Can Spread</h2>
</div>
<p>Good design is keeping each decision in its own module so that
each design decision can be changed independently. Unfortunately, a
very usefully made decision can leak as the rest of the system
starts to rely on it.</p>
<p>A separate version of the <tt class="literal">#at:</tt>
primitive is now compiled for each receiver. This enables some code
to be calcalated at compile time rather than run time. This means
that the versions of the <tt class="literal">#at:</tt> primitive
that are used for getting characters out of strings are compiled
into different methods to the versions used to get objects out of
Arrays. Squeak only uses one primitive for all common <tt class=
"literal">#at:</tt> calls.</p>
<p>Compiling each form of <tt class="literal">#at:</tt> efficiently
is fairly easy but creating a single implementation that deals with
all cases is hard. A case statement would be needed for the type of
the receiver which adds to the overhead. As the compiler inlines
calls to <tt class="literal">#at:</tt> it would mean that every
call would have code for every kind of optimised <tt class=
"literal">#at:</tt> implementation. Compiling a different method
for each receiver also allows the compiler to calculate the offset
where indexed variables start. Objects in Squeak start with some
named instance variables then may have indexed (array) instance
variables afterwards. This way a simple array can be implemented as
a single object, the named instance variables contain any
bookkeeping the class needs, and the indexed instance variables
contain the array data. It's a space optimisation from the '70s,
changing it would be a lot of work and would also make it much
harder for people to use Exupery.</p>
<p>The array <tt class="literal">#at:</tt> primitive in Squeak
looks in the class to find out how many named instance variables
the class has, looks at the object to find out how big it is, range
checks, then does the lookup. By specialising for each receiver
class we can avoid all overhead from having a variable number of
named instance variables. The primitive code can also ignore all
the other implementations of <tt class="literal">#at:</tt> in the
system because they are sent to other classes, a bytecode
implementation needs to deal with this type checking. However,
having a separate compiled representation for each receiver type
opens up the possibility of inlining self sends with no run time
overhead, because the receiver type is known at compile time. As
more code starts to rely on the receiver type being known at
compile time then this decision will slowly get locked in.</p>
<p>The register allocator has been locked in by a similar process.
The front end assumes that registers and move instructions are
free. Switching to a simplier register allocator would involve a
lot of rework for the rest of the compiler to avoid generating
wildly inefficient code.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e535" id="d0e535"></a>How Should
Compiled Code Interact with Objects?</h2>
</div>
<p>Compiled code often needs to refer to objects in garbage
collected object memory (the image). This is tricky because objects
may be moved on a full garbage collect as the garbage collector
will compact memory.</p>
<p>The current solution is naive but it works. There is an array of
objects which the VM has a pointer to. Compiled code accesses
objects indirectly through this array. This way there is only one
pointer that needs to be coordinated with the compacting garbage
collector. It does add an extra level of indirection to every
object access. These accesses happen during the method entry code
and also during an inlined primitive.<sup>[<a name="d0e542" href=
"#ftn.d0e542" id="d0e542">5</a>]</sup></p>
<p>Ideally, it would be nice to be able to hard code a pointer to
an object in the compiled code. However if there is a garbage
collect then the object may have been moved leaving a dangling
pointer, unless something is done. There are two options, either
update all the pointers to objects in compiled code, or discard all
the code in the code cache (flushing it). Code can be regenerated
dynamically, so throwing it away is a simple and viable solution to
managing dependencies.</p>
<p>If there weren't pointers from objects into the code cache then
discarding all the code would be a very simple operation.
Unfortunately <tt class="literal">CompiledContexts</tt> (stack
frames) contain pointers into the code cache which make it very
quick to perform returns. The pointer is the machine code return
address. There is also a dictionary (hash table) used to look up
compiled methods. Another option to speed up sends and returns
would be a send cache [<a href="#Deutsch-">Deutsch-</a>] which is
more complex but the contents could easily be discarded. Luckily,
Squeak objects have a few common classes encoded directly into the
header in a 5 bit field. This means that type checks will be fast
for common operations such as arrays. Having three different
encodings for classes (SmallIntegers, compact classes using the bit
field, and classes requiring a pointer to the class) is unfortunate
when implementing fast sends or general type lookups.</p>
<p>My first cut simple approach of using indirection through an
array turned out to be good enough especially with a simple
optimisation relying on the compact class encoding. This will make
it a little harder to eliminate compact classes (it could be done
in a few lines of Smalltalk). A better solution may be needed
sometime in the future but there is no urgency.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e557" id="d0e557"></a>Making
Exupery Practical, the Current Design Problem</h2>
</div>
<p>Making Exupery practical is the current major strategic problem.
The basic requirements have been met, Exupery is four times faster
for the bytecode benchmark and two times faster for the send
benchmark. Most of the implementation is solid and the last few
pieces of scaffolding (early implementations that were too naive to
be left in but allowed other parts to be built properly) have been
removed.</p>
<p>The goal is to provide a good useful speed improvement for
normal code. Fast benchmark performance is nice, but it doesn't
guarantee a useful speed improvement for general code especially
when only a few benchmarks are used. That a lot of the inner loop
methods have been rewritten in Slang<sup>[<a name="d0e564" href=
"#ftn.d0e564" id="d0e564">6</a>]</sup> or C doesn't help, because
it removes a lot of the easy methods which Exupery could have
optimised.</p>
<p>The issues stopping usefulness are bugs, driving the compiler,
and missing features. When compiling code outside of the test suite
it's still too common to run into bugs, to be generally useful it
must be trusted, preferably to compile anything at any time.
Driving dynamic inlining manually is not pleasant, you need to
specify what sends will be inlined and with what receivers. There
are also probably a few missing features, for example, blocks
(closures) might be needed, or support for more of the super
extended bytecodes (bytecodes that have an extended argument
field).</p>
<p>There are a host of small issues, including bugs when compiling
things that have never been compiled before, and small features
that will really matter if the compiler is running in a background
thread automatically compiling methods. For instance, compiled code
does not listen for interrupts, so it can not be aborted and
compiled code can not currently be single stepped.</p>
<p>Most of these issues are small, the key decision is figuring out
what needs to be done for the compiler to become practical. That's
best done incrementally by fixing the current most obviously
limiting problem until it isn't a significant problem. These kinds
of problems are best solved with plenty of feedback by fixing one
then re-evaluating.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e574" id="d0e574"></a>Inlining and
Polymorphic Inling Caching</h2>
</div>
<p><tt class="literal">Send</tt> performance is critical to high
performance Smalltalk. One current goal is improving the general
<tt class="literal">send</tt> performance. This is building towards
full dynamic inlining. Dynamic inlining is strategic, because it
changes both how the compiler is used and also the overall cost
structure. Dynamic inlining makes common sends free, this means a
lot of optimisations to work around slow sends can be removed.</p>
<p>The phases are:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>Polymorphic Inline Caches</p>
</li>
<li>
<p>Specialised Primitives</p>
</li>
<li>
<p>Inlined Specialised Primitives</p>
</li>
<li>
<p>Inlined General Methods</p>
</li>
</ol>
</div>
<p>The breakdown was to get a useful speed improvement as early as
possible and also to make the individual sections as short as
possible.</p>
<p>Polymorphic inline caches provide two things, first they provide
a fast form of <tt class="literal">send</tt>s, second they provide
a way to get dynamic type information. Dynamic type information may
be required to drive inlining<sup>[<a name="d0e606" href=
"#ftn.d0e606" id="d0e606">7</a>]</sup>.</p>
<p>Specialised primitives are compiled versions of a method
containing a primitive. They are compiled for each different
receiver type, this allows the compiler to generate custom code for
each implementor. They need PICs to make the sends fast enough to
compete with local implementations.</p>
<p>While having fast primitives with fast sends to them is nice,
it's still not as quick as a specialised version of the message
(say <tt class="literal">#at:</tt> and <tt class=
"literal">#at:put:</tt>). One way to speed them up further would be
to inline the primitives. Specialised inlined primitives are very
nice because they provide a generalised framework to speed up many
primitives to the same levels that the basic integer operations
enjoy. The type feedback means that your method dynamically gets
the right primitives inlined rather than a specific one chosen by
the VM implementor. It's also a building block towards full inlined
primitives.</p>
<p>Specialised inlined primitives was the last thing added to the
breakdown. They change the shape of the breakdown because they
remove the need for PICs to get inlined primitives working. There's
still a practical issue that non-inlined primitives will be slower,
compiling shouldn't make code slower, even temporarily (at least
until compilation is automatic). The idea of specialised inlined
primitives didn't occur to me until after I'd implemented PICs and
specialised primitives. Insight is nice but it doesn't necessarily
come at the right time.</p>
<p>It could be argued that it would have been better to go straight
for profile driven inlining rather than using PICs. With hindsight,
and original foresight, just using inlining might have been better
than beginning with PICs, because inlined primitives were needed to
regain enough <tt class="literal">#at:</tt> performance. This would
have involved finding a way to inline <tt class="literal">#at:</tt>
methods without using PICs to get type information, instrumenting a
compiled non-PICed specialised primitive might have been possible.
The problem is it would have been much slower after the first
compile but before inlining. Whether this would have been tolerable
is hard to say (without having dynamically driven inlining to
perform real empirical experiments)</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e630" id="d0e630"></a>Design as
Reading, Writing and Discussion</h2>
</div>
<p>A lot of initial design work involved reading around the field,
there's a lot to know - from modern CPU architecture to the two
main compiler approaches that it's based on (Self and classical/SSA
optimising compilers).</p>
<p>Moving into new areas also will bring on a lot of design by
reading. Adding a solid optimiser will involve hitting the books.
My current feeling is using SSA from the beginning makes sense,
it's not that much more complex (and probably simpler including a
basic optimiser) than building a traditional dataflow analyser.</p>
<p>Documentation is something that you shouldn't do too much of on
an agile project because demanding documentation increases the cost
of change. Agility is producing a low cost of change instead of
guaranteeing a perfect result first time. But some is helpful.</p>
<p>One group of documents that's always worth having is those that
pay for themselves in writing. Writing is a good way to regain the
big picture. Writing can be a good way to have a conversation
without anyone to talk to, it's a form of documentation that's more
valuable for single person projects than larger projects where
there are other people to talk to.</p>
<p>Open source projects also have interesting documentation
requirements. The teams are dispersed with different goals and
sources of funds (often people's hobby and education time). Being
developer driven, documentation often suffers (though commercial
development is often undocumented for very similar reasons). But
being made up of geographically and organisationally diverse
groups, having decent documentation is a serious asset. A key
reason is to minimise the amount of time needed to come up to speed
on a project. There are a lot more people who have a few weekends
available than a few months. But there have to be sub-projects
available that could be done by a new person in a few weekends.</p>
<p>Critical reading is essential, the only way to work effectively
is to build from experience in literature. There is the trap of
implementing something that only works on paper and not in code.
Writing is very useful for a single person project because it
replaces conversations with peers who are knowledgable about the
projects specifics.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e645" id=
"d0e645"></a>Conclusion</h2>
</div>
<p>Both theory and incremental design are invaluable tools when
writing a compiler. Incremental design provides a great way to
understand how to apply theory, and also the means to escape from
theory's traps where things work brilliantly, but only on
paper.</p>
<p>Thinking and doing at the same time, or very closely interwoven,
is a critical technique when working on software in an area with a
lot of design literature, especially when lacking first hand
experience.</p>
<p>Using a simple solution to a sub-problem at a minimum enables
everything else to be debugged separately to the complex solution.
Often the simple solution is enough. Breaking large problems down
into smaller ones that can be solved individually is vital to keep
the risks low [<a href="#Henney">Henney</a>] and also because,
often, the experience of building the component is required to
understand how to design it.</p>
<p>The major goals of the project were valuable for providing
consistency and focus, but have yet to become useful when
justifying the time spent so far because they are too far away. The
short term rewards have been enough to make it worthwhile but they
do change, at the beginning working on a hard problem really
deepens understanding, as the project progresses community
membership becomes more important.</p>
</div>
<div class="bibliography">
<div class="titlepage">
<h2><a name="d0e659" id="d0e659"></a>References</h2>
</div>
<div class="bibliomixed"><a name="May" id="May"></a>
<p class="bibliomixed">[May] May, C. <span class=
"citetitle"><i class="citetitle">Mimic: a fast System/370
simulator.</i></span>, 1987 Symposium on Interpreters and
Interpretive techniques.</p>
</div>
<div class="bibliomixed"><a name="Deutsch-" id="Deutsch-"></a>
<p class="bibliomixed">[Deutsch-] Deutsch, L. Peter, Schiffman,
Allan M. <span class="citetitle"><i class="citetitle">Efficient
Implementation of the Smalltalk-80 System</i></span>, Principles of
Programming Languages ? 1983</p>
</div>
<div class="bibliomixed"><a name="Appel" id="Appel"></a>
<p class="bibliomixed">[Appel] Appel, Andrew, <span class=
"citetitle"><i class="citetitle">Modern Compiler Implementation in
C</i></span>, 1998, Cambridge University Press</p>
</div>
<div class="bibliomixed"><a name="Holzle" id="Holzle"></a>
<p class="bibliomixed">[Holzle] Holzle, Urs, <span class=
"citetitle"><i class="citetitle">Adaptive Optimization for Self:
Reconciling high performance with Exploratory
Programming</i></span>, 1994, PhD Thesis</p>
</div>
<div class="bibliomixed"><a name="Chaitin" id="Chaitin"></a>
<p class="bibliomixed">[Chaitin] Chaitin, M.A., Register allocation
via coloring, 1981, <span class="citetitle"><i class=
"citetitle">Computer Languages</i></span> 6, pp. 47-57</p>
</div>
<div class="bibliomixed"><a name="Briggs" id="Briggs"></a>
<p class="bibliomixed">[Briggs] Briggs, P., <span class=
"citetitle"><i class="citetitle">Register Allocation via Graph
Coloring</i></span>, 1992, PhD from Rice University</p>
</div>
<div class="bibliomixed"><a name="George-" id="George-"></a>
<p class="bibliomixed">[George-] George, L. and Appel, A. W.,
<span class="citetitle"><i class="citetitle">Iterated register
coalescing</i></span>, 1996, ACM Trans. on Programming Languages
and Systems 18(3), pp. 300-324</p>
</div>
<div class="bibliomixed"><a name="Henney" id="Henney"></a>
<p class="bibliomixed">[Henney] Henney, K., <span class=
"citetitle"><i class="citetitle">Stable Intermediate Forms: A
foundation Pattern for Derisking the Process of Change</i></span>,
Overload issue 65 February 2005</p>
</div>
<div class="bibliomixed"><a name="Beck-" id="Beck-"></a>
<p class="bibliomixed">[Beck-] Beck, K. and Fowler, M.,
<span class="citetitle"><i class="citetitle">Planning Extreme
Programming</i></span>, 2001, Addison-Wesley</p>
</div>
</div>
<div class="footnotes"><br>
<hr class="c4" width="100">
<div class="footnote">
<p><sup>[<a name="ftn.d0e32" href="#d0e32" id=
"ftn.d0e32">1</a>]</sup> <a href="http://citeseer.ist.psu.edu/"
target="_top">http://citeseer.ist.psu.edu/</a></p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e303" href="#d0e303" id=
"ftn.d0e303">2</a>]</sup> Common case might just mean any code path
that's ever been called. There is a lot of code generated for cases
that are not expected to happen but theoretically might. For
eample, the integer addition code includes a full message send just
in case the arguments are not integers or the addition overflows.
Moving type checks and tagging out of the common case code and into
the return from an uncommon send is a simple and effective
optimisation, it just needs the level of abstraction.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e379" href="#d0e379" id=
"ftn.d0e379">3</a>]</sup> This is vital for hobby projecs where 2
full days in a row is rare (there's always something else to do in
a weekend) and very nice for professional work where distractions
from coding are often unavoidable (live customer problems,
meetings, whatever).</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e384" href="#d0e384" id=
"ftn.d0e384">4</a>]</sup> There are also performance tests, but
their use is separate, they do not check correctness.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e542" href="#d0e542" id=
"ftn.d0e542">5</a>]</sup> A send to some primitive operation that
isn't implemented in Smalltalk, either because it is a basic
operation or for performance.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e564" href="#d0e564" id=
"ftn.d0e564">6</a>]</sup> Slang is the language that the Squeak
interpreter is written in. It is a cross between Smalltalk's syntax
and C's semantics. One advantage of Slang is it's possible to debug
the interpreter in Squeak before compiling down to C. It also
provides a clean way of writing primitives to speed up slow loops
from code that was originally Smalltalk.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e606" href="#d0e606" id=
"ftn.d0e606">7</a>]</sup> It can also be obtained by profiling if
the profiler records a method and its receiver as well as its
sender. this may not be enough for some primitives which is
partially why I originally decided to implement PICs.</p>
</div>
</div></p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
