Journal Articles

CVu Journal Vol 13, #5 - Oct 2001 + ACCU Topics

Browse in :

All > Journals > CVu > 135 (7)
All > Topics (2)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Space Optimisation under Palm OS

Author: Administrator

Date: 03 October 2001 13:15:47 +01:00 or Wed, 03 October 2001 13:15:47 +01:00

Summary:

Body:

Introduction

Over the last year I have started working with the Palm Operating System. For anyone not familiar with them, Palm devices are small handhelds without a keyboard, but with a little touch sensitive screen.

Developing on this platform is an interesting challenge as contemporary Palm devices ship with between 2MB and 8MB of memory in total - and no hard disk! Because of this, users expect most Palm applications to be less than 100K in size and they are usually not disappointed. Along the path of my Palm apprenticeship, trying to pack lots of functionality into as small a memory footprint as possible, I have picked up a few tips and tricks. I would like to share these with you.

I use GCC 2.91 for Palm OS [GCC] and offer no guarantee that my experience of this compiler suite on this platform will generalise elsewhere. However, an article that was entirely Palm-specific would only be of interest to a small proportion of ACCU members, so I have endeavoured to at least provide food for thought for anyone thinking about space optimisation.

The rules of optimisation

The rules of optimisation are usually stated as [Jackson]:

Don't do it.
(For experts only) don't do it yet.

However, there are two caveats to the rule:

You should always avoid pessimisations [Sutter], coding techniques that make your code slower but have little other benefit. A lot of what I discuss here come under this category - just common sense coding techniques that should "flow from your fingertips" whenever you are coding for size.
Part of the reasoning for the "(For experts)" clause is that measuring the effects of optimisation can be quite involved. This is not so for size optimisation - one build and your executable size tells you all you need to know.

You don't have to be an expert to do some space optimisation.

First try: use C

I was aware of the special space considerations almost from the start of my attempts to program the Palm, because the official documentation [PalmOS] is rather insistent upon this point. I started off using C, my reasoning being that, as a simpler language than C++, it was bound to create smaller code, right? (As I will show later on, this was both right and wrong.)

I had some success with C, but I found some of its limitations annoying because essentially I am an object-oriented programmer at heart.

I found myself writing code like this:

/* This struct is private! Do not access the
 * fields directly - use the DocX functions 
 * instead. */
struct Doc
{
  This* thisP;
  That* thatP;
};

Clearly, I am addicted to data hiding, but I also missed other C++ features like interface inheritance and inline functions. So, I started tentatively experimenting with using GCC's C++ compiler, G++.

First steps with G++

Once I had fixed all the problems with my C code that prevented it being valid C++, chiefly putting in casts to replace all implicit conversions from void*, I had a bit of a shock: compiling exactly the same code, G++ produced an executable that was just short of 20K bigger than GCC. At that time, that meant the program I was working on increased in size by 50%, which was unacceptable.

At this point I contemplated sticking with C, but my structure comments were starting to really grate, so I persevered. I started to think about why the executable size had grown.

Learning to live without

The general philosophy of C++ is "what you don't use, you don't pay for" [Stroustrup]. However, there are at least two features of the language that, even if they don't break that rule in theory, certainly make it hard for compiler and linker writers to stick to it. These are:

Run-time type identification (RTTI).
Exception handling.

The reason that both of these features are problematic in this respect is the way they flow across translation units^[1]. The compiler must generate RTTI information for any defined type with external linkage, (well actually, no, only for polymorphic types - ones with virtual member functions, however you might, with a lazy compiler, get RTTI infra structure - Francis) for even if the translation unit that contains the type never uses this information, another translation unit may require it.

The reasoning is similar for exception handling: even a translation unit that never catches or throws an exception may, provided it makes calls outside the translation unit, be required to propagate an exception thrown lower down the call stack. So, the compiler must generate code for this eventuality.

So, we end up with a situation where even C-compatible code contains RTTI and exception handling logic (well only the latter - Francis). Moreover, any part of the run-time library required to support these features will also be handed to the linker.

Ideally, we'd have a system where the linker would realise that, even though RTTI and exception handling code was present, it was never used so could be safely removed. Unfortunately, this is asking quite a lot of your average linker. It is certainly not what the GCC linker does.

Once I'd worked all this out, I discovered the G++ options -fno-rtti and -fno-exceptions, which turn off both of these features. Using these two switches meant that G++ compiled the code to exactly the same size executable as the C compiler.

Of course you might be wondering, why bother removing the code to support these features if you're going to use them anyway? Well, to take the easy case first, I hardly ever use RTTI, so I can easily do without it. Living without exception handling is more problematic - say goodbye to the most of the STL, conforming new behaviour, and more, but if you can do it in C, you can do it in C++. In my case, the 20K saving was a sufficient incentive to do without exception handling, but your mileage may vary. The deficit would often be less than 20K in practice because as you start to actually use exception handling, most of your C-style error handling code can be removed.

As I started to move my code base away from pure C, and added some non-trivial classes, I needed a non-exception-throwing way for a constructor to signal failure. If there is an elegant way of doing this, I have not found it. Two options I considered are:

Two phase construction. A constructor that is always guaranteed to succeed and another member function, called something like Init(), that does any more error-prone construction work, indicating in its return value whether it succeeded or not.
Pass an output parameter to the constructor, which it can use to signal a failure.

I plumped for option 2 in the end, so I had something like this:

Doc::Doc(bool& success) :
  m_resourceP(AcquireResource())
{
  if(!m_resourceP)success = false;
}

Doc::~Doc()
{
  if(m_resourceP) FreeResource(m_resourceP);
}

which was called by code a bit like this:

bool success = true;
Doc doc(dbP, success);
if(success)
{
// Doc constructed OK - we can use it.
}

Note that the caller sets success to true initially, rather than the constructor. The benefit of this is that the top-level constructor can call other constructors (e.g. for data members or base classes) and any failure is reported back to the caller. If I'd taken the other approach and had the constructor set success to true initially, cascading construction would have been more complex: a constructor could end up setting success to true after another constructor had already failed and set it to false.

Avoiding library calls

Part of the reason that RTTI and exception handling increased the executable size was that they caused parts of the run-time library to be linked in. On a device like the Palm, the C++ run-time library is an expensive luxury and I avoid using it entirely. This is not such a hardship as you might first imagine, as, for the most basic calls for such things as string and memory handling, there are OS supplied equivalents. Calling an OS function costs very little in terms of code size, because the function is implemented in ROM. Even on non-ROM based OSs such as most Windows, OS function calls don't usually cost much, since the shared libraries for the most basic calls - GDI32.DLL, USER32.DLL and KERNEL32.DLL, in the case of Windows - will already be loaded anyway^[2].

Where there is an exact mapping between OS and standard library function, it is possible to create a function with the same name as the standard library function and use it to call the OS function. This "interposition" gives you the best of both worlds. However, be careful to ensure you always include the "translation" header before you call any of the functions within - at best failure to do so will cause a compile error, at worst it will just silently bring in part of the standard library and bloat your code - because certain standard library function calls don't require their standard library header to be included in order for the compiler to accept them. If you make several changes at once, you may have trouble tracking the culprit down, although the -nostdlib option available in later GCC versions can help.

One example where interposition is commonly used, and even explicitly sanctioned by the C++ language standard, involves the free store functions - operator new, operator delete and their array-handling chums. Their run-time library implementation brings in substantial parts of the run-time library to the link. This, plus their key role in most C++ systems, makes them ideal candidates for replacement.

Initially I tried implementing the free store operators as static member functions of only the classes I knew might be allocated on the free store. However, this ended up leading to a lot of duplicate code, so in the end I used "interposition" as described above and implemented the global versions of the operators.

First, create a header file called something like FreeStore.h, containing prototypes for the free store functions you will be replacing. Make sure you include this in every source file that makes use of these functions. Don't forget one!

Then create a source file that implements the functions using only OS-supplied memory management functions^[3]. If, like me, you turned off exception handling, your new functions should return NULL in the case of failure. Luckily, this is how most OS memory allocation functions, including Palm's MemPtrNew, behave.

One thing to watch out for is that operator delete is guaranteed to take no action when passed a NULL pointer, whereas the closest OS equivalent may not make this promise. The documentation for MemPtrFree, the Palm OS deallocation function, doesn't, so it is safest to write operator delete to include the NULL test, like so:

void operator delete(void* ptr){
  if(ptr) MemPtrFree(ptr);
}

Inline functions

A simplistic view of inline functions is that they are a straight trade-off to achieve faster code at the expense of an increase in code size. For very simple functions, this is not so - making them inline both increases speed and reduces code size, because it removes the function call overhead. Unless you wish to measure the impact of every single inlining decision (certainly ideal, but also rather tedious), I recommend the following rule of thumb when optimising for space:

"If a function is getting or setting a single variable OR calling a single function, inline it. Otherwise, don't."

Watch for "hidden" function calls such as copy constructors and assignment operators. And be aware that, like all rules of thumb, there will be exceptions.

Polymorphic hierarchies the small way

C++'s support for polymorphism is often used to provide multiple concrete implementation classes of a single interface class (also known as an abstract base class or ABC). Interface classes contain no code, they are used purely to provide a set of member function signatures, called pure virtual functions.

Contrary to common expectation, calls to pure virtual functions can sometimes occur - in the twilight zone when an object is partially constructed. The compiler must emit code that somehow copes with this possibility. With the version of GCC I was using this led to a 6K increase in code size, just for a single pure virtual function. Why is this?

The C++ standard just says that calling a pure virtual function leads to undefined behaviour, leaving it completely up to compiler vendors as to what should happen. As is suggested by the "= 0" syntax, pure virtual functions could perhaps be implemented as a NULL function pointer. Indeed, at least one compiler uses just this strategy [Vollmann]. The downside here is that if the call is not spotted by the compiler, it will lead to a general exception or crash. This is perfectly valid "undefined behaviour", but it isn't terribly helpful for someone trying to work out why some code isn't working.

Another common strategy - adopted by both GCC and Microsoft's Visual C++ compiler, is to implement pure virtual functions as calls to a run-time library function. GCC's is called __pure_virtual. Testing __pure_virtual() showed it getting stuck in an infinite recursion, which is probably a bug. Visual C++'s similar system displays an error message and terminates the application, which I guess is what GCC's authors also intended. The take home point is that this means that having any pure virtual functions in your code causes parts of the run-time library to be brought in - __pure_virtual() plus all the gubbins it requires to report the error. This is the cause of the 6K overhead that was observed.

To minimise code size, this overhead must be avoided. Therefore it is necessary to eschew true pure virtual functions.

I adopted a minimalist technique to achieve this aim. First, I provided an empty macro, PURE, solely to make the code to some degree self-documenting:

#define PURE

So, when I wanted a pure function what I wrote it like so:

virtual int foo() PURE;

And created a stub implementation of the function. This was the smallest possible stub that the compiler would accept plus a call to my own PureCall() function e.g.

/*virtual*/ int SomeClass::foo()
{ PureCall(); return 0; }

I wrote PureCall() so it just uses OS functionality to generate a fatal run-time error diagnostic^[4].

Of course, you can do without PureCall(), but then you run the risk of silent run-time errors in your code. For the best of both worlds, it would be possible to keep PureCall() as it is for debug versions of the code, and to make it an empty inline function in release versions.

Finally

That concludes our amble through Palm OS space optimisation. I hope you found it useful, even if you do not program a Palm device. I would welcome your feedback on the article, and can be contacted at <glancaster@codemill.net>. Happy optimising!

Acknowledgements

Many thanks to Detlef Vollman, for thoroughly reviewing this article and providing much additional insight, and also to Lois Goldthwaite for her comments on an earlier draft.

References

[GCC] GCC, the GNU Compiler Collection. Free Software Foundation. http://www.gnu.org/software/gcc/gcc.html

[Jackson] Jackson, Michael A., Principles of Program Design, Academic Press, London and New York, 1975.

[Sutter] Herb Sutter. Talk at ACCU Spring Conference 2001.

[PalmOS] Palm OS® Programmer's API Reference and Palm OS® Programmer's Companion, Palm Inc. www.palmos.com/dev/tech/docs/.

[Stroustrup] Bjarne Stroustrup, The Design and Evolution of C++, p. 121, Addison Wesley, 1994.

[Vollmann] Detlef Vollmann, e-mail communication. The compiler is/was Borland C++ for OS/2.

^[1] A translation unit is a source file, plus all the headers #included in that source file.

^[2] Occasionally, using a shared C++ run-time library has no extra overhead - if you can guarantee that when your program is loaded, another program is already using it.

^[3] Bear in mind that this straightforward implementation doesn't call the current, or indeed, any, new_handler.

^[4] Another more transparent approach to avoiding the pure virtual function overhead and other run-time library overhead issues is to write a run-time library replacement, containing low overhead implementations of only those functions you need, including __pure_virtual().

Notes:

More fields may be available via dynamicdata ..