Journal Articles

Overload Journal #34 - Oct 1999 + Design of applications and programs
Browse in : All > Journals > Overload > 34 (11)
All > Topics > Design (236)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: When's an object not an object?

Author: Administrator

Date: 26 October 1999 17:50:55 +01:00 or Tue, 26 October 1999 17:50:55 +01:00

Summary: 

Body: 

When is an object not an object? When it's a value! What's the difference in the way that a std::string and std::cout are treated? A string is treated as a value, and cout is treated as an object. What's the difference?

Consider how ints are typically used in C++. You declare them, you assign them, combine them with operations and convert them to and from other types of value (string, double, etc). You don't think of int i = 10 and int j = 10 as being the same thing; they merely have the same value or contents. You don't worry about temporaries being created and destroyed when you write 2 + 3 and neither do you tend to pass ints by const reference. You may write conversion operators to and from other classes using non-explicit single argument constructors and operator int(). For your own value based classes you will implement the copy constructor and the assignment operator so that you can use your class with the STL. You will also tend to implement equality testing and the comparison operators. You will be unlikely to inherit from the class so none of the methods would be virtual. Also you don't worry about memory leaks when using such a class (assuming it's been written cleanly) as you normally declare variables on the stack or use them in containers. Neither do you allocate them dynamically or use smart pointers to manage their lifetimes.

In comparison, consider how a window class is typically used. You don't go around copying windows or comparing their contents, so you'll declare the copy constructor, the assignment operator and the equality operator private to prevent their use. You use operations to change a window but you still consider it to be the same window. You don't pass them by value, you pass them by non-const reference instead. You expect to derive other classes from the window so you make all of the public functions virtual including the destructor so that you can use polymorphism. Often you will allocate a window dynamically using new and possibly use a smart pointer to prevent memory leaks, or rely upon the good services of a garbage collector to tidy up for you afterwards. You don't create expressions using windows or write operators that return temporary windows. Any operators you define will be of the compound type that modify the left-hand operand rather than returning a temporary (operator+= instead of operator+). You won't define any conversion operators and single argument constructors will be labelled as explicit.

What is the fundamental difference between these two styles of usage? The first (string, int) has value semantics and the second (cout, window) have reference semantics. Let's take a look at how a number of languages deal with this issue. C++ is value based by default (because of the presence of a default copy constructor and assignment operator) but supports reference semantics easily. Java is purely reference based (all variables are references and the only way of creating an object is new) and has to work hard to handle values. Eiffel is reference based by default but has expanded types for values. Smalltalk is purely reference based and makes no attempt at values, and functional languages such as ML are purely value based as they have no assignment.

How does this affect the programming environment? Reference based languages need garbage collectors. Most garbage in reference based languages is generated by trying to simulate values. This is why generational garbage collectors are so effective. Values tend to be short lived and so are cleared up soon, whereas reference based objects are long lived and so survive many generations. In value based languages the values are usually stack based or compiler generated temporaries and so no garbage collector is necessary. Look at the effort required in reference based languages to support value based programming: clone and deep_clone operations and equals and deep_equals methods are often provided by a single rooted object hierarchy, garbage collection is needed, and certain arithmetic value types are built-in for efficiency. In contrast there is very little machinery needed to support reference based programming in value based languages: all that is needed is pointers and the ability to redefine copy construction and assignment.

How does this affect our perception of programming? Values are simple things and tend to be small (ints, strings, points). Reference objects are often large (cf cout, windows, databases) and contain state. When you change the state of a window you tend to do it incrementally and you don't think of it being a different window, whereas when you change a string it is completely changed with no memory of its former contents.

The clean separation of value types and reference types can be blurred by having different semantics for different interfaces implemented by the same object. For instance, consider a persistent string implemented as a subclass of string (containing no virtual methods) and a subclass of an abstract base class called PersistentObject (containing only virtual methods).

Implementation consequences and guidelines

How does this distinction help us when programming? Deciding whether we are producing a value type or a reference type tells us what we should and should not include in our class. Here are some heuristics. They are not hard and fast rules. They are however designed to stimulate discussion.

  • Don't mix reference semantics and value semantics in the same interface.

  • Objects with multiple interfaces may use different semantics for each one.

  • If object copying is disallowed then declare the copy constructor and assignment operators as private, and make all methods virtual.

  • If object copying is allowed then define the copy constructor and assignment operators, and make no methods virtual.

  • Don't create new value types using inheritance, unless adding a new interface to an existing type.

  • Use value types by naming their class directly.

  • Use reference types only via an abstract interface (use by type).

  • Operators on reference types should be compound operators, whereas operators on values can be either compound or simple value returning ones.

  • Don't allocate values dynamically; create anonymous temporaries instead.

  • Use dynamically allocated objects only in conjunction with smart pointers.

  • Use values with smart pointers only if the pointers implement copy on write to preserve value semantics.

  • Reduce the number of outgoing dependencies of value types, preferably to zero, to aid in reuse.

  • Don't write equality or comparison operators for reference types.

  • Don't take the address of a value.

  • Don't write conversion operators for reference types.

  • Pass reference types by non-const reference or pointer and value types by value or const reference.

  • Return reference types by reference and values by value from functions.

  • Don't share values; copy them instead to preserve semantics.

  • Make single argument constructors explicit only for reference types.

  • Conversion operators for value types should return by value or const reference.

  • On UML diagrams use aggregation (open diamond) for reference types and composition (solid diamond) for value types.

Notes: 

More fields may be available via dynamicdata ..