<<< Saturday, January 25, 2003 11:03 PM

Home

Sunday, January 26, 2003 01:02 PM >>>


How to Write C++ Classes

Sunday,  01/26/03  10:14 AM

If you're programming in 2003, you're most likely coding in either C++ or Java.  These languages are quite similar syntactically, and provide a solid, well-defined way to package functionality in objects.  They also share a common defect, they make it difficult to hide interface details from implementation details.  I have the solution - read on!

The Problem

In a typical class, you have a header file <class>.h and a code file <class>.cpp.  The header contains the class definition.  The class's public properties and methods define the interface details of the class.  This is the information users of the class need.  The class's private properties and methods define the implementation details of the class.  This information is only used in the code file, it is not needed by any user of the class.  But, because both sets of properties and methods are in the same header file, all users of the class "see" the implementation details.  This is the defect.

Before prescribing a solution, let me explain why this is a defect.  First, it is ugly, and by W=UH this means it is wrong.  Second, whenever the implementation data are changed, the header changes.  This causes all users of the class to require recompilation.  But changing the implementation should never affect any users of the class!  Third, changing the private properties and methods may change the size of the class's data.  This implementation-only change can affect users.  Consider the following example:

_____ myclass.h _____
class   myclass {
private:
        int     theInt;
public:
        myclass(void);
int     getInt(void);
void    putInt(int theInt);
        };

_____ mycaller.cpp _____
...
void    myfunc(void) {
myclass localInt;
int     stackInt;
...

This shows a simple class which encapsulates an integer, and a simple caller which instantiates it.  Let's say we change the class slightly, as follows:

_____ myclass.h _____
class   myclass {
private:
        int     theInt;
        int     counter;
public:
        myclass(void);
int     getInt(void);
void    putInt(int theInt);
        };

The new implementation datum is highlighted in green.  Even though this is only a change to the implementation, it will break the caller!  Because the size of the myclass object has changed, the location of stackInt will change.  Not only will every user be recompiled because the class header changed, but every user has to be recomplied, because the size of the object has changed.  All for an implementation-only change.  A defect indeed.

The Solution

Okay, now on to the solution.  The crux of the problem is that the implementation details (the private properties and methods) are defined in the class header.  So let's pull them out of the header, shall we?  Instead of defining private properties and methods in the class header, let's create a new internal class to "package" the implementation details.  This class will be defined in the code module for the class.  Only a pointer to the internal class will appear in the header.  Here's the class header with this organization:

_____ myclass.h _____
class   iclass;            // warn compiler iclass is a class

class   myclass {
private:
        iclass  *icp;      // internal data object pointer
public:
        myclass(void);
int     getInt(void);
void    putInt(int theInt);
        };

As you can see, there is only one private datum defined in the class header, a pointer to the internal class.  Here's the code module for the class:

_____ myclass.cpp _____
...
#include "myclass.h"       // external class definition

class   iclass  {          // internal implementation class
public:                      // everything in iclass is accessible
        int     myInt;
        };
...
myclass::myclass (void) {  // class constructor
        icp = new iclass;  // instantiate internal object
        ...
}
...
myclass::~myclass () {     // class destructor
        ...
        delete icp;        // destroy internal object
}
...
int myclass::getInt (void) {   // class method
        return (icp->theInt);  // access internal data though pointer
}
...

Note that the internal class is defined right in the code module, there is no need to define it in a separate header, because this is the only place it will be used.  If we want to add a new datum to the implementation, we just go ahead and do it in the code module:

_____ myclass.cpp _____
...
#include "myclass.h"       // external class definition
...
class   iclass  {          // internal implementation class
public:                      // everything in iclass is accessible
        int     myInt;
        int     counter;
        };

myclass::myclass (void) {  // class constructor
        icp = new iclass;  // instantiate internal object
        icp->counter = 0;  // can perform initialization here...
        ...
}
...
myclass::~myclass () {     // class destructor
        ...
        delete icp;        // destroy internal object
}
...
int myclass::getInt (void) {   // class method
        icp->counter++;        // internal-only data
        return (icp->theInt);  // access internal data though pointer
}
...

Again, the changes are highlighted in green.  Only the code module is affected; the class header did not change, and none of the users of the class have to be recompiled.  Isn't this pretty?  It must be right :)

This technique is sometimes called a "Cheshire Cat" class, after John Carolan, an early C++ pioneer.  The lone private pointer to the internal data object is called "the smile", of course...

More Advanced Example

The example above was pretty simple, but more advanced stuff can be done with an internal class.  In particular, this class can have internal-only methods.  Here's an example of this in action...  the external class remains the same as in the examples above, so all this only affects the code module.

_____ myclass.cpp _____
...
#include "myclass.h"       // external class definition

class   iclass  {          // internal implementation class
public:                      // everything in iclass is accessible
        iclass(void);      // internal class can have constructor
int     getInt(void);      // as well as other methods...

        int     myInt;
        int     counter;
        };

iclass::iclass (void) {    // internal class constructor
        counter = 0;       // perform initialization here...
}

int iclass::getInt (void) {  // internal class method
        counter++;
        return (myInt);
}

...
myclass::myclass (void) {  // class constructor
        icp = new iclass;  // instantiate internal object
        ...
}
...
myclass::~myclass () {     // class destructor
        ...
        delete icp;        // destroy internal object
}
...
int myclass::getInt (void) {   // class method
        return icp->getInt();  // access internal data though methods
}
...

In this example, a constructor for the internal class has been added, as well as a method.  Because the internal class has addressability to all class data (by definition, since the class is used to encapsulate those data), there is no reason not to use internal class methods for all "local" subroutines, instead of private methods of the external class.  As with private data, this has the dual advantages of "hiding" the implementation details, since they are not in the class header, and isolating users of the class from any changes to those details.

This organization has other benefits as well.  Suppose the class uses another class internally for implementation.  For example, say the myclass defined above instantiated a cString object.  In the standard organization, this would mean placing a #include "cString.h" in the myclass header, even though the cString object is only used internally.  This exposes any users of myclass to changes in cString!  By separating the internal data into an internal class only defined in the code module, the #include "cString.h" is only needed in the code module.  All users of myclass remain blissfully unaware that a cString is being used.  Now if cString changes only the code module for myclass would have to be recompiled, and that makes sense, because myclass uses a cString.

There you have it - a clean solution to a troublesome problem.

Caveat: Some readers have pointed out that there is overhead inherent in this technique.  For each object instantiation, there is another internal object instantiated, and the data required by each object is increased by the size of the private pointer plus the allocation overhead for another object (typically 16 bytes).  There are also extra calls to new and delete for each object allocation and destruction.  This is all true.  For tiny objects which are allocated and destroyed rapidly, the overhead may be unacceptable.  In most cases this overhead disappears into the noise, and the added elegance and maintainability are well worth it.

[ Later - for more please see How to Write C++ Classes II... ]

[ Later still - I am indebted to several readers for comments and suggestions which improved and simplified this technique, including Eric Holm, Klitos Kyriacou, and Horacio Peña.  Thanks! ]