How to Write C++ Classes II

Thursday, March 13, 2003 10:51 PM >>>

How to Write C++ Classes II

A little while back I posted How to Write C++ Classes, which discussed ways to separate the interface from the implementation when building C++ and Java classes. There is more which could be said, of course, and here is more (not to say everything that could be said :). A brief table of contents:

These sections each illustrate that good languages like C++ and Java enable good programming style, but don't necessarily enforce it.

Local Types

Often a class may define "local types", types which pertain to the parameters and returns of the class methods. How and where should these be defined?

Consider a class cA which provides an API for encapsulating access to something. This class in turn uses other classes internally for implementation, which are hidden from uses of the cA class. For example cB provides encapsulation of some internal object. Good so far - anyone can use cA and be blissfully unaware that cB is used "under the covers".

Now say that cA provides a method to return a value, and that value is actually provided internally by cB. Let's say cB defines an enum type for the possible values. How and where should this type be defined?

One possibility is to define an enum type tB in the cB header, and define a corresponding enum type tA in the cA header. The cA method returns a tA, and invokes a cB method which returns a tB. The tB is cast into the tA or converted via a switch. The upside is that a caller only sees the cA header, and the tA type. The downside is that the type is duplicated in two different headers, with the consequent possibility of a maintenance issue. If a new type is supported by cB, it won't be supported automatically by cA.
Possibility two is to define an enum type tB in the cB header, and #include the cB header in the cA header. The cA method calls the cB method, and both return a tB. This eliminates the duplication of types, but exposes the cB interface to all users of cA. If the cB header is changed, all cA users will require recompilation.
Possibility three is to define an enum type tC in a separate header file hC. This header is #included in both the cA and cB headers. The cA method calls the cB method, and both return a tC. This solves the maintainability problem of (1) and the implementation exposure of (2), at the cost of having another header file floating around.

The last possibility is strongly preferred. It preserves the separation between interface and implementation, and has no maintenance downside. The takeaway:

If two or more classes share a common "local type", define the local types in a separate header.

Constant Correctness

When designing class interfaces, it is helpful to impose "constant correctness". This doesn't mean the code is constantly correct and has no errors :) What it means is that the const attribute of types is used whenever appropriate in parameters and return values. This attribute tells the compiler "this thing may not be modified". The most common occurrence is with C strings and other buffers, normally typed as char*. If you have a string or buffer which should not be modified, you should tell the compiler by defining it as const char*.

Here's an example:

____ OLD BAD WAY ___
char *getUserid(void); // get/set userid value
void setUserid(char *userid);

____ NEW COOL WAY ____
const char *getUserid(void); // get/set userid value
void setUserid(const char *userid);

This not only prevents poor coding (like modifying a caller's parameter) but it also prevents genuine errors. Consider the following:

____ OLD BAD WAY ____
CString m_userid;

char *cA::getUserid(void) // get userid value
{
return (char*)(const char*) m_userid;
}

____ NEW COOL WAY ____
CString m_userid;

const char *cA::getUserid(void) // get userid value
{
return (const char*) m_userid;
}

This is a simple "glue" method which passes back a string value. Not only is the double cast in the "old bad way" ugly, but it has a bug! The inner cast (const char*) causes the CString object to return a pointer to its internal representation of the string. The outer cast (char*) causes the compiler to copy the constant string from inside of the CString object to a new temporary string, so it can be modifiable. This temporary string has local scope! So by passing back a pointer to it, you are passing back garbage to the caller. It might work, but it might not. And it might work for a while, until the heap is modified, and then stop working. Quite subtle and ugly.

You could fix this in the old bay way with an indirect cast, like this:

____ OLD NOT AS BAD BUT STILL BAD WAY ____
CString m_userid;

char *cA::getUserid(void) // get userid value
{
const char pUserid = (const char*) m_userid;
return *(char**) &puserid;
}

This is so ugly you just know it has to be wrong (W=UH). This doesn't have the bug of creating a temporary string, but it causes the internal objects' data to be accessible and modifiable by the caller - not good.

The const char* return type wins easily. Not only is it bug free and clean, but by defining the method with a const return type, you are telling the caller they cannot modify the value. And the compiler will enforce this, which is what you want. The takeaway:

Use the const attribute of parameters and return values whenever appropriate.

Direct access to data

One final thought about C++ classes. This is a bit heretical, but please bear with me. It is standard C++ dogma that one should never ever expose class data in the public interface. Instead, one should always provide get and put methods to access the data. Well, yeah, but...

I claim there are times when it is better to make data members public. There are three cases where this is justifiable and efficient:

The class is tiny and many objects are instantiated to form a larger structure. Examples include nodes in a tree, links in a chain, or entries in a table. In these cases the additional overhead of having "get" and "put" methods for each datum is not justifiable. It is far easier simply to access the members. Consider the following example, incrementing a counter in the left leaf of a tree node:

____ THE PURE WAY ____
pNode->getLeft()->putCount(pNode->getLeft()->getCount() + 1);

____ THE COOL WAY ____
pNode->pLeft->count++;

Just looking at this example, it is obvious which way is better. W=UH and all that.

The class is used to encapsulate something inside another object. Typically such "internal" classes are not publicly used. The primary reason for the class is encapsulation and modularization. In such cases there are often a lot of data inside the inner object which are used by the outer object. Why impose the complexity and overhead of "get" and "put" methods? Separation of interface from storage isn't important, because both objects are likely to be updated and compiled together. The code will read better and execute faster if the outer object has direct access to the inner object's data. You will also spent less time maintaining "glue" between the classes.

The class contains internal objects which are exposed in its interface. Consider a class cA which contains one or more instances of another class cB. Say that users of cA want to invoke methods on the cB objects inside it. There are two possibilities:

____ THE PURE WAY ____
pA->createB(...);               // instantiate a cB
pA->getB()->doThing(...);       // invoke method of the CB
pA->destroyB();                 // destruct the cB

____ THE COOL WAY ____
pA->pB = new cB(...);           // instantiate a cB
pA->pB->doThing(...);           // invoke method of the cB
delete pA->pB;                  // destruct the cB

Again, just looking at the code gives you a strong clue which way is preferred. In the former case you end up with a bunch of "glue" in cA which passes through calls to cB. This glue adds noise but no value. And - if cB is changed in some way, in the pure way new glue would have to be added to cA to make the changes visible, whereas in the cool way the interface to cB is directly available. The takeaway:

There are cases where publicly exposing class data is better than using get and put methods.

There is even more which could be said, of course (and you know me - I'll probably end up saying it); if you have thoughts, comments, suggestions, etc. please let me know!