Archive: March 13, 2003

<<< March 12, 2003

Home

March 15, 2003 >>>


How to Write C++ Classes II

Thursday,  03/13/03  11:48 AM

A little while back I posted How to Write C++ Classes, which discussed ways to separate the interface from the implementation when building C++ and Java classes.  There is more which could be said, of course, and here is more (not to say everything that could be said :).  A brief table of contents:

These sections each illustrate that good languages like C++ and Java enable good programming style, but don't necessarily enforce it.

Local Types

Often a class may define "local types", types which pertain to the parameters and returns of the class methods.  How and where should these be defined?

Consider a class cA which provides an API for encapsulating access to something.  This class in turn uses other classes internally for implementation, which are hidden from uses of the cA class.  For example cB provides encapsulation of some internal object.  Good so far - anyone can use cA and be blissfully unaware that cB is used "under the covers".

Now say that cA provides a method to return a value, and that value is actually provided internally by cB.  Let's say cB defines an enum type for the possible values.  How and where should this type be defined?

  1. One possibility is to define an enum type tB in the cB header, and define a corresponding enum type tA in the cA header.  The cA method returns a tA, and invokes a cB method which returns a tB.  The tB is cast into the tA or converted via a switch.  The upside is that a caller only sees the cA header, and the tA type.  The downside is that the type is duplicated in two different headers, with the consequent possibility of a maintenance issue.  If a new type is supported by cB, it won't be supported automatically by cA.
  2. Possibility two is to define an enum type tB in the cB header, and #include the cB header in the cA header.  The cA method calls the cB method, and both return a tB.  This eliminates the duplication of types, but exposes the cB interface to all  users of cA.  If the cB header is changed, all cA users will require recompilation.
  3. Possibility three is to define an enum type tC in a separate header file hC.  This header is #included in both the cA and cB headers.  The cA method calls the cB method, and both return a tC.  This solves the maintainability problem of (1) and the implementation exposure of (2), at the cost of having another header file floating around.

The last possibility is strongly preferred.  It preserves the separation between interface and implementation, and has no maintenance downside.  The takeaway:

  • If two or more classes share a common "local type", define the local types in a separate header.

Constant Correctness

When designing class interfaces, it is helpful to impose "constant correctness".  This doesn't mean the code is constantly correct and has no errors :)  What it means is that the const attribute of types is used whenever appropriate in parameters and return values.  This attribute tells the compiler "this thing may not be modified".  The most common occurrence is with C strings and other buffers, normally typed as char*.  If you have a string or buffer which should not be modified, you should tell the compiler by defining it as const char*.

Here's an example:

____ OLD BAD WAY ___
char *getUserid(void);            // get/set userid value
void setUserid(char *userid);

____ NEW COOL WAY ____
const char *getUserid(void);      // get/set userid value
void setUserid(const char *userid);

This not only prevents poor coding (like modifying a caller's parameter) but it also prevents genuine errors.  Consider the following:

____ OLD BAD WAY ____
CString m_userid;

char *cA::getUserid(void)        // get userid value
{
return (char*)(const char*) m_userid;
}

____ NEW COOL WAY ____
CString m_userid;

const char *cA::getUserid(void)  // get userid value
{
return (const char*) m_userid;
}

This is a simple "glue" method which passes back a string value.  Not only is the double cast in the "old bad way" ugly, but it has a bug!  The inner cast (const char*) causes the CString object to return a pointer to its internal representation of the string.  The outer cast (char*) causes the compiler to copy the constant string from inside of the CString object to a new temporary string, so it can be modifiable.  This temporary string has local scope!  So by passing back a pointer to it, you are passing back garbage to the caller.  It might work, but it might not.  And it might work for a while, until the heap is modified, and then stop working.  Quite subtle and ugly.

You could fix this in the old bay way with an indirect cast, like this:

____ OLD NOT AS BAD BUT STILL BAD WAY ____
CString m_userid;

char *cA::getUserid(void)        // get userid value
{
const char pUserid = (const char*) m_userid;
return *(char**) &puserid;
}

This is so ugly you just know it has to be wrong (W=UH).  This doesn't have the bug of creating a temporary string, but it causes the internal objects' data to be accessible and modifiable by the caller - not good.

The const char* return type wins easily.  Not only is it bug free and clean, but by defining the method with a const return type, you are telling the caller they cannot modify the value.  And the compiler will enforce this, which is what you want.  The takeaway:

  • Use the const attribute of parameters and return values whenever appropriate.

Direct access to data

One final thought about C++ classes.  This is a bit heretical, but please bear with me.  It is standard C++ dogma that one should never ever expose class data in the public interface.  Instead, one should always provide get and put methods to access the data.  Well, yeah, but...

I claim there are times when it is better to make data members public.  There are three cases where this is justifiable and efficient:

The class is tiny and many objects are instantiated to form a larger structure.  Examples include nodes in a tree, links in a chain, or entries in a table.  In these cases the additional overhead of having "get" and "put" methods for each datum is not justifiable.  It is far easier simply to access the members.  Consider the following example, incrementing a counter in the left leaf of a tree node:

____ THE PURE WAY ____
pNode->getLeft()->putCount(pNode->getLeft()->getCount() + 1);

____ THE COOL WAY ____
pNode->pLeft->count++;

Just looking at this example, it is obvious which way is better.  W=UH and all that.

The class is used to encapsulate something inside another object.  Typically such "internal" classes are not publicly used.  The primary reason for the class is encapsulation and modularization.  In such cases there are often a lot of data inside the inner object which are used by the outer object.  Why impose the complexity and overhead of "get" and "put" methods?  Separation of interface from storage isn't important, because both objects are likely to be updated and compiled together.  The code will read better and execute faster if the outer object has direct access to the inner object's data.  You will also spent less time maintaining "glue" between the classes.

The class contains internal objects which are exposed in its interface.  Consider a class cA which contains one or more instances of another class cB.  Say that users of cA want to invoke methods on the cB objects inside it.  There are two possibilities:

____ THE PURE WAY ____
pA->createB(...);               // instantiate a cB
pA->getB()->doThing(...);       // invoke method of the CB
pA->destroyB();                 // destruct the cB

____ THE COOL WAY ____
pA->pB = new cB(...);           // instantiate a cB
pA->pB->doThing(...);           // invoke method of the cB
delete pA->pB;                  // destruct the cB

Again, just looking at the code gives you a strong clue which way is preferred.  In the former case you end up with a bunch of "glue" in cA which passes through calls to cB.  This glue adds noise but no value.  And - if cB is changed in some way, in the pure way new glue would have to be added to cA to make the changes visible, whereas in the cool way the interface to cB is directly available.  The takeaway:

  • There are cases where publicly exposing class data is better than using get and put methods.

There is even more which could be said, of course (and you know me - I'll probably end up saying it); if you have thoughts, comments, suggestions, etc.  please let me know!

 

 

Thursday,  03/13/03  10:51 PM

I had a bunch of posts queued up - on the road - finally let them escape...

Interesting - Chile has proposed a new U.N. resolution aimed at bridging the gap between the Alliance (US/UK) and the Axis (France/Russia/China).  The proposal may not be acceptable to either side, but the fact that Chile has taken it upon themselves as a conflicted and unaligned member of the U.N. Security Council to make this unsolicited proposal is interesting.  Meanwhile the Alliance are meeting in the Azores off Portugal to plot strategy; I've always wanted to visit there, but somehow I doubt they'll do much sightseeing.  If you're a student of the military preparations, please visit IraqWar.info.  Much maneuvering, and as usual Steven Den Beste is thoughtful and interesting on the countdown...

This period in world diplomacy is going to be studied by historians, politicians, and thoughtful people everywhere for many years to come.  Interesting times, indeed...

"Pentagon Threatens to Kill Independent Reporters in Iraq" - This gets my vote for the most misleading and biased news headline of the year, amid stiff competition.  The Pentagon has stated satellite uplinks in a war zone could be targeted.  Does anyone find this amazing?  Of course they'll be trying to disrupt communications during a war.  They haven't threatened to kill anyone, they've warned people so they won't get killed.  Sheesh.

There are about one billion articles about Elizabeth Smart, and I can't add anything to them, but as a parent I can only imagine how blessed her parents must feel.  Just shows that there is always hope...

Wow - I wonder if AOL is really planning an answer to Tivo?  Doesn't seem likely that Time Warner, the world's leading content company, could bring themselves to endorse this.  They'll have the same conflicts as Sony, with even more bias towards content.  But who knows...

Interesting post by Dave Hyatt about news.com's coverage of the new Mozilla 1.3 release, which revisits the 1998 browser wars...  Dave makes some good points, but he also indulges in logical fallacy; he says even though IE 4 was technically superior to Netscape 4, it didn't matter because Netscape couldn't have won anyway.  Well, maybe, but it did matter - technical superiority is why Microsoft won.  More on this later...

Hey - check out Feedster! (the RSS search engine formerly known as Roogle...)

This is interesting - mlb.com is starting to carry live baseball games, with the same kinds of blackout restrictions in effect for TV.  They use your credit card billing address to know where you live.  If you're reading this five years from 2003, you probably think this was always routine, right?  As in "oh, yeah, remember TV?"

Howard Stern is suing ABC over plans for a show called "Are You Hot?"  Yeah, right Howard, you invented the idea of objectifying women for their sexiness.

Check out the fly guy...  cool.

Finally - Happy Birthday NCSA Mosaic.  The first version of the first web browser was released ten years ago today.  Seems like only yesterday, doesn't it?  Uh, no, actually.  Seems like forever!  Who can even remember what it was like before the web?

 
 

Return to the archive.