Minimization

Consider the following class hierarchy, modeled after one found in actual code. Names have been changed to protect the innocent, but no other significant alterations to the structure have been made.

class planet
{
public:
  planet() : mass(0), albedo(0), name("unset") {}

  virtual void set_params() = 0;

  std::string name;
  float albedo;//between zero and one
  double mass;//must be positive
};

class mercury : public planet
{
  virtual void set_params()
  {
    albedo = .068f;
    mass = .055f;
    name = "Mercury";
  }
};

class earth : public planet
{
  virtual void set_params()
  {
    albedo = .306f;
    mass = 1.f;
    name = "Earth";
  }
};

####Question 1: This hierarchy has severe problems. What issues affecting correctness of the code are present?

####Answer 1:

The data members of planet are all public. The requirements on albedo and mass cannot be enforced by the class due to insufficient encapsulation [^1].
The parameter ordering in the initializer list differs from the actual order of definition in the class. Specifically, the initializer list should read name("unset"), albedo(0), mass(0).
If client code forgets to call set_params we have, e.g. an earth which doesn't behave like one. There's no reason here why that information couldn't have been passed into the constructor.
planet is intended to be used as a base class, but its destructor is both public and non-virtual. This invokes undefined behavior. If you're unfamiliar with undefined behavior, I like to think of it as something that never causes you any problems until you're in a big demo, or your software is gone without hope of retrieval. Then it causes problems. You typically want to declare the destructor of a base class as virtual, as that means that whether a pointer (or reference) to the base or derived class is used, the derived class destructor is called first, followed by the base class. [^2][^3]

Here's the usage of class planet:

enum planets { MERCURY, EARTH, /*presumably others*/ };

planet* get_planet(planets planet_id)
{
  planet* pPlanet;
  if(planet_id == MERCURY)
    planet = new mercury;
  else if(planet_id == EARTH)
    planet = new earth;
  // and so forth...
  planet->set_params();

  return planet;
}

####Exercise 1: We'll now rewrite the hierarchy (and get_planet) such that it doesn't suffer from the problems in Question 1. First, you might rationalize changing as little as possible. It's true that treading lightly in code which you're unfamiliar with is a good idea. The following attempts to do that, can you spot the error?

class planet
{
public:
  planet() : mass(0), albedo(0), name("unset")
  {
    //Call derived set_params to set values.
    set_params();
  }
  virtual ~planet() {}

  virtual void set_params() = 0;

  const std::string& get_name() const { return name; }
  float get_albedo() const { return albedo; }
  double get_mass() const { return mass; }
private:
  std::string name;
  float albedo;//between zero and one
  double mass;//must be positive
};

To its credit, the compiler will generally warn about this problem, and the linker will stop cold. The errors, while informative, could be misread. On both VC10 and g++ 4.8, the linker will complain that planet::set_params is an unresolved external symbol (i.e., not found). g++ 4.8 gives you a better message, "warning: pure virtual virtual void planet::set_params() called from constructor." You might realize that this means the compiler is trying to explicitly call planet::set_params, and not the children's overloads. On the other hand, you might just supply an implementation and expect it will never actually be called:

class planet
{
public:
  planet() : mass(0), albedo(0), name("unset")
  {
    //Call derived set_params to set values.
    set_params();
  }
  virtual void set_params()
  { mass = 0; albedo = 0; name = "unset"; }

  //snip remainder of code
};

The expectation would be wrong, however:

earth p;
std::cout << p.get_name() << std::endl;//prints "unset"
p.set_params();
std::cout << p.get_name() << std::endl;//prints "Earth"

Why didn't it work? Calling virtual functions from constructors or destructors is generally a bad idea, because the actual function called depends on which stage of construction it's in. In the previous, the call from planet's constructor goes to planet::set_params since the earth's constructor hasn't been called yet. There are ways to deal with problems like these[^4], but we can use a much more straightforward approach. There's a lot more we could do with it, this is a first step. Without further ado, a corrected implementation:

class planet
{
public:
  planet(const std::string& name, float albedo, double mass)
   : name_(name), albedo_(albedo), mass_(mass) {}
  virtual ~planet() {}

  const std::string& get_name() const { return name_; }
  float get_albedo() const { return albedo_; }
  double get_mass() const { return mass; }
private:
  std::string name_;
  float albedo_;//between zero and one
  double mass_;//must be positive
};

class mercury : public planet
{
public:
  mercury() : planet("Mercury", .068f, .055f) {}
};
class earth : public planet
{
public:
  earth() : planet("Earth", .306f, 1.f) {}
};

enum planets { MERCURY, EARTH, /*presumably others*/ };

//Here's how that hierarchy was used in our code:

planet* get_planet(planets planet_id)
{
  planet* planet;
  if(planet_id == MERCURY)
    planet = new mercury;
  else if(planet_id == EARTH)
    planet = new earth;
  // and so forth...

  return planet;
}

We'll leave it there for now, while we answer

####Question 2: Do you feel an inheritance hierarchy was appropriate here? What are cases when you should use public inheritance? ####Answer 2: An inheritance hierarchy seems like overkill. Perhaps the original developer intended to add more polymorphic behavior, but for the real code in question, that never happened. While earth IS A planet, it's not clear from the usage if that actually buys us anything. Designs which dynamic_cast frequently should be treated as suspect [^5], and given the dearth of other virtual methods [^6], that's the only way to use this hierarchy "polymorphically."

Here's a design that should work equally well, but without so much code:

class planet
{
public:
  planet(const std::string& name, float albedo, double mass)
   : name_(name), albedo_(albedo), mass_(mass) {}

  const std::string& get_name() const { return name_; }
  float get_albedo() const { return albedo_; }
  double get_mass() const { return mass; }
private:
  std::string name_;
  float albedo_;//between zero and one
  double mass_;//must be positive
};

enum planets { MERCURY, EARTH, /*presumably others*/ };

//Here's how that hierarchy was used in our code:

planet* get_planet(planets planet_id)
{
  planet* planet;
  if(planet_id == MERCURY)
    planet = new planet("Mercury", .068f, .055f);
  else if(planet_id == EARTH)
    planet = new planet("Earth", .306f, 1.f);
  // and so forth...

  return planet;
}

For completeness's sake, let's follow Scott Meyer's advice to make interfaces hard to use incorrectly [^7]:

std::unique_ptr<planet> get_planet(planets planet_id)
{
  std::unique_ptr<planet> planet;
  if(planet_id == MERCURY)
    planet.reset(new planet("Mercury", .068f, .055f));
  else if(planet_id == EARTH)
    planet.reset(new planet("Earth", .306f, 1.f));
  // and so forth...

  return planet;
}

Recall that this doesn't prevent the planet pointer from being made into a shared_ptr (as unique_ptr to shared_ptr is permitted), it just means that users will now have to try to leak the resource.

####Question 3: Consider the following:

#ifndef IMAGELOADER_HPP
#define IMAGELOADER_HPP

#define QUALITY 19

#include "deserializer.hpp"
#include <memory>
#include <string>
#include <vector>

class worker {};

class image_loader
{
public:
  image_loader(const std::string& name, int quality = QUALITY)
  {
    m_deserializer.reset(new deserializer(quality));
    m_num_workers = 5;
    m_workers.resize(m_num_workers);
    m_name = name;
  }

  // Converts from an image_loader object to the loaded image.
  // Usage:
  //
  // image_loader loader;
  // boost::shared_array<char> img = loader;
  operator std::unique_ptr<char[]>();

  int m_num_workers;
  std::string m_name;
  std::shared_ptr<deserializer> m_deserializer;
  std::vector<worker> m_workers;

};//end image_loader

#endif

It contains a member, m_num_workers which we don't want to change during the lifetime of an image_loader instance. What are a couple of ways to prevent it from changing? What are the tradeoffs for each method?

####Answer 3: The big problem with m_num_workers is that it's a mutable public data member. That means we really don't know what client code (even well-intentioned code!) might do with it. The solutions are to:

Declare m_num_workers private and provide a read-only accessor:

  int get_num_workers() const { return m_num_workers; }
private:
  int m_num_workers;

Declare m_num_workers as const:

  const int m_num_workers;

Both approaches have merits. The latter approach guarantees (modulo const_casting abuse) that the value will not change for the life of the class. However, it also disables the ability to copy or move-assign image_loaders. This may or may not be important, but should be remembered. The latter also allows client code to be unchanged (assuming it wasn't erroneously changing the value, at least). The former approach permits copying, but client code would need to be modified, and it's possible that code with private access to image_loader could still erroneously change the value.

####Exercise 2: image_loader is a class that's frequently included throughout our codebase. Because of this, we want:

this class to include very few dependencies.
to be able to work on this class without causing excessive recompilation.
the existence and use of this class to avoid surprising and unwelcome interactions with other classes.

Let's look at the problems. First, the header file includes many dependencies:

#include "deserializer.hpp"
#include <memory>
#include <string>
#include <vector>

You might wonder why it's important to reduce header dependencies. There are two main reasons: First, the cost of including a file can be significant -- in general, the entire file (and its inclusions) has to be parsed, which could add hundreds of thousands of lines of code to the file to compile. Second, and more seriously, the more files you include, the more files will need to be recompiled due to a seemingly innocuous change in one. In a large project, excessive includes can be the difference between five minute builds and hour-long builds. One good way to reduce such dependencies is to forward-declare classes. That technique switches a #include with one or more forward declarations. Here's an example:

//Header a.hpp (omitting #include guards for brevity)
class a
{ ... };

//Header b.hpp
namespace b_stuff
{
  class b
  { ... };
}//end b_stuff

If usage was

//source.cpp
#include "a.hpp"
#include "b.hpp"

class foo
{
  std::shared_ptr<a> p;
};

void bar(b_stuff::b var);

We can change it to:

//source.cpp
class a;
namespace b_stuff { class b; }

class foo
{
  std::shared_ptr<a> p;
};

void bar(b_stuff::b var);

A rule of thumb on when you can forward declare a class is when you don't need to know the class's size and you don't call any of its member functions [^8].

Of course, that doesn't really help us here. We can't forward declare items in the standard library[^9], and we construct a deserializer in the inline constructor. There are often good reasons to make methods inline[^10]. However, the most common reason that I've seen is "I had to type less this way." The laziness reason is particularly odious in this case, since it forces us to #include "deserializer.hpp. If we move the constructor to the cpp file, we can forward-declare it instead:

//header
#define QUALITY 19
struct worker {};

class deserializer;

class image_loader
{
public:
  image_loader(const std::string& name, int quality = QUALITY);

  operator std::unique_ptr<char[]>();

  int num_workers_;
  std::string name_;
  std::shared_ptr<deserializer> deserializer_;
  std::vector<worker> workers_;
};

//cpp
image_loader::image_loader(const std::string& name, int quality)
  : num_workers_(5),
    name_(name),
    deserizlizer_(new deserializer(quality)),
    workers_(num_workers_)
{}

There are a couple ways that image_loader can be changed such that it interacts more reasonably with other classes. For example, using #defines, especially in a header file (and not for an include guard) is verboten in C++. Prefer using static const int for QUALITY, lest someone's code be broken:

#include "image_loader.hpp"
static const double QUALITY = 5.;

This would give fairly inscrutable error messages, as what the compiler actually sees would be static const double 19 = 5.; [^11].

Another issue with the interface to image_loader is the use of operator std::unique_ptr<char[]>(). While not as egregious as a conversion to a built-in [^12], it's still not entirely clear what the function is doing at the point of usage. In this case, the class usage would probably be easier to understand if the function were named load, or even std::unique_ptr<char[]> operator()(), as such functors are common in C++. Since automatic conversions should be looked at with suspicion in C++ [^13], we'll prefer operator() here. There is one other problem with an automatic conversion here -- the constructor. Typical advice is that single-argument constructors should be generally be marked explicit. That is true, of course, but the advice extends to constructors where all (or all but one) of the arguments have default values. So the constructor image_loader(const std::string& name, int quality = QUALITY); could allow std::string to be implicitly converted to an image_loader, without the user's knowledge.[^14]

Finally, the public data members should be made private (and given accessor methods, if needed). So with our changes, here's the updated class:

#include <memory>
#include <string>
#include <vector>

static const int QUALITY = 19;

struct worker {};

class deserializer;

class image_loader
{
public:
  explicit image_loader(const std::string& name, int quality = QUALITY);

  std::unique_ptr<char[]> operator()();

private:
  //It's not clear if we really need accessors for these.
  // We'll make them implementation details for now.
  int num_workers_;
  std::string name_;
  std::shared_ptr<deserializer> deserializer_;
  std::vector<worker> workers_;
};

Well, this makes the class header file include less items, prevents some accidental misuse, and makes it so that fiddling with the constructor doesn't cause everyone that includes it to need to recompile. You might think we're done, but there's one other way using this class could cause unwanted conflicts with other code. Consider the cpp file:

#include "image_loader.hpp"
#include "deserializer.hpp"
#include <fstream>

char* load_image(deserializer& deserializer, std::istream& in)
{ return deserializer.read(in); }

image_loader::image_loader(const std::string& name, int quality)
  : num_workers_(5),
    name_(name),
    deserizlizer_(new deserializer(quality)),
    workers_(num_workers_)
{}

std::unique_ptr<char[]> image_loader::operator()()
{
  std::ifstream in(name_.c_str(), std::ios::binary);
  std::unique_ptr<char[]> img(loadImage(*deserializer_, in));
  //Um, assume we process the image somehow using our workers here. . .

  return img;
}

The problem there is load_image. While unlikely, it's possible that another cpp file somewhere could have a function with the same signature. If that were the case, attempting to link something with both cpp files would cause a multiply defined symbol error, even though it's perfectly clear to us which function should be called. That's because functions, by default, have external linkage[^15]. Here's a quick example of the problem, and its solutions.

//my.cpp
int clamp(int low, int val, int high)
{ return std::min(high, std::max(low, val)); }

//other.cpp
extern int clamp(int low, int val, int high); //extern unnecessary, but illustrates the point.
...
std::cout << "On a scale from 1 to 10, how good is this example?" << std::endl;
int val;
std::cin >> val;
val = clamp(1, val, 10);

If we really want to prevent other people from seeing our instance of clamp, we could either give it internal linkage by declaring it static:

//my.cpp
static int clamp(int low, int val, int high)
{ return std::min(high, std::max(low, val)); }

Another option is to make the name otherwise inaccessible using an unnamed namespace [^16]:

namespace
{
int clamp(int low, int val, int high)
{ return std::min(high, std::max(low, val)); }
}

Both of these will cause other.cpp to fail to link due to an undefined reference to clamp. We can use either technique to ensure our load_image function doesn't conflict with any others.

We've addressed all the issues raised in the exercise, but we can actually do more to improve 1 & 2. Without going into detail, here's a restructuring of image_loader using the pointer-to-implementation (PIMPL) idiom[^17]:

//image_loader.hpp
#include <memory>
#include <string>

static const int QUALITY = 19;
class image_loader_impl;

class image_loader
{
public:
  explicit image_loader(const std::string& name, int quality = QUALITY);

  std::unique_ptr<char[]> operator()();
  ~image_loader();

private:
  std::unique_ptr<image_loader_impl> impl_;
};

//image_loader.cpp
#include "image_loader.hpp"
#include <vector>

class image_loader_impl
{
  int num_workers_;
  std::string name_;
  std::shared_ptr<deserializer> deserializer_;
  std::vector<worker> workers_;
};

image_loader::image_loader(const std::string& name, int quality)
  : impl_(new image_loader_impl{5, name, new deserializer(quality), 5})
{}

static char* load_image(deserializer& deserializer, std::istream& in)
{ return deserializer.read(in); }

std::unique_ptr<char[]> image_loader::operator()()
{
  std::ifstream in(impl->name_.c_str(), std::ios::binary);
  std::unique_ptr<char[]> img(loadImage(*impl->deserializer_, in));
  //Um, assume we process the image somehow using our workers here. . .

  return img;
}

image_loader::~image_loader() {}

This is not completely idiomatic, but it accomplishes the task of reducing compilation dependencies. Now only changes to the public interface of image_loader will require recompilation of other files. There is certainly a cost to using this technique -- the code is now separated in a less-intuitive manner, each image_loader object is slightly larger, and there's extra indirection for each access of operator(). Whether this is critical or not depends on how you use it. There are many cases where the separation is much more convenient than the minor overhead.

####Summary: When designing classes, be sure to make them as user-friendly as possible. Give the classes one job, use inheritance appropriately, and take care to ensure your classes won't interact badly with other code. A little care during class creation can save you headaches down the road.

####Notes: [^1] See Effective C++ item 22 for a discussion of why data should generally be private.

[^2] See Effective C++ item 7 for why base class destructors should usually be virtual.

[^3] However, there are some cases where have a protected, non-virtual destructor is okay. See Modern C++ Design for rationale. This isn't commonly needed, but I mention it for completeness.

[^4] See Effective C++ item 9 for a more thorough description of the problem, and alternatives.

[^5] See Effective C++ item 27.

[^6] And the lack of open multi-methods in C++.

[^7] Effective C++ item 18.

[^8] Stack Overflow has a more complete set of rules.

[^9] Exceptional C++ item 26

[^10] Effective C++ item 30

[^11] That reminds me of an amusing anecdote I hope is true: http://everything2.com/title/Changing+the+value+of+5+in+FORTRAN. Also, there's a large number of people who opine that macros are evil. Avoid them.

[^12] For example, writing operator char*(); would be objectively worse -- it would allow incorrect usage like if (my_loader), which would probably be both wrong and expensive.

[^13] Exceptional C++ item 39. Let's be democratic here. We'd need to be suspicious of automatic conversions in any language with the same conversion rules as C++. ;)

[^14] More Effective C++ item 5 has a great example of mistaken usage.

[^15] A full discussion on linkage is a bit out-of-scope here, but can be illuminating. See stack overflow or even your favorite compiler's documentation of the extern keyword.

[^16] Depending on how modern your compiler is, clamp may have internal linkage in an unnamed namespace, just as it would if declared static. Unnamed namespaces and static have a tenuous relationship. In C++98, using static to declare internal linkage was declared deprecated. In C++11, it's been undeprecated since there are still occasions where static conveys meaning to the compiler which cannot be done with an unnamed namespace. Feel free to use either, but bear in mind that an unnamed namespace can be used for more than just functions and variables -- classes can also be used there.

[^17] See both Effective C++ 31 and Exceptional C++ 26-30. Note that they give conflicting advice on whether PIMPL should be the default for classes (Meyers) or not (Sutter). Herb Sutter also updated his advice on PIMPL for C++11: http://herbsutter.com/2011/11/04/gotw-100-compilation-firewalls/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Minimization

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally