C++ OOP Performance Tips–Part 1
April 1, 2012 10 Comments
We often encounter inheritance and composition implementations that are too flexible and too generic for the problem domain. They may perform computations that are rarely or never required. In practice, it is not surprising to discover performance overhead associated with inheritance and composition. This is a tradeoff between code reuse and performance. Oftentimes, reusable code will
compute things you don’t really need in a specific scenario. Any time you call functions that do more than you really need, you will take a performance hit. This article will highlight some of these scenarios.
This is not to say that inheritance is fundamentally a performance obstacle. We must make a distinction between the overall computational cost, required cost, and computational penalty. The overall computational cost is the set of all instructions executed in a computation. The required cost is that subset of instructions whose results are necessary. This part of the computation is mandatory; computational penalty = overall computational cost – required cost. This is the part of the computation that could have been eliminated by an alternative design or implementation. (e.g inheritance hierarchy, composition layout, etc…)
So we cannot make a blanket statement that complex inheritance designs are necessarily bad, nor do they always carry a performance penalty. All we can say is that overall cost grows with the size of the derivation tree. If all those computations are valuable then it is all required cost. In practice, inheritance hierarchies are not likely to be perfect. In that case, they are likely to impose a computational penalty.
Constructors and Destructors
Following is a snippet from Visual Studio 2008 Disassembly Window for a simple Foo() function creating Point object locally on the stack and destroying it on returning.
class Point
{
public:
Point() : X(0), Y(0) { cout << "Hello!\n"; }
~Point() { cout << "Bye bye\n"; }
int X;
int Y;
};
void Foo()
{
010820B0 push ebp
010820B1 mov ebp,esp
010820B3 sub esp,8
Point obj;
010820B6 lea ecx,[obj] // Pass obj address to the constructor
010820B9 call Point::Point (10813ACh) // Call Constructor
}
010820BE lea ecx,[obj] // Pass obj address to the destructor
010820C1 call Point::~Point (10813B1h) // Call Destructor
010820C6 xor eax,eax
010820C8 mov esp,ebp
010820CA pop ebp
010820CB ret
The translated assembly instructions of each C++ statement (in black) appears directly below it. As you can see, constructors and destructors (in Red) are normal functions like any C++ function. The compiler knows exactly where and when to call them. A constructor or destructor call costs 2 assembly instructions.
The following is the code generated for an empty Point constructor, total of 8 assembly instructions.
class Point
{
public:
Point()
00E12960 push ebp
00E12961 mov ebp,esp
00E12963 push ecx
00E12964 mov dword ptr [ebp-4],ecx
{
}
00E12967 mov eax,dword ptr [this]
00E1296A mov esp,ebp
00E1296C pop ebp
00E1296D ret
The following is the code generated for an empty Point destructor, total of 7 assembly instructions.
~Point()
{
00E12580 push ebp
00E12581 mov ebp,esp
00E12583 push ecx
00E12584 mov dword ptr [ebp-4],ecx
}
00E12587 mov esp,ebp
00E12589 pop ebp
00E1258A ret
Let’s calculate a base-line for the number of assembly instructions required to construct/destruct a Point object:
- 2 instructions to call Point constructor Point::Point().
- 8 instructions to implement Point empty constructor body.
- 2 instructions to call Point destructor Point:~Point()
- 7 instructions to implement Point empty destructor body.
- Total of 2 + 8 + 2 + 7 = 19 instruction to construct/destruct a Point object!
- Enough theory, let’s get our hands dirty and write some code and figure out the effect of these extra 19 instruction.
- Consider the code snippet below (Version 0):
// Version 0
Point dummy; for (int i = 0; i < 100000000; ++i) { Point p; p.X = p.Y = i; dummy.X = p.X; dummy.Y = p.Y; }
The above code doesn’t make sense in a real-life program, and it is meaningless, but it will help illustrate our theory. We will focus only on the Point object construction/destruction. As you can see, each iteration costs 19 instruction (mentioned up) per Point construction/destruction, and the overall cost for construction/destruction in the previous for-loop is 20 multiplied by the number f iterations which is 100 million ~= 1900000000 instruction only for Point construction/destruction. Note that we didn’t calculate the instructions required for p and dummy objects members initialization because they are irrelevant to our case-study. We care only for Point construction/destruction.
A make-sense optimization for the previous for loop is to bring the Point p; line outside the for loop so that the Point object is constructed/destructed only once. In theory, the overall construction/destruction should drop from 1900000000 to 20 ! We can call this overhead: “Silent Overhead”.
The optimized for loop will look as follows (Version 1):
// Version 1
Point p; Point dummy; for (int i = 0; i < 100000000; ++i) { p.X = p.Y = i; dummy.X = p.X; dummy.Y = p.Y; }
Results
We chose number of iterations to be a rather very high 100000000 iteration to highlight the performance drop. Our experimental computer is equipped with a very fast Intel Core i7 processor with 8 logical processors. And 1 million iteration is too small for it, the i7 was able to execute 1 million iteration in 2 milliseconds in average !! That’s why we chose a very high number of iterations to scale this difference.
For Version 0 and Version 1, We ran the for-loop for 100 million iteration and take the average running time for 100 sample. The optimized Version 1 is approximately 3 times faster than Version 0 .
From the previous experiment, It is obvious that object construction/destruction can lead to a big drop in performance if they are called unnecessarily as in code snippet Version 0.
Key Points
- Watch out for the combinatorial explosion of created objects in complex hierarchies. The construction (destruction) of an object triggers recursive construction (destruction) of parent and member objects.
- Defer object construction (i.e local variable declaration, dynamic object allocation) to the scope in which it is manipulated. The object life cycle is not free of cost. At the very least, construction and destruction of an object may consume CPU cycles. Don’t create an object unless you are going to use it.
- Use the object constructor initialization list to complete the member object creation. Because compilers must initialize contained member objects prior to entering the constructor body. This will save the overhead of calling the assignment operator later in the constructor body. In some cases, it will also avoid the generation of temporary objects.
Do/ Don’t Do
Do this:
void Foo(int n) { if (n == 0) return; else { HeavyClass heavyObj; heavyObj.DoWork(); } }Don’t do this:
void Foo(int n) { HeavyClass heavyObj; if (n == 0) return; else { heavyObj.DoWork(); } }
Do this:
class Person { public: Person(const char* name) : Name(name) {} string Name; };Don’t do this:
class Person { public: Person(const char* name) /* Name constructor called */ { Name = name; } string Name; };
Reference
- Efficient C++ Performance Programming Techniques book By Dov Bulka, David Mayhew