Optimizing a C++ program

Posted 2005-03-26 12:00. Tagged , , , .

A small case study of how to optimize a C++ program.

A while ago, Michael Strömberg presented a “Quick Java & C# Performance Comparison” on his blog. As a basis for comparison, he included the original C++ reference implementation of the algorithm, a pulse-coupled neural network. The reference program was slower than Michaels java code even on the slowest of the Java environments.

Since I believe that a reasonably sane implementation in C++ has a good chance of outperforming even a well-tuned Java implementation, I decided to take a look at the reference implementation and see if it could be made to run faster. It certainly could.

Starting out

The first step was, naturally, to get the old code to compile and run at all, and then add -Wall to the compiler flags and fix all warnings. This mainly meant changing e.g. #include <iostream.h> to #include <iostream> and adding using std::cout; in a few files.

Project-specific code or the standard library

A common problem in old C++ code is the use of own, often rather simplistic, code for data containers, such as vectors or strings. There is an excuse for this, as the C++ language and library was standardized as late as in 1997, and compilers lagged quite long after that.

This project had its own string class (used only for a file name construction). I removed the 150 lines of string class and changed the 6 lines that used it to use standard string (actually, a std::ostringstream) instead.

Matrix storage

The main offender, however, was a matrix implemented as an array of arrays of double. The matrix class then had an index operator that simply returned the inner array, used like this:

int x,y;
for(x = 0; x<width; ++x) for(y = 0; y<width; ++y)
  matrix[x][y] *= factor;

This meant that the matrix class gave free access to its inner workings to its users. The first thing to do if I wanted to change anything in the matrix was to create access operators that take two arguments, to replace returning the inner array:

for(int x = 0; x<width; ++x) for(int y = 0; y<width; ++y)
  matrix(x, y) *= factor;

At first I implemented these with a double index just like before, but I soon changed the data to just a std::vector<double>, indexed with (x + width * y). Then, for the methods that just operated on each element (which used a lot of the running time), there was no need to use separate x and y indexes at all, just iterate over all the elements:

for(iterator i = begin(); i != end(); ++i)
  *i *= factor;

By value or by reference

Another common problem is passing parameters by value instead of by reference. Looking at the function declarations, the differece is minimal:

void foo(Arg a);         // By value
void foo(const Arg &a);  // By reference

The ampersand in the later declaration means that that function takes a reference to an Arg, rather than an actual Arg, as argument. If an Arg is a large object, copying it might take quite a while.

Of course, when the method gets a reference to the original object rather than a copy, any change the method makes will be to the original object. To avoid unintended changes, we can declare the argument const. To put lots of “const” in the program also facilitates the mechanical optimization the compiler does for you.

Some classes, notably the std::iostream and family, even makes the copy constructor (and the assignment operator) private, to prohibit copying. Allthough this is to ensure proper semantics rather than as a mere optimization, those two often goes hand in hand.

Where to tune

Given that the program was several hundred lines of code – even after simplifying it with STL – it was not obvious where to optimize. Since I took pride in only spending a few hours on the optimization, I wanted to put my effort where it would matter the most (in real-world projects, often several magnitudes larger, having to pay the developers per hour gives a similar motivation).

Luckily, there is a tool for this: just compile the program with the -pg flag, run the program, and then run gprof, and you get a listing of which methods the program spent most time in, and how many times they were called. Then you just have to figure out either how to make those methods run faster, or how to call them less often.


Write a comment

Basic markdown is accepted.

Your name (or pseudonym).

Not published, except as gravatar.

Your presentation / homepage (if any).