Optimizing a C++ program
Posted 2005-03-26 12:00. Tagged c++, hack, performance, neural network.
A small case study of how to optimize a C++ program.
A while ago, Michael Strömberg presented a “Quick Java & C# Performance Comparison” on his blog. As a basis for comparison, he included the original C++ reference implementation of the algorithm, a pulse-coupled neural network. The reference program was slower than Michaels java code even on the slowest of the Java environments.
Since I believe that a reasonably sane implementation in C++ has a good chance of outperforming even a well-tuned Java implementation, I decided to take a look at the reference implementation and see if it could be made to run faster. It certainly could.
The first step was, naturally, to get the old code to compile and run
at all, and then add
-Wall to the compiler flags and fix all
warnings. This mainly meant changing e.g.
#include <iostream.h> to
#include <iostream> and adding
using std::cout; in a few files.
Project-specific code or the standard library
A common problem in old C++ code is the use of own, often rather simplistic, code for data containers, such as vectors or strings. There is an excuse for this, as the C++ language and library was standardized as late as in 1997, and compilers lagged quite long after that.
This project had its own string class (used only for a file name
construction). I removed the 150 lines of string class and changed
the 6 lines that used it to use standard string (actually, a
The main offender, however, was a matrix implemented as an array of
double. The matrix class then had an index operator that
simply returned the inner array, used like this:
int x,y; for for matrix *= factor;
This meant that the matrix class gave free access to its inner workings to its users. The first thing to do if I wanted to change anything in the matrix was to create access operators that take two arguments, to replace returning the inner array:
for for *= factor;
At first I implemented these with a double index just like before, but
I soon changed the data to just a
std::vector<double>, indexed with
(x + width * y). Then, for the methods that just operated on each
element (which used a lot of the running time), there was no need to
y indexes at all, just iterate over all the
for *i *= factor;
By value or by reference
Another common problem is passing parameters by value instead of by reference. Looking at the function declarations, the differece is minimal:
void ; // By value
void ; // By reference
The ampersand in the later declaration means that that function takes
a reference to an
Arg, rather than an actual
Arg, as argument. If
Arg is a large object, copying it might take quite a while.
Of course, when the method gets a reference to the original object
rather than a copy, any change the method makes will be to the
original object. To avoid unintended changes, we can declare the
const. To put lots of “
const” in the program also
facilitates the mechanical optimization the compiler does for you.
Some classes, notably the
std::iostream and family, even makes the
copy constructor (and the assignment operator)
private, to prohibit
copying. Allthough this is to ensure proper semantics rather than as
a mere optimization, those two often goes hand in hand.
Where to tune
Given that the program was several hundred lines of code – even after simplifying it with STL – it was not obvious where to optimize. Since I took pride in only spending a few hours on the optimization, I wanted to put my effort where it would matter the most (in real-world projects, often several magnitudes larger, having to pay the developers per hour gives a similar motivation).
Luckily, there is a tool for this: just compile the program with the
-pg flag, run the program, and then run
gprof, and you get a listing
of which methods the program spent most time in, and how many times
they were called. Then you just have to figure out either how to make
those methods run faster, or how to call them less often.
Write a comment