C++ Performance Tweaking Tips
General Idea
It doesn't make sense to tweak every program code. Sometimes they are executed in time-uncritical sections of the program or they are executed only a few times. In these cases its nice to do some performance tweaking but it doesn't slow down the program to much if you don't. The more a certain part of code is executed, the more you will want to tweak it!
Inline Functions
Function declared as inline will be included in the calling code in compilation time. This speeds up execution because all the branching stuff doesn't have to be executed. The speedup is maxed out, when these functions are only very small, so when the execution of the function needs approximatly the same time as the branching time. Here a little example:
// in test.h class Test { public: int getSize() { return this->iSize; } private: int size; }; /* remember that inline functions must be defined in the *.h (header) file! */
Y This function will be executed aproximatly 10 times faster on a Pentium II based processor. BUT Don't write everywhere inline functions: use it only for :
- small interface functions like int getAttribute {....}; or void setVelocity(float velocity) { this->velocity = velocity; }
- time critical stuff, so said functions, that are executed very often during the game-time of orxonox like: void !WorldEntity::tick(float time) {} this function is called everytime a frame is rendered
Don't use it for functions that are normally called before and after the game time or functions that are rarely called at all. Inlining brings some problems, too. First: Inlined code doesn't have to be made inline by the compiler. Some reasons, why this could happen are: loops in the inlined code, recursive code in the inlined code and function calls in the inlined code. Private functions are inlined automaticaly.
const and const&
const reference
Given the following example:
class MyClass { public: void setPosition(Vector3 position) { this->myPosition_ = position; } Vector3 getPosition() { return this->myPosition_; } private: Vector3 myPosition_; };
And you execute the following code:
MyClass obj1, obj2; Vector3 pos(10, 20, 30); obj1->setPosition(pos); obj2->setPosition(obj1->getPosition());
In this example, a lot of overhead is executed, because Vector3 pos is not only created once, but in fact 4 times. The red parts of the code create a new instance of Vector3:
void setPosition(Vector3 position) { this->myPosition_ = position; } Vector3 getPosition() { return this->myPosition_; }
By calling 2 times setPosition and once getPosition, we create 3 additional instances of Vector3. This overhead can be reduced by using constant references:
void setPosition(const Vector3& position) { this->myPosition_ = position; } const Vector3& getPosition() { return this->myPosition_; }
The rest of the code remains the same.
const functions
Now you may think of a situation, where OtherClass has a MyClass as a membervalue:
class OtherClass { public: void setObject(const MyClass& object) { this->object_ = object; } const MyClass& getObject() { return this->object_; } private: MyClass object_; };
And you want to execute the following code:
OtherClass instance; Vector3 position = instance.getObject().getPosition();
But this doesn't work. Why? Because OtherClass returns a const reference to MyClass. This means, you can't change anything in MyClass. But getPosition() doesn't change anything? You're absolutely right, but the compiler doesn't know about that. You have to tell him by adding the const keyword to the function head as well:
const Vector3& getPosition() const { return this->myPosition_; }
And we better do the same for OtherClass::getObject as well:
const MyClass& getObject() const { return this->object_; }
Now the code from above will compile without problem.
So please remember the following rules:
- Alway use const& if you got an existing instance of the object you want to pass
- Only create a new instance in a return value if you return a temporary object
- Add const to all functions that don't change the class.
And remember: const ObjectName& might look scary, but it's your friend.
Memory Allocation and Deletion: new, delete
Creation of new objects needs very much time compared to mathematical operations. It can take from 20 to 200 times the time a normal function call costs depending of how deeply the class has been derived (measured on a pentium II computer). Try to make as few new objects you can and recycle them if possible (altough it can lead to strange code).
Given the case, that you have to create a new object, try to make it like this:
void ExampleClass::goodStyle() { Object* obj = new Object(); obj->doSomeStuf(); delete obj; }
To free the memory is very important, if the function is called multiple times! But know, that deleting uses much time again. So if there is a possibliliy of reusing the old function. In time critical parts of the code (like in-game) you can think about creating the objects in initialisation time and delete them after the time-critical part.
If you write it the following way, you don't have to delete it:
void ExampleClass::goodStyle2() { Object obj; obj.doSomeMoreStuf(); }
The difference is, that this obj will be stored as a temporary variable and deleted after the function returns! This leads to some other problems: If you want to give a reference to said object via the argument of a function to another object, never use these temporary variables.
void SomeClass::wantsObjectReference(Object* reference) { myReference_ = reference; // Store the reference for later usage } void ExampleClass::badBadBad(SomeClass* sc) { Object obj; /* this is only a local reference automatically deleted after function return */ sc->wantObjectReference(&obj); /* BAD BAD BAD BAD!!!!! */ }
The compiler will complain about such things with a message like this: "WARNING: taking address of a temporary". And Mr. compiler is absolutly right! A better way would be:
void SomeClass::wantsObjectReference(Object* reference) { myReference_ = reference; // Store the reference for later usage } void ExampleClass::badBadBad() { Object* obj = new Object*(); /* now the object is created with 'new' and won't be deleted after function return */ sc->wantObjectReference(obj); }
Of course SomeClass must now delete the object reference if it isn't needed anymore.
Redundant code
As anything which can be done fast should be done fast, we can optimise many functions which take and return values. Often code for an operator or something similar will look like that:
CVector3f operator+( CVector3f v ) { CVector3f returnVector; returnVector.m_x = m_x + v.m_x; returnVector.m_y = m_y + v.m_y; returnVector.m_z = m_z + v.m_z; return returnVector; }
Now what's wrong here is the local variable. When the temporary object is made on the first line, the constructor is called and the object initialized. But we didn't want that to happen! We assign new values anyways in the next lines. Always keep in mind the time wasted when creating objects.
The copy constructor is called again in the end of the function, as returnVector is a local variable. This just shows why this style is very bad.
A problem which lies hidden is the problem of parameter. As the parameter is a copy of the original argument, we get another useless object construction.
A better way would be:
CVector3f operator+( const CVector3f &v ) const { return CVector3f( m_x + v.m_x, m_y + v.m_y, m_z + v.m_z ) }
This implementation saves us time as 2 less copy constructors are called. In this particular problem we could save time by knowing waht happens behind the scenes.
Pointer arithmetic
Dereferencing a pointer takes some time. If you do something like this:
for( int i = 0; i < numPixels; i++ ) { rendering_context->back_buffer->surface->bits[i] = some_value; }
you're wasting much time in the loop dereferencing.
Instead you could do this:
unsigned char *back_surface_bits = rendering_context->back_buffer->surface->bits; for( int i = 0; i < numPixels; i++ ) { back_surface_bits[i] = some_value; }
or even faster:
unsigned char *back_surface_bits = rendering_context->back_buffer->surface->bits; for( int i = 0; i < numPixels; i++,back_surface_bits++ ) { *back_surface_bits = some_value; }
By leaving the brackets and going into pointer arithmetics the addresses don't have to be calculated, saving you some time to drink a cup of coffee.
Math optimisations
Bit shifting is a very fast way to perform integer math. It is limited in it's use, as you can only multiply and divide by 2, but it is very fast. Consider:
i *= 256; //i = i * 256 i = i << 8; //i = i * 256
Logically, they are the same. For this simple example, the compiler might even turn the first into the second, but as you get more complex the compiler might not be able to make the conversion. Example:
i *= 272 //i = i * 272 i = i << 8 + i << 4 //i = i * 272
One should also keep in mind, that multiplying is way slower than adding. This is why this makes sense:
a*b + a*c; a*(b+c); //This gets rid of one multiplication, with no change to the meaning of the expression
A variation on the previous item, but this time, replacing two divisions by a division and a multiplication. On every platform that I am aware of, divisions are slower than multiplications, so this rewrite will provide a speed improvement.
b/a + c/a = (1/a)*(b+c);
The last example is perhaps not so obvious, but the C++ standard requires lazy evaluation.
(a || b ) && c; c && ( a || b );
b ) has to be evaluated, whereas in the second case, it can be skipped as the entire expression can never evaluate to true anymore. |