Version 7 (modified by bknecht, 17 years ago) (diff) |
---|
C++ Performance Tweaking Tips
General Idea
It doesn't make sense to tweak every program code. Sometimes they are executed in time-uncritical sections of the program or they are executed only a few times. In these cases its nice to do some performance tweaking but it doesn't slow down the program to much if you don't. The more a certain part of code is executed, the more you will want to tweak it'''
Inline Functions
Function declared as inline will be included in the calling code in compilation time. This speeds up execution because all the branching stuff doesn't have to be executed. The speedup is maxed out, when these functions are only very small, so when the execution of the function needs approximatly the same time as the branching time. Here a little example:
// in test.h class Test { public: int getSize() { return this->iSize; } private: int size; }; /* remember that inline functions must be defined in the *.h (header) file! */
Y This function will be executed aproximatly 10 times faster on a Pentium II based processor. BUT Don't write everywhere inline functions: use it only for :
- small interface functions like int getAttribute {....}; or void setVelocity(float velocity) { this->velocity = velocity; }
- time critical stuff, so said functions, that are executed very often during the game-time of orxonox like: void !WorldEntity::tick(float time) {} this function is called everytime a frame is rendered
Don't use it for functions that are normally called before and after the game time or functions that are rarely called at all. Inlining brings some problems, too. First: Inlined code doesn't have to be made inline by the compiler. Some reasons, why this could happen are: loops in the inlined code, recursive code in the inlined code and function calls in the inlined code. Private functions are inlined automaticaly.
Memory Allocation and Deletion: new, delete
Creation of new objects needs very much time compared to mathematical operations. It can take from 20 to 200 times the time a normal function call costs depending of how deeply the class has been derived (measured on a pentium II computer). Try to make as few new objects you can and recycle them if possible (altough it can lead to strange code).[br] Given the case, that you have to create a new object, try to make it like this:
void ExampleClass::goodStyle() { Object* obj = new Object(); obj->doSomeStuf(); delete obj; }
To free the memory is very important, if the function is called multiple times! But know, that deleting uses much time again. So if there is a possibliliy of reusing the old function. In time critical parts of the code (like in-game) you can think about creating the objects in initialisation time and delete them after the time-critical part.[br] If you write it the following way, you don't have to delete it:
void ExampleClass::goodStyle2() { Object obj; obj.doSomeMoreStuf(); }
The difference is, that this obj will be stored as a temporary variable and deleted after the function returns! This leads to some other problems: If you want to give a reference to said object via the argument of a function to another object, never use these temporary variables.
void SomeClass::wantsObjectReference(Object* reference) { /* do something with the reference */ } void ExampleClass::badBadBad() { Object obj; /* this is only a local reference automatically deleted after function return */ SomeClass* sc = new SomeClass(); /* creation of a new object needs much time, avoid it if possible - here we need it */ sc->wantObjectReference(&obj); /* BAD BAD BAD BAD!!!!! */ delete sc; }
The compiler will complain about such things with a message like this: "WARNING: taking address of a temporary". And Mr. compiler is absolutly right! A better way would be:
void SomeClass::wantsObjectReference(Object* reference) { /* do something with the reference */ } void ExampleClass::badBadBad() { Object* obj = new Object*(); /* this is only a local reference! automatically deleted after function return */ SomeClass* sc = new SomeClass(); /* creation of a new object needs much time, avoid it if possible - here we need it */ sc->wantObjectReference(obj); delete sc; /* remember that creating and deleting object need VERY much time*/ }
Redundant code
As anything which can be done fast should be done fast, we can optimise many functions which take and return values. Often code for an operator or something similar will look like that:
CVector3f operator+( CVector3f v ) { CVector3f returnVector; returnVector.m_x = m_x + v.m_x; returnVector.m_y = m_y + v.m_y; returnVector.m_z = m_z + v.m_z; return returnVector; }
Now what's wrong here is the local variable. When the temporary object is made on the first line, the constructor is called and the object initialized. But we didn't want that to happen! We assign new values anyways in the next lines. Always keep in mind the time wasted when creating objects. [br] The copy constructor is called again in the end of the function, as returnVector is a local variable. This just shows why this style is very bad. [br] A problem which lies hidden is the problem of parameter. As the parameter is a copy of the original argument, we get another useless object construction. [br] A better way would be:
CVector3f operator+( const CVector3f &v ) const { return CVector3f( m_x + v.m_x, m_y + v.m_y, m_z + v.m_z ) }
This implementation saves us time as 2 less copy constructors are called. In this particular problem we could save time by knowing waht happens behind the scenes.
Pointer arithmetic
Dereferencing a pointer takes some time. If you do something like this:
for( int i = 0; i < numPixels; i++ ) { rendering_context->back_buffer->surface->bits[i] = some_value; }
you're wasting much time in the loop dereferencing.[br] Instead you could do this:
unsigned char *back_surface_bits = rendering_context->back_buffer->surface->bits; for( int i = 0; i < numPixels; i++ ) { back_surface_bits[i] = some_value; }
or even faster:
unsigned char *back_surface_bits = rendering_context->back_buffer->surface->bits; for( int i = 0; i < numPixels; i++,back_surface_bits++ ) { *back_surface_bits = some_value; }
By leaving the brackets and going into pointer arithmetics the addresses don't have to be calculated, saving you some time to drink a cup of coffee.
Math optimisations
Bit shifting is a very fast way to perform integer math. It is limited in it's use, as you can only multiply and divide by 2, but it is very fast. Consider:
i *= 256; //i = i * 256 i = i << 8; //i = i * 256
Logically, they are the same. For this simple example, the compiler might even turn the first into the second, but as you get more complex the compiler might not be able to make the conversion. Example:
i *= 272 //i = i * 272 i = i << 8 + i << 4 //i = i * 272
One should also keep in mind, that multiplying is way slower than adding. This is why this makes sense:
a*b + a*c; a*(b+c); //This gets rid of one multiplication, with no change to the meaning of the expression
A variation on the previous item, but this time, replacing two divisions by a division and a multiplication. On every platform that I am aware of, divisions are slower than multiplications, so this rewrite will provide a speed improvement.
b/a + c/a = (1/a)*(b+c);
The last example is perhaps not so obvious, but the C++ standard requires lazy evaluation.
(a || b ) && c; c && ( a || b );
b ) has to be evaluated, whereas in the second case, it can be skipped as the entire expression can never evaluate to true anymore. |