C++ Operator Overloading, and why it blows chunks

Joris Timmermans · 2000-06-06T09:39:26

Heyyyy, you''ll think, Mad Keith posting a flame about C++? That''s right. Until recently the most adamant supporter of C++, I''ve reached other conclusions. The problem? Something we''ve probably all worked on at some point or other: Vector and Matrix functions. It started like this: I downloaded a class-template library for vector/matrix math (among other things). It seemed so damn useful at the time, and it was so easy to program using it. Vector1 = Vector2 + Size*Vector3; Looks wonderful doesn''t it? Well, it may LOOK wonderful, but it''s slower than a dog. The problem is the copy constructor you''ll inevitably need for these operators to work. Inlining helps a bit. Reorganising your code helps a bit. Looking from afar it actually seems it''s efficient enough, within a few percent of using non-operator style functions. Well it isn''t. A full weekend of optimising my class code, trying out every compiler optimisation, every class trick, even buying a BOOK on efficient C++, my "typedefs and Macros" version takes about ( not kidding ) 50% of the time of the C++ version. *sigh* Just thought I''d warn you all before you waste a perfectly good weekend ( birthday weekend in my case ) trying to figure out why that C++ code just doesn''t seem as fast as it should be. #pragma DWIM // Do What I Mean! ~ Mad Keith ~

MadKeithV

Author

992

June 05, 2000 09:16 AM

The problem I'm tackling at the moment is like this:
I have a Vertex class - containing among others a CVector3f for the position of the vertex.
However, because there is MORE data than just that vector, it's not consecutive in memory.

This is the situation as it looks in my code now:

vertex_array{i}.m_position.Displace( left, right, scale );

A quick test showed that if I could somehow do THIS, it would be twice as fast:
CVector* vector_array = vertex_array::vector_array;
..
vector_array{i}.Displace( left, right, scale );

Programming this is not so hard, but it doesn't look clean enough yet, to me. So I'm trying to find a way to work with parallel buffers so that it looks clean, and works fast.

[note: the curly braces should be square ones, but with the i in the middle it thinks it's an italic tag

]

#pragma DWIM // Do What I Mean!
~ Mad Keith ~

Edited by - MadKeithV on June 5, 2000 10:21:57 AM

It's only funny 'till someone gets hurt.And then it's just hilarious.Unless it's you.

bjoern

122

June 05, 2000 09:54 AM

I see, hm.

If performance matters (is this a question :-) you should organize your data in a way which lent to a fast path.

The question is: do you often need access to all vertex data (like position, normal, color, etc...) which would be used the following way in OpenGL:

class CVertex
{
CVector3f m_position;
CVector3f m_normal;
CColor3ui8 m_color; // ui8 == unsigned int 8 bit
public:
...
// Returns a
const CVector3f& position() const { return m_position;};
const CVector3f& normal() const {return m_normal;};
const CColor3ui8& color() const {return m_color;};
};

glBegin(GL_TRIANGLES);
for(unsigned int i=0;i{
glColor3fv( &(vertex_array.color().m_red ) );
glNormal3fv( & (vertex_arrray.normal().m_x) ); glVertex3fv( & (vertex_array.position().m_x ) ); } glEnd(); … or do you decide to use vertex buffers (D3D term I think…) or vertex arrays (OGL term) in which case you wouldn''t use a vertex_array with vertices but something like a CTriMesh which would look like: class CTriMesh { CVector3f* m_vertex_list; CVector3f* m_normal_list; CColor3ui8* m_color_list; unsigned short m_primitive_count; // how many primitves inside the lists above (all with the same size) unsigned int m_index_count; // size of index_list unsigned short* m_index_list; public: … }; And the your drawing (or processing code) would look like: glEnableClientState(GL_VERTEX_ARRAY); glEnableClientState(GL_NORMAL_ARRAY); glEnableClientState(GL_COLOR_ARRAY); glVertexPointer(tri_mesh.numberOfPrimitives(), GL_FLOAT, 0, tri_mesh.positions()); glColorPointer( /* like above */ ); glNormalPointer( /* like above */); glDrawElements(GL_UNSIGNED_SHORT, tri_mesh.numberOfPrimitives(), GL_TRIANGLES,tri_mesh.indices()); … which corresponds to the way you would like to use it. Another way would be the use of a "stride" which tells you how much you have to jump to access the next vertex in a vertex_array… Did this help (or am I blowing dust?)? It''s all a question of your concept (what isn''t)… Bjoern

milo

122

June 05, 2000 09:56 AM

Straight out of MSDN:

Debug version-Full symbolic debugging information in Microsoft format No optimization (optimization generally makes debugging more difficult)

Release version-No symbolic debugging information
Optimized for maximum speed

Was that the sound of MadKeithV slapping his forehead?

Mike Roberts
aka milo
mlbobs@telocity.com

MadKeithV

Author

992

June 05, 2000 10:02 AM

Ohh I know about the speed increase of Release mode, and why Debug mode is slow, but it affects C++ a lot more than it affects plain C code. That''s what threw me, I was optimising things in ways that SEEMED to work, but they really only had effect in Debug mode, because a Release compile would get the most out of the code anyway.

#pragma DWIM // Do What I Mean!
~ Mad Keith ~

It's only funny 'till someone gets hurt.And then it's just hilarious.Unless it's you.

MadKeithV

Author

992

June 05, 2000 12:08 PM

I''m going to try out the "Vertex Array" - approach. This seems the most valid, as it will allow for cache optimisations and a possible SIMD assembler implementation later, when I really want to show off

something along these lines, with any additional members that may be necessary to get it to work:

class CVertex{public:protected:int  m_vertexIndex;private:static CVector3f[ maxSize ]m_positions; static CVector3f[ maxSize ]m_normals;static CTexCoord2f[ maxSize ]m_texCoords; static CRgba[ maxSize ]m_colours; };

I''m starting work on it now, I''ll keep this thread updated on my progress.

#pragma DWIM // Do What I Mean!
~ Mad Keith ~

It's only funny 'till someone gets hurt.And then it's just hilarious.Unless it's you.

bjoern

122

June 05, 2000 12:23 PM

This seems to be to closed fit... The scribble of your approach will limit all your CVertices to one structure of lists... so if you want to use many big graphic models you will have to grow the arrays - however there are array size limits on current (and I suppose future)3d accelerators. Another negative side is that enormeous growing arrays will introduce cache misses when you need to read one index out of many arrays.

Even if it isn't the fastest path I would bite the bullit and create abstraction levels which would allow you to use arrays per object - not per class (through static).

Well in the end everything which counts is your problem and perhaps I am totaly wrong and a simplier approach better suits your needs. I tend to architect things to an abstract death :-)

Bjoern
P.S.: So or so, I wouldn't name the class CVertex - CVertexHandle describes its use better (however there should be a way better name for it).

Edited by - bjoern on June 5, 2000 1:27:16 PM

MadKeithV

Author

992

June 05, 2000 12:31 PM

If you saw the template I''m working on now, I think you''d cringe in horror

. I''ll probably throw the idea away soon enough, but I have to try it.

I understand your point about the "tight fit", but it''s a question of how low-down in the graphics engine you go. The arrays will be emptied per polygon at first, so I wouldn''t need THAT much space ( though "per polygon" in my case probably means around 1k vertices ). When I get to more complex visibility algorithms, it may be necessary to go "per-object".
I''m dreading that day, because I see no easy way around it yet.

Perhaps I''ll go have a look at the Mesa3D source code again, and see how they do it there, and how many vertices they throw into their vertex buffer before flushing it...

#pragma DWIM // Do What I Mean!
~ Mad Keith ~

It's only funny 'till someone gets hurt.And then it's just hilarious.Unless it's you.

Justinfinity

122

June 05, 2000 05:41 PM

i''m 53 posts late, but madkeith, a "typedefs and macros" version will always be faster than operators or functions, because using macros doesn''t have any overhead of calling functions. you also made a distinction between operators and "non-operator" functions. umm, when compiled, operators end up as normal function calls, they just use operator syntax for readability

-Justin White

jowhite@bigfoot.com
AIM: Just6979
www.bigfoot.com/~jowhite

"To infinity and beyond!" -Buzz Lightyear
"I can only show you the door. You must choose to go through it." -Morpheus
"Your mind is like a parachute. It works best when open." -Anonymous
"I know Kung-fu." -Neo
"Ignorance is bliss" -Cypher
"My name...is Neo!" -Neo

-Justin Whitejowhite@bigfoot.comAIM: Just6979www.bigfoot.com/~jowhite"To infinity and beyond!" -Buzz Lightyear"I can only show you the door. You must choose to go through it." -Morpheus"Your mind is like a parachute. It works best when open." -Anonymous"I know Kung-fu." -Neo"Ignorance is bliss" -Cypher"My name...is Neo!" -Neo

MadKeithV

Author

992

June 06, 2000 02:24 AM

Justfinity - the macro''s and typedefs one, I knew that would be fastest. But I wanted to get the difference down to NOTHING more than the function call overhead, and even that COULD go away with inlining.

In the end, I got very very close to this. The trick is finding ways to avoid as much construction of temporary objects as possible. Visual C++ 6.0 does a hell of a job optimising code in the Release version, I must say, and for my efforts I get readable code, with type safety.
That''s worth the few ticks I lose calling a function occasionally.

It''s been a great learning experience doing this too - learning about the internals of C++, the weird constructs you can use to make things fast, and how they usually don''t work, and how to help your compiler to make the best possible optimisations to your code.

#pragma DWIM // Do What I Mean!
~ Mad Keith ~

It's only funny 'till someone gets hurt.And then it's just hilarious.Unless it's you.

null_pointer

289

June 06, 2000 07:44 AM

Justinfinity: Actually, macros can hamper performance when they are used to create pseudo-inline functions. Why? Well, if the compiler does a lot of work to see whether a function declared inline should really be compiled inline, then sometimes the performance would be better if the function were not inline. The compiler compares code size to execution speed to see which is going to make the program faster in the long run. Macros lock the function inline every time and that's why it's possible that they can hamper performance.

Want an example? OK:


#define BIG_FUNCTION( n ) { 	int i[1000]; 	 	for( int j=n; j&ltn j++ ) 	{ 		i[j] = n; 	} }

void main()
{
	BIG_FUNCTION( 0 )
	BIG_FUNCTION( 1 )
	BIG_FUNCTION( 2 )
}

Here's the C++ inline version:


inline void
BIG_FUNCTION( int n )
{
	int i[100];

	for( int j=n; j&ltn j++)
	{
		i[j] = n;
	}
}

void main()
{
	BIG_FUNCTION(0);
	BIG_FUNCTION(1);
	BIG_FUNCTION(2);
}

Note that in the macro version, if you take out the blocks then you'll get symbol redefinition errors in main. That's why I have included them. The C++ inline functions work as if the scoping is there anyway. My point is that inline functions can do everything that macro functions can do, plus they are more easily optimized (correctly).

(Note also that the for() loop in the inline function will probably force it outline, since ratio of the processor time of the for() loop over the function call is very large. The code was just concept code to show you that the compiler really does optimize inline functions quite intelligently.)

- null_pointer
Sabre Multimedia

Edited by - null_pointer on June 6, 2000 8:48:48 AM

C++ Operator Overloading, and why it blows chunks

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

C++ Operator Overloading, and why it blows chunks

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines