Advertisement

Optimization issue when using operator=

Started by April 30, 2016 04:14 PM
7 comments, last by mychii 8 years, 7 months ago

Hi guys,

I am still getting into C++ and messing around with comparing a simple hand-written Vec2 with glm's vec2, and I kinda found a quite funny result.

Code first:


#include <iostream>
#include <string>
#include <chrono>

#include <glm/vec2.hpp>

class Vec2
{
public:
	float a;
	float b;

	Vec2()
	{

	}

	Vec2(float a, float b) : a(a), b(b)
	{
		
	}

	~Vec2()
	{

	}

	void operator=(Vec2 const& v)
	{
		// it doesn't even do anything yet.
	}

	Vec2& operator*(Vec2 const& v)
	{
		this->a *= v.a;
		this->b *= v.b;

		return *this;
	}
};

int main()
{
	auto t_start = std::chrono::high_resolution_clock::now();

	Vec2 v{ 2.0f, 2.0f };
	Vec2 vv{ 2.0f, 2.0f };
	Vec2 vr;

	for (int i = 0; i < 10000000; ++i)
	{
		vr = vv;
	}

	auto t_end = std::chrono::high_resolution_clock::now();

	std::cout << std::fixed << std::chrono::duration<double, std::milli>(t_end - t_start).count() << std::endl;

	t_start = std::chrono::high_resolution_clock::now();

	glm::vec2 v2{ 2.0f, 2.0f };
	glm::vec2 v22{ 2.0f, 2.0f };
	glm::vec2 vres;

	for (int i = 0; i < 10000000; ++i)
	{
		vres = v22;
	}

	t_end = std::chrono::high_resolution_clock::now();

	std::cout << std::fixed << std::chrono::duration<double, std::milli>(t_end - t_start).count() << std::endl;

	return 0;
}

I make a simple assignment using my Vec2 operator= for 10m iterations. It takes roughly around 390ms compare to glm's vec2 that is 28ms on my machine. I even removed the code inside it, it is still like that. This is completely the opposite when I'm doing a multiplication (operator*), which is ~410ms vs ~668ms.

So assume there's something wrong or missing when I'm doing operator=, but I really don't know what it is.

I've tried to get as close as glm's code I saw in type_vec2.inl.

Cheers~

A good compiler should optimize away your entire test. Therefore, it is dangerous to make assumptions based on what you have here.

EDIT:
Redacted-- Frob is right, I forgot which board I was posting in.
Also, upon a second look, I realized that operator* is probably a typo of operator*=

My post is useless.
Advertisement
I make a simple assignment using my Vec2 operator= for 10m iterations. It takes roughly around 390ms compare to glm's vec2 that is 28ms on my machine. I even removed the code inside it, it is still like that. This is completely the opposite when I'm doing a multiplication (operator*), which is ~410ms vs ~668ms.

What kind of optimizations are you applying in your compiler? I assume you are testing with "debug" in Visual Studio (or any other equivalent), since otherwise I strongly assume the loop would be optimized out anyways since it doesn't do any work. What that means is that first of all you are not using the the write-to vector anywhere outside of the loop, which is case one for a compiler to optimize it away, and case two is that you are assigning the same value over and over which could also be eliminated.

Why is this important? Any benchmark in debug mode is pretty pointless. Enable optimizations/switch to "release" first, which will most likely result in the loop taking 0 ms at first due to optimizations (which can be fixed by doing "std::cout << vres" after the loop; unless the compiler chooses to optimize the loop based on the second case, for which I do not have a solution).

A loop with an empty assignment-operator, if executed with proper optimization should take like not even 1 ms. Then you can see what the real difference between yours and glms class is.

As for why it could be slower: GLM uses template types, which are more likely to be inlined IIRC, which can be a huge deal if you are running in an unoptimized build. See how it behaves in release first, and then if it still doesn't change try "force_inlining" the function.

@Juliean

Ah, silly me, I forgot the release mode. :P

I was trying to make a simple code without templates and all to see if it makes a difference. But you've explained it nicely to make me realize this doesn't do a thing.

On top of that, your vector operators aren’t well formed.

Can you tell me what the "well-formed" should be? is it cause I didn't return anything on that assignment? It's the reason why I post it in Beginner section.

Come back when you have real problems with real code built without debugging.


Please remember that this is For Beginners, where special rules apply: Do not flame users because of their lack of knowledge. We all had to start somewhere. This forum is for beginners to ask questions without being harassed because somebody more experienced thinks the answer should be obvious. Make sure your replies are helpful and guiding the beginner in the right direction, not taunting or flaming or insulting them.

Can you tell me what the "well-formed" should be? is it cause I didn't return anything on that assignment? It's the reason why I post it in Beginner section.


It should do some work that cannot reasonably be optimized away.

Writing small pieces of code that test for performance, called a micro-benchmark, is surprisingly difficult to do. Nearly always the compiler will find something about the code that can be optimized away. Generally programmers writing this type of benchmark need to study the generated assembly code to ensure it is measuring what they think it is measuring.

Probably what is happening is the compiler is using one of the benefits of templates in optimization. Templates are not specific functions with specific code. They are like cookie-cutters where the final output is code. Probably the difference is that in your first example the compiler is calling a function that many times because it has a specific function it needs to call. But in the template version the compiler is evaluating what happens when it applies the cookie cutter, notices that the cookie-cutter generates ten million no-ops, and eliminates that to a single call rather than ten million. The only way to know for certain is to look at the final disassembled output.

A better way to get good performance numbers is to write real code that does something. Measure that instead.

@frob

Oh my, okay, that's very deep. I didn't know it could go that far down to assembly. :blink:

About the templates, this is what I always do in JavaScript cause this does matter on that level as everything happened during runtime, including generic codes.

So whenever I see any generalization and stuff, I have this habit to check its performance and see if the simplification of the code would make a difference.

My main case was I only need float-based vectors. By that, my mind says why would I need a generic version of it and then tries to check if it does make a difference by runtime performance.

I will re-read C++ template again. :rolleyes:

Thanks for the time, guys!

Advertisement

Optimizing compilers do an awful lot.

As you mentioned getting a start in JavaScript, you probably understand how different JavaScript systems generate different results. Running the same JavaScript code in different browsers can give radically different performance even though the JavaScript is identical.

The same is true in C++ and other languages. Different compilers and different compiler settings can result in big differences in how the code actually executes.

The only way to make a truly fair comparison is to consider how the code actually executes all the way through in a non-trivial example. It needs to do actual work, then you can look and see how the system actually runs in a real-world environment.

A primary problem here is that you wrote an operator= at all. Detailed explanation of why this is a problem:

In C++, the assignment operators are generated automatically by default. These automatically-generated operators have _special properties_. Namely, they can be what is called a "trivial" operator. If you write your own operator - even an empty one! - it is not _trivial_.

The importance is that the trivial operator allows the compiler or library in some cases to generate wildly different (and more optimal) code.

For an assignment operator, the difference between a trivial version and a non-trivial version is that the trivial assignment will just copy memory around while the non-trivial one is a function. Even if your implementation of the assignment operator is just copying memory, it's still a function that copies memory, and not just a magical automatic copy of memory.

Functions aren't free to call; there's some overhead involve in how the CPU has to setup a function to be called, overhead in the function "prolog", more overhead in the function "epilog", and yet more overhead in the caller's code that cleans up after a function is called.

This function call overhead is far worse in Debug builds or when using various default options in many compilers (including Microsoft's). For security purposes, the compiler might silently insert additional "safety check" code into a function prolog or epilog. This results in small do-nothing functions actually have _very large_ overhead relative to the work that they actually do.

Function call overhead is a primary reason that "function inlining" is such an important optimization that is turned on in Release builds (or that you can turn on manually yourself in Debug builds, which is what I do, because I need my Debug builds to be fast enough for my game to be interactive and not a slideshow).

If you want to guarantee that one of the default operators exists with its default semantics (of just copying memory), you can declare a "defaulted function" using the =default syntax, e.g.

class vec2 {
  float x, y;

  vec2& operator=(vec2 const&) = default;
};

In the above snippet, the default operator declaration serves no purpose because the compiler would have generated it anyway. There are various things you might do to suppress the default generation of functions, though, so you need to use the =default syntax to turn that generation back on. Importantly, defaulted functions are the _only_ way to get a "trivial" operation for constructors, destructors, or assignment operators. The most common use of =default I see is generally to enable the default constructor in cases that you also have a user-provided constructor, which for your vector would be:

class vec2 {
  float x, y;

  explicit vec2(float x, float y) : x(x), y(y) {} // user-provided constructor, so the default constructor is suppressed!
  vec2() = default; // tell the compiler to generate the default (trivial) constructor anyway
};

Note that the trivial constructor does _no_ initialization, meaning that in the above code the .x and .y members of vec2 will be _uninitialized memory_ if the default constructor is invoked. This may not be what you want and you may want to provide a non-trivial default constructor that zero-initializes those members.

As an additional note, there is also an =delete modifier you can use on function to _suppress_ the default generation in cases where you don't want the operator to exist at all. e.g., if you wanted a class to not be copyable at all:

class not_copyable {
  not_copyable(not_copyable const&) = delete; // disbale copy construction (but not copy assignment) entirely
  not_copyable& operator=(not_copyable const&) = delete; // disable copy assignment (but not copy construction) entirely
};

Sean Middleditch – Game Systems Engineer – Join my team!

@frob

True, Sir! Learned myself a new homework for that now. This is excitingly new to me. :rolleyes:

@SeanMiddleditch

Thanks for telling me how things are going on. Didn't realize it's not that simple. :wacko:

The assignment was supposed to have simple assignment code which is this->a = v.a and this->b = v.b. When I put those, it "gave" me a slow result, so I was assuming that removing these code to empty may increase the speed, which apparently not. Therefore I posted the empty one. But as everyone has explained, it's completely not the point.

Also, upon a second look, I realized that operator* is probably a typo of operator*=

Thanks for the clarification fastcall22. I just copied the glm code (just without template) and use that operator instead of the '*=' one for a (fail) comparison. :)

This topic is closed to new replies.

Advertisement