Advertisement

Optimization results

Started by July 09, 2001 09:01 PM
14 comments, last by JD 23 years, 7 months ago
>> File writing under windows98 win32 and msvc++ compiler. << On average I found win32 file writing to be 5x faster than C. C is 3x faster than C++ STL(dinkum that comes with vc++6). STL is 20x slower than win32. ------- My results from console app(writing to file over and over): Writing 10000 integers 10000 times C file writing took: 9.675209 seconds STL file writing took: 33.263714 seconds Win32 file writing took: 1.730036 seconds Press any key to continue ------ If you use vector.reserve() and then vector = value then it''s only 2.3x slower than "C" array filling. If vector.push_back() is used (constantly allocates from heap during looping) then it''s 16x slower than "C" array filling. Now for the kicker, if using either iterators or vector.size() to iterate the vector then it''s faster than "C" array iteration. ------(using reserve()) C array: 0.041174 seconds STL array: 0.096588 seconds iterate C array: 0.047839 seconds iterate STL array: 0.000004 seconds Press any key to continue ------ ----(using push_back()) C array: 0.040835 seconds STL array: 0.638112 seconds iterate C array: 0.042605 seconds iterate STL array: 0.060424 seconds Press any key to continue -------
  
#include "stdafx.h"
#include <fstream>
#include <iostream>
#include <string>
#include <vector>

#include "C:\MyProjects\NewProject\WinUtils\WinUtils.h"
using namespace std;

//#define FILE_IO
#define ARRAY_TEST


int main(int argc, char* argv[])
{		
#if defined(FILE_IO)

	const int SIZE = 10000;
	int ints[SIZE];
	
	for(int i = 0; i < SIZE; ++i)
		ints[i] = 1;
	
	wul_CTimer timer;
	int k;
	
	cout << "Writing " <<  SIZE << " integers " <<  SIZE << " times" << endl << endl;

	/*
	// C
	*/
	timer.Start();	
	for(k = 0; k < SIZE; ++k)
	{			
		FILE *stream = fopen("text.txt", "w");
		fwrite(&ints, sizeof(int), SIZE, stream);
		fclose(stream);
	}
	timer.Stop();
	string s1(timer.GetTimeStr());
	
	/*
	// STL
	*/
	timer.Start();
	for(k = 0; k < SIZE; ++k)
	{
		basic_ofstream<int> stream;
		stream.open("text.txt");
		stream.write(ints, SIZE);
		stream.close();	
	}
	timer.Stop();
	string s2(timer.GetTimeStr());
	
	/*
	// WIN32
	*/

	// To be fair to win32

	int count = SIZE * sizeof(int);
	DWORD BytesWritten; 

	timer.Start();
	for(k = 0; k < SIZE; ++k)
	{	
		HANDLE hFile = CreateFile(
			"text.txt",
			GENERIC_WRITE,
			0, 
			NULL,
			CREATE_ALWAYS,
			FILE_ATTRIBUTE_NORMAL,
			NULL);
																		
		 WriteFile(
			hFile,
			(const void*)&ints,
			count,
			&BytesWritten,
			NULL);
	
		CloseHandle(hFile);
	}
	timer.Stop();
	string s3(timer.GetTimeStr());
	
	// Results		

	cout << "C file writing took:\t\t" << s1 << " seconds" << endl;
	cout << "STL file writing took:\t\t" << s2 << " seconds" << endl;
	cout << "Win32 file writing took:\t" << s3 << " seconds" << endl;
	cout << endl;
#endif
	
#if defined(ARRAY_TEST)
	const int COUNT = 2000000;
	int i;
	int ints[COUNT];
	vector<int> intss;
	wul_CTimer timer;
	
	timer.Start();
	for(i = 0; i < COUNT; ++i)
		ints[i] = i;
	timer.Stop();
	cout << "C array:\t" << timer.GetTimeStr() << " seconds" << endl;
	
//	intss.reserve(COUNT);

	timer.Start();	
	for(i = 0; i < COUNT; ++i)
//		intss = i;
</font>
		intss.push_back(i);
	timer.Stop();
	cout &lt;&lt; <font color="darkred">"STL array:\t"</font> &lt;&lt; timer.GetTimeStr() &lt;&lt; <font color="darkred">" seconds"</font> &lt;&lt; endl &lt;&lt; endl;

	<font color="blue">int</font> val;
	
	timer.Start();
	<font color="blue">for</font>(i = 0; i &lt; COUNT; ++i)
		val = ints[<font color="purple">i</font>];
	timer.Stop();
	cout &lt;&lt; <font color="darkred">"iterate C array:\t"</font> &lt;&lt; timer.GetTimeStr() &lt;&lt; <font color="darkred">" seconds"</font> &lt;&lt; endl;
	
	vector&lt;int&gt;::iterator itInt;

	timer.Start();
	<font color="blue">for</font>(it<font color="blue">Int</font> = intss.begin(); it<font color="blue">Int</font> != intss.end(); ++itInt)
<font color="gray">//	for(i = 0; i &lt; intss.size(); ++i)
</font>
		val = *itInt;
<font color="gray">//		val = intss;
</font>
	timer.Stop();
	cout &lt;&lt; <font color="darkred">"iterate STL array:\t"</font> &lt;&lt; timer.GetTimeStr() &lt;&lt; <font color="darkred">" seconds"</font> &lt;&lt; endl &lt;&lt; endl;
#end<font color="blue">if</font>
		
	<font color="blue">return</font> 0;
}
  </pre></font></td></tr></table></center><!–ENDSCRIPT–>

    <!–STARTSCRIPT–><center><table border=1 cellspacing=0 cellpadding=8 bgcolor="#FFFFFF" width="90%"><tr><td><font color=black face="Courier New"><pre>  
<font color="blue">class</font> wul_CTimer
{
private:
	<font color="gray">// QueryPerformance…() - more precise timer, resolution below 1 millisecond
</font>
	LARGE_INTEGER	m_liCount, m_liLastCount;
	LARGE_INTEGER	m_liDeltaCount;
	LARGE_INTEGER	m_liCounterFrequency;
	
	<font color="gray">// timeGetTime() - 1 millisecond resolution timer i.e. slower that above		
</font>
	<font color="blue">long</font>			m_lDeltaCount;
	<font color="blue">long</font>			m_lLastCount;
	
	<font color="gray">// Shared
</font>
	<font color="blue">char</font>			m_strTimeInSec[<font color="purple">64</font>];
	<font color="blue">float</font>			m_fTimeInSec;

public:
	wul_CTimer();
	
	<font color="gray">// Interface
</font>
	<font color="blue">void</font> Start();
	<font color="blue">void</font> Stop();

	<font color="gray">// Return time as a string or as a float 
</font>
	const <font color="blue">char</font> *GetTimeStr();
	<font color="blue">float</font> GetTim<font color="blue">eNum</font>();
};


<font color="gray">/*!
Timer to time code
*/</font>
wul_CTimer::wul_CTimer()
{
	QueryPerformanceFrequency(&m_liCounterFrequency);
	m_lLastCount = 0;
	m_fTimeInSec = 0.0f;
}
	
<font color="blue">void</font> wul_CTimer::Start()
{
	<font color="blue">if</font>(<font color="blue">FALSE</font> == QueryPerformanceCounter(&m_liLastCount))
		m_lLastCount = timeGetTime();									
}

<font color="blue">void</font> wul_CTimer::Stop()
{
	<font color="blue">if</font>(<font color="blue">TRUE</font> == QueryPerformanceCounter(&m_liCount))
	{
		<font color="gray">// High resolution timer
</font>
		m_liDeltaCount.QuadPart = m_liCount.QuadPart - m_liLastCount.QuadPart;	
		m_fTimeInSec = <font color="blue">float</font>(m_liDeltaCount.QuadPart) / <font color="blue">float</font>(m_liCounterFrequency.QuadPart);						
	}
	<font color="blue">else</font>
	{
		<font color="gray">// Low resolution timer
</font>
		m_lDeltaCount = timeGetTime() - m_lLastCount;
							
		<font color="gray">// Convert milliseconds to seconds
</font>
		m_fTimeInSec = m_lDeltaCount * 0.001f;	
	}
}

const char* wul_CTimer::GetTimeStr()
{
	sprintf(m_strTimeInSec, <font color="darkred">"%f"</font>, m_fTimeInSec);	
	<font color="blue">return</font> m_strTimeInSec;
}

<font color="blue">float</font> wul_CTimer::GetTim<font color="blue">eNum</font>()
{
	<font color="blue">return</font> m_fTimeInSec;
}
  </pre></font></td></tr></table></center><!–ENDSCRIPT–>

Do you guys agree and do you also time your code?  Should we worry about not optimizing the code?  Would you optimize if you had veeery fast machine and would you feel guilty that there are more features you can put in if you''ve only optimized?  I like to know your thoughts on this<img src="smile.gif" width=15 height=15 align=middle>  Thanks.    
Maybe no one is interested

Wrote 40,000 chars to stream 10,000 times in Java 2 i.e. v1.3 and it took 50 seconds. Same test as above.

    import java.util.*;import java.io.*;public class test extends java.lang.Object{    public test() { }    public static void main (String args[]) throws IOException 	{				final int CHARS_TO_WRITE 		= 40000;				final int NUMBER_OF_ITERATIONS	= 10000;		char[] chars = new char[CHARS_TO_WRITE];		for(int i = 0; i < CHARS_TO_WRITE; ++i)			chars[i] = 'c';				long base, time;				// Start timing the code		base = System.currentTimeMillis();				for(int j = 0; j < NUMBER_OF_ITERATIONS; ++j)		{			File hFile = new File("text.txt");			FileWriter stream = new FileWriter(hFile);						stream.write(chars);					stream.close();		}				// Stop timing code		time = System.currentTimeMillis();		float seconds = (time - base) * 0.001f;		System.out.println("Seconds elapsed: " + seconds);		    }}    


So it's slower than STL file io.

Edited by - JD on July 10, 2001 1:25:05 AM
Advertisement
> Now for the kicker, if using either iterators or vector.size() to iterate the vector then it's faster than "C" array iteration.

And why is that surprising? You are NOT doing the same test, they can't be compared!

Replace your Interate C array by:

  timer.Start();int *pints=ints,*end=ints+COUNT;for(;pints<end;pints++)    val=*pints;timer.Stop();cout << "iterate C array:\t" << timer.GetTimeStr() << " seconds" << endl;  


And you'll probably see no difference with STL.

Y.


Edited by - Ysaneya on July 11, 2001 4:48:31 AM
I think there''s a bug in your code...when ever it says "press any key to continue" only the down arrow or the page down key works, maybe its just my HTML interepter (Netscape 4.5 on RH Linux 6.2 with KDE) who knows. The frame rate''s not too bad though, keep up the good work.

Hope this helps in the development of you article
Yeah that "Press blah..." is direct copy from console, too lazy to edit.

I''ll be dropping STL in exchange of my own code. I can beat vector insertions when the reserved size is the same for both my and stl code. Push_back is simply too slow even with reserved space. It''s even slower when you don''t reserve the space in the vector. There''s also some funky stuff going on in stl. If I reserve space in vector then use iterator to assign values and then want to print vector elements I think I have to use .capacity() or .size() doesn''t do anything. Basically nothing works until I use .push_back().

Here''s my own dynamically allocated and resized array code that beats pants of STL if you reserve all space for items in the array. If you reserve on 1/10 of what you reserve for STL then the the my code and stl code is even in speed. Try it yourself

  namespace STL_MY_ARRAY_TEST{class Vector{public:	Vector(unsigned int val = 100);	~Vector() {if(!m_pArray)delete [] m_pArray;}	void			PushBack(unsigned int uiValue);	void			Insert(unsigned int uiIndex, int iValue);		void			ShowElements();	void			Reserve(unsigned int uiAmountToReserve);	unsigned int	Size();	unsigned int	ReservedSize();	int	*m_pArray;protected:	// Number of elements (not bytes) in the array	unsigned int m_uiActualArraySize;	unsigned int m_uiNumOfElements;	const unsigned int NUMBER_OF_ELEMENTS_TO_ALLOCATE;};Vector::Vector(unsigned int val /*= 100*/):NUMBER_OF_ELEMENTS_TO_ALLOCATE(val){	m_pArray			= NULL;	m_uiActualArraySize = 0;	m_uiNumOfElements	= 0;}void Vector::Reserve(unsigned int uiAmountToReserve){	m_uiActualArraySize += uiAmountToReserve;	int *pInts = new int[m_uiActualArraySize];	unsigned int i;	for(i = 0; i < m_uiNumOfElements; ++i)		pInts = m_pArray;<br><br>	delete [] m_pArray;<br>	m_pArray = pInts;<br>}<br><br>void Vector::PushBack(unsigned int uiValue)<br>{<br>	if(m_uiActualArraySize == 0)<br>	{		<br>		m_pArray = new int[NUMBER_OF_ELEMENTS_TO_ALLOCATE];<br>		m_pArray[0] = uiValue;<br>		++m_uiNumOfElements;<br>		m_uiActualArraySize = NUMBER_OF_ELEMENTS_TO_ALLOCATE;<br>	}<br>	else<br>	{<br>		if(m_uiActualArraySize - m_uiNumOfElements &gt; 0)<br>		{<br>			// Array has enough space for new elements<br>			m_pArray[m_uiNumOfElements] = uiValue;<br>			++m_uiNumOfElements;<br>		}<br>		else<br>		{<br>			if(m_uiActualArraySize - m_uiNumOfElements == 0)<br>			{<br>				// Array is full so make a new one with more space<br>				m_uiActualArraySize += NUMBER_OF_ELEMENTS_TO_ALLOCATE;<br><br>				int *pNewArray = new int[m_uiActualArraySize];<br><br>				// Copy old elements into new array and append new value <br>				unsigned int i;<br>				for(i = 0; i &lt; m_uiNumOfElements; ++i)<br>					pNewArray = m_pArray;<br><br>				pNewArray = uiValue;<br><br>				delete [] m_pArray;<br>				m_pArray = pNewArray;<br>				++m_uiNumOfElements;				<br>			}<br>		}<br>	}<br>}<br><br>unsigned int Vector::Size()<br>{<br>	return m_uiNumOfElements;<br>}<br><br>unsigned int Vector::ReservedSize()<br>{<br>	return m_uiActualArraySize;<br>}<br><br>void Vector::ShowElements()<br>{<br>	unsigned int i;<br><br>	for(i = 0; i &lt; m_uiNumOfElements; ++i)<br>		cout &lt;&lt; m_pArray &lt;&lt; endl;<br>}<br><br>void Vector::Insert(unsigned int uiIndex, int iValue)<br>{<br>	<br>}<br>	<br>}<br><br>/********************<br>//		MAIN<br>********************/</font><br><font color="blue">int</font> main(<font color="blue">int</font> argc, char* argv[])<br>{<br>	<font color="blue">using</font> STL_MY_ARRAY_TEST::Vector;<br><br>	Vector v;<br>	v.Reserve(300);<br><br>	<font color="blue">for</font>(<font color="blue">int</font> i = 0; i &lt; 400; ++i)<br>		v.PushBack(i);<br>	<br>	cout &lt;&lt; <font color="darkred">"size: "</font> &lt;&lt; v.Size() &lt;&lt; endl;<br>	cout &lt;&lt; <font color="darkred">"reserved size: "</font> &lt;&lt; v.ReservedSize() &lt;&lt; endl;<br><br><font color="blue">return</font> 0;<br>}<br>  </pre></font></td></tr></table></center><!–ENDSCRIPT–>     
Ysaneya,

I'm filling the array with ints. Maybe I didn't understand your code can you get back to me?

The following should be the same(then again I don't know if the assembler differs them

  int *pInts = new int[10];for(int i = 0; i < 10; ++i){pInts[i] = i;}for(int i = 0; i < 10; ++i){*(pInts + i) = i;}  


I just don't have the time to figure out the pecularities of STL and test each and every single function and find out how to drive the STL optimally. If I don't care about speed I can probably see a use for STL but not in games where every cycle could be used to put more trees or stuff into the game. I'm using small set of STL so rewrite of this code is not that bad.

My observation of STL tells me that using .reserve() and not using .push_back() but instead using *iterator = value or vector = i is faster. Still you can out code STL if you tried. I only hope I screwed up somewhere in my thinking and coding so that STL in reality is faster than my code, so if you know a way could you provide a hint please

Basically for me it goes like this.

From fastest to slowest code:

1)Use operating system api functions (file writing is really fast, possibly other stuff too)
2)Use C library
3)Use C++ STL possibly third party libs
4)Use assembly in parts(maybe a first choice but is time consuming, using this only when I absolutely must)

Hope I don't have to put on my asbestos gear. Any thoughts? Are you timing your code? What's your observation of STL? Any websites out there provide timing results of vc++ stl vs. hand rolled code? Thanks.

Edited by - JD on July 10, 2001 12:29:29 AM
Advertisement
I did one more test to make sure I wasn''t halucinating when I say STL is slower than my own code. I reserved both arrays to max and filled them with ints. My code is 4 times faster than STL code.

  	using STL_MY_ARRAY_TEST::Vector;		wul_CTimer timer;	const int NUM_ITERATIONS = 10000;	int i;	float t1, t2;		//////////	timer.Start();	Vector a;	a.Reserve(NUM_ITERATIONS);	for(i = 0; i < NUM_ITERATIONS; ++i)			a.PushBack(i);	timer.Stop();	t1 = timer.GetTimeNum();	//////////	//////////	timer.Start();	vector<int> b;	b.reserve(NUM_ITERATIONS);	for(i = 0; i < NUM_ITERATIONS; ++i)			b.push_back(i);				timer.Stop();	t2 = timer.GetTimeNum();	/////////	cout << "My array:\t" << t1 << " seconds" << endl;	cout << "STL vector:\t" << t2 << " seconds" << endl;	cout << endl;//	cout << "STL is " << t1 / t2 << " times faster than my array" << endl;	cout << "My array is " << t2 / t1 << " times faster than STL vector" << endl;	cout << endl;	//	for(int h = 0; h < b.size(); ++h)	//		cout << b[h] << " ";	cout << "STL vector size:\t" << b.size() << endl;	cout << "STL vector capacity:\t" << b.capacity() << endl;	return 0;#endif  


I had to use push_back() on stl vector any other way to fill the vector would yield zero vector''s size and I couldn''t write the ints to the console. Using b[h] = value resulted in b.size() == zero but the elements did wrote themselves to the console when I used h < max instead of h < b.size(). Capacity in both cases was max elements. Looks like dinkum''s stl is slow. I may try the SGI stl library but I hear stl is not very cross platform compliant so I''ll probably stick with writing my own containers.
JD: sorry, i''ve been confusing the tags with another site, check out the code in my previous reply now.

Basically, in your iterate C array, you are doing 2 additions in your loop (one for the array, one to increment the index). In your interate STL array, you are doing only one (incrementing a pointer). So no wonder why it''s faster in the second test!

Y.
I understand now. Ysaneya, thank you for pointing that out.
quote:
Original post by JD
I just don't have the time to figure out the pecularities of STL and test each and every single function and find out how to drive the STL optimally. If I don't care about speed I can probably see a use for STL but not in games where every cycle could be used to put more trees or stuff into the game. I'm using small set of STL so rewrite of this code is not that bad.

You see, I find this to be an, er, interesting view to say the least. It almost always takes longer to recode something yourself than to work out how to use an existing tool properly, providing you have good references on hand. A lot of people here on GameDev can point out the optimal way to use something, and a little study of the SGI STL documentation helps too. Also, perhaps buying a book? (I hear the Josuttis one is good.) You're right that saving cycles is important, but evaluating code in isolation is rarely too useful. What's more important is to profile the end result, because if it turns out that STL vector routines constitute less than 1% of your runtime CPU usage, then you've wasted your time rewriting it when you could have been optimising other code.

As for STL not being useful in games... the idea of pretty much any library is to get the job done quickly and reliably. Then, you optimise once you know what needs optimising. 99% of the time, taking twice as long for an array iteration is not going to be that big of a deal. Besides, most people would use the operator[] on vectors rather than iterators, which will probably optimise better, since many STL vectors are implemented as C arrays anyway! In which case, you've lost nothing in terms of iterator speed, and gained in terms of flexibility (functions to check the number of elements, automatic reallocation, etc).

Going back to what I said about reliability: you seem to require people to iterate through your array by using that public member pointer, yes? That doesn't seem like a too robust solution if other programmers had to use it. That pointer could get changed in subtle ways with just a simple typing error in the code. And on a second point, look at your destructor:
if(!m_pArray)delete [] m_pArray;}  

Ooh, look: only delete the array if m_pArray is NULL. I guess that means it'll never get deleted. Nice memory leak, and I bet you wouldn't have found it for quite a while, since everything 'works'. This is another reason why people use the STL: because it's been extensively tested already....

Last points: your tests are not scientifically valid. Apart from potential issues such as too many additions or whatever Ysaneya was mentioning, there are other problems. Your tests suffer from what are called 'order effects'. Such things affect all scientific studies, from psychology to computing. Basically, performing a set of tests in a given order can have an effect on the results. For instance, in your filing test, by the time the Win32 API test comes around, the Win32 API functions have already been called numerous times (by stdio and iostreams) and are likely to be in the CPU cache, and hence the results are likely to be better than if you did them in a different order. Or did them in separate programs. Additionally: did you use any form of optimisation for these tests? The STL contains a lot of code that gets optimised away in the final version of the executable, which you might not have been taking into account. And which version of the compiler did you use? And did you patch the iostream bugs with the fixes on the Dinkumware site? Without revealing all these factors, your tests are almost meaningless. Sorry
quote:
I had to use push_back() on stl vector any other way to fill the vector would yield zero vector's size

This is because you slightly misunderstand what 'reserve' does. It doesn't say "Ok, this vector is now X in size" . It just says "Ok, now we have enough memory allocated so that the vector can grow to X in size before we need to allocate any more memory." See the difference? It means that you can't just write to the vector after you've reserved it. Reserve() is just a request for memory, not a resizing of the vector. Maybe you should try with the 'vector(size_type n, const T& t)' constructor, eg: vector<int> intss(COUNT, 0); which will give you a vector with 'COUNT' as the size right from the start. (Note that the 2nd parameter to the constructor is the value to initialise all the items to, and is optional. If omitted, they are left uninitialised.)

Anyway, I hope I've cleared up a few things there. Good luck with whichever method you choose, but don't write STL off just yet.

Edited by - Kylotan on July 11, 2001 8:51:24 AM

This topic is closed to new replies.

Advertisement