Advertisement

boost::asio causing seemingly random crashes

Started by May 21, 2017 09:13 PM
15 comments, last by JackOfCandles 7 years, 5 months ago

I'm using boost::asio for UDP communication and I'm experiencing what appears to be random crashes, which I don't even know how to begin to solve. I've been trying to figure this out for 3 weeks now and I'm about at my wit's end.

Some information I've discovered while working on this:

  • It only happens about every 1 in 3 times I run it.
  • When I add console output using std::cout that is written once every frame, it never happens. I have no idea why this would affect it.
  • It only happens when I run the exe directly. When I run it in visual studio in either debug or release mode, it's fine.
  • It only happens when I'm running the client and the server within the same process. If I run one machine as a dedicated server and another as a client it does not happen.
  • It only seems to be a problem when it's in my actual game engine. I created a bare bones process that does nothing but simulates the UDP communication that the game would do, and that works fine.
  • I've tried having the client and server IO done in two separate threads with two separate io_service objects, as well as in the same thread with a single io_service. There was no difference either way.

The crash happens during the initiation/synchronization step.The protocol is as follows:

  1. The client sends a connection initiation packet to the server.
  2. Upon receiving this, the server sends to the client a few packets containing a list of UUID and integer ID pairs.
  3. The client receives each of these packets, and builds its own map of the server generated IDs to the client generated IDs.

Here's a screenshot of the console output, which shows that it is crashing in the middle of writing a line in the client's handleReceive function. I should not, it doesn't always crash at this very moment. Sometimes it does write out all console lines fully.

Lrb69mQ.png

Here is the code of the handleReceive function in the client IO thread:


void UdpClient::handleReceive(const boost::system::error_code& error, std::size_t bytesReceived)
{
    if ((!error || error == error::message_size) && bytesReceived > 0 && udpPacket_.decodeHeader())
    {
	UdpPacketType packetType = udpPacket_.getPacketType();		
	int bodyLength = udpPacket_.getBodyLength();
	char* body = udpPacket_.getBody();

	switch (packetType)
	{
		case UDP_PACKET_SERVER_ID_LIST: // Read a batch of IDs from the server, and map them to client IDs.
		{
			DebugHelper::streamLock->lock();
			std::cout << "[" << boost::this_thread::get_id() << "] Synching IDs on client. "<<std::endl;
			DebugHelper::streamLock->unlock();
			
			int uuidSize = boost::uuids::uuid::static_size();
				
			// Combined size of a UUID and an integer ID.
			int combinedSize = sizeof(int) + uuidSize;
			
			for (int i = 0; i < bodyLength; i += combinedSize)
			{
				// Read the UUID from the packet body.
				boost::uuids::uuid uuid = networkUtility_.buildUuid(body, i);			
						
				// Read the server generated integer ID from the packet body.
				int serverId = networkUtility_.buildInteger(body, i + uuidSize);

				// Get the locally generated integer ID.
				int localId = BaseIds::getIntegerFromUuid(uuid);

				clientLayer_->mapLocalIdToServerId(localId, serverId);
			}

			DebugHelper::streamLock->lock();
			std::cout << "[" << boost::this_thread::get_id() << "] ID batch processed on client. "<<std::endl;
			DebugHelper::streamLock->unlock();
			break;
		}

I don't know if maybe there are some tricks to working with threads, or UDP/IP that might be helpful? I'm just completely stuck at this point and I don't know where to go from here. Any ideas are appreciated!

From the symptoms, it sounds like the difference is in timing changes. Adding printing will change timing, as will most of the other things you're talking about.
You haven't showed us the stack trace of the crash, nor the declaration/use of the variables that are crashing.

You can also log to a file, rather than console. to make sure that you get all printing before the crash.
Or log to OutputDebugString().

Separately, and probably not related, but I thought I should mention it: The "streamlock" you're using is locked manually, not using a RAII "locker" object that gets automatically unwould when leaving scope; manual locking is a very bad pattern and you should change to using a local variable based lock holder.
enum Bool { True, False, FileNotFound };
Advertisement

From the symptoms, it sounds like the difference is in timing changes. Adding printing will change timing, as will most of the other things you're talking about.
You haven't showed us the stack trace of the crash, nor the declaration/use of the variables that are crashing.

You can also log to a file, rather than console. to make sure that you get all printing before the crash.
Or log to OutputDebugString().

Separately, and probably not related, but I thought I should mention it: The "streamlock" you're using is locked manually, not using a RAII "locker" object that gets automatically unwould when leaving scope; manual locking is a very bad pattern and you should change to using a local variable based lock holder.

Is there a way to get the stack trace when running the exe? It never happens when running inside visual studio.

And thanks for the tip about locking, I will change that.

Is there a way to get the stack trace when running the exe? It never happens when running inside visual studio.


Once it crashes, you get a dialog box.
You can use that dialog box to attach with Visual Studio.
enum Bool { True, False, FileNotFound };

You can try to use a static code analyser, it might find some edge cases why your code crashes.

Is there a way to get the stack trace when running the exe? It never happens when running inside visual studio.


Once it crashes, you get a dialog box.
You can use that dialog box to attach with Visual Studio.

Sorry for the long delay in getting back to you. I had a technical issue that prevented me from continuing, followed by a family emergency that I had to leave town for. Just a string of bad luck! But now everything is back to normal and I can continue working on this. I was able to attach the VS debugger to the faulting process as you suggested, and based on the call stack it looks like it is crashing when rendering.

vIIOKXt.png

I think atioglxx.dll is related to my video card driver. As a test, I tried running it on a machine with a different video card, and sure enough it worked fine. I did try updating the driver, but there didn't seem to be any change. That DLL is still the same version I originally had.

I really have no idea how the multi-threaded UDP I/O would in any way be causing a fault in the graphics driver. Maybe this isn't as big of a deal as I thought, as my card is pretty old (Radeon HD 5970 from 2009), but it feels really unsatisfying to just leave it hanging, knowing that this could be happening for who knows what other video cards.

Advertisement
What if you record the data you receive on UDP, and write it to a file, then run your program in a mode that plays back the file without using a socket?
That way, you should be able to reproduce the crash without using the networking code.
If you can do that, then it's pretty clear that it's an ATI OpenGL driver issue. Those are not particularly uncommon.
enum Bool { True, False, FileNotFound };

What if you record the data you receive on UDP, and write it to a file, then run your program in a mode that plays back the file without using a socket?
That way, you should be able to reproduce the crash without using the networking code.
If you can do that, then it's pretty clear that it's an ATI OpenGL driver issue. Those are not particularly uncommon.

Well, I'm not sure this would make a difference, because at one point not only did I try sending no data (other than the packet header), I also tried removing the body of the handleReceive function as well. And now it just got more puzzling. I removed the call to the render function just to see what happens, and while this did decrease the frequency of crashes, it does still crash fairly often. Each time I attached to the debugger to check where it's crashing and commented the code out, only for it to crash again somewhere else. It's turned into a sort of whack-a-mole game.

Here are a couple screenshots of the other crashes and their call stacks. At this point, I'm not convinced it is an OpenGL driver issue. One of the crashes happened in the fmod dll, and one in my own code with the line in question being a simple boolean conditional. It seems unlikely that there would be a bug in all of these independently. It almost feels like something is failing in the network IO thread, but it's breaking on whatever instruction it is executing in the main game loop thread.

MatQ2fe.png

c9ZHUQ1.png

Sounds like you have a "memory smasher" bug, such as a write-after-delete.

You might want to try enabling different memory debugging tools, depending on what version of visual studio you have.
You might want to look at the memory hex dump before and after the data that's been corrupted.
You can look at the disassembly and figure out which registers point at areas that then are bad, and see where those values come from, and then hex dump that memory to look for patterns.

Another option, if this crash is reproducible, is to use a memory write breakpoint to figure out what's writing the thing that's crashing.
enum Bool { True, False, FileNotFound };

Sounds like you have a "memory smasher" bug, such as a write-after-delete.

You might want to try enabling different memory debugging tools, depending on what version of visual studio you have.
You might want to look at the memory hex dump before and after the data that's been corrupted.
You can look at the disassembly and figure out which registers point at areas that then are bad, and see where those values come from, and then hex dump that memory to look for patterns.

Another option, if this crash is reproducible, is to use a memory write breakpoint to figure out what's writing the thing that's crashing.

Ooh that sounds bad. Currently I'm using the free Visual Studio Express 2015, so I'm not sure if those tools are available, but maybe I'll have to bite the bullet and buy the real thing. Hopefully I can still buy 2015, because I tried 2017 originally but had to downgrade because boost wasn't compatible with it when trying to build the boost::python DLLs. Ughh... I'll see what I can find though!

This topic is closed to new replies.

Advertisement