Advertisement

Linux X64 Crash

Started by July 21, 2016 03:34 PM
16 comments, last by Miss 8 years, 3 months ago
So in our project, when calling a registered method GetVector2 from script on a Linux x64 build (gcc-5, AS 2.31.1), I'm getting a segfault.

I'm no expert in x64, so my assessment here may be wrong, but here's my analysis. Take a look at this screenshot:

20160721GVS51AFU.png

(This is debugged remotely on an Ubuntu 16.04 machine via WinGDB in VS 2015.) The parameters of this function are incorrect (eg. ctx is actually pointing to 0x2000b520 instead of the object that the method was called with). I have marked the parameters in different colors, matching them with the memory view. The memory view is showing the "tempBuffer" memory passed to X64_CallFunction as the "args" param.

You can see that everything is ordered correctly, but there's an additional pointer at the start, which is a pointer to a glm::vec2 structure. (The contents happen to be the same as the "def" param, but I'm guessing that's a coincedence due to uninitialized memory?)

Looking through the code, I can see that "retPointer" is being set as the first pointer, so that must be its return address: (0x2000b520)
case ICC_CDECL_OBJFIRST_RETURNINMEM: 
{
	paramBuffer[0] = (asPWORD)retPointer;
	paramBuffer[1] = (asPWORD)obj;
So as a result, the parameters in the GetVector2 function above are invalid.

For the record, the method in the screenshot is registered like this:
r = m_engine->RegisterObjectMethod("WidgetLoadingContext", "vec2 GetVector2_extra(string name, bool required, vec2 def, int stuff)", asFUNCTION(GetVector2_extra), asCALL_CDECL_OBJFIRST);
I marked this as a bug, and I'm guessing it is a bug, but I'm not entirely sure, as I built AngelScript with a custom premake4 (actually GENie) script. (It works fine on Windows x86, though.)

Any tips would be appreciated! smile.png

The X64 GCC calling instruction seems to be the only one that sets the return address to the first argument. The objlast version also does this. Could you try using that calling convention and then checking if it still happens? It probably will, but if not that eliminates the instruction handling code as the cause of the problem.

Also, can you get the assembly code for that function? It should show if it handles the return address or not.

Advertisement

Setting the calling convention to asCALL_CDECL_OBJLAST did not fix the problem.

Here's the first couple instructions for GetVector2_extra:


Dump of assembler code for function GetVector2_extra(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, glm::tvec2<float, (glm::precision)0>, int, WidgetLoadingContext*):
   0x00000000006a2f20 <+0>:	push   rbx
   0x00000000006a2f21 <+1>:	sub    rsp,0x90
   0x00000000006a2f28 <+8>:	mov    QWORD PTR [rsp+0x28],rdi
   0x00000000006a2f2d <+13>:	mov    eax,esi
   0x00000000006a2f2f <+15>:	movq   QWORD PTR [rsp+0x10],xmm0
   0x00000000006a2f35 <+21>:	mov    DWORD PTR [rsp+0x20],edx
   0x00000000006a2f39 <+25>:	mov    QWORD PTR [rsp+0x18],rcx
   0x00000000006a2f3e <+30>:	mov    BYTE PTR [rsp+0x24],al
   0x00000000006a2f42 <+34>:	mov    rax,QWORD PTR fs:0x28
   0x00000000006a2f4b <+43>:	mov    QWORD PTR [rsp+0x88],rax
   0x00000000006a2f53 <+51>:	xor    eax,eax
=> 0x00000000006a2f55 <+53>:	mov    rdx,QWORD PTR [rsp+0x28]
   0x00000000006a2f5a <+58>:	lea    rax,[rsp+0x60]
   0x00000000006a2f5f <+63>:	mov    rsi,rdx
   0x00000000006a2f62 <+66>:	mov    rdi,rax
   0x00000000006a2f65 <+69>:	call   0x409f70 <_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ERKS4_@plt>
   0x00000000006a2f6a <+74>:	lea    rdx,[rsp+0x60]
   0x00000000006a2f6f <+79>:	mov    rax,QWORD PTR [rsp+0x18]
   0x00000000006a2f74 <+84>:	mov    rsi,rdx
   0x00000000006a2f77 <+87>:	mov    rdi,rax
   0x00000000006a2f7a <+90>:	call   0x6a2989 <GetAttribute(WidgetLoadingContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>

I'm guessing the bug is in the X64_CallFunction assembly then?

I also have to note, that whenever X64_CallFunction calls a function (any function, including those that work fine), gdb reports this when I try to get the backtrace: "Backtrace stopped: previous frame inner to this frame (corrupt stack?)" - I've ignored this because I don't think it's worthy to note as long as it works fine for other functions.

Seems like it, yes. The assembly code you provided only covers up until the call to GetAttribute. It would be nice to have the instructions that cover the return statement so we can see how it's returning the value. Can you get any more of it?

That message indicates that the stack frame was partially overwritten. This means it's probably overwriting part of the stack every time and only crashes here because the pointers are offset. I'm guessing GCC's calling conventions have changed and aren't accounted for. Which compiler version are you using?

I'm using gcc 5.4.0

Here's the last bunch of instructions of GetVector2_extra:


   0x00000000006a300b <+235>:	call   0x409f70 <_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ERKS4_@plt>
   0x00000000006a3010 <+240>:	mov    rcx,QWORD PTR [rsp+0x10]
   0x00000000006a3015 <+245>:	lea    rdx,[rsp+0x60]
   0x00000000006a301a <+250>:	mov    rax,QWORD PTR [rsp+0x18]
   0x00000000006a301f <+255>:	mov    QWORD PTR [rsp+0x8],rcx
   0x00000000006a3024 <+260>:	movq   xmm0,QWORD PTR [rsp+0x8]
   0x00000000006a302a <+266>:	mov    ecx,ebx
   0x00000000006a302c <+268>:	mov    esi,0x8becd7
   0x00000000006a3031 <+273>:	mov    rdi,rax
   0x00000000006a3034 <+276>:	call   0x6a503a <ReturnDefault<glm::tvec2<float, (glm::precision)0> >(WidgetLoadingContext*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, glm::tvec2<float, (glm::precision)0>)>
   0x00000000006a3039 <+281>:	movq   rbx,xmm0
   0x00000000006a303e <+286>:	lea    rax,[rsp+0x60]
   0x00000000006a3043 <+291>:	mov    rdi,rax
   0x00000000006a3046 <+294>:	call   0x40acb0 <_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev@plt>
   0x00000000006a304b <+299>:	mov    rax,rbx
   0x00000000006a304e <+302>:	mov    QWORD PTR [rsp+0x8],rax
   0x00000000006a3053 <+307>:	movq   xmm0,QWORD PTR [rsp+0x8]
   0x00000000006a3059 <+313>:	mov    rax,QWORD PTR [rsp+0x88]
   0x00000000006a3061 <+321>:	xor    rax,QWORD PTR fs:0x28
   0x00000000006a306a <+330>:	je     0x6a308e <GetVector2_extra(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, glm::tvec2<float, (glm::precision)0>, int, WidgetLoadingContext*)+366>
   0x00000000006a306c <+332>:	jmp    0x6a3089 <GetVector2_extra(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, glm::tvec2<float, (glm::precision)0>, int, WidgetLoadingContext*)+361>
   0x00000000006a306e <+334>:	mov    rbx,rax
   0x00000000006a3071 <+337>:	lea    rax,[rsp+0x60]
   0x00000000006a3076 <+342>:	mov    rdi,rax
   0x00000000006a3079 <+345>:	call   0x40acb0 <_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev@plt>
   0x00000000006a307e <+350>:	mov    rax,rbx
   0x00000000006a3081 <+353>:	mov    rdi,rax
   0x00000000006a3084 <+356>:	call   0x40aa20 <_Unwind_Resume@plt>
   0x00000000006a3089 <+361>:	call   0x40a270 <__stack_chk_fail@plt>
   0x00000000006a308e <+366>:	add    rsp,0x90
   0x00000000006a3095 <+373>:	pop    rbx
   0x00000000006a3096 <+374>:	ret

You can't see it in the screenshot, but the last line of the function is "return ReturnDefault<glm::vec2>(ctx, "vec2", name, req, def);"

Ok, based on what i've found, on Linux systems RAX contains the return value (https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI).

Return value optimization would normally pass the return address as a hidden argument, but i think that in this particular case it doesn't apply.

You're returning a glm::vec2, which is comprised out of 2 floats. On your system this is probably an 8 byte structure which can fit in RAX, so it returns by value.

I suppose the Angelscript x64 GCC calling instruction code will need to account for this.

I'm no compiler expert, so take this with a grain of salt, but that's what i think it does. You can try returning a larger data structure to check if this is the case. Even a glm::vec3 should use RVO here since it won't fit in RAX, but you may have to increase it even more. Perhaps try using double precision floats to force it to use RVO, since it'll double the structure sizes.

Advertisement

Good idea, however calling the same function with vec2 replaced with vec3 as return value and param still causes the invalid arguments, so that can't be it. Also, in x86, structures like vec2 appear to be passed by value, but in x64 if you pass a structure like this to a function it appears to be passed with a pointer instead of by value. So I don't think RAX will ever contain the actual structure, but rather a pointer to the structure. (Just observations, I'm no expert :))

Maybe something else is going on here. Interestingly, when targeting x86, the function arguments are fine up until the "def" param: (here still using vec3 as an example instead of vec2):


(gdb) i args
ctx = 0xffffc824
name = "anchor"
req = true
def = {{x = 9.41764226e-32, r = 9.41764226e-32, s = 9.41764226e-32}, {y = 4.17232506e-08, g = 4.17232506e-08, t = 4.17232506e-08}, {z = -nan(0x7fc1ec), 
    b = -nan(0x7fc1ec), p = -nan(0x7fc1ec)}}
stuff = 196785384
(gdb) p ctx->m_widgetHost->m_type->GetName()
$9 = 0xbc3e618 "IWidgetHost"
(gdb) p/x stuff
$10 = 0xbbab4e8
(gdb) p def.x
$11 = 9.41764226e-32

"stuff" is supposed to be 0x33333333, and def is supposed to be vec3(2.0, 8.0, 16.0). My first thought was that it might have something to do with the fact that glm::vec has unions, but that doesn't really make any sense because unions aren't actual data, they're just there for the programmer to use as aliases.

When observing other function calls, those arguments are also not always correct.

As a point of reference, here's the part of the premake4 (GENie) script I'm using to build AngelScript: (which works fine on Windows 32 bit. I haven't tested 64 bit on Windows yet.)


		project "angelscript"
			kind "StaticLib"
			language "C++"
			files {
				ANGELSCRIPT_DIR .. "source/as_atomic.cpp",
				ANGELSCRIPT_DIR .. "source/as_builder.cpp",
				ANGELSCRIPT_DIR .. "source/as_bytecode.cpp",
				ANGELSCRIPT_DIR .. "source/as_callfunc.cpp",
				ANGELSCRIPT_DIR .. "source/as_callfunc_x86.cpp",
				ANGELSCRIPT_DIR .. "source/as_callfunc_x64_gcc.cpp",
				ANGELSCRIPT_DIR .. "source/as_callfunc_x64_msvc.cpp",
				ANGELSCRIPT_DIR .. "source/as_callfunc_x64_mingw.cpp",
				ANGELSCRIPT_DIR .. "source/as_compiler.cpp",
				ANGELSCRIPT_DIR .. "source/as_context.cpp",
				ANGELSCRIPT_DIR .. "source/as_configgroup.cpp",
				ANGELSCRIPT_DIR .. "source/as_datatype.cpp",
				ANGELSCRIPT_DIR .. "source/as_generic.cpp",
				ANGELSCRIPT_DIR .. "source/as_gc.cpp",
				ANGELSCRIPT_DIR .. "source/as_globalproperty.cpp",
				ANGELSCRIPT_DIR .. "source/as_memory.cpp",
				ANGELSCRIPT_DIR .. "source/as_module.cpp",
				ANGELSCRIPT_DIR .. "source/as_objecttype.cpp",
				ANGELSCRIPT_DIR .. "source/as_outputbuffer.cpp",
				ANGELSCRIPT_DIR .. "source/as_parser.cpp",
				ANGELSCRIPT_DIR .. "source/as_restore.cpp",
				ANGELSCRIPT_DIR .. "source/as_scriptcode.cpp",
				ANGELSCRIPT_DIR .. "source/as_scriptengine.cpp",
				ANGELSCRIPT_DIR .. "source/as_scriptfunction.cpp",
				ANGELSCRIPT_DIR .. "source/as_scriptnode.cpp",
				ANGELSCRIPT_DIR .. "source/as_scriptobject.cpp",
				ANGELSCRIPT_DIR .. "source/as_string.cpp",
				ANGELSCRIPT_DIR .. "source/as_string_util.cpp",
				ANGELSCRIPT_DIR .. "source/as_thread.cpp",
				ANGELSCRIPT_DIR .. "source/as_tokenizer.cpp",
				ANGELSCRIPT_DIR .. "source/as_typeinfo.cpp",
				ANGELSCRIPT_DIR .. "source/as_variablescope.cpp"
			}
			configuration "x64"
				if os.get() == "linux" then
					buildoptions_cpp { "-fPIC" }
				end

Aha, I might be on to something here:


(gdb) p **(glm::vec3**)&def
$10 = {{x = 2, r = 2, s = 2}, {y = 8, g = 8, t = 8}, {z = 16, b = 16, p = 16}}
(gdb) p/x *(int*)&def.y
$31 = 0x33333333

Seems like there's a difference in pointer indirection there. Perhaps try passing the string and vec2 by const ref?

Yeah that's exactly what I did and that works now on 32 bit. That's still an inconsistency between the Windows and Linux builds though. Bug?

Either way, 64 bit issue is still there. So do you think the calling convention has changed with newer versions of gcc 5 that broke the existing x64 code?

This topic is closed to new replies.

Advertisement