Advertisement

PeekMessage() stutters/inconsistent execution time (SDL/GLFW)

Started by August 12, 2023 01:25 PM
6 comments, last by frob 1 year, 5 months ago

So, i am/was investigating inconsistent frametimes/stuttering in my gameengine/renderloop and (after a bit of profiling) found out that the culprit lies within the polling of messages with SDL_PollEvent() (and SDL_PumpEvents())

OS is Windows 10 64bit

std::array< SDL_Event, 64> events;
Engine::Util::DurationTimer duration; //timestamp class
SDL_PumpEvents();
int total = 0;

{						
	int eventCount;
	do
	{
		eventCount = SDL_PeepEvents(&events[0], events.size(), SDL_GETEVENT, SDL_FIRSTEVENT, SDL_LASTEVENT);
		for (int i = 0; i < eventCount; ++i)
		{
		}
		total += eventCount;
	} while (eventCount == events.size());
}
Engine::Debug::println("TIME [ms]: "+ std::to_string(duration.getMilliseconds()) + " / Messages:" + std::to_string(total));

The console/timestamp logging prints this:

TIME [ms]: 0 / Messages:1
TIME [ms]: 0 / Messages:16
TIME [ms]: 0 / Messages:1
TIME [ms]: 0 / Messages:15
TIME [ms]: 0 / Messages:0
TIME [ms]: 0 / Messages:16
TIME [ms]: 0 / Messages:1
TIME [ms]: 0 / Messages:16
TIME [ms]: 0 / Messages:1
TIME [ms]: 6 / Messages:8
TIME [ms]: 14 / Messages:16  ///<<<?????
TIME [ms]: 3 / Messages:11
TIME [ms]: 0 / Messages:9
TIME [ms]: 0 / Messages:1
TIME [ms]: 2 / Messages:9
TIME [ms]: 0 / Messages:10
TIME [ms]: 1 / Messages:17
TIME [ms]: 0 / Messages:1
TIME [ms]: 1 / Messages:16
TIME [ms]: 0 / Messages:16
TIME [ms]: 0 / Messages:2
TIME [ms]: 0 / Messages:15
TIME [ms]: 0 / Messages:1
TIME [ms]: 0 / Messages:15
TIME [ms]: 0 / Messages:16
TIME [ms]: 0 / Messages:1
TIME [ms]: 0 / Messages:15
TIME [ms]: 0 / Messages:1
TIME [ms]: 0 / Messages:16
TIME [ms]: 0 / Messages:16
TIME [ms]: 0 / Messages:1
TIME [ms]: 0 / Messages:16
TIME [ms]: 0 / Messages:1
TIME [ms]: 0 / Messages:16
TIME [ms]: 0 / Messages:16
TIME [ms]: 0 / Messages:1
TIME [ms]: 0 / Messages:16

you can see that sometimes you get lagspikes in the double digit range (even sometimes in triple digit territory) which you can imagine is not optimal if you try to make sure that your gameloop (both logic and rendering) fits within the 16ms window (per frame) for a minimum 60 fps experience.

At first i thought maybe that SDL was the culprit so i tried recreating the same thing in GLFW (took the example almost 1:1 of their website)

#include <GLFW/glfw3.h>
#include "DurationTimer.h"
#include <iostream>

int main(void)
{
    GLFWwindow* window;

    /* Initialize the library */
    if (!glfwInit())
        return -1;

    /* Create a windowed mode window and its OpenGL context */
    window = glfwCreateWindow(640, 480, "Hello World", NULL, NULL);
    if (!window)
    {
        glfwTerminate();
        return -1;
    }

    /* Make the window's context current */
    glfwMakeContextCurrent(window);

    /* Loop until the user closes the window */
    while (!glfwWindowShouldClose(window))
    {

        /* Swap front and back buffers */
        glfwSwapBuffers(window);

        /* Poll for and process events */
        // Retrieve times:
        Engine::Util::DurationTimer t;
        glfwPollEvents();
        auto m = t.getMilliseconds();
        if (m > 0) {
            std::cout << m << std::endl;
        }
    }

    glfwTerminate();
    return 0;
}

The console output of the GLFW sample is this: (after simply moving my mouse around a bit and typing on a keyboard)

Times in milliseconds:

1
1
3
1
4
3
1
6
1
1
552 //<<<?????
2
1
1
1
1
1
1
1
1
1
3
1
1
1
1
13
1
1
1
1
2
1
1
1
1
1
1
1
1
1
2
12
4
1
1
1
1
1
1
2
8
1
2
1
1
1
6
1
1
1
1
2
1
1
1
1
2
2
1
1
1
1
22
1
1
1
1
13
2
4
2
2
3
1
1
1
2
27
2
1
13
6
1
1
2
23
3
2
1
1
1
1
8
7
1
1
1
9
1
1
1
1
1
1
1
1
1
7
10
2
2
2
1
1
1
8
1
1
1
1
12
1
1
1
1
1
1
1
1
9
1
1
1
1
1
1
1
1
9
6
1
1
1
1
16
1
28
3
1
1
9
1
1
1
1
1
1
14
2
1

So then i pulled the GLFW source code from github and did some digging.

Turns out that the spikes are being caused by “PeekMessageW” loop in the pollevents call: (win32_window.c file)

void _glfwPollEventsWin32(void)
{
    
    auto time = timePoint();
    MSG msg;
    HWND handle;
    _GLFWwindow* window;

	// >>> THIS LOOP IS THE CULPRIT
    while (PeekMessageW(&msg, NULL, 0, 0, PM_REMOVE))
    {
        if (msg.message == WM_QUIT)
        {
            // NOTE: While GLFW does not itself post WM_QUIT, other processes
            //       may post it to this one, for example Task Manager
            // HACK: Treat WM_QUIT as a close on all windows

            window = _glfw.windowListHead;
            while (window)
            {
                _glfwInputWindowCloseRequest(window);
                window = window->next;
            }
        }
        else
        {
            TranslateMessage(&msg);
            DispatchMessageW(&msg);
        }
    }
    ...

Did some research on google and this issue seems to crop up sometimes but i was not able to find any kind of solution to this problem.

Has anyone any idea as to what is going on and or has anyone any suggestion as to how this could be worked around?

Looking at my engine's code, I use GetMessage() instead of PeekMessage(), then call TranslateMessage() and then DispatchMessage(). I've never noticed any lag with that setup. From what I see the main difference between GetMessage() and PeekMessage() is that GetMessage() blocks the calling thread until a message is received. In my case, I run most of the engine in a secondary thread, and only use the win32 main event loop thread for event dispatching and creating windows, so any waiting on next message is acceptable because it doesn't affect frame rate of the engine thread.

I saw this thread here, which suggests writing the peek message loop like this to avoid the queue getting full, which might be a possible reason why you get occasional lag.

while (true){
    while (PeekMessage(..., PM_REMOVE) == TRUE) {
        if (Msg.message == WM_QUIT) break;
        TranslateMessage();     DispatchMessage();
    }
    // Do your stuff here
}
Advertisement

Aressera said:

while (true){
    while (PeekMessage(..., PM_REMOVE) == TRUE) {
        if (Msg.message == WM_QUIT) break;
        TranslateMessage();     DispatchMessage();
    }
    // Do your stuff here
}

That's essentially what the GLFW sample is doing (the peekmessage while loop is exatly what's causing the spikes) and the issue is still occuring.
The weird thing is that the stutter/high execution time happens even if no messages are recieved at all (as if the PeekMessage() call blocks for a few milliseconds and then resumes without returning messages to process.)

Performing the Message processing on a seperate thread might be a workaround (although if you get spikes in the 3 digit range such as in the example above where i had a 500ms spike) this will lead to input (either mouse and/or keyboard) being delayed/not recognized fast enough for the next rendered/processed frame. (Not to mention that you then have to handle accessing the swapchain/window state with OpenGL/Vulkan/DirectX from a seperate thread which makes the architecture more complicated.)

I'm not sure if the whole issue is Hardware and/or Driver related given that both SDL and GLFW show the same behaviour. (Makes my current dev-rig pratically unusable for measuring performance like that and it worries me that players might run into the same issue without a workaround in sight.)

I would test a release build on a different computer and see if it shows the same behavior. It does seem like it might be something specific to your machine. 500ms of lag can't be easily explained otherwise. Do you have something like antivirus running in the background?

Don't have an antivirus running besides the defaulf windows defender. My current rig where the spikes are happening has an I7 6700k CPU (Windows 10).

The measurements were also taken with a Release build.

Tested this also on a laptop with a Ryzen 7 5800H (Windows 11, basically a clean install.)
There i get framespikes of around 1-3 milliseconds. (Wasn't able to get anything higher than that.) Although granted, they happen less frequent. (And you have to type/move your mouse faster to get those spikes compared to the other build.)
But even cutting up to 3 milliseconds from your frame budget simply for polling the messagequeue feels a bit high. (My feeling is that the same issue might be happening there but the better CPU performance compared to the I7 might reduce the execution time?)

One thing to keep in mind, is that PeekMessage will not only give you the message, but also internal do other things. From MSDN:

During this call, the system dispatches (DispatchMessage) pending, nonqueued messages, that is, messages sent to windows owned by the calling thread using the SendMessage, SendMessageCallback, SendMessageTimeout, or SendNotifyMessage function. Then the first queued message that matches the specified filter is retrieved. The system may also process internal events. If no filter is specified, messages are processed in the following order:

So I think its reasonable to assume that it's not the function itself that is hanging (as it happens on multiple computers), but some receiver of messages in the entire application, notably if you specify a message-handler proc. You have tried to change to GLFW, but have you tried disabling other modules in the application to see if the issue disppears? There is also some more things that can attach to message-processing AFAIK, forgot what exactly it's called but there was some way to intercept input-messages specifically. So in general a lot of external/global things that might affect the process of PeekMessage, even in case no message is actually returned to you.

Advertisement

While what you've got is helpful, it's also going to be misleading. It isn't detailed enough.

The type of measuring you're doing is subject to Windows scheduling issues, to Windows time slices, to competition with everything else going on in the system. It's also not painting a very complete picture, just where you put your heartbeat measurements.

Use a proper profiler. Pix is a free download from Microsoft. It's a sampling profiler so it won't tell you ALL the details of your program, it will capture data around the actual slow places and that's where it's critical.

This topic is closed to new replies.

Advertisement