I am looking for a way to predict the user's next position accurately based on his position from the previous frames.
Context:
I have an AR application that renders frames on a server and sends them to the client through a wireless network, I receive the user's position, project the scene I have on the server, and send the projected frame, there is always a high latency through the network (roughly between 30 to 100 ms depending on the traffic) which forces me to predict the user's position if I don't want my application to feel very laggy.
What I am doing right now, is taking the position from the 2 previous frames, to get a direction, I also calculate the acceleration from the previous 4 frames, and average it, having those, I predict the user's next position using basic newton's law of motion, and despite its simplicity, if the user is moving in a linear fashion (not spiral or circular) it works okay, there is, of course, an issue when the user flips his direction but that is a problem that will always exist in any prediction algorithm, but I want to make it better.
After looking up online, this seems to be a problem very common in computer graphics and gamedev (prediction for collisions, mouse dragging, etc.) So, I am looking for any suggestion on how to improve it, or if possible, a better method.
Upon googling into this topic, Control Theory comes out a lot, which I have no experience with it, I tried digging into it last week using an introductory course, but that just showed me that this topic is very vast and I won't need the majority of it (nor I have time sadly). So I am asking if anyone knows resources that specifically address this issue, of how to accurately predict the next position.
I also looked into statistics and there is a possibility of implementing something like a linear regression model that tries to stabilize the prediction based on the previous errors (however, my knowledge is a bit limited in numerical analysis and I feel it is using the wrong way to solve the problem, as looking into what control theory about seems ideal for my case)
Resources for specifically control theory in gamedev, prediction, are welcomed as much as suggestions
Best,
Ali
Accurate position Predict based on previous frames positions (probably control theory)
None
AliAbdulKareem said:
After looking up online, this seems to be a problem very common in computer graphics and gamedev (prediction for collisions, mouse dragging, etc.)
We're probably not really experienced, because our systems are low latency, usually.
I do remember Google Stadia tried to predict player input using machine learning. But idk if this was just research or actually used, if it required training per game, etc. Maybe worth to look up.
One thing we actually do is reprojecting rendered frames. That's maybe better than predicting input.
Basically you would project the outdated frame you get on the network on the AR device to the current camera position.
This would cause artifacts in the image due to missing, occluded information, though. So instead feeling your head is wobbling, you would feel your eyes have issues. And you still see the past, but at least from the proper view port.
But if your display device has some compute power, it would be worth a try maybe. We use this mostly for temporal anti aliasing. Related search terms are TAA, reprojection, motion vectors.
AliAbdulKareem said:
What I am doing right now, is taking the position from the 2 previous frames, to get a direction, I also calculate the acceleration from the previous 4 frames, and average it, having those, I predict the user's next position using basic newton's law of motion
This sounds a bit more complicated than needed, probably caused by a desperate attempt to accomplish the impossible?
What i mean is, i wonder why you need 4 states to get acceleration.
You should get velocity from the most recent two states, so to get acceleration you would only need 3 states, not 4.
But i don't see a need for acceleration at all. If you already have velocity, all you can do is assuming it remains constant and predict using just that.
Not perfect ofc., so maybe you tried to smooth it out, generating nice trajectory curves or something like that. But i would assume while this helps in some cases, the real failure cases become even worse.
To apply some smoothing, which i guess is needed, i would try to prefer using predicted states, not existing states which are back too much in the past. But not sure. Never did anything similar.
JoeJ said:
This sounds a bit more c What i mean is, i wonder why you need 4 states to get acceleration.
You should get velocity from the most recent two states, so to get acceleration you would only need 3 states, not 4.
Right, I did that because in my case of an AR application the head is moving constantly as well as the hands, so I wanted to average something so it works better in case of a very small movement (as they will cancel out each other) there might be something simpler like clamping the value if it is under a certain threshold.
JoeJ said:
I do remember Google Stadia tried to predict player input using machine learning. But idk if this was just research or actually used, if it required training per game, etc. Maybe worth to look up.
I looked it up a bit before, but it seems to be complex, not only we don't have machine learning engineers, but we also don't know where to get this specific data from to train the ML model on it. (i.e. little leg movements, a lot of head movements, a lot of hand movements, specifically a lot of finger/gesture movements)
I think it will be very hard to convince the company to take this approach.
Thanks for your suggestion regarding re-projection and other terms, I will try to see how that works.
As you mentioned, I actually did use the predicted state, and indeed the failure cases become more like a glitch, I feel something in between would work better but I am not sure what to use (neither linear nor bilinear blending worked).
At least for now the re-projection sounds very promising from the first google searches.
Thanks
None
AliAbdulKareem said:
At least for now the re-projection sounds very promising from the first google searches.
Somehow i feel guilty for luring you on the next path of failure, eventually ; )
But here a very good tutorial about implementing TAA, which surely helps to estimate expected problems: https://www.elopezr.com/temporal-aa-and-the-quest-for-the-holy-trail/
Personally, i see only one way for ‘proper’ client-server rendering. But this has not been explored yet from the games industry.
I remember a paper from Qualcomm, but could not find it again. Will post later if i do…
The idea is to do expensive things like lighting on the server, then sending the lit textures plus the geometry for the current view to the client.
The client then renders the image at the correct camera transform, at low cost.
Ofc. this only helps if your geometry is not crazy detailed, and lighting is indeed the expensive problem. So it depends on what you do.