Caveat it is many years since I did any multiplayer stuff so my memory is hazy and I defer to the others on this / might be wildly wrong:
One key thing to make crystal clear is whether you are using the simpler scheme of displaying every game object that is not the player as a delayed version, I am assuming this. There are some more complex schemes where you are trying to display client predicted versions of other players (I believe Hodgman uses this in his racing game), and this is a whole other ballgame.
In general limiting the amount of objects to be snapped back and resimulated sounds like something you need to design the game around.
Simplest version you could have a local bullet simulation that only included the player, and the static (ground, walls etc) and non-simulated elements (say deterministic moving platforms) of the world. That way your client prediction would tend to match the server except in cases of interaction. Putting more objects into the client simulation is just a way of trying to make it less likely that the server and client simulations will diverge (or rather diverge over a large amount).
In a typical multiplayer game you might also decide to simulate 8-16 other players while doing a client side correction (in some way, perhaps more limited).
Even with 8-16 players, (and especially with more dynamic objects) you might try and simulate only the nearest / most relevant ones .. maybe only 3 could influence the player on a particular tick? For instance if you have a potentially visible set you might only be interested in simulating objects that are within the PVS for the player. The general idea is that stuff that is further from the player or deemed less relevant is less likely to cause a change to the player physics.
Presuming you are using the simpler scheme of displaying on the client a smoothed / delayed version of the server simulation (for everything but the player) then understand that you might end up with 2 distinct versions of non main player objects :
- a client predicted physics version for use only in determining your players position
- a displayed version which is delayed and matches the server simulation, and is not the client predicted version
As to how you go from a current (wrong) predicted player position to the recalculated position to prevent a snap, you just use interpolation of some sort to smooth it out. Although I don't think this is what you were asking.
15 hours ago, zqf said:
But applying inputs means stepping the player's avatar in the physics engine. From what I've read so far it is common technique to give players instant feedback since games like Quakeworld. But I can't see how this is done in physics engines which only have single monolithic steps?
Well, unless the physics engine has specific support for this, a step is just a step. You would remove objects from the physics that weren't relevant, add ones that are, set their positions, velocities etc, then step them, and read the results. As to how you do this in bullet I have no experience, but I'm sure it would be possible to step manually or abuse to do this.