If you are behind (sending commands too late) you need to immediately catch up, which means stepping the simulation several times without rendering. This will glitch a little bit, but hopefully only happens at most once. If you are ahead (sending commands too early, causing unnecessary latency) you can smoothly drop the step delta. Say, if you're ahead by X steps, you subtract (X / 10) steps from the local offset (which means you step simulation slower / less frequently for a little bit.) (Also, you have to keep the time -> server ticks offset variable as a double or fraction)
Finally, you want some "desired" set of jitter buffering. It may be OK to be between 0-4 steps ahead of the server, for example, because transmission times will jump around a little bit. You may even want to count how many times you have to adjust the clock (either direction) and slowly increasae the acceptable jitter size if you have to adjust it often, to compensate.
Hello. I'd like to thank for your help. I have implemented what has been discussed in this thread, and the results are much better than before!
The player now is almost exactly where he should by the time the update packet is received! (It's not exactly 100% correct, but it's barely noticeable).
One thing remains to do. Currently the client completely ignores the tick diff reported by the server and just adds +2 to each input's tick (+2 is because that's what I noticed was the best with the server running on localhost).
By setting an artificial delay to my network, the results change, because the tick diff changes. With 50 milliseconds delay, the client gets this: http://pastebin.com/kNu9RL1U
As you can see, it changes just a bit when it does.
Here is my idea: Use an average of the last X tick differences (updated each frame) and round to the nearest integer. But what would be a good value for X? Probably it should be a small number - I was thinking of maybe 5.