If I don't use a queue to store the states, I will only need two states for interpolation. One is the previous state, and the other is the future state or current state. Just need to swap the state buffer when a new state comes in. If I use a queue, I need to find a state in the queue that is just before the render time and a closet state that is behind the render time, this way the render time which is clientLocalTime + offset - 100 will be lying in the middle of the two states I find in the queue. The key value that matters most is the render time. Am I correct?
The last part is correct. But the question is not "whether you use a queue or not". Whether you use a queue or not depends on whether you want to use a queue or not, or whether you need a queue for what you want to do.
There are 2 strategies to choose from here:
1) Collect sufficient past states so that you can render at some arbitrary time in the past (e.g. 200ms), usually measured relative to a timer somewhat-synchronised with the server. A queue is a good way of storing these states, as you don't know exactly how many you're going to have when covering the necessary time period. It's possible to change the time buffer from 200ms to whatever you like, providing you do it smoothly, and that it always stays large enough that you have 2 states 'either side' of it, to interpolate between.
2) Keep 2 states, so that you can render between the last received position, and some position previous to that. Ideally that previous position is whatever you were rendering when the latest position came in, because that guarantees smooth rendering on the client. The render time here is not attempting to match any particular server time; it's just attempting to provide smooth rendering that closely follows what the server is sending.
What you can't do - but were doing, initially - is trying to use the 2nd strategy's data structure for the 1st strategy's algorithm, and that can never work because you couldn't guarantee that your stored positions spanned the time you wanted to render at.
Regarding client prediction, you will quickly realise that 'running a client simulation locally' and 'other techniques such as client prediction/extrapolation' are actually the same thing, with the same symptoms of showing different things on different screens. The best solution is to reduce latency so that the differences are minimised.