Thanks! I recently read this article: https://link.springer.com/article/10.1007/s00530-012-0271-3 and I think it helped me clarify the point I was trying to make.
Summarizing here to see if it makes sense. They describe in 4.3.1 a model for "Time Offsetting Techniques". It seems like this matches up with the intentions of a Lag Compensation model coupled with a Client side prediction model. It seems like this solution essentially assumes two frame times. There's the frame associated with your current character, which is where you've predicted yourself to be when you choose to shoot. There's also the frame associated with all the remote objects you're seeing, which is the frame the server has last sent you.
The reason for having two frames is because the server has all the latest information (relative to you) of your remote objects, and it packaged and sent that to you. However, you have some inputs that the server does not have yet. Therefore, when you render "Frame 5", this frame, on your screen, represents all the objects in the past at Frame 5 but your player character potentially at Frame 10. When the server receives the packet that you shot, it needs to reconcile two things: where the objects were at frame 5, and where you were at frame 10. This accurately allows the server to produce a rendition of what you were seeing when you shot and resolve things appropriately.
Is this what you meant when you said "your input at frame (5 + transmission latency)"? As in, 5 represents the frame that all your remote objects were at, and the transmission latency + any buffer represents where you were when you chose to shoot at frame 5? With both pieces of information, the server knows how to reconcile things appropriately. This can be likened to "combining Option 1 and 3" and storing two frame numbers.
What I'm seeing here is that maybe my breakdown of the "options" wasn't the best way to put it. At the end of the day, the key way to look at it may be to think about the timeline of each of my objects on any particular player or server's game state. At any given time, I will be rendering a single frame based on my knowledge of where remote objects and my player is. For all objects, I technically hold stale data, but I hold perfect data for my player inputs. The server holds perfect data, but stale data for all player inputs. As a result, as a dev, I'm choosing how I want to choose to reconcile everything when I render one frame.
One option for clients, as described above, is to choose to render the stale data for opponents but perfect input for myself. This allows the server an easier job to reconcile and perform what's known as "lag compensation". Alternatively, I can choose to predict everything including myself, which would be some form of full prediction. Alternatively I could choose to delay my own inputs to ensure that everything, including my inputs are "stale" for consistency. On the server, I could choose to reconcile the client's stale data to favor the client, or wield absolute power and use my perfect data to make decisions. All of these choices aren't valid in the grand scheme of things because networking is annoying, but they may result in 99% of the time, the game feeling fine, depending on the game. But it all boils down to reconciling the different logical frames (different timelines) of remote/player objects in order to render the actual physical frame on the screen.
A long wall of text later, does this match expectations?