the server will occasionally need to stop and recover in the event that it runs out of inputs
For a lockstep simulated game (like an RTS,) this makes sense. The server proceeds to step N exactly when it has inputs from all clients for step N.
For a FPS game, the server really just needs to keep going, and if one or more players haven't provided input for step N, then they proceed as if they had not given any input (which will likely generate corrections if it happens when they player is trying to steer.)
If you have the automatic "send so it arrives in time," and drive everything else based on that adjustment, you don't need to RTT; it "falls out" of the math. Your main option is then to figure out how far behind server updates you display remote entities -- at known positions but late, or at furthermore extrapolated positions but more up-to-date-guessed.
You can estimate the end-to-end RTT simply by calculating game time when you process a received server packet. If you already have a game-time-offset variable, the RTT is "timestep of latest server entity update game time" minus "my currently estimated server time." Note that your currently estimated server time includes the transmit delay, because it's designed to make you send the input data so it arrives in time to the server.
This value will always be a lot bigger than the raw physical network ping time, so gamers will hate it if you show it to them.
Thus, having the server always respond with a packet as soon as a packet is received (in some real-time thread) and measure the round-trip time of that (using a real-time receving thread on client, too) will show what gamers want to think of as their "ping."