For movement in my game, I allow for some rewind by the server, but actions only happen when they arrive at the server and when the server says it's okay. This prevents the scenario you described in your OP. It gets a little bit tricky with actions that affect movement, but that's manageable.
Wouldn't this create collision detection issues potentially? The player will shoot at the place where he sees the monster (up to 100ms back in time) and then will get the action executed when the monster is probably at a different location.
Yes, but you have to give in somewhere, everything is a trade-off. You need to design your game around these trade-offs. Is shooting in your ARPG really that precise? How often will that scenario present itself / do monsters move about erratically?