First, a few assumptions that aren't necessarily true.
First, on the number of bits:
A FPS can often get away with using 24 bits fixed point for each of X and Y in position, and maybe as low as 20 bits for Z. This would, for example, give you +/- 8000 meters with millimeter precision for X/Y, and +/- 500 meters for elevation. For velocity, you can often get away with 16 bits for all quantities – this will give you +/- 32 meters per second with millimeter per second precision. (that's 100 kmph, which is fast enough for most FPS characters running.) You can cram this into 8 bytes if you really try, depending on size of level and precision needed.
For orientation, you often only need heading and pitch – most FPS don't actually require twist around the forward looking vector. You're always standing straight up, unless a special camera affecting animation is playing, and you don't sync the physical object to that orientation. For a non-character object, you need the three axes, but you don't need to send a full quaternion for this; you can make sure the largest value is positive, drop the largest of the four values, and send two bits for which axis was dropped, and then send the other three axes. I know of games that send a quaternion using 32 bits total; 2 bits for the eliminated axis, and 10 bits each for the other three axes. Note that when you drop the largest value, the next largest can be at most sqrt(0.5) in magnitude, so you don't even need to map the 10 bit range to the full -1 .. 1 range.
For spin, you will likely need all three axes. In most physics systems, spin is actually stored as the axis of rotation, with the length of the axis being the spin velocity (this ends up working out well when you multiply it out with the time delta and a quaternion to transform it.) So that's another three quantities, at whatever precision you prefer.
Depending on how precise and big you need this to be, you are looking at between 16 and 32 bytes for a full snapshot.
Second, on the size of a packet:
A single IP datagram is limited to 1280 bytes minus packet overhead, to make sure you don't get fragmentation on an IPv6 link or an Ethernet link. 1500 bytes is the size without fragmenting on Ethernet without jumbo frames, but half of all players don't even use Ethernet anymore, they use WiFi or gig ethernet with jumbo frames. And Ethernet fragmentation isn't actually a big deal. You can jam a UDP packet full with 64 kB of data, and send it to the other end of the earth, and with very high likelihood, the full packet will make it to the other end. RakNet does this, and goes one step further – it may send multiple, big, UDP datagrams, and pack self-correcting codes into the different packets, so that if one is lost, the full datagram can be reconstructed anyway.
Sure, sending a megabyte per second for an 8-player FPS will not be viewed kindly by your players, but 30 bytes per player, 50 times a second, for 20 players won't break the bank for most players on the current internet. If you're a very small studio, it's probably not the most important quantity to optimize further. (30*20 is just 600 bytes, btw, sending that 50 times per second is 30 kB/second.)