(Let BLUE be the memory address of the blue byte, RED be the memory address of the red byte, and TEMP be the temp byte. Registers are denoted R, followed by a number.)
Swapping
|
XOR
|
Note how few registers (extremely fast memory) we use, while we''re hitting memory (compartively much slower) quite a bit. In the swapping routine, we hit memory 6 times, while in the XOR routine, that''s 9 times. In this way, the swapping routine is 33% faster than the XOR routine.
In actuality, it''s even worse. The red and blue bytes of code are within an index of an array. So not only do we have to hit the red and blue bytes themselves, but we ALSO have to retrieve the memory address of the array holding the data AND the offset into the array the data is (the number in the brackets). That adds 8 more memory hits to the swapping method, but EIGHTEEN more to the XOR methods. We''re now up to 14 memory hits versus 27. Now the swapping routine is roughly 50% faster. Hence why you''re getting the results you''re getting.
However, there''s a way to get around that, to make the XOR more effecient. In fact, the only reason it is more efficient in this way is because we don''t have to allocate space for the temporary variable. That''s done by using a wonderful keyword called "register". By putting "register" in front of a variable name, the compiler will attempt to force that variable to remain in one of the processor''s registers throughout it''s entire scope, thus speeding it up since we''re not pinging memory all the time .

|
As you can see, just 4 memory hits instead of 6 this time. The XOR will be similar.
|
4 memory hits here as well. Though you have three more assembly operations here, the XOR is then quicker because the XOR operations are extremely fast, but allocating space for the "temp" variable takes more time.
Hope that explains everything!

~ Dragonus