Running the ARM instruction emulation test on a big-endian system
would fail, since the code doesn't respect endianness properly.
In EmulateInstructionARM::TestEmulation, code assumes that an
instruction opcode read in from the test file is in target byte
order, but it was in fact read in in host byte order.
More difficult to fix, the EmulationStateARM structure models
the overlapping sregs and dregs by a union in _sd_regs. This
only works correctly if the host is a little-endian system.
I've removed the union in favor of a simple array containing
the 32 sregs, and changed any code accessing dregs to explicitly
use the correct two sregs overlaying that dreg in the proper
target order.
Also, the EmulationStateARM::ReadPseudoMemory and WritePseudoMemory
track memory as a map of uint32_t values in host byte order, and
implement 64-bit memory accessing by splitting them up into two
uint32_t ones. However, callers expect memory contents to be
provided in the form of a byte array (in target byte order).
This means the uint32_t contents need to be byte-swapped on
BE systems, and when splitting up a 64-bit access into two 32-bit
ones, byte order has to be respected.