ARMv6 and earlier include optional support for big-endian instruction
encoding (in three all encoding variants: ARM, Thumb, and Thumb-2).
LLVM already supports assembling ARM code as big-endian by specifying
'armbe' as the target architecture; however, the disassembler
currently ignores the endian choice and always disassembles as
little-endian. This is a simple patch to fix that, plus a test.
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 19954 Build 19954: arc lint + arc unit
Event Timeline
Is this applicable only for ARMv6 case? From the patch it does not seem so and so may be good to add cases for ARMv7 as well.
test/MC/Disassembler/ARM/endian.txt | ||
---|---|---|
2 | Please consider adding a line description of the test. Kind of obvious from reading the test but may still be useful. |
Fix disassembling ARM instructions as big-endian
ARMv6 and earlier (and ARMv7-R) include optional support for
big-endian instruction encoding, in all three encoding variants:
ARM, Thumb, and Thumb-2. LLVM already supports assembling these
as big-endian by adding 'eb' to the architecture name in the triple
(e.g. 'armeb', 'armv6eb', 'thumbv7eb'); however, the disassembler
currently ignores the endian choice and always disassembles as
little-endian. This is a simple patch to fix that, plus a test.
[Sidenote: ARM supports configuring data and instruction endianness
separately, and ARMv7 does not deprecate big-endian *data* encoding.
Thus, the ARM proprietary toolchain and GNU toolchain support targets
where instructions are encoded is little-endian but data accesses
are big-endian. This is known as BE8 mode (--be8) and is implemented
in a hacky way: the compiler emits big-endian instructions and the
linker swaps them! LLVM does not currently support BE8 at all. In
theory, when parsing an ELF file with EF_ARM_BE8 in e_flags, the
disassembler should disassemble as little-endian, i.e. the old
behavior. However, I think that's out of scope for this patch,
which is just a bugfix.]
I just ran into what I think may be the same issue here in that llvm-objdump seems to ignore machine data (endian, possible mode) when disassembling ARM instructions on ARMv7. Is there any interest in completing this review and committing this particular fix (and making sure it works for ARMv7 as well)?
Just checking in again -- is there interest in completing this review? I'd be happy to assist, or I could post an updated version of it.
I would use support::endian::endianness instead of a bool. Then you can remove the /*isLittleEndian=*/ comments and use the read calls that take an endianness. E.g. uint16_t read16(const void *P, endianness E);.
I assume armv7eb would work out of the box, given that llvm-objdump already knows the file is big endian. (I thought there might be some ELF attribute we needed to look for)
Please consider adding a line description of the test. Kind of obvious from reading the test but may still be useful.