This is an archive of the discontinued LLVM Phabricator instance.

Fix disassembling ARM instructions as big-endian
Needs ReviewPublic

Authored by comex on Jul 1 2018, 3:26 PM.
Tokens
"Like" token, awarded by ivg.

Details

Summary

ARMv6 and earlier include optional support for big-endian instruction
encoding (in three all encoding variants: ARM, Thumb, and Thumb-2).
LLVM already supports assembling ARM code as big-endian by specifying
'armbe' as the target architecture; however, the disassembler
currently ignores the endian choice and always disassembles as
little-endian. This is a simple patch to fix that, plus a test.

Diff Detail

Event Timeline

comex created this revision.Jul 1 2018, 3:26 PM

Is this applicable only for ARMv6 case? From the patch it does not seem so and so may be good to add cases for ARMv7 as well.

test/MC/Disassembler/ARM/endian.txt
2

Please consider adding a line description of the test. Kind of obvious from reading the test but may still be useful.

comex updated this revision to Diff 153766.Jul 2 2018, 12:24 PM

Fix disassembling ARM instructions as big-endian

ARMv6 and earlier (and ARMv7-R) include optional support for
big-endian instruction encoding, in all three encoding variants:
ARM, Thumb, and Thumb-2. LLVM already supports assembling these
as big-endian by adding 'eb' to the architecture name in the triple
(e.g. 'armeb', 'armv6eb', 'thumbv7eb'); however, the disassembler
currently ignores the endian choice and always disassembles as
little-endian. This is a simple patch to fix that, plus a test.

[Sidenote: ARM supports configuring data and instruction endianness
separately, and ARMv7 does not deprecate big-endian *data* encoding.
Thus, the ARM proprietary toolchain and GNU toolchain support targets
where instructions are encoded is little-endian but data accesses
are big-endian. This is known as BE8 mode (--be8) and is implemented
in a hacky way: the compiler emits big-endian instructions and the
linker swaps them! LLVM does not currently support BE8 at all. In
theory, when parsing an ELF file with EF_ARM_BE8 in e_flags, the
disassembler should disassemble as little-endian, i.e. the old
behavior. However, I think that's out of scope for this patch,
which is just a bugfix.]

I just ran into what I think may be the same issue here in that llvm-objdump seems to ignore machine data (endian, possible mode) when disassembling ARM instructions on ARMv7. Is there any interest in completing this review and committing this particular fix (and making sure it works for ARMv7 as well)?

Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2020, 4:00 PM
Herald added a subscriber: danielkiss. · View Herald Transcript

Just checking in again -- is there interest in completing this review? I'd be happy to assist, or I could post an updated version of it.

I would use support::endian::endianness instead of a bool. Then you can remove the /*isLittleEndian=*/ comments and use the read calls that take an endianness. E.g. uint16_t read16(const void *P, endianness E);.

I assume armv7eb would work out of the box, given that llvm-objdump already knows the file is big endian. (I thought there might be some ELF attribute we needed to look for)

ivg awarded a token.Apr 30 2021, 2:03 PM
ivg added a subscriber: ivg.
DavidSpickett resigned from this revision.Oct 7 2022, 2:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 2:30 AM