This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump,ARM] Fix big-endian AArch32 disassembly.
ClosedPublic

Authored by simon_tatham on Aug 1 2022, 7:31 AM.

Details

Summary

The ABI for big-endian AArch32, as specified by AAELF32, is above-
averagely complicated. Relocatable object files are expected to store
instruction encodings in byte order matching the ELF file's endianness
(so, big-endian for a BE ELF file). But executable images can
either do that or store instructions little-endian regardless
of data and ELF endianness (to support BE32 and BE8 platforms
respectively). They signal the latter by setting the EF_ARM_BE8 flag
in the ELF header.

(In the case of the Thumb instruction set, this all means that each
16-bit halfword of a Thumb instruction is stored in one or other
endianness. The two halfwords of a 32-bit Thumb instruction must
appear in the same order no matter what, because the first halfword is
the one that must avoid overlapping the encoding of any 16-bit Thumb
instruction.)

llvm-objdump was unconditionally expecting Arm instructions to be
stored little-endian. So it would correctly disassemble a BE8 image,
but if you gave it a BE32 image or a BE object file, it would retrieve
every instruction in byte-swapped form and disassemble it to
nonsense. (Even an object file output by LLVM itself, because
ARMMCCodeEmitter outputs instructions big-endian in big-endian mode,
which is correct for writing an object file.)

This patch allows llvm-objdump to correctly disassemble all three of
those classes of Arm ELF file. It does it by introducing a new
SubtargetFeature for big-endian instructions, setting it from the ELF
image type and flags during llvm-objdump setup, and teaching both
ARMDisassembler and llvm-objdump itself to pay attention to it when
retrieving instruction data from a section being disassembled.

Diff Detail

Event Timeline

simon_tatham created this revision.Aug 1 2022, 7:31 AM
simon_tatham requested review of this revision.Aug 1 2022, 7:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2022, 7:31 AM

Updated with arc lint style fixes.

(I didn't apply them ahead of time because I was confused by output from arc that appeared to want to reformat most of ARMDisassembler.cpp! But it turns out that was some kind of intermediate output, and not part of the final patch it wanted to apply.)

DavidSpickett added inline comments.Aug 1 2022, 7:54 AM
llvm/lib/Target/ARM/ARM.td
748

When you say "in memory" here, does it refer to "memory" on the device or "memory" in the object file? I think from the last sentence, you mean inside the object file, which is what objdump cares about.

Because if I had an object file that was going to be byte reversed by the linker then I'd need to disassemble the object file in one endian, but if I was disassembling the content of memory on the device. I'd have to use the opposite endian.

simon_tatham added inline comments.Aug 1 2022, 7:57 AM
llvm/lib/Target/ARM/ARM.td
748

Hmmm. It's true, of course, that when you read a relocatable object file, the endianness in memory after linking is not yet specified, because the linker may or may not choose to byte-reverse the instructions and set the BE8 flag. But I had imagined it would be clear that "in memory" here means in the memory that the disassembler is reading from, not the memory on the target!

Do you have a suggested alternative wording that doesn't end up being a whole paragraph? Perhaps just deleting "in memory" would be simplest?

MaskRay added inline comments.Aug 2 2022, 12:19 AM
llvm/tools/llvm-objdump/llvm-objdump.cpp
683

printInst is called for every instruction. Calling checkFeatures here may be too expensive.

MaskRay added inline comments.Aug 2 2022, 12:21 AM
llvm/test/tools/llvm-objdump/ELF/ARM/be32-image-disasm.test
4 ↗(On Diff #449016)

Add -NEXT whenever appropriate

45 ↗(On Diff #449016)

delete ...

llvm/test/tools/llvm-objdump/ELF/ARM/be8-image-disasm.test
1 ↗(On Diff #449016)

Do we need two files? Grep yaml2obj .*-D and use it to combine tests into one file.

3 ↗(On Diff #449016)

Add -NEXT whenever appropriate

45 ↗(On Diff #449016)

delete ...

DavidSpickett added inline comments.Aug 2 2022, 2:52 AM
llvm/lib/Target/ARM/ARM.td
748

Perhaps just deleting "in memory" would be simplest?

Sounds good to me.

Addressed all review comments, I think.

I couldn't find any convenient way to make
STI.getFeatureBits()[ARM::ModeBigEndianInstructions] work in
llvm-objdump.cpp, because that tool doesn't have the right include
path to include the generated file where those enums are defined. So
I've made the prettyprinter have its own endianness flag that's set
directly at the same time as setting that feature in the
SubtargetInfo, and that will save having to check anything in the STI
at all.

LGTM if @MaskRay is cool with it.

I remembered that this patch https://reviews.llvm.org/D48811 was posted ages ago and never went anywhere. Do you have an opinion on that? It makes objdump disassemble as big endian if there is eb in the target name. From what I read here I don't think it's needed because the important thing is the format in the object file. Is that correct?

llvm/lib/Target/ARM/ARM.td
733

// Endianness of instruction encodings

I didn't know about that previous patch at all, but off the top of my head, yes, I don't think it should be necessary at least for ELF files, because the combination of ELF header endianness and EF_ARM_BE8 flag is enough to autodetect the right code endianness in all cases.

For other Arm object file formats, on the other hand, I can't say the same. I know nothing at all about big-endian support in Arm COFF, for example, including whether it has any BE support at all!

I wouldn't rule out the possibility that at some point an instance of MCDisassembler might need to have its code endianness manually set to the right state – but the facility provided in this commit can be reused for that, if and when the use case shows up, because all you have to do is to set the new big-endian-instructions subtarget feature.

MaskRay added inline comments.Aug 5 2022, 10:22 AM
llvm/test/tools/llvm-objdump/ELF/ARM/be-image-disasm.test
1 ↗(On Diff #449257)

You can place be-image-disasm.test and be-object-disasm.test in one file.

Grep docnum on existing tests.

4 ↗(On Diff #449257)

binary utility test directories use ## for non-RUN non-CHECK comments. They make comments stand out and they may be highlit with different styles in an editor.

This revision was not accepted when it landed; it landed in state Needs Review.Aug 8 2022, 2:50 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.

Oh, oops, caught out again :-( I really must stop assuming that a Phab comment saying "LGTM" actually marks the patch as accepted. Sorry about that. Will revert if you need me to.