This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump,ARM] Add PrettyPrinters for Arm and AArch64.
ClosedPublic

Authored by simon_tatham on Jul 22 2022, 7:17 AM.

Details

Summary

Most Arm disassemblers, including GNU objdump and Arm's own fromelf,
emit an instruction's raw encoding as a 32-bit words or (for Thumb)
one or two 16-bit halfwords, in logical order rather than according to
their storage endianness. This is generally easier to read: it matches
the encoding diagrams in the architecture spec, it matches the value
you'd write in a .inst directive, and it means that fields within
the instruction encoding that span more than one byte (such as branch
offsets or SVC immediates) can be read directly in the encoding
without having to mentally reverse the bytes.

llvm-objdump already has a system of PrettyPrinter subclasses which
makes it easy for a target to drop in its own preferred formatting.
This patch adds pretty-printers for all the Arm targets, so that
llvm-objdump will display Arm instruction encodings in their preferred
layout instead of little-endian and bytewise.

Diff Detail

Event Timeline

simon_tatham created this revision.Jul 22 2022, 7:17 AM
simon_tatham requested review of this revision.Jul 22 2022, 7:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2022, 7:17 AM

Are there existing tests that check this works for a big endian object file?

llvm/tools/llvm-objdump/llvm-objdump.cpp
688

You could write this as:

size_t insn_bytes = STI.checkFeatures("+thumb-mode") ? 2 : 4;
for (; Pos + insn_bytes <= End; Pos += insn_bytes)
      OS << ' '
         << format_hex_no_prefix(
                llvm::support::endian::read<uint16_t>(
                    Bytes.data() + Pos, llvm::support::little),
                insn_bytes*2);
}
693

Does the endian here have to match the endian of the object file?

Updated to include fixes to all the affected lld tests, which also
make heavy use of llvm-objdump.

simon_tatham added inline comments.Jul 25 2022, 6:59 AM
llvm/tools/llvm-objdump/llvm-objdump.cpp
688

I don't think so, because you need to read<uint16_t> or read<uint32_t> depending on insn_bytes. If there had been an llvm::support::endian::read function that took the number of bytes of the integer as a runtime parameter, then yes, I could fold those branches together.

693

As per the comment in D130357, no, in all versions of the Arm architecture supported by LLVM MC, instructions are stored little-endian regardless of data endianness.

This revision is now accepted and ready to land.Jul 25 2022, 7:27 AM
This revision was landed with ongoing or failed builds.Jul 26 2022, 1:35 AM
This revision was automatically updated to reflect the committed changes.