This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump] Switch between ARM/Thumb based on mapping symbols.

Authored by efriedma on Apr 19 2019, 4:38 PM.



The ARMDisassembler changes allow changing between ARM and Thumb mode based on the MCSubtargetInfo, rather than the Target, which simplifies the other changes a bit.

I'm not really happy with adding more target-specific logic to tools/llvm-objdump/, but there any easy way around it: the logic in question specifically applies to disassembling an object file, and that code simply isn't located in lib/Target, at least at the moment.

Diff Detail


Event Timeline

efriedma created this revision.Apr 19 2019, 4:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2019, 4:38 PM

I'm not really happy about putting the switching logic into llvm-objdump.cpp; it seems like there should be a better way to refactor the interface so the target-specific logic is somewhere inside the target.

I do feel the monolithic disassembleObject is hard to reason about. Do you have idea how to simplify it? On the other hand, I think the new addition of ThumbMappingSymsAddr ARMMappingSymsAddr should be ok, because the ARM specific logic is already there.


Since you are adding isArm32Elf, can you add isArm64Elf and delete isArmElf if that orthogonal property is desired?

I added Peter and Oliver since they might have some suggestions here too.


Add logic to check if you are defaulting to Thumb mode here (i.e. that was the triple that was passed).


Wouldn't this be easier to fix by retaining a createThumbDisassembler that calls a shared createARMThumbDisassembler with a flag to denote the default case for disassembly?

efriedma marked 2 inline comments as done.Apr 22 2019, 12:52 PM

In terms of how to generally improve disassembleObject, I guess there are a few different ways to think about it...

Currently, MCDisassembler::getInstruction is a "stateless" API which takes some bytes, and outputs a single instruction or instruction bundle. (We currently cheat a little on Thumb: the Thumb disassembler has a "mutable" member that keeps track of "it" blocks on Thumb. But that's mostly orthogonal.) That has some nice properties: essentially, it's easy to understand what it's doing. But it seems like it isn't rich enough for disassembling object files, so we have a bunch of code in llvm-objdump which has to deal with target-specific aspects of object files. We could instead add a disassembler API which is intended specifically for disassembling object files, and formally has state: essentially, the llvm-objdump would just pass the symbols and raw bits, and the API would handle the various target-specific aspects of disassembly.

Not sure it's worth the effort, though; I'm not sure there would be any use for it outside of llvm-objdump itself.


Not sure how that would work; getFeatureBits() only returns "on" or "off". There's no reasonable way to check the default. We should inherit the right default anyway when the target is used to construct the subtarget.


In some places the property we want is "can this object file have mapping symbols", which is currently equivalent to isArmElf. I guess I can add a separate helper for that, and make it check "isArmElf32 || isArmElf64"?

MaskRay marked an inline comment as done.Apr 22 2019, 6:49 PM
MaskRay added inline comments.


I've not got any great ideas about how to move the target switching logic out of llvm-objdump. I think that we would have to either construct a map (Arm/Thumb/Data) in llvm-objdump that we can pass to the disassembler and it can tell from the index which SubTarget to use assuming the disassembler also has the index, or we'd have to make the disassembler aware of symbols so it could look them up.

I think Stephen's suggestion of passing in a parameter of the initial Primary/Secondary based on the triple could be useful. In general for completely stripped binaries with no symbols getting any sensible disassembly is difficult. The only heuristics I know there is to see if the Entry Point has bit 0 set (Thumb) so we say if entry Thumb state then assume a Thumb triple.


Could we use isAArch64Elf() and isArmElf(), I think that this would be more consistent with the ABI documentation and the current implementation?

Rather than name all the subtargets it might be neater to check the e_machine field is EM_ARM or EM_AARCH64, although handling the elf class and endian may prove to be as complicated.


Rather than 4 separate vectors, would it be better to have a single one that stores a pair of Index, Letter. In a valid object there is only one mapping symbol at a particular address so each index would match a unique entry. The search code would need to pass in a query function to obtain a symbol of the right letter, but this might be a bit easier to read than 4 separate ones?


Worth updating the comment to "Arm and AArch64 ELF binaries"?


A way to extend this for Arm/Thumb in the case of an Object with no mapping symbols (Strip?) but other symbols are present then if there is a symbol at (Index & 0xfffffffe) then:

  • Symbol Type == STT_FUNC and bit 0 of symbol value is 1 then switch to Secondary STI
  • Symbol Type == STT_FUNC and bit 0 of symbol value is 0 then switch to Primary STI

In most cases this won't add much over just mapping symbols though and could easily be added as a follow up patch.


Would it be worth moving this up to line 1452 as the part that adds +thumb-mode is quite far from the part that does -thumb-mode. It took me longer than I'd like to work out why --triple=thumbv7a-linux-gnueabi didn't make the primary and secondary subtargets Thumb. Alternatively maybe worth a comment on 1432 saying something like Primary STI is always ARM secondary is always Thumb regardless of Triple.

efriedma marked an inline comment as done.Apr 23 2019, 12:03 PM
efriedma added inline comments.

I don't think there's any convenient way to access the e_machine field from here... ELFObjectFileBase has a getEMachine() method, but it's private, and the only other reasonable way to access the field is to downcast all the way to ELF32LEObjectFile/ELF32BEObjectFile. I guess we could change that?

efriedma updated this revision to Diff 196298.Apr 23 2019, 12:13 PM

Partially addresses review comments. Still need to think about the mapping symbol vectors a bit more.

peter.smith added inline comments.Apr 24 2019, 3:46 AM

Yes, it looks like there is some prior art in llvm-objdump to get the e_type field. In printDynamicRelocations there is:

const auto *Elf = dyn_cast<ELFObjectFileBase>(Obj);
  if (!Elf || Elf->getEType() != ELF::ET_DYN) {

where ElfObjectFileBase::getEType() is very similar to ElfObjectFileBase::getEType() so it could be possible to make it public if it made sense to do so. We test against e_machine quite a bit in LLD as we often don't need to worry about the specific subtarget.

efriedma updated this revision to Diff 205192.Jun 17 2019, 3:17 PM
efriedma marked 9 inline comments as done.
efriedma retitled this revision from [WIP][llvm-objdump] Switch between ARM/Thumb based on mapping symbols. to [llvm-objdump] Switch between ARM/Thumb based on mapping symbols..
efriedma edited the summary of this revision. (Show Details)
efriedma added a reviewer: peter.smith.

Switched to a single vector for mapping symbols. Switched isArmElf() to use getEMachine(). Updated title/commit message.

I think this addresses all the review comments.

peter.smith accepted this revision.Jun 18 2019, 4:12 AM

Thanks for the update. Looks good to me. I've made one comment about the v7r test but not knowing what the original intention behind the test was I'm not sure if it is worth changing.

3 ↗(On Diff #205192)

I think it depends on whether the dissassembly is supposed to be architecturally aware or not. It seems like GNU arm-none-eabi-objdump is not and will disassemble the instruction, whereas llvm-objdump is.

Strictly speaking the build-attributes are lying to objdump here as there should be:

.eabi_attribute	Tag_DIV_use, 2 // This code was permitted to use SDIV and UDIV in the ARM and Thumb ISAs

If --arm-add-attributes is used then llvm-mc will put this in. With this attribute then llvm-objdump will disassemble it. I'm not entirely sure what the intention of the test is though.

This revision is now accepted and ready to land.Jun 18 2019, 4:12 AM
This revision was automatically updated to reflect the committed changes.