This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump] Default to the "apple-latest" mcpu when disassembling arm64 Mach-O binaries
AbandonedPublic

Authored by mtrent on Apr 13 2020, 9:47 AM.

Details

Summary

This changes the Mach-O dumper to use the "apple-latest" Mcpu value
if one isn't specified by the command-line arguments or by the arch
triple.

Diff Detail

Event Timeline

mtrent created this revision.Apr 13 2020, 9:47 AM
MaskRay added a comment.EditedApr 13 2020, 10:39 AM

Can you clarify whether the intention is:

  • For MachOObjectFile::getArchTriple, we want to use a conservative arch flag, because it may be used by both producers and consumers. We don't want to impose an arch feature restriction for binaries created by the producers.
  • For llvm-objdump and may be llvm-nm/llvm-size, the read-only consumers, we want to use the latest arch flag so that all features can be decoded correctly.

In GNU objdump, -m is another option which takes an argument.

-m machine
 --architecture=machine
     Specify the architecture to use when disassembling object files.  This can be useful when disassembling object files which do not
     describe architecture information, such as S-records.  You can list the available architectures with the -i option.

     If the target is an ARM architecture then this switch has an additional effect.  It restricts the disassembly to only those
     instructions supported by the architecture specified by machine.  If it is necessary to use this switch because the input file does not
     contain any architecture information, but it is also desired to disassemble all the instructions use -marm.

Would it be possible to remove the -m alias for --macho?

llvm/test/tools/llvm-objdump/MachO/apple-latest-default.s
2

Add a comment about the purpose of the test.

## I would prefer comments starting with ##
# RUN: # is for RUN and CHECK lines
# CHECK: ...

Delete -show-encoding which is a no-op with the object output (-filetype=obj)

Prefer --macho to -m. It looks strange to me that without --macho, the instruction cannot be decoded (<unknown>). I did not expect the --macho flag should matter here.

Delete 2> %t because stderr is not needed by the test.

MaskRay retitled this revision from Default to the "apple-latest" mcpu when disassembling arm64 Mach-O binaries. to [llvm-objdump] Default to the "apple-latest" mcpu when disassembling arm64 Mach-O binaries.Apr 13 2020, 10:50 AM

In GNU objdump, -m is another option which takes an argument.

-m machine
 --architecture=machine
     Specify the architecture to use when disassembling object files.  This can be useful when disassembling object files which do not
     describe architecture information, such as S-records.  You can list the available architectures with the -i option.

     If the target is an ARM architecture then this switch has an additional effect.  It restricts the disassembly to only those
     instructions supported by the architecture specified by machine.  If it is necessary to use this switch because the input file does not
     contain any architecture information, but it is also desired to disassemble all the instructions use -marm.

Would it be possible to remove the -m alias for --macho?

I don't know. Maybe the time has come to remove otool from llvm and just let llvm be a copy of the gnu binutils. Feels like a topic for another thread.

Can you clarify whether the intention is:

  • For MachOObjectFile::getArchTriple, we want to use a conservative arch flag, because it may be used by both producers and consumers. We don't want to impose an arch feature restriction for binaries created by the producers.
  • For llvm-objdump and may be llvm-nm/llvm-size, the read-only consumers, we want to use the latest arch flag so that all features can be decoded correctly.

I don't understand the question?

Can you clarify whether the intention is:

  • For MachOObjectFile::getArchTriple, we want to use a conservative arch flag, because it may be used by both producers and consumers. We don't want to impose an arch feature restriction for binaries created by the producers.
  • For llvm-objdump and may be llvm-nm/llvm-size, the read-only consumers, we want to use the latest arch flag so that all features can be decoded correctly.

I don't understand the question?

Is it possible to make the (defaulting mcpu to "apple-latest") decision in a library, instead of in llvm-objdump and other individual tools?

Can you clarify whether the intention is:

  • For MachOObjectFile::getArchTriple, we want to use a conservative arch flag, because it may be used by both producers and consumers. We don't want to impose an arch feature restriction for binaries created by the producers.
  • For llvm-objdump and may be llvm-nm/llvm-size, the read-only consumers, we want to use the latest arch flag so that all features can be decoded correctly.

I don't understand the question?

Is it possible to make the (defaulting mcpu to "apple-latest") decision in a library, instead of in llvm-objdump and other individual tools?

Anything is possible, it's only software.

The request from Ahmed and others who work on the library code team was to do this in the tools. I think if they thought making this change that the library level was the right thing do to they would have done it themselves when they added "apple-latest" to llvm.org.

I can also make the change upstream from here, doing it in Apple's cctools otool interface. But people who use llvm-objdump to disassemble MachO binaries directly will find this useful, given that that the triple doesn't always infer the Mcpu option.

Ahmed, do you have anything to add here?

mtrent updated this revision to Diff 257560.Apr 14 2020, 4:53 PM

Add a comment about the purpose of the test.

Reformat the test to use octothorpes instead of slashes.

Replace -m and -d with --macho and --disassemble

Removed 2> %t

clang-formatted MachODump.cpp

ab added a comment.Apr 15 2020, 1:51 PM

Is it possible to make the (defaulting mcpu to "apple-latest") decision in a library, instead of in llvm-objdump and other individual tools?

Yes, that sounds reasonable. I'm curious, what tools do you have in mind?
The obvious (only?) candidate is MachOObjectFile::getArchTriple, as you mentioned, we can't really distinguish assemblers from disassemblers in the lower layers (other than MCDisassember obviously, but that's already too late). I think the mcpu inferred by getArchTriple is only used by MachODump. And to answer the other questions: llvm-nm and llvm-size shouldn't care about CPUs. lldb is another similar consumer that does care, and I don't think that uses the MachOObjectFile API either.

So picking apple-latest in getArchTriple should be NFC relative to this patch I think, but it sounds reasonable to me. I'm a bit uneasy about having that in MachOObjectFile, which isn't specific to disassembly, but I don't have an argument against it either.

lhames added a subscriber: lhames.Apr 20 2020, 2:57 PM

I would prefer to keep this change in the tool for now (as this patch does).

For disassembly purposes it makes sense to assume the newest CPU in case the object uses instructions from it. As a MachOObject file client on the other hand I would have expected getArchTriple to be conservative (i.e. "this object uses / requires the CPU specified"). I'm open to changing / reinterpreting getArchTriple's behavior, but I think that's a separate discussion from this patch and shouldn't block this going in.

lhames accepted this revision.Apr 21 2020, 11:57 AM

This looks good to me. Unless anyone else has any other feedback I think this should go in, minus the whole-file reformatting.

This revision is now accepted and ready to land.Apr 21 2020, 11:57 AM
mtrent abandoned this revision.May 10 2020, 10:22 PM