This is an archive of the discontinued LLVM Phabricator instance.

Allow ObjectFilePECOFF to initialize with ARM binaries.
ClosedPublic

Authored by sas on Apr 27 2016, 10:37 AM.

Diff Detail

Event Timeline

sas updated this revision to Diff 55259.Apr 27 2016, 10:37 AM
sas retitled this revision from to Allow ObjectFilePECOFF to initialize with ARM binaries..
sas updated this object.
sas added a reviewer: zturner.
sas added a subscriber: lldb-commits.

Nice! LLDB on ARM Windows! :)

Adding Omair and Saleem to approve, as hard-coding the triple may bring unwanted consequences.

compnerd accepted this revision.Apr 30 2016, 11:59 AM
compnerd edited edge metadata.
compnerd added inline comments.
source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
149

This may be a bit tricky. armv7-windows is unsupported in LLVM/clang (and we silently rewrite that in the clang frontend), and you need thumbv7-windows (ARM NT). Though, it is possible that LLDB is unable to handle that distinction right now.

That said, the pc vendor is silly, and unknown sounds better to me, but this shouldn't matter too much.

Finally, the Windows environment defaults to msvc here, which has a slight issue that it can sometimes fail to generate an assembly listing (the code generation is correct, its just a serialization issue caused by not having invested sufficiently in generating MASM style assembly listing).

The safest triple would be thumbv7-unknown-windows-itanium. But, if lldb is going to ensure that the code is handled as thumb, using armv7 should be fine.

This revision is now accepted and ready to land.Apr 30 2016, 11:59 AM
omjavaid requested changes to this revision.May 2 2016, 7:43 AM
omjavaid edited edge metadata.

Please see inline comments.

source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
149

Can we recognize armv7 and armv8 independent of each other?

For 32 bit arm normally we use arm-linux-gnueabi independent of instruction modes (thumb or arm) and architecture versions (v5, v5 or v7).

And we right now do not specifically distinguish between thumb and arm triples.

Better select arm-unknown-windows.

Please also handle arm v8 case where header is MachineArm64 and triple should be like aarch64-unknown-windows.

This revision now requires changes to proceed.May 2 2016, 7:43 AM
sas added a comment.May 2 2016, 4:03 PM

@compnerd:

  • We don't use thumb-* triples in lldb as far as I can see. Thumb is handled just fine regardless of the triple.
  • pc vs unknown doesn't seem to matter either, and other code in this file uses pc (see a few lines above).
  • msvc vs itanium is also handled elsewhere, and I don't think there's a need to put it here. Furthermore, most windows binaries will probably follow the msvc ABI, not itanium, and forcing it to itanium without know what we're dealing with sounds wrong.

@omjavaid:

  • I could just use arm instead of armv7 but as far as I know, Windows Phone is a pure thumb environment, so the CPUs used will be armv7 and up.
  • I could add support for aarch64 in this file, but I've got no way of testing it at the moment, and it seems likes a bad idea to advertise support for something we can't even test.

Given all of these, it seems like sticking with armv7-pc-windows or using arm-pc-windows might be the better solutions. Let me know what you guys think.

The few lines above are for the x86, x86_64 targets, which traditionally use the pc vendor as a legacy label.

@sas

Ideally it should be thumb-* if an environment is thumb only but lldb right now has some areas where we need to handle this case.
Either you put arm-* and leave it for someone else to correct the problems or put thumb-* and fix the issues that come up after that.
I think better to take the first option with a note to come back and fix this once other areas in the code are able to handle this.

In D19604#419284, @sas wrote:
  • We don't use thumb-* triples in lldb as far as I can see. Thumb is handled just fine regardless of the triple.

This is a good strategy. Thumb is an instruction set, the "arm-" in the triple means the Architecture.

  • pc vs unknown doesn't seem to matter either, and other code in this file uses pc (see a few lines above).

AFAIK, "pc" is accepted but ignored, just like "unknown" and "gobbedygook".

  • I could just use arm instead of armv7 but as far as I know, Windows Phone is a pure thumb environment, so the CPUs used will be armv7 and up.

Er, this doesn't make sense. ARM cores support Thumb ever since ARMv4T (ARM7, circa '97). Thumb2, which is the version supported by ARMv7 cores, exists since ARMv6 (ARM11, circa '03).

It's best if you keep the triple free of sub-architecture choices and use -march to pick the right one. But some platforms have chosen to specify it on the triple, and we may have to follow suit to be compatible.

  • I could add support for aarch64 in this file, but I've got no way of testing it at the moment, and it seems likes a bad idea to advertise support for something we can't even test.

Agree.

Given all of these, it seems like sticking with armv7-pc-windows or using arm-pc-windows might be the better solutions. Let me know what you guys think.

"arm-pc-windows" seems good to me. I'm also ok with "armv7-pc-windows" if that's the "accepted" triple on Windows world.

cheers,
--renato

fwiw, there are ARM cores that only support thumb - the Cortex M series. I doubt a windows phone is running one of those low-power chips though. The importance of the triple used will come in to play when you try to evaluate an expression on the device -- lldb will call into llvm to parse/jit the expression into machine code for the device. For instance, if "arm-windows" translates to a minimum arm target, llvm may not think it has any fp/NEON instructions. As for lldb disassembling as arm/thumb (or instruction emulation for unwinding), today it doesn't use the CPSR/XPSR T bit to tell if the current instruction is thumb or not, it depends on annotations in the symbol file to do this correctly. I don't think it uses the 0th bit of saved pc values up the stack to tell this either -- it is a pretty fixed model of depending on the symbols to indicate whether they're arm or thumb.

If you look in the disassembler, lldb creates both an armv7 MCDisassembler and a thumbv7 MCDisassembler and picks one or the other depending on the symbol's "alternate ISA" flag or something along those lines.

fwiw, there are ARM cores that only support thumb - the Cortex M series.

And they're still "armv7". :)

Remember, "armv7" is *not* the same as ARMv7A+NEON. If the only thing you have is "armv7" or even "armv7-a", you *cannot* assume NEON is present.

The importance of the triple used will come in to play when you try to evaluate an expression on the device -- lldb will call into llvm to parse/jit the expression into machine code for the device. For instance, if "arm-windows" translates to a minimum arm target, llvm may not think it has any fp/NEON instructions.

This is worrying. LLDB should *not* rely only on the triple, but all architectural options (mcpu/mtune/march/mfpu/mfloat-abi).

Without it, it can infer very little, and obscure bugs will crop up if the meaning of the triple changes slightly in the future.

As for lldb disassembling as arm/thumb (or instruction emulation for unwinding), today it doesn't use the CPSR/XPSR T bit to tell if the current instruction is thumb or not, it depends on annotations in the symbol file to do this correctly. I don't think it uses the 0th bit of saved pc values up the stack to tell this either -- it is a pretty fixed model of depending on the symbols to indicate whether they're arm or thumb.

These are the easy cases. You still need both disassemblers to areas where you have no idea what it is (system binaries, etc.), where you have to try both and see which one has the least number of errors. :)

cheers,
--renato

In case of ELF .arm attributes contains tags providing information on underlying CPU specification used. Thats only for the inferior being debugged but actually knowing which target we are running on, like for example if we want to figure out if we are running on a armv7 or armv8 hardware thats not decoded by LLDB right now.
For user space applications dont want to know that either.

So triple is just architecture-vendor-platform-* but for detailed information we have to populate arch specification structure after decoding the binary which is something we need to do in future.

Right now we can distingusih between hard and soft float based on ABI information in elf. But cant really tell if hard float is legacy VFP or neon.

Right now we can distingusih between hard and soft float based on ABI information in elf. But cant really tell if hard float is legacy VFP or neon.

If the object has build attributes, it could help. But you also can't rely on it being there.

sas updated this revision to Diff 63542.Jul 11 2016, 11:07 AM
sas edited edge metadata.

Use arm-pc-windows intead of armv7-pc-windows.

sas updated this revision to Diff 120146.Oct 24 2017, 4:28 PM

Rebase.

This revision was automatically updated to reflect the committed changes.