Page MenuHomePhabricator

[XRay] ARM 32-bit no-Thumb support in LLVM
ClosedPublic

Authored by rSerge on Aug 26 2016, 9:58 AM.

Details

Summary

This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:

  1. https://reviews.llvm.org/D23932 (Clang test)
  2. https://reviews.llvm.org/D23933 (compiler-rt)

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
rengolin added inline comments.Aug 30 2016, 8:54 AM
lib/CodeGen/XRayInstrumentation.cpp
46–53 ↗(On Diff #69390)

Agreed. Probably move the separate comments to their implementations?

92 ↗(On Diff #69390)

Good point. Probably not correct.

126 ↗(On Diff #69390)

nit: this comment is better applied to the function "prependRetWithPatchableExit" after the case. People will know what to do in the future. You don't need a comment on the default case, too.

lib/Target/ARM/ARMAsmPrinter.cpp
1983 ↗(On Diff #69390)

No need for braces if you're not declaring variables.

lib/Target/ARM/ARMMCInstLower.cpp
158 ↗(On Diff #69390)

There isn't, as nop is currently only an alias, not an instruction. But take a look at:

ARMInstrInfo::getNoopForMachoTarget()

and do the same for ELF.

181 ↗(On Diff #69390)

Why just save r0? AAPCS can use all four r0-r3 for return results.

187 ↗(On Diff #69390)

BLX is unconditional, POP will never be executed. Is that intended?

198 ↗(On Diff #69390)

Please, don't emit binary directly. Use the builders.

This revision now requires changes to proceed.Aug 30 2016, 8:54 AM
dberris added inline comments.Aug 30 2016, 7:18 PM
lib/CodeGen/XRayInstrumentation.cpp
92 ↗(On Diff #69677)

Yes, this is definitely not correct. This is a remnant of some refactoring I've done and it stuck around. :(

Let me add a test and fix, should be trivial.

dberris requested changes to this revision.Aug 30 2016, 10:30 PM
dberris edited edge metadata.
dberris added inline comments.
lib/CodeGen/XRayInstrumentation.cpp
92 ↗(On Diff #69677)

This is now fixed in rL280192 -- please rebase to get the change (and tests).

Hi, I see a number of problems with this patch. The most common one is the direct emission of binary patterns, which is not clear nor maintainable. Please, use the builders to emit instructions.

Also, I'm worried that the space you're reserving for the binary patch won't be enough for all cases. There are a number of PCS issues (hard vs soft, larger-than-32bit returns, arch and sub-arch support of return styles) which you're not accounting for any of them.

Furthermore, you need to make sure thumb-interworking works. You're outputting ARM code, but the user code can very well be Thumb, so you need to make sure it works. Not all architectures support BLX either (ex. v4T), and POP { lr } has been deprecated.

Finally, you need tests. A lot of them. To make sure you are covering the architectures you intend, in all the configurations you intend, and to actively fail if you don't intend, by adding checks in the code that error out when the arch / sub-arch is in a combination you don't expect.

Hi,
Ok, I'll look if the same can be done with builders.
I'm not targeting all ARM architectures at once, at least not in the first commit. I think we should choose 1 ARM architecture for which XRay works, and assume the others not supported or experimental. Currently I am building and experimenting with armhf (32-bit).
Sled sizes do not have to fit all architectures either (this would result in waste of space for some, thus worse performance due to cache misses). Currently sleds are 11 bytes on x86_64 and 28 bytes on armhf.
What is PCS ?
Thumb is not supported yet.
Architectures which do not support BLX are not supported.
Any evidence that POP {lr} is deprecated? I could only find on the internet that "These instructions that include both PC and LR in the reglist are deprecated in ARMv6T2 and above.": http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0588b/Babefbce.html . I'm not using both pc and lr in PUSH or POP.
Any specific examples which more tests should be added for the single supported architecture armhf?

lib/CodeGen/XRayInstrumentation.cpp
126 ↗(On Diff #69677)

Moving the comments towards the function calls.

lib/Target/ARM/ARMAsmPrinter.cpp
1983 ↗(On Diff #69677)

Removing.

lib/Target/ARM/ARMMCInstLower.cpp
181 ↗(On Diff #69677)

We save the other registers in the trampoline (__xray_FunctionEntry and __xray_FunctionExit assembly functions).

187 ↗(On Diff #69677)

POP is intended to execute after return from the subroutine, which BLX calls.

rSerge updated this revision to Diff 69851.Aug 31 2016, 7:49 AM
rSerge edited edge metadata.

Updated with the changes requested in the comments.

rSerge marked 7 inline comments as done.Aug 31 2016, 7:51 AM

I'm not targeting all ARM architectures at once, at least not in the first commit. I think we should choose 1 ARM architecture for which XRay works, and assume the others not supported or experimental. Currently I am building and experimenting with armhf (32-bit).

Right, "armhf" is not one ARM architecture, but dozens. It can be anything from v6T2 to V8.2A, including all sub-architectures, features and variations. Though, from what I've seen so far, the code you use would work on any architecture of that range.

It would be safer, though, to document the *intended* target specifically, like "ARMv7A with VFPv3 support". So that people with "ARMv6T2 with VFPv2" support are not surprised when you assumed something "wrong" for them. Adding a "hasV6T2Ops()" check on the entry-point would help.

Sled sizes do not have to fit all architectures either (this would result in waste of space for some, thus worse performance due to cache misses). Currently sleds are 11 bytes on x86_64 and 28 bytes on armhf.

Check.

What is PCS ?

Procedure Call Standard. This is the part of the ABI that defines how functions are called to be compatible with the ABI. Mostly about how to serialise arguments and return values in registers, stack, etc.

Both C and C++, as well as any other language that wants to be compatible with ARM's EABI standard *have* to abide to those terms.

Thumb is not supported yet.

Do you mean not supported in the Sled code, or inserting ARM Sled code into Thumb functions?

If the former, then you have to check if the architecture/OS/ABI you're supporting allows ARM code. For instance, Windows doesn't.

If the latter, than you need to check if the architecture/OS/ABI you're supporting allows Thumb code. For instance, there could be libraries around, or even inline assembly with ".thumb" in it (yes, this does happen). I can't remember how, but there's a way to know what's the ISA for a specific function, this could help you. OTOH, this could be an assembler things, can't remember.

Any way, you need to check if the architecture/OS/ISA/ABI you have is compatible with your assumptions before you emit code.

Architectures which do not support BLX are not supported.

Fair enough. But as I said earlier, this has to be clearly encoded (via error messages) on the entry-point of your code.

Any evidence that POP {lr} is deprecated?

Sorry, my bad. I was thinking about a different case. Ignore me.

Any specific examples which more tests should be added for the single supported architecture armhf?

As I said earlier, you need to make sure you only emit your stubs on architectures that you know works. Checking the target for architecture level, ISA support and ABI should be enough, at least on the entry-point.

Adding tests is, then, easily done by having two files: one where everything should fail, RUNning with a "not" before "llc", CHECKing for error messages, and one where everything should pass, CHECKing for the correct sequence of Nops, etc.

It should be fine to add all error messages to one file and all cases that should pass to another.

cheers,
--renato

lib/Target/ARM/ARMMCInstLower.cpp
181 ↗(On Diff #69851)

Right, and these are guaranteed to only use one 32-bit argument. Check.

187 ↗(On Diff #69851)

D'oh, Branch&Link, sorry, you're correct.

test/CodeGen/ARM/xray-attribute-instrumentation.ll
5 ↗(On Diff #69851)

I was expecting Nops...

rSerge marked 6 inline comments as done.Sep 1 2016, 8:31 AM
rSerge added inline comments.
lib/Target/X86/X86AsmPrinter.h
74–94 ↗(On Diff #69851)

Do you mean the description for the diff? Or a comment in the source code?

test/CodeGen/ARM/xray-attribute-instrumentation.ll
5 ↗(On Diff #69851)

The first instruction is a jump over the NOPs. The other 6 instructions are NOPs.

rSerge updated this object.Sep 1 2016, 8:32 AM
rSerge edited edge metadata.
rengolin added inline comments.Sep 1 2016, 8:36 AM
test/CodeGen/ARM/xray-attribute-instrumentation.ll
5 ↗(On Diff #69851)

Right, I was referring to the .ascii... When you use builders, this won't happen any more. It will also work in big-endian. :)

I'm not targeting all ARM architectures at once, at least not in the first commit. I think we should choose 1 ARM architecture for which XRay works, and assume the others not supported or experimental. Currently I am building and experimenting with armhf (32-bit).

Right, "armhf" is not one ARM architecture, but dozens. It can be anything from v6T2 to V8.2A, including all sub-architectures, features and variations. Though, from what I've seen so far, the code you use would work on any architecture of that range.

Thanks for explaining. I am still starting with ARM and LLVM.

It would be safer, though, to document the *intended* target specifically, like "ARMv7A with VFPv3 support". So that people with "ARMv6T2 with VFPv2" support are not surprised when you assumed something "wrong" for them. Adding a "hasV6T2Ops()" check on the entry-point would help.

Ok, I'll try to select something more specific than armhf.

Sled sizes do not have to fit all architectures either (this would result in waste of space for some, thus worse performance due to cache misses). Currently sleds are 11 bytes on x86_64 and 28 bytes on armhf.

Check.

What is PCS ?

Procedure Call Standard. This is the part of the ABI that defines how functions are called to be compatible with the ABI. Mostly about how to serialise arguments and return values in registers, stack, etc.

Both C and C++, as well as any other language that wants to be compatible with ARM's EABI standard *have* to abide to those terms.

Thumb is not supported yet.

Do you mean not supported in the Sled code, or inserting ARM Sled code into Thumb functions?

Neither is supported. I estimated that Thumb support requires substantial additional effort.

If the former, then you have to check if the architecture/OS/ABI you're supporting allows ARM code. For instance, Windows doesn't.

If the latter, than you need to check if the architecture/OS/ABI you're supporting allows Thumb code. For instance, there could be libraries around, or even inline assembly with ".thumb" in it (yes, this does happen). I can't remember how, but there's a way to know what's the ISA for a specific function, this could help you. OTOH, this could be an assembler things, can't remember.

Yes, this looks like a lot of effort.

Any way, you need to check if the architecture/OS/ISA/ABI you have is compatible with your assumptions before you emit code.

Architectures which do not support BLX are not supported.

Fair enough. But as I said earlier, this has to be clearly encoded (via error messages) on the entry-point of your code.

Any evidence that POP {lr} is deprecated?

Sorry, my bad. I was thinking about a different case. Ignore me.

Any specific examples which more tests should be added for the single supported architecture armhf?

As I said earlier, you need to make sure you only emit your stubs on architectures that you know works. Checking the target for architecture level, ISA support and ABI should be enough, at least on the entry-point.

Adding tests is, then, easily done by having two files: one where everything should fail, RUNning with a "not" before "llc", CHECKing for error messages, and one where everything should pass, CHECKing for the correct sequence of Nops, etc.

It should be fine to add all error messages to one file and all cases that should pass to another.

cheers,
--renato

The amount of change requested in the code review seems too much for the first iteration. Can we limit the scope and plan incremental improvements?

Cheers,
Serge

rSerge updated this revision to Diff 70043.Sep 1 2016, 11:59 AM

Updated with the latest changes from mainline.

Do you mean not supported in the Sled code, or inserting ARM Sled code into Thumb functions?

Neither is supported. I estimated that Thumb support requires substantial additional effort.

My gut feeling is that this should mostly work already, since you're using BLX instructions.

But I agree, let's not get ahead of ourselves.

Limit support for ARMv7A, non-Windows (which forces Thumb2). Something like:

if (!SubTarget->hasV7Ops() || SubTarget->isWindows())
  return Forgerabarit.

cheers,
--renato

dberris added inline comments.Sep 1 2016, 11:14 PM
lib/Target/X86/X86AsmPrinter.h
74–94 ↗(On Diff #70043)

Definitely a description in the diff.

rSerge marked an inline comment as done.Sep 2 2016, 5:39 AM

Do you mean not supported in the Sled code, or inserting ARM Sled code into Thumb functions?

Neither is supported. I estimated that Thumb support requires substantial additional effort.

My gut feeling is that this should mostly work already, since you're using BLX instructions.

BLX r12 instruction has different machine code for ARM and Thumb. It is 4 byte long on ARM and 2 byte long on Thumb. Furthermore, the rest of machine code in a sled contains 32-bit ARM instructions. Thumb may need different machine code, or even sequence of instructions because not everything is available in Thumb. To avoid changing trampoline assembly code, the trampoline can be called with BLX indicating that the destination is in ARM assembly.

But I agree, let's not get ahead of ourselves.

Limit support for ARMv7A, non-Windows (which forces Thumb2). Something like:

if (!SubTarget->hasV7Ops() || SubTarget->isWindows())
  return Forgerabarit.

Ok.

rSerge updated this revision to Diff 70255.Sep 2 2016, 4:26 PM

Implemented the changes requested in the code review.

rSerge marked 4 inline comments as done.Sep 2 2016, 4:30 PM

Limit support for ARMv7A, non-Windows (which forces Thumb2). Something like:

if (!SubTarget->hasV7Ops() || SubTarget->isWindows())
  return Forgerabarit.

It seems that ARMv6 is sufficient. Implemented mostly as suggested.

lib/Target/ARM/ARMMCInstLower.cpp
159 ↗(On Diff #70255)

Changed.

199 ↗(On Diff #70255)

Done.

test/CodeGen/ARM/xray-attribute-instrumentation.ll
6 ↗(On Diff #70255)

Done.

Hi Serge,

The Nop emission is really simple, and the isXRaySupported() is really simple and accurate. Thanks for addressing all the comments, the code is looking really nice.

Now, two points:

  1. There are ways to report warnings/errors back to the front-end, but it depends how this is interpreted.

Since the instrumentation is inserted by the front end, than this should be a back-end *error*, and front-ends should fail with a decent error message saying "XRay is not supported for target X".

If you want just a warning, you can avoid inserting the sleds and the run-time code won't do anything, as you're doing it now. But you then have to warn the users that they won't get what they requested. I strongly suggest to make it an error instead.

For error messages, it's best to use "getContext().reportError(Loc, ...)", as this would nicely roll back to the front-end without crashing. But if that doesn't work (it should, really), you can use "report_fatal_error", "llvm_unreachable" or even an "assert()", though these are just last-resort only.

About front-end duplicating the checks, it's up to you and @dberris. The error message in Clang and llc should be the same, though, and reportError() does that well.

  1. Tests.

The current test is good, it checks the right number of NOPs and the overall structure. Excellent.

Now we need "negative tests", ie. those that *have* to fail. For that, you add a RUN line that starts with "not llc ..." and CHECK for the error messages. There are plenty of examples in there already.

Since you're restricting x86_64, you should have one for i386. Since you're restricting ARMv6/Unix, you should have one for ARMv5, and one for ARM Windows.

cheers,
--renato

Hi Serge,

The Nop emission is really simple, and the isXRaySupported() is really simple and accurate. Thanks for addressing all the comments, the code is looking really nice.

+1 -- thanks @rSerge!

Now, two points:

  1. There are ways to report warnings/errors back to the front-end, but it depends how this is interpreted.

    Since the instrumentation is inserted by the front end, than this should be a back-end *error*, and front-ends should fail with a decent error message saying "XRay is not supported for target X".

    If you want just a warning, you can avoid inserting the sleds and the run-time code won't do anything, as you're doing it now. But you then have to warn the users that they won't get what they requested. I strongly suggest to make it an error instead.

    For error messages, it's best to use "getContext().reportError(Loc, ...)", as this would nicely roll back to the front-end without crashing. But if that doesn't work (it should, really), you can use "report_fatal_error", "llvm_unreachable" or even an "assert()", though these are just last-resort only.

    About front-end duplicating the checks, it's up to you and @dberris. The error message in Clang and llc should be the same, though, and reportError() does that well.

I'm happy with an error using the usual error reporting mechanisms here.

  1. Tests.

    The current test is good, it checks the right number of NOPs and the overall structure. Excellent.

    Now we need "negative tests", ie. those that *have* to fail. For that, you add a RUN line that starts with "not llc ..." and CHECK for the error messages. There are plenty of examples in there already.

    Since you're restricting x86_64, you should have one for i386. Since you're restricting ARMv6/Unix, you should have one for ARMv5, and one for ARM Windows.

I agree with this. FWIW, I'm happy with getting this in and getting it tested, then locking it down with more negative tests once it's upstream.

Thanks Renato!

lib/Target/ARM/ARMAsmPrinter.h
102–106 ↗(On Diff #70255)

Do you already want to support tail call optimisation sleds now? Or did you plan to do something about that later?

dberris accepted this revision.Sep 5 2016, 6:15 PM
dberris edited edge metadata.

LGTM (I think we should be fine with adding more tests later)

Thanks again @rSerge!

rengolin accepted this revision.Sep 6 2016, 1:14 AM
rengolin edited edge metadata.

Thanks!

This revision is now accepted and ready to land.Sep 6 2016, 1:14 AM
rSerge marked 3 inline comments as done.Sep 6 2016, 12:17 PM
This comment was removed by rSerge.
This comment was removed by rSerge.
rSerge added a comment.EditedSep 6 2016, 1:04 PM

So something started to just remove the first instruction of the sled, whether the sled is emitted as binary or using instructions/builders. Clang -S generates the assembly file with correct sleds (all the instructions present), but then disassembly of the object or executable file shows only 6 last instructions, without the first instruction of the sled.
UPDATE: I just confused the compile options, so assembly files were new and object files were old. No problem with this in the code, tested.

rSerge updated this revision to Diff 70573.Sep 7 2016, 11:26 AM
rSerge edited edge metadata.

Rebased to the latest revision. I don't have commit access rights. Could someone commit?

I'll do it for all three, thanks again @rSerge!

For some reason the standard arc patch DNNNNN workflow doesn't apply to this patch (I'm not sure if it's generated in a manner not using arcanist). I've had to massage this manually by doing:

curl https://reviews.llvm.org/file/data/spjqzhddatjrbozzbl4u/PHID-FILE-7s4h3zdshadln2e7cgbi/D23931.diff  | git apply - -p0 --ignore-whitespace --whitespace=fix

I may have to do something similar to the other patches, so all landing errors will be mine.

This revision was automatically updated to reflect the committed changes.
rSerge added a comment.Sep 8 2016, 6:28 AM

Thanks all, especially @dberris .

dberris reopened this revision.Sep 8 2016, 9:24 PM

So, unfortunately this got reverted in rL280967 because it fails on thumb (as the checks hadn't been put in to not generate XRay sleds for non-thumb).

@rSerge -- are you able to put in the appropriate checks to warn when using XRay on thumb? @rengolin has offered to help with the testing on the build-bots to make this possible.

This revision is now accepted and ready to land.Sep 8 2016, 9:24 PM
dberris requested changes to this revision.Sep 8 2016, 9:25 PM
dberris edited edge metadata.

I think @rengolin has more details as to how this caused failures and how else to debug on thumb.

This revision now requires changes to proceed.Sep 8 2016, 9:25 PM
rSerge added a comment.Sep 9 2016, 4:53 AM

I don't yet understand how these commits could break build-bots. Did someone add -fxray-instrument clang option to bots which generate Thumb code?

I don't yet understand how these commits could break build-bots. Did someone add -fxray-instrument clang option to bots which generate Thumb code?

Nope. The error was when compiling xray_trampoline_arm.S.

Compiler-RT's patch enables XRay on ARM, which means it'll run all the existing XRay tests on ARM buildbots, which also mean Thumb ones, which also build XRay's sources.

This was the error message:

FAILED: /usr/lib/ccache/cc  -DXRAY_HAS_EXCEPTIONS=1 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Iprojects/compiler-rt/lib/xray -I/home/linaro/devel/buildbot/clang-cmake-thumbv7-a15-full-sh/llvm/projects/compiler-rt/lib/xray -Iinclude -I/home/linaro/devel/buildbot/clang-cmake-thumbv7-a15-full-sh/llvm/include -I/home/linaro/devel/buildbot/clang-cmake-thumbv7-a15-full-sh/llvm/projects/compiler-rt/lib/xray/.. -I/home/linaro/devel/buildbot/clang-cmake-thumbv7-a15-full-sh/llvm/projects/compiler-rt/lib/xray/../../include -fPIC -O3 -DNDEBUG   -UNDEBUG  -march=armv7-a -mfloat-abi=hard -fPIC -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables -fno-stack-protector -fvisibility=hidden -fvisibility-inlines-hidden -fno-function-sections -fno-lto -O3 -g -Wno-variadic-macros -Wno-non-virtual-dtor -MMD -MT projects/compiler-rt/lib/xray/CMakeFiles/clang_rt.xray-armhf.dir/xray_trampoline_arm.S.o -MF projects/compiler-rt/lib/xray/CMakeFiles/clang_rt.xray-armhf.dir/xray_trampoline_arm.S.o.d -o projects/compiler-rt/lib/xray/CMakeFiles/clang_rt.xray-armhf.dir/xray_trampoline_arm.S.o -c /home/linaro/devel/buildbot/clang-cmake-thumbv7-a15-full-sh/llvm/projects/compiler-rt/lib/xray/xray_trampoline_arm.S

llvm/projects/compiler-rt/lib/xray/xray_trampoline_arm.S: Assembler messages:
llvm/projects/compiler-rt/lib/xray/xray_trampoline_arm.S:17: Error: attempt to use an ARM instruction on a Thumb-only processor -- `push {r1-r3,lr}'

My patch didn't work because this is using the system compiler, and not Clang, and GCC was picky about assembling ARM instructions (from xray_trampoline_arm.S) into an object that will be linked with other Thumb-only objects.

This will require some experimentation with a cross GCC/binutils, to make sure that -mthumb won't generate the NOPs as well as not try to link the code. An #ifndef __thumb__ in xray_trampoline_arm.S to omit everything could work, so if the Clang implementation is wrong, we get a compiler error instead of a run-time error.

cheers,
--renato

rSerge added a comment.Sep 9 2016, 9:36 AM

Now I understand, thanks, @rengolin . Thumb is on my list, though I thought it can be done later. Now I need to weigh whether all the work with conditional compilation and error reporting for Thumb is not too much w.r.t. the time to just implement the support for Thumb. I'm looking into this...

rSerge updated this revision to Diff 71627.Sep 16 2016, 6:23 AM
rSerge edited edge metadata.
rSerge removed rL LLVM as the repository for this revision.

Fixed "Error: attempt to use an ARM instruction on a Thumb-only processor -- `push {r1-r3,lr}' ". The reason was ".arch armv7" directive. This directive for GCC represents the intersection of arm7v-a and armv7-m instruction sets, implying Thumb-only instructions, and this conflicts with ".code 32" directive. Then GCC, instead of articulating the conflict, complains about every instruction in the assembly file.
Tested on cross-compilation with GCC from x86_64-Ubuntu to ARM-Linux.
Tested on cross-compilation with Clang from x86_64-Windows to ARM-Linux.

rSerge updated this revision to Diff 71634.Sep 16 2016, 6:47 AM
rSerge edited edge metadata.

Fixed patch file format.

Fixed "Error: attempt to use an ARM instruction on a Thumb-only processor -- `push {r1-r3,lr}' ". The reason was ".arch armv7" directive. This directive for GCC represents the intersection of arm7v-a and armv7-m instruction sets, implying Thumb-only instructions, and this conflicts with ".code 32" directive. Then GCC, instead of articulating the conflict, complains about every instruction in the assembly file.

Ah, yes! This makes sense.

dberris accepted this revision.Sep 18 2016, 5:03 PM
dberris edited edge metadata.

Thanks @rSerge -- I'll land this and dependent patches again.

Cheers

This revision is now accepted and ready to land.Sep 18 2016, 5:03 PM
This revision was automatically updated to reflect the committed changes.