This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
CodeGen/
1/2
AsmPrinter.h
-
Target/
3
Target.td
1/2
TargetOpcodes.def
-
lib/
-
CodeGen/
-
AsmPrinter/
-
AsmPrinter.cpp
6/8
XRayInstrumentation.cpp
-
Target/
-
ARM/
1
ARMAsmPrinter.h
1/2
ARMAsmPrinter.cpp
6/10
ARMMCInstLower.cpp
-
X86/
1/3
X86AsmPrinter.h
-
X86MCInstLower.cpp
-
test/CodeGen/ARM/
-
CodeGen/
-
ARM/
2/4
xray-attribute-instrumentation.ll

Differential D23931

[XRay] ARM 32-bit no-Thumb support in LLVM
ClosedPublic

Authored by rSerge on Aug 26 2016, 9:58 AM.

Download Raw Diff

Details

Reviewers

dberris
rengolin
t.p.northover
zatrazz
asl

Commits

rG4640154446cb: [XRay] ARM 32-bit no-Thumb support in LLVM
rG17d94e279e43: [XRay] ARM 32-bit no-Thumb support in LLVM
rL281878: [XRay] ARM 32-bit no-Thumb support in LLVM
rL280888: [XRay] ARM 32-bit no-Thumb support in LLVM

Summary

This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:

https://reviews.llvm.org/D23932 (Clang test)
https://reviews.llvm.org/D23933 (compiler-rt)

Diff Detail

Event Timeline

rSerge updated this revision to Diff 69390.Aug 26 2016, 9:58 AM

rSerge retitled this revision from to [XRay] ARM 32-bit no-Thumb support.

rSerge updated this object.

rSerge added reviewers: dberris, rengolin, asl, t.p.northover.

rSerge added a subscriber: llvm-commits.

Herald added subscribers: dberris, samparker, rengolin, aemerson. · View Herald TranscriptAug 26 2016, 9:58 AM

rSerge added a parent revision: D19904: XRay: Add entry and exit sleds.Aug 26 2016, 9:59 AM

rSerge retitled this revision from [XRay] ARM 32-bit no-Thumb support to [XRay] ARM 32-bit no-Thumb support in LLVM.

rSerge added a child revision: D23932: [XRay] ARM 32-bit no-Thumb support in Clang.Aug 26 2016, 10:12 AM

rSerge added a child revision: D23933: [XRay] ARM 32-bit no-Thumb support in compiler-rt.Aug 26 2016, 10:21 AM

rSerge updated this object.

rengolin added a reviewer: zatrazz.Aug 26 2016, 11:17 AM

iid_iunknown added a subscriber: iid_iunknown.Aug 26 2016, 11:31 AM

dberris requested changes to this revision.Aug 28 2016, 5:51 PM

dberris edited edge metadata.

dberris added inline comments.

include/llvm/CodeGen/AsmPrinter.h
209	Do you need to spell out 'class' here? Wouldn't `const Function*` suffice?
include/llvm/Target/Target.td
969	AFAICT, yes, this is correct. The expectation is that this instruction should only ever show up in the assembler as a pseudo instruction (unless this is doing something else).
970	This one is a little harder. At least in x86, we weren't able to get this to work this way, because stack adjustments may happen later than the insertion of the marker instruction. Unless you can control exactly when this instruction is inserted and that the stack adjustment code doesn't ever move this (or add things after this instruction) then you might want to go do the same thing that we're doing in X86.
include/llvm/Target/TargetOpcodes.def
161	Is there any reason to do this instead of following the same convention used in x86 of having the nops be after the return instruction?
lib/CodeGen/XRayInstrumentation.cpp
46–53	This is a great explanation. Can you say something similar in the description just so it's clear why there's a difference in the approach?
lib/Target/X86/X86AsmPrinter.h
74–94	I think it's worth noting in the description that we're moving the XRay instrumentation support up to AsmPrinter too.

This revision now requires changes to proceed.Aug 28 2016, 5:51 PM

rSerge added inline comments.Aug 30 2016, 7:19 AM

include/llvm/CodeGen/AsmPrinter.h
209	I just moved (copy-pasted) this from X86AsmPrinter.h . Without `class` it does not compile because XRayFunctionEntry already has a member wih the same name: `const MCSymbol *Function` .
include/llvm/Target/Target.td
970	The same thing as in x86_64 is not possible for ARM because it has multiple return instructions. Furthermore, CPU allows parametrized and even conditional return instructions. In the current ARM implementation we are making use of the fact that currently LLVM doesn't seem to generate conditional return instructions. On ARM, the same instruction can be used for popping multiple registers from the stack and returning (it just pops `pc` register too), and LLVM generates it sometimes. So we can't insert the sled between this stack adjustment and the return without splitting the original instruction into 2 instructions. So on ARM, rather than jumping into the exit trampoline, we call it, it does the tracing, preserves the stack and returns.
include/llvm/Target/TargetOpcodes.def
161	Yes, as I've explained above, the problem is that ARM has multiple return instructions, so we have to preserve the original return instruction and call the exit tracing trampoline instead of jumping into it. I'm adding a comment in the code too.
lib/CodeGen/XRayInstrumentation.cpp
46–53	I'm adding it to llvm\include\llvm\Target\TargetOpcodes.def .

Implemented the requested changes (more comments).

Hi, I see a number of problems with this patch. The most common one is the direct emission of binary patterns, which is not clear nor maintainable. Please, use the builders to emit instructions.

Also, I'm worried that the space you're reserving for the binary patch won't be enough for all cases. There are a number of PCS issues (hard vs soft, larger-than-32bit returns, arch and sub-arch support of return styles) which you're not accounting for any of them.

Furthermore, you need to make sure thumb-interworking works. You're outputting ARM code, but the user code can very well be Thumb, so you need to make sure it works. Not all architectures support BLX either (ex. v4T), and POP { lr } has been deprecated.

Finally, you need tests. A lot of them. To make sure you are covering the architectures you intend, in all the configurations you intend, and to actively fail if you don't intend, by adding checks in the code that error out when the arch / sub-arch is in a combination you don't expect.

rengolin added inline comments.Aug 30 2016, 8:54 AM

lib/CodeGen/XRayInstrumentation.cpp
46–53	Agreed. Probably move the separate comments to their implementations?
92	Good point. Probably not correct.
126	nit: this comment is better applied to the function "prependRetWithPatchableExit" after the case. People will know what to do in the future. You don't need a comment on the default case, too.
lib/Target/ARM/ARMAsmPrinter.cpp
1983	No need for braces if you're not declaring variables.
lib/Target/ARM/ARMMCInstLower.cpp
158	There isn't, as nop is currently only an alias, not an instruction. But take a look at: ARMInstrInfo::getNoopForMachoTarget() and do the same for ELF.
181	Why just save r0? AAPCS can use all four r0-r3 for return results.
187	BLX is unconditional, POP will never be executed. Is that intended?
198	Please, don't emit binary directly. Use the builders.

This revision now requires changes to proceed.Aug 30 2016, 8:54 AM

dberris added inline comments.Aug 30 2016, 7:18 PM

lib/CodeGen/XRayInstrumentation.cpp
92	Yes, this is definitely not correct. This is a remnant of some refactoring I've done and it stuck around. :( Let me add a test and fix, should be trivial.

dberris requested changes to this revision.Aug 30 2016, 10:30 PM

dberris edited edge metadata.

dberris added inline comments.

lib/CodeGen/XRayInstrumentation.cpp
92	This is now fixed in rL280192 -- please rebase to get the change (and tests).

In D23931#529004, @rengolin wrote:

Hi, I see a number of problems with this patch. The most common one is the direct emission of binary patterns, which is not clear nor maintainable. Please, use the builders to emit instructions.

Also, I'm worried that the space you're reserving for the binary patch won't be enough for all cases. There are a number of PCS issues (hard vs soft, larger-than-32bit returns, arch and sub-arch support of return styles) which you're not accounting for any of them.

Furthermore, you need to make sure thumb-interworking works. You're outputting ARM code, but the user code can very well be Thumb, so you need to make sure it works. Not all architectures support BLX either (ex. v4T), and POP { lr } has been deprecated.

Finally, you need tests. A lot of them. To make sure you are covering the architectures you intend, in all the configurations you intend, and to actively fail if you don't intend, by adding checks in the code that error out when the arch / sub-arch is in a combination you don't expect.

Hi,
Ok, I'll look if the same can be done with builders.
I'm not targeting all ARM architectures at once, at least not in the first commit. I think we should choose 1 ARM architecture for which XRay works, and assume the others not supported or experimental. Currently I am building and experimenting with armhf (32-bit).
Sled sizes do not have to fit all architectures either (this would result in waste of space for some, thus worse performance due to cache misses). Currently sleds are 11 bytes on x86_64 and 28 bytes on armhf.
What is PCS ?
Thumb is not supported yet.
Architectures which do not support BLX are not supported.
Any evidence that POP {lr} is deprecated? I could only find on the internet that "These instructions that include both PC and LR in the reglist are deprecated in ARMv6T2 and above.": http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0588b/Babefbce.html . I'm not using both pc and lr in PUSH or POP.
Any specific examples which more tests should be added for the single supported architecture armhf?

lib/CodeGen/XRayInstrumentation.cpp
126	Moving the comments towards the function calls.
lib/Target/ARM/ARMAsmPrinter.cpp
1983	Removing.
lib/Target/ARM/ARMMCInstLower.cpp
181	We save the other registers in the trampoline (`__xray_FunctionEntry` and `__xray_FunctionExit` assembly functions).
187	`POP` is intended to execute after return from the subroutine, which `BLX` calls.

Updated with the changes requested in the comments.

In D23931#530324, @rSerge wrote:

I'm not targeting all ARM architectures at once, at least not in the first commit. I think we should choose 1 ARM architecture for which XRay works, and assume the others not supported or experimental. Currently I am building and experimenting with armhf (32-bit).

Right, "armhf" is not one ARM architecture, but dozens. It can be anything from v6T2 to V8.2A, including all sub-architectures, features and variations. Though, from what I've seen so far, the code you use would work on any architecture of that range.

It would be safer, though, to document the *intended* target specifically, like "ARMv7A with VFPv3 support". So that people with "ARMv6T2 with VFPv2" support are not surprised when you assumed something "wrong" for them. Adding a "hasV6T2Ops()" check on the entry-point would help.

Sled sizes do not have to fit all architectures either (this would result in waste of space for some, thus worse performance due to cache misses). Currently sleds are 11 bytes on x86_64 and 28 bytes on armhf.

Check.

What is PCS ?

Procedure Call Standard. This is the part of the ABI that defines how functions are called to be compatible with the ABI. Mostly about how to serialise arguments and return values in registers, stack, etc.

Both C and C++, as well as any other language that wants to be compatible with ARM's EABI standard *have* to abide to those terms.

Thumb is not supported yet.

Do you mean not supported in the Sled code, or inserting ARM Sled code into Thumb functions?

If the former, then you have to check if the architecture/OS/ABI you're supporting allows ARM code. For instance, Windows doesn't.

If the latter, than you need to check if the architecture/OS/ABI you're supporting allows Thumb code. For instance, there could be libraries around, or even inline assembly with ".thumb" in it (yes, this does happen). I can't remember how, but there's a way to know what's the ISA for a specific function, this could help you. OTOH, this could be an assembler things, can't remember.

Any way, you need to check if the architecture/OS/ISA/ABI you have is compatible with your assumptions before you emit code.

Architectures which do not support BLX are not supported.

Fair enough. But as I said earlier, this has to be clearly encoded (via error messages) on the entry-point of your code.

Any evidence that POP {lr} is deprecated?

Sorry, my bad. I was thinking about a different case. Ignore me.

Any specific examples which more tests should be added for the single supported architecture armhf?

As I said earlier, you need to make sure you only emit your stubs on architectures that you know works. Checking the target for architecture level, ISA support and ABI should be enough, at least on the entry-point.

Adding tests is, then, easily done by having two files: one where everything should fail, RUNning with a "not" before "llc", CHECKing for error messages, and one where everything should pass, CHECKing for the correct sequence of Nops, etc.

It should be fine to add all error messages to one file and all cases that should pass to another.

cheers,
--renato

lib/Target/ARM/ARMMCInstLower.cpp
181	Right, and these are guaranteed to only use one 32-bit argument. Check.
187	D'oh, Branch&Link, sorry, you're correct.
test/CodeGen/ARM/xray-attribute-instrumentation.ll
6	I was expecting Nops...

rSerge marked 6 inline comments as done.Sep 1 2016, 8:31 AM

rSerge added inline comments.

lib/Target/X86/X86AsmPrinter.h
74–94	Do you mean the description for the diff? Or a comment in the source code?
test/CodeGen/ARM/xray-attribute-instrumentation.ll
6	The first instruction is a jump over the NOPs. The other 6 instructions are NOPs.

rSerge updated this object.Sep 1 2016, 8:32 AM

rSerge edited edge metadata.

rengolin added inline comments.Sep 1 2016, 8:36 AM

test/CodeGen/ARM/xray-attribute-instrumentation.ll
6	Right, I was referring to the .ascii... When you use builders, this won't happen any more. It will also work in big-endian. :)

In D23931#530369, @rengolin wrote:

In D23931#530324, @rSerge wrote:

I'm not targeting all ARM architectures at once, at least not in the first commit. I think we should choose 1 ARM architecture for which XRay works, and assume the others not supported or experimental. Currently I am building and experimenting with armhf (32-bit).

Right, "armhf" is not one ARM architecture, but dozens. It can be anything from v6T2 to V8.2A, including all sub-architectures, features and variations. Though, from what I've seen so far, the code you use would work on any architecture of that range.

Thanks for explaining. I am still starting with ARM and LLVM.

It would be safer, though, to document the *intended* target specifically, like "ARMv7A with VFPv3 support". So that people with "ARMv6T2 with VFPv2" support are not surprised when you assumed something "wrong" for them. Adding a "hasV6T2Ops()" check on the entry-point would help.

Ok, I'll try to select something more specific than armhf.

Sled sizes do not have to fit all architectures either (this would result in waste of space for some, thus worse performance due to cache misses). Currently sleds are 11 bytes on x86_64 and 28 bytes on armhf.

Check.

What is PCS ?

Procedure Call Standard. This is the part of the ABI that defines how functions are called to be compatible with the ABI. Mostly about how to serialise arguments and return values in registers, stack, etc.

Both C and C++, as well as any other language that wants to be compatible with ARM's EABI standard *have* to abide to those terms.

Thumb is not supported yet.

Do you mean not supported in the Sled code, or inserting ARM Sled code into Thumb functions?

Neither is supported. I estimated that Thumb support requires substantial additional effort.

If the former, then you have to check if the architecture/OS/ABI you're supporting allows ARM code. For instance, Windows doesn't.

If the latter, than you need to check if the architecture/OS/ABI you're supporting allows Thumb code. For instance, there could be libraries around, or even inline assembly with ".thumb" in it (yes, this does happen). I can't remember how, but there's a way to know what's the ISA for a specific function, this could help you. OTOH, this could be an assembler things, can't remember.

Yes, this looks like a lot of effort.

Any way, you need to check if the architecture/OS/ISA/ABI you have is compatible with your assumptions before you emit code.

Architectures which do not support BLX are not supported.

Fair enough. But as I said earlier, this has to be clearly encoded (via error messages) on the entry-point of your code.

Any evidence that POP {lr} is deprecated?

Sorry, my bad. I was thinking about a different case. Ignore me.

Any specific examples which more tests should be added for the single supported architecture armhf?

As I said earlier, you need to make sure you only emit your stubs on architectures that you know works. Checking the target for architecture level, ISA support and ABI should be enough, at least on the entry-point.

Adding tests is, then, easily done by having two files: one where everything should fail, RUNning with a "not" before "llc", CHECKing for error messages, and one where everything should pass, CHECKing for the correct sequence of Nops, etc.

It should be fine to add all error messages to one file and all cases that should pass to another.

cheers,
--renato

The amount of change requested in the code review seems too much for the first iteration. Can we limit the scope and plan incremental improvements?

Cheers,
Serge

Updated with the latest changes from mainline.

In D23931#531728, @rSerge wrote:

Do you mean not supported in the Sled code, or inserting ARM Sled code into Thumb functions?

Neither is supported. I estimated that Thumb support requires substantial additional effort.

My gut feeling is that this should mostly work already, since you're using BLX instructions.

But I agree, let's not get ahead of ourselves.

Limit support for ARMv7A, non-Windows (which forces Thumb2). Something like:

if (!SubTarget->hasV7Ops() || SubTarget->isWindows())
  return Forgerabarit.

cheers,
--renato

dberris added inline comments.Sep 1 2016, 11:14 PM

lib/Target/X86/X86AsmPrinter.h
74–94	Definitely a description in the diff.

In D23931#531853, @rengolin wrote:

In D23931#531728, @rSerge wrote:

Do you mean not supported in the Sled code, or inserting ARM Sled code into Thumb functions?

Neither is supported. I estimated that Thumb support requires substantial additional effort.

My gut feeling is that this should mostly work already, since you're using BLX instructions.

BLX r12 instruction has different machine code for ARM and Thumb. It is 4 byte long on ARM and 2 byte long on Thumb. Furthermore, the rest of machine code in a sled contains 32-bit ARM instructions. Thumb may need different machine code, or even sequence of instructions because not everything is available in Thumb. To avoid changing trampoline assembly code, the trampoline can be called with BLX indicating that the destination is in ARM assembly.

But I agree, let's not get ahead of ourselves.

Limit support for ARMv7A, non-Windows (which forces Thumb2). Something like:
if (!SubTarget->hasV7Ops() || SubTarget->isWindows())
  return Forgerabarit.

Ok.

Implemented the changes requested in the code review.

Limit support for ARMv7A, non-Windows (which forces Thumb2). Something like:
if (!SubTarget->hasV7Ops() || SubTarget->isWindows())
  return Forgerabarit.

It seems that ARMv6 is sufficient. Implemented mostly as suggested.

lib/Target/ARM/ARMMCInstLower.cpp
158	Changed.
198	Done.
test/CodeGen/ARM/xray-attribute-instrumentation.ll
7	Done.

Hi Serge,

The Nop emission is really simple, and the isXRaySupported() is really simple and accurate. Thanks for addressing all the comments, the code is looking really nice.

Now, two points:

There are ways to report warnings/errors back to the front-end, but it depends how this is interpreted.

Since the instrumentation is inserted by the front end, than this should be a back-end *error*, and front-ends should fail with a decent error message saying "XRay is not supported for target X".

If you want just a warning, you can avoid inserting the sleds and the run-time code won't do anything, as you're doing it now. But you then have to warn the users that they won't get what they requested. I strongly suggest to make it an error instead.

For error messages, it's best to use "getContext().reportError(Loc, ...)", as this would nicely roll back to the front-end without crashing. But if that doesn't work (it should, really), you can use "report_fatal_error", "llvm_unreachable" or even an "assert()", though these are just last-resort only.

About front-end duplicating the checks, it's up to you and @dberris. The error message in Clang and llc should be the same, though, and reportError() does that well.

Tests.

The current test is good, it checks the right number of NOPs and the overall structure. Excellent.

Now we need "negative tests", ie. those that *have* to fail. For that, you add a RUN line that starts with "not llc ..." and CHECK for the error messages. There are plenty of examples in there already.

Since you're restricting x86_64, you should have one for i386. Since you're restricting ARMv6/Unix, you should have one for ARMv5, and one for ARM Windows.

cheers,
--renato

In D23931#533606, @rengolin wrote:

Hi Serge,

The Nop emission is really simple, and the isXRaySupported() is really simple and accurate. Thanks for addressing all the comments, the code is looking really nice.

+1 -- thanks @rSerge!

Now, two points:

There are ways to report warnings/errors back to the front-end, but it depends how this is interpreted.

Since the instrumentation is inserted by the front end, than this should be a back-end *error*, and front-ends should fail with a decent error message saying "XRay is not supported for target X".

If you want just a warning, you can avoid inserting the sleds and the run-time code won't do anything, as you're doing it now. But you then have to warn the users that they won't get what they requested. I strongly suggest to make it an error instead.

For error messages, it's best to use "getContext().reportError(Loc, ...)", as this would nicely roll back to the front-end without crashing. But if that doesn't work (it should, really), you can use "report_fatal_error", "llvm_unreachable" or even an "assert()", though these are just last-resort only.

About front-end duplicating the checks, it's up to you and @dberris. The error message in Clang and llc should be the same, though, and reportError() does that well.

I'm happy with an error using the usual error reporting mechanisms here.

Tests.

The current test is good, it checks the right number of NOPs and the overall structure. Excellent.

Now we need "negative tests", ie. those that *have* to fail. For that, you add a RUN line that starts with "not llc ..." and CHECK for the error messages. There are plenty of examples in there already.

Since you're restricting x86_64, you should have one for i386. Since you're restricting ARMv6/Unix, you should have one for ARMv5, and one for ARM Windows.

I agree with this. FWIW, I'm happy with getting this in and getting it tested, then locking it down with more negative tests once it's upstream.

Thanks Renato!

lib/Target/ARM/ARMAsmPrinter.h
102–106	Do you already want to support tail call optimisation sleds now? Or did you plan to do something about that later?

LGTM (I think we should be fine with adding more tests later)

Thanks again @rSerge!

Thanks!

This revision is now accepted and ready to land.Sep 6 2016, 1:14 AM

rSerge marked 3 inline comments as done.Sep 6 2016, 12:17 PM

This comment was removed by rSerge.

rSerge added a comment.Sep 6 2016, 12:56 PM

This comment was removed by rSerge.

So something started to just remove the first instruction of the sled, whether the sled is emitted as binary or using instructions/builders. Clang -S generates the assembly file with correct sleds (all the instructions present), but then disassembly of the object or executable file shows only 6 last instructions, without the first instruction of the sled.
UPDATE: I just confused the compile options, so assembly files were new and object files were old. No problem with this in the code, tested.

Rebased to the latest revision. I don't have commit access rights. Could someone commit?

I'll do it for all three, thanks again @rSerge!

For some reason the standard arc patch DNNNNN workflow doesn't apply to this patch (I'm not sure if it's generated in a manner not using arcanist). I've had to massage this manually by doing:

curl https://reviews.llvm.org/file/data/spjqzhddatjrbozzbl4u/PHID-FILE-7s4h3zdshadln2e7cgbi/D23931.diff  | git apply - -p0 --ignore-whitespace --whitespace=fix

I may have to do something similar to the other patches, so all landing errors will be mine.

Closed by commit rL280888: [XRay] ARM 32-bit no-Thumb support in LLVM (authored by dberris). · Explain WhySep 7 2016, 5:27 PM

This revision was automatically updated to reflect the committed changes.

Thanks all, especially @dberris .

So, unfortunately this got reverted in rL280967 because it fails on thumb (as the checks hadn't been put in to not generate XRay sleds for non-thumb).

@rSerge -- are you able to put in the appropriate checks to warn when using XRay on thumb? @rengolin has offered to help with the testing on the build-bots to make this possible.

This revision is now accepted and ready to land.Sep 8 2016, 9:24 PM

I think @rengolin has more details as to how this caused failures and how else to debug on thumb.

This revision now requires changes to proceed.Sep 8 2016, 9:25 PM

dberris mentioned this in rL280889: [XRay] ARM 32-bit no-Thumb support in Clang.Sep 8 2016, 9:27 PM

dberris mentioned this in D23932: [XRay] ARM 32-bit no-Thumb support in Clang.

dberris mentioned this in D23933: [XRay] ARM 32-bit no-Thumb support in compiler-rt.Sep 8 2016, 9:31 PM

I don't yet understand how these commits could break build-bots. Did someone add -fxray-instrument clang option to bots which generate Thumb code?

In D23931#538093, @rSerge wrote:

I don't yet understand how these commits could break build-bots. Did someone add -fxray-instrument clang option to bots which generate Thumb code?

Nope. The error was when compiling xray_trampoline_arm.S.

Compiler-RT's patch enables XRay on ARM, which means it'll run all the existing XRay tests on ARM buildbots, which also mean Thumb ones, which also build XRay's sources.

This was the error message:

FAILED: /usr/lib/ccache/cc  -DXRAY_HAS_EXCEPTIONS=1 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Iprojects/compiler-rt/lib/xray -I/home/linaro/devel/buildbot/clang-cmake-thumbv7-a15-full-sh/llvm/projects/compiler-rt/lib/xray -Iinclude -I/home/linaro/devel/buildbot/clang-cmake-thumbv7-a15-full-sh/llvm/include -I/home/linaro/devel/buildbot/clang-cmake-thumbv7-a15-full-sh/llvm/projects/compiler-rt/lib/xray/.. -I/home/linaro/devel/buildbot/clang-cmake-thumbv7-a15-full-sh/llvm/projects/compiler-rt/lib/xray/../../include -fPIC -O3 -DNDEBUG   -UNDEBUG  -march=armv7-a -mfloat-abi=hard -fPIC -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables -fno-stack-protector -fvisibility=hidden -fvisibility-inlines-hidden -fno-function-sections -fno-lto -O3 -g -Wno-variadic-macros -Wno-non-virtual-dtor -MMD -MT projects/compiler-rt/lib/xray/CMakeFiles/clang_rt.xray-armhf.dir/xray_trampoline_arm.S.o -MF projects/compiler-rt/lib/xray/CMakeFiles/clang_rt.xray-armhf.dir/xray_trampoline_arm.S.o.d -o projects/compiler-rt/lib/xray/CMakeFiles/clang_rt.xray-armhf.dir/xray_trampoline_arm.S.o -c /home/linaro/devel/buildbot/clang-cmake-thumbv7-a15-full-sh/llvm/projects/compiler-rt/lib/xray/xray_trampoline_arm.S

llvm/projects/compiler-rt/lib/xray/xray_trampoline_arm.S: Assembler messages:
llvm/projects/compiler-rt/lib/xray/xray_trampoline_arm.S:17: Error: attempt to use an ARM instruction on a Thumb-only processor -- `push {r1-r3,lr}'

My patch didn't work because this is using the system compiler, and not Clang, and GCC was picky about assembling ARM instructions (from xray_trampoline_arm.S) into an object that will be linked with other Thumb-only objects.

This will require some experimentation with a cross GCC/binutils, to make sure that -mthumb won't generate the NOPs as well as not try to link the code. An #ifndef __thumb__ in xray_trampoline_arm.S to omit everything could work, so if the Clang implementation is wrong, we get a compiler error instead of a run-time error.

cheers,
--renato

Now I understand, thanks, @rengolin . Thumb is on my list, though I thought it can be done later. Now I need to weigh whether all the work with conditional compilation and error reporting for Thumb is not too much w.r.t. the time to just implement the support for Thumb. I'm looking into this...

Fixed "Error: attempt to use an ARM instruction on a Thumb-only processor -- `push {r1-r3,lr}' ". The reason was ".arch armv7" directive. This directive for GCC represents the intersection of arm7v-a and armv7-m instruction sets, implying Thumb-only instructions, and this conflicts with ".code 32" directive. Then GCC, instead of articulating the conflict, complains about every instruction in the assembly file.
Tested on cross-compilation with GCC from x86_64-Ubuntu to ARM-Linux.
Tested on cross-compilation with Clang from x86_64-Windows to ARM-Linux.

Fixed patch file format.

In D23931#544741, @rSerge wrote:

Fixed "Error: attempt to use an ARM instruction on a Thumb-only processor -- `push {r1-r3,lr}' ". The reason was ".arch armv7" directive. This directive for GCC represents the intersection of arm7v-a and armv7-m instruction sets, implying Thumb-only instructions, and this conflicts with ".code 32" directive. Then GCC, instead of articulating the conflict, complains about every instruction in the assembly file.

Ah, yes! This makes sense.

Thanks @rSerge -- I'll land this and dependent patches again.

Cheers

This revision is now accepted and ready to land.Sep 18 2016, 5:03 PM

Closed by commit rL281878: [XRay] ARM 32-bit no-Thumb support in LLVM (authored by dberris). · Explain WhySep 18 2016, 6:03 PM

This revision was automatically updated to reflect the committed changes.

rSerge added a child revision: D24799: [XRay] Check in Clang whether XRay supports the target when -fxray-instrument is passed.Sep 21 2016, 7:27 AM

rSerge added a child revision: D25030: [XRay] Support for for tail calls for ARM no-Thumb.Sep 28 2016, 8:49 AM

Revision Contents

Path

Size

include/

llvm/

CodeGen/

	AsmPrinter.h
	AsmPrinter.h (revision 279822)

28 lines

Target/

	Target.td
	Target.td (revision 279822)

10 lines

	TargetOpcodes.def
	TargetOpcodes.def (revision 279822)

17 lines

lib/

CodeGen/

AsmPrinter/

	AsmPrinter.cpp
	AsmPrinter.cpp (revision 279822)

10 lines

	XRayInstrumentation.cpp
	XRayInstrumentation.cpp (revision 279822)

88 lines

Target/

ARM/

	ARMAsmPrinter.h
	ARMAsmPrinter.h (revision 279822)

12 lines

	ARMAsmPrinter.cpp
	ARMAsmPrinter.cpp (revision 279822)

13 lines

	ARMMCInstLower.cpp
	ARMMCInstLower.cpp (revision 279822)

86 lines

X86/

	X86AsmPrinter.h
	X86AsmPrinter.h (revision 279822)

23 lines

	X86MCInstLower.cpp
	X86MCInstLower.cpp (revision 279822)

10 lines

test/

CodeGen/

ARM/

	xray-attribute-instrumentation.ll
	xray-attribute-instrumentation.ll (nonexistent)

24 lines

Diff 69677

include/llvm/CodeGen/AsmPrinter.h

Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	public:
const MCSection *getCurrentSection() const;		const MCSection *getCurrentSection() const;

void getNameWithPrefix(SmallVectorImpl<char> &Name,		void getNameWithPrefix(SmallVectorImpl<char> &Name,
const GlobalValue *GV) const;		const GlobalValue *GV) const;

MCSymbol getSymbol(const GlobalValue GV) const;		MCSymbol getSymbol(const GlobalValue GV) const;

//===------------------------------------------------------------------===//		//===------------------------------------------------------------------===//
		// XRay instrumentation implementation.
		//===------------------------------------------------------------------===//
		public:
		// This describes the kind of sled we're storing in the XRay table.
		enum class SledKind : uint8_t {
		FUNCTION_ENTER = 0,
		FUNCTION_EXIT = 1,
		TAIL_CALL = 2,
		};

		// The table will contain these structs that point to the sled, the function
		// containing the sled, and what kind of sled (and whether they should always
		// be instrumented).
		struct XRayFunctionEntry {
		const MCSymbol *Sled;
		const MCSymbol *Function;
		SledKind Kind;
		bool AlwaysInstrument;
		const class Function *Fn;
		dberrisUnsubmitted Done Reply Inline Actions Do you need to spell out 'class' here? Wouldn't `const Function` suffice? dberris:* Do you need to spell out 'class' here? Wouldn't `const Function*` suffice?
		rSergeAuthorUnsubmitted Not Done Reply Inline Actions I just moved (copy-pasted) this from X86AsmPrinter.h . Without `class` it does not compile because XRayFunctionEntry already has a member wih the same name: `const MCSymbol Function` . rSerge:* I just moved (copy-pasted) this from X86AsmPrinter.h . Without `class` it does not compile…
		};

		// All the sleds to be emitted.
		std::vector<XRayFunctionEntry> Sleds;

		// Helper function to record a given XRay sled.
		void recordSled(MCSymbol *Sled, const MachineInstr &MI, SledKind Kind);

		//===------------------------------------------------------------------===//
// MachineFunctionPass Implementation.		// MachineFunctionPass Implementation.
//===------------------------------------------------------------------===//		//===------------------------------------------------------------------===//

/// Record analysis usage.		/// Record analysis usage.
///		///
void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;

/// Set up the AsmPrinter when we are working on a new module. If your pass		/// Set up the AsmPrinter when we are working on a new module. If your pass
▲ Show 20 Lines • Show All 360 Lines • Show Last 20 Lines

include/llvm/Target/Target.td

Show First 20 Lines • Show All 950 Lines • ▼ Show 20 Lines	def PATCHABLE_FUNCTION_ENTER : Instruction {
let InOperandList = (ins);		let InOperandList = (ins);
let AsmString = "# XRay Function Enter.";		let AsmString = "# XRay Function Enter.";
let usesCustomInserter = 1;		let usesCustomInserter = 1;
let hasSideEffects = 0;		let hasSideEffects = 0;
}		}
def PATCHABLE_RET : Instruction {		def PATCHABLE_RET : Instruction {
let OutOperandList = (outs unknown:$dst);		let OutOperandList = (outs unknown:$dst);
let InOperandList = (ins variable_ops);		let InOperandList = (ins variable_ops);
let AsmString = "# XRay Function Exit.";		let AsmString = "# XRay Function Patchable RET.";
let usesCustomInserter = 1;		let usesCustomInserter = 1;
let hasSideEffects = 1;		let hasSideEffects = 1;
let isReturn = 1;		let isReturn = 1;
}		}
		def PATCHABLE_FUNCTION_EXIT : Instruction {
		let OutOperandList = (outs);
		let InOperandList = (ins);
		let AsmString = "# XRay Function Exit.";
		let usesCustomInserter = 1;
		let hasSideEffects = 0; // FIXME: is this correct?
		dberrisUnsubmitted Not Done Reply Inline Actions AFAICT, yes, this is correct. The expectation is that this instruction should only ever show up in the assembler as a pseudo instruction (unless this is doing something else). dberris: AFAICT, yes, this is correct. The expectation is that this instruction should only ever show up…
		let isReturn = 0; // Original return instruction will follow
		dberrisUnsubmitted Not Done Reply Inline Actions This one is a little harder. At least in x86, we weren't able to get this to work this way, because stack adjustments may happen later than the insertion of the marker instruction. Unless you can control exactly when this instruction is inserted and that the stack adjustment code doesn't ever move this (or add things after this instruction) then you might want to go do the same thing that we're doing in X86. dberris: This one is a little harder. At least in x86, we weren't able to get this to work this way…
		rSergeAuthorUnsubmitted Not Done Reply Inline Actions The same thing as in x86_64 is not possible for ARM because it has multiple return instructions. Furthermore, CPU allows parametrized and even conditional return instructions. In the current ARM implementation we are making use of the fact that currently LLVM doesn't seem to generate conditional return instructions. On ARM, the same instruction can be used for popping multiple registers from the stack and returning (it just pops `pc` register too), and LLVM generates it sometimes. So we can't insert the sled between this stack adjustment and the return without splitting the original instruction into 2 instructions. So on ARM, rather than jumping into the exit trampoline, we call it, it does the tracing, preserves the stack and returns. rSerge: The same thing as in x86_64 is not possible for ARM because it has multiple return instructions.
		}

// Generic opcodes used in GlobalISel.		// Generic opcodes used in GlobalISel.
include "llvm/Target/GenericOpcodes.td"		include "llvm/Target/GenericOpcodes.td"

}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// AsmParser - This class can be implemented by targets that wish to implement		// AsmParser - This class can be implemented by targets that wish to implement
▲ Show 20 Lines • Show All 330 Lines • Show Last 20 Lines

include/llvm/Target/TargetOpcodes.def

	Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines

	/// This is a marker instruction which gets translated into a nop sled, useful			/// This is a marker instruction which gets translated into a nop sled, useful
	/// for inserting instrumentation instructions at runtime.			/// for inserting instrumentation instructions at runtime.
	HANDLE_TARGET_OPCODE(PATCHABLE_FUNCTION_ENTER)			HANDLE_TARGET_OPCODE(PATCHABLE_FUNCTION_ENTER)

	/// Wraps a return instruction and its operands to enable adding nop sleds			/// Wraps a return instruction and its operands to enable adding nop sleds
	/// either before or after the return. The nop sleds are useful for inserting			/// either before or after the return. The nop sleds are useful for inserting
	/// instrumentation instructions at runtime.			/// instrumentation instructions at runtime.
				/// The patch here replaces the return instruction.
	HANDLE_TARGET_OPCODE(PATCHABLE_RET)			HANDLE_TARGET_OPCODE(PATCHABLE_RET)

				/// This is a marker instruction which gets translated into a nop sled, useful
				/// for inserting instrumentation instructions at runtime.
				/// The patch here prepends the return instruction.
				dberrisUnsubmitted Done Reply Inline Actions Is there any reason to do this instead of following the same convention used in x86 of having the nops be after the return instruction? dberris: Is there any reason to do this instead of following the same convention used in x86 of having…
				rSergeAuthorUnsubmitted Not Done Reply Inline Actions Yes, as I've explained above, the problem is that ARM has multiple return instructions, so we have to preserve the original return instruction and call the exit tracing trampoline instead of jumping into it. I'm adding a comment in the code too. rSerge: Yes, as I've explained above, the problem is that ARM has multiple return instructions, so we…
				/// The same thing as in x86_64 is not possible for ARM because it has multiple
				/// return instructions. Furthermore, CPU allows parametrized and even
				/// conditional return instructions. In the current ARM implementation we are
				/// making use of the fact that currently LLVM doesn't seem to generate
				/// conditional return instructions.
				/// On ARM, the same instruction can be used for popping multiple registers
				/// from the stack and returning (it just pops pc register too), and LLVM
				/// generates it sometimes. So we can't insert the sled between this stack
				/// adjustment and the return without splitting the original instruction into 2
				/// instructions. So on ARM, rather than jumping into the exit trampoline, we
				/// call it, it does the tracing, preserves the stack and returns.
				HANDLE_TARGET_OPCODE(PATCHABLE_FUNCTION_EXIT)

	/// The following generic opcodes are not supposed to appear after ISel.			/// The following generic opcodes are not supposed to appear after ISel.
	/// This is something we might want to relax, but for now, this is convenient			/// This is something we might want to relax, but for now, this is convenient
	/// to produce diagnostics.			/// to produce diagnostics.

	/// Generic ADD instruction. This is an integer add.			/// Generic ADD instruction. This is an integer add.
	HANDLE_TARGET_OPCODE(G_ADD)			HANDLE_TARGET_OPCODE(G_ADD)
	HANDLE_TARGET_OPCODE_MARKER(PRE_ISEL_GENERIC_OPCODE_START, G_ADD)			HANDLE_TARGET_OPCODE_MARKER(PRE_ISEL_GENERIC_OPCODE_START, G_ADD)

	▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

lib/CodeGen/AsmPrinter/AsmPrinter.cpp

Show First 20 Lines • Show All 2,600 Lines • ▼ Show 20 Lines	GCMetadataPrinter *AsmPrinter::GetOrCreateGCPrinter(GCStrategy &S) {

report_fatal_error("no GCMetadataPrinter registered for GC: " + Twine(Name));		report_fatal_error("no GCMetadataPrinter registered for GC: " + Twine(Name));
}		}

/// Pin vtable to this file.		/// Pin vtable to this file.
AsmPrinterHandler::~AsmPrinterHandler() {}		AsmPrinterHandler::~AsmPrinterHandler() {}

void AsmPrinterHandler::markFunctionEnd() {}		void AsmPrinterHandler::markFunctionEnd() {}

		void AsmPrinter::recordSled(MCSymbol *Sled, const MachineInstr &MI,
		SledKind Kind) {
		auto Fn = MI.getParent()->getParent()->getFunction();
		auto Attr = Fn->getFnAttribute("function-instrument");
		bool AlwaysInstrument =
		Attr.isStringAttribute() && Attr.getValueAsString() == "xray-always";
		Sleds.emplace_back(
		XRayFunctionEntry{ Sled, CurrentFnSym, Kind, AlwaysInstrument, Fn });
		}

lib/CodeGen/XRayInstrumentation.cpp

Show All 28 Lines
struct XRayInstrumentation : public MachineFunctionPass {		struct XRayInstrumentation : public MachineFunctionPass {
static char ID;		static char ID;

XRayInstrumentation() : MachineFunctionPass(ID) {		XRayInstrumentation() : MachineFunctionPass(ID) {
initializeXRayInstrumentationPass(*PassRegistry::getPassRegistry());		initializeXRayInstrumentationPass(*PassRegistry::getPassRegistry());
}		}

bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

		private:
		// Replace the original RET instruction with the exit sled code ("patchable ret"
		// pseudo-instruction), so that at runtime XRay can replace the sled with a
		// code jumping to XRay trampoline, which calls the tracing handler and, in
		// the end, issues the RET instruction.
		// This is the approach to go on CPUs which have a single RET instruction,
		// like x86/x86_64.
		void replaceRetWithPatchableRet(MachineFunction &MF, const TargetInstrInfo *TII);
		// Prepend the original return instruction with the exit sled code ("patchable
		// function exit" pseudo-instruction), preserving the original return instruction
		// just after the exit sled code.
		// This is the approach to go on CPUs which have multiple options for the return
		// instruction, like ARM. For such CPUs we can't just jump into the XRay trampoline
		// and issue a single return instruction there. We rather have to call the
		// trampoline and return from it to the original return instruction of the
		// function being instrumented.
		dberrisUnsubmitted Done Reply Inline Actions This is a great explanation. Can you say something similar in the description just so it's clear why there's a difference in the approach? dberris: This is a great explanation. Can you say something similar in the description just so it's…
		rengolinUnsubmitted Done Reply Inline Actions Agreed. Probably move the separate comments to their implementations? rengolin: Agreed. Probably move the separate comments to their implementations?
		rSergeAuthorUnsubmitted Not Done Reply Inline Actions I'm adding it to llvm\include\llvm\Target\TargetOpcodes.def . rSerge: I'm adding it to llvm\include\llvm\Target\TargetOpcodes.def .
		void prependRetWithPatchableExit(MachineFunction &MF, const TargetInstrInfo *TII);
};		};
		} // anonymous namespace

		void XRayInstrumentation::replaceRetWithPatchableRet(MachineFunction &MF, const TargetInstrInfo *TII)
		{
		// We look for all terminators and returns, then replace those with
		// PATCHABLE_RET instructions.
		SmallVector<MachineInstr *, 4> Terminators;
		for (auto &MBB : MF) {
		for (auto &T : MBB.terminators()) {
		// FIXME: Handle tail calls here too?
		if (T.isReturn() && T.getOpcode() == TII->getReturnOpcode()) {
		// Replace return instructions with:
		// PATCHABLE_RET <Opcode>, <Operand>...
		auto MIB = BuildMI(MBB, T, T.getDebugLoc(),
		TII->get(TargetOpcode::PATCHABLE_RET))
		.addImm(T.getOpcode());
		for (auto &MO : T.operands())
		MIB.addOperand(MO);
		Terminators.push_back(&T);
		break;
		}
		}
		}

		for (auto &I : Terminators)
		I->eraseFromParent();
		}

		void XRayInstrumentation::prependRetWithPatchableExit(MachineFunction &MF, const TargetInstrInfo *TII)
		{
		for (auto &MBB : MF) {
		for (auto &T : MBB.terminators()) {
		if (T.isReturn()) {
		// Prepend the return instruction with PATCHABLE_FUNCTION_EXIT
		auto MIB = BuildMI(MBB, T, T.getDebugLoc(),
		TII->get(TargetOpcode::PATCHABLE_FUNCTION_EXIT));
		break; //FIXME: is this correct? Can't a MachineBasicBlock have multiple return instructions?
		rengolinUnsubmitted Done Reply Inline Actions Good point. Probably not correct. rengolin: Good point. Probably not correct.
		dberrisUnsubmitted Done Reply Inline Actions Yes, this is definitely not correct. This is a remnant of some refactoring I've done and it stuck around. :( Let me add a test and fix, should be trivial. dberris: Yes, this is definitely not correct. This is a remnant of some refactoring I've done and it…
		dberrisUnsubmitted Done Reply Inline Actions This is now fixed in rL280192 -- please rebase to get the change (and tests). dberris: This is now fixed in rL280192 -- please rebase to get the change (and tests).
		}
		}
		}
}		}

bool XRayInstrumentation::runOnMachineFunction(MachineFunction &MF) {		bool XRayInstrumentation::runOnMachineFunction(MachineFunction &MF) {
auto &F = *MF.getFunction();		auto &F = *MF.getFunction();
auto InstrAttr = F.getFnAttribute("function-instrument");		auto InstrAttr = F.getFnAttribute("function-instrument");
bool AlwaysInstrument = !InstrAttr.hasAttribute(Attribute::None) &&		bool AlwaysInstrument = !InstrAttr.hasAttribute(Attribute::None) &&
InstrAttr.isStringAttribute() &&		InstrAttr.isStringAttribute() &&
InstrAttr.getValueAsString() == "xray-always";		InstrAttr.getValueAsString() == "xray-always";
Show All 13 Lines	bool XRayInstrumentation::runOnMachineFunction(MachineFunction &MF) {
// First, insert an PATCHABLE_FUNCTION_ENTER as the first instruction of the		// First, insert an PATCHABLE_FUNCTION_ENTER as the first instruction of the
// MachineFunction.		// MachineFunction.
auto &FirstMBB = *MF.begin();		auto &FirstMBB = *MF.begin();
auto &FirstMI = *FirstMBB.begin();		auto &FirstMI = *FirstMBB.begin();
auto *TII = MF.getSubtarget().getInstrInfo();		auto *TII = MF.getSubtarget().getInstrInfo();
BuildMI(FirstMBB, FirstMI, FirstMI.getDebugLoc(),		BuildMI(FirstMBB, FirstMI, FirstMI.getDebugLoc(),
TII->get(TargetOpcode::PATCHABLE_FUNCTION_ENTER));		TII->get(TargetOpcode::PATCHABLE_FUNCTION_ENTER));

// Then we look for all terminators and returns, then replace those with		switch (MF.getTarget().getTargetTriple().getArch()) {
// PATCHABLE_RET instructions.		// List here the architectures which don't have a single return instruction
		rengolinUnsubmitted Done Reply Inline Actions nit: this comment is better applied to the function "prependRetWithPatchableExit" after the case. People will know what to do in the future. You don't need a comment on the default case, too. rengolin: nit: this comment is better applied to the function "prependRetWithPatchableExit" after the…
		rSergeAuthorUnsubmitted Not Done Reply Inline Actions Moving the comments towards the function calls. rSerge: Moving the comments towards the function calls.
SmallVector<MachineInstr *, 4> Terminators;		case Triple::ArchType::arm:
for (auto &MBB : MF) {		prependRetWithPatchableExit(MF, TII);
for (auto &T : MBB.terminators()) {
// FIXME: Handle tail calls here too?
if (T.isReturn() && T.getOpcode() == TII->getReturnOpcode()) {
// Replace return instructions with:
// PATCHABLE_RET <Opcode>, <Operand>...
auto MIB = BuildMI(MBB, T, T.getDebugLoc(),
TII->get(TargetOpcode::PATCHABLE_RET))
.addImm(T.getOpcode());
for (auto &MO : T.operands())
MIB.addOperand(MO);
Terminators.push_back(&T);
break;		break;
		// Architectures that have a single return instruction (such as RETQ on x86_64)
		default:
		replaceRetWithPatchableRet(MF, TII);
}		}
}
}

for (auto &I : Terminators)
I->eraseFromParent();

return true;		return true;
}		}

char XRayInstrumentation::ID = 0;		char XRayInstrumentation::ID = 0;
char &llvm::XRayInstrumentationID = XRayInstrumentation::ID;		char &llvm::XRayInstrumentationID = XRayInstrumentation::ID;
INITIALIZE_PASS(XRayInstrumentation, "xray-instrumentation", "Insert XRay ops",		INITIALIZE_PASS(XRayInstrumentation, "xray-instrumentation", "Insert XRay ops",
false, false)		false, false)

lib/Target/ARM/ARMAsmPrinter.h

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	public:
void EmitFunctionEntryLabel() override;		void EmitFunctionEntryLabel() override;
void EmitStartOfAsmFile(Module &M) override;		void EmitStartOfAsmFile(Module &M) override;
void EmitEndOfAsmFile(Module &M) override;		void EmitEndOfAsmFile(Module &M) override;
void EmitXXStructor(const DataLayout &DL, const Constant *CV) override;		void EmitXXStructor(const DataLayout &DL, const Constant *CV) override;

// lowerOperand - Convert a MachineOperand into the equivalent MCOperand.		// lowerOperand - Convert a MachineOperand into the equivalent MCOperand.
bool lowerOperand(const MachineOperand &MO, MCOperand &MCOp);		bool lowerOperand(const MachineOperand &MO, MCOperand &MCOp);

		//===------------------------------------------------------------------===//
		// XRay implementation
		//===------------------------------------------------------------------===//
		public:
		// XRay-specific lowering for ARM.
		void LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI);
		void LowerPATCHABLE_FUNCTION_EXIT(const MachineInstr &MI);
		// Helper function that emits the XRay sleds we've collected for a particular
		// function.
		void EmitXRayTable();
		dberrisUnsubmitted Not Done Reply Inline Actions Do you already want to support tail call optimisation sleds now? Or did you plan to do something about that later? dberris: Do you already want to support tail call optimisation sleds now? Or did you plan to do…

private:		private:
		void EmitSled(const MachineInstr &MI, SledKind Kind);

// Helpers for EmitStartOfAsmFile() and EmitEndOfAsmFile()		// Helpers for EmitStartOfAsmFile() and EmitEndOfAsmFile()
void emitAttributes();		void emitAttributes();

// Generic helper used to emit e.g. ARMv5 mul pseudos		// Generic helper used to emit e.g. ARMv5 mul pseudos
void EmitPatchedInstruction(const MachineInstr *MI, unsigned TargetOpc);		void EmitPatchedInstruction(const MachineInstr *MI, unsigned TargetOpc);

void EmitUnwindingInstruction(const MachineInstr *MI);		void EmitUnwindingInstruction(const MachineInstr *MI);
Show All 32 Lines

lib/Target/ARM/ARMAsmPrinter.cpp

Show First 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	if (Subtarget->isTargetCOFF()) {
OutStreamer->EmitCOFFSymbolStorageClass(Scl);		OutStreamer->EmitCOFFSymbolStorageClass(Scl);
OutStreamer->EmitCOFFSymbolType(Type);		OutStreamer->EmitCOFFSymbolType(Type);
OutStreamer->EndCOFFSymbolDef();		OutStreamer->EndCOFFSymbolDef();
}		}

// Emit the rest of the function body.		// Emit the rest of the function body.
EmitFunctionBody();		EmitFunctionBody();

		// Emit the XRay table for this function.
		EmitXRayTable();

// If we need V4T thumb mode Register Indirect Jump pads, emit them.		// If we need V4T thumb mode Register Indirect Jump pads, emit them.
// These are created per function, rather than per TU, since it's		// These are created per function, rather than per TU, since it's
// relatively easy to exceed the thumb branch range within a TU.		// relatively easy to exceed the thumb branch range within a TU.
if (! ThumbIndirectPads.empty()) {		if (! ThumbIndirectPads.empty()) {
OutStreamer->EmitAssemblerFlag(MCAF_Code16);		OutStreamer->EmitAssemblerFlag(MCAF_Code16);
EmitAlignment(1);		EmitAlignment(1);
for (unsigned i = 0, e = ThumbIndirectPads.size(); i < e; i++) {		for (unsigned i = 0, e = ThumbIndirectPads.size(); i < e; i++) {
OutStreamer->EmitLabel(ThumbIndirectPads[i].second);		OutStreamer->EmitLabel(ThumbIndirectPads[i].second);
▲ Show 20 Lines • Show All 1,810 Lines • ▼ Show 20 Lines	EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::t2LDRi12)
.addReg(ARM::PC)		.addReg(ARM::PC)
.addReg(SrcReg)		.addReg(SrcReg)
.addImm(4)		.addImm(4)
// Predicate		// Predicate
.addImm(ARMCC::AL)		.addImm(ARMCC::AL)
.addReg(0));		.addReg(0));
return;		return;
}		}
		case ARM::PATCHABLE_FUNCTION_ENTER:
		{
		rengolinUnsubmitted Done Reply Inline Actions No need for braces if you're not declaring variables. rengolin: No need for braces if you're not declaring variables.
		rSergeAuthorUnsubmitted Not Done Reply Inline Actions Removing. rSerge: Removing.
		LowerPATCHABLE_FUNCTION_ENTER(*MI);
		return;
		}
		case ARM::PATCHABLE_FUNCTION_EXIT:
		{
		LowerPATCHABLE_FUNCTION_EXIT(*MI);
		return;
		}
}		}

MCInst TmpInst;		MCInst TmpInst;
LowerARMMachineInstrToMCInst(MI, TmpInst, *this);		LowerARMMachineInstrToMCInst(MI, TmpInst, *this);

EmitToStreamer(*OutStreamer, TmpInst);		EmitToStreamer(*OutStreamer, TmpInst);
}		}

Show All 11 Lines

lib/Target/ARM/ARMMCInstLower.cpp

Show All 15 Lines
#include "ARMAsmPrinter.h"		#include "ARMAsmPrinter.h"
#include "MCTargetDesc/ARMBaseInfo.h"		#include "MCTargetDesc/ARMBaseInfo.h"
#include "MCTargetDesc/ARMMCExpr.h"		#include "MCTargetDesc/ARMMCExpr.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Mangler.h"		#include "llvm/IR/Mangler.h"
#include "llvm/MC/MCExpr.h"		#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
		#include "llvm/MC/MCContext.h"
		#include "llvm/MC/MCSymbolELF.h"
		#include "llvm/MC/MCSectionELF.h"
		#include "llvm/MC/MCInstBuilder.h"
using namespace llvm;		using namespace llvm;


MCOperand ARMAsmPrinter::GetSymbolRef(const MachineOperand &MO,		MCOperand ARMAsmPrinter::GetSymbolRef(const MachineOperand &MO,
const MCSymbol *Symbol) {		const MCSymbol *Symbol) {
const MCExpr *Expr =		const MCExpr *Expr =
MCSymbolRefExpr::create(Symbol, MCSymbolRefExpr::VK_None, OutContext);		MCSymbolRefExpr::create(Symbol, MCSymbolRefExpr::VK_None, OutContext);
switch (MO.getTargetFlags() & ARMII::MO_OPTION_MASK) {		switch (MO.getTargetFlags() & ARMII::MO_OPTION_MASK) {
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	if (AP.lowerOperand(MO, MCOp)) {
int32_t Enc = ARM_AM::getSOImmVal(MCOp.getImm());		int32_t Enc = ARM_AM::getSOImmVal(MCOp.getImm());
if (Enc != -1)		if (Enc != -1)
MCOp.setImm(Enc);		MCOp.setImm(Enc);
}		}
OutMI.addOperand(MCOp);		OutMI.addOperand(MCOp);
}		}
}		}
}		}

		// FIXME: Is there a ready way to emit NOPs on ARM?
		rengolinUnsubmitted Done Reply Inline Actions There isn't, as nop is currently only an alias, not an instruction. But take a look at: ARMInstrInfo::getNoopForMachoTarget() and do the same for ELF. rengolin: There isn't, as nop is currently only an alias, not an instruction. But take a look at…
		rSergeAuthorUnsubmitted Not Done Reply Inline Actions Changed. rSerge: Changed.
		static void Emit4ByteNops(MCStreamer& OS, int NumInstructions, const MCSubtargetInfo &STI)
		{
		STI; // would be useful for OS.EmitInstruction(MCInstBuilder(ARM::NOP), STI);
		for (int I = 1; I <= NumInstructions; I++)
		{
		OS.EmitBytes(StringRef( /Little-endian!/ "\x00\xF0\x20\xE3", 4));
		}
		}

		void ARMAsmPrinter::EmitSled(const MachineInstr &MI, SledKind Kind)
		{
		// We want to emit the following pattern:
		//
		// .Lxray_sled_N:
		// ALIGN
		// B #20
		// ; 6 NOP instructions (24 bytes)
		// .tmpN
		//
		// We need the 24 bytes (6 instructions) because at runtime, we'd be patching
		// over the full 28 bytes (7 instructions) with the following pattern:
		//
		// PUSH{ r0, lr }
		rengolinUnsubmitted Done Reply Inline Actions Why just save r0? AAPCS can use all four r0-r3 for return results. rengolin: Why just save r0? AAPCS can use all four r0-r3 for return results.
		rSergeAuthorUnsubmitted Not Done Reply Inline Actions We save the other registers in the trampoline (`__xray_FunctionEntry` and `__xray_FunctionExit` assembly functions). rSerge: We save the other registers in the trampoline (`__xray_FunctionEntry` and `__xray_FunctionExit`…
		rengolinUnsubmitted Done Reply Inline Actions Right, and these are guaranteed to only use one 32-bit argument. Check. rengolin: Right, and these are guaranteed to only use one 32-bit argument. Check.
		// MOVW r0, #<lower 16 bits of function ID>
		// MOVT r0, #<higher 16 bits of function ID>
		// MOVW ip, #<lower 16 bits of address of __xray_FunctionEntry/Exit>
		// MOVT ip, #<higher 16 bits of address of __xray_FunctionEntry/Exit>
		// BLX ip
		// POP{ r0, lr }
		rengolinUnsubmitted Done Reply Inline Actions BLX is unconditional, POP will never be executed. Is that intended? rengolin: BLX is unconditional, POP will never be executed. Is that intended?
		rSergeAuthorUnsubmitted Not Done Reply Inline Actions `POP` is intended to execute after return from the subroutine, which `BLX` calls. rSerge: `POP` is intended to execute after return from the subroutine, which `BLX` calls.
		rengolinUnsubmitted Done Reply Inline Actions D'oh, Branch&Link, sorry, you're correct. rengolin: D'oh, Branch&Link, sorry, you're correct.
		//
		OutStreamer->EmitCodeAlignment(4);
		auto CurSled = OutContext.createTempSymbol("xray_sled_", true);
		OutStreamer->EmitLabel(CurSled);
		auto Target = OutContext.createTempSymbol();

		// Emit "B #20" instruction, which jumps over the next 24 bytes (because
		// register pc is 8 bytes ahead of the jump instruction by the moment CPU
		// is executing it).
		// FIXME: Find another less hacky way do force the relative jump.
		OutStreamer->EmitBytes(StringRef( /Little-endian!/ "\x05\x00\x00\xEA", 4));
		rengolinUnsubmitted Done Reply Inline Actions Please, don't emit binary directly. Use the builders. rengolin: Please, don't emit binary directly. Use the builders.
		rSergeAuthorUnsubmitted Not Done Reply Inline Actions Done. rSerge: Done.
		Emit4ByteNops(*OutStreamer, 6, getSubtargetInfo());
		OutStreamer->EmitLabel(Target);
		recordSled(CurSled, MI, Kind);
		}

		void ARMAsmPrinter::LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI)
		{
		EmitSled(MI, SledKind::FUNCTION_ENTER);
		}

		void ARMAsmPrinter::LowerPATCHABLE_FUNCTION_EXIT(const MachineInstr &MI)
		{
		EmitSled(MI, SledKind::FUNCTION_EXIT);
		}

		void ARMAsmPrinter::EmitXRayTable()
		{
		if (Sleds.empty())
		return;
		if (Subtarget->isTargetELF()) {
		auto *Section = OutContext.getELFSection(
		"xray_instr_map", ELF::SHT_PROGBITS,
		ELF::SHF_ALLOC \| ELF::SHF_GROUP \| ELF::SHF_MERGE, 0,
		CurrentFnSym->getName());
		auto PrevSection = OutStreamer->getCurrentSectionOnly();
		OutStreamer->SwitchSection(Section);
		for (const auto &Sled : Sleds) {
		OutStreamer->EmitSymbolValue(Sled.Sled, 4);
		OutStreamer->EmitSymbolValue(CurrentFnSym, 4);
		auto Kind = static_cast<uint8_t>(Sled.Kind);
		OutStreamer->EmitBytes(
		StringRef(reinterpret_cast<const char *>(&Kind), 1));
		OutStreamer->EmitBytes(
		StringRef(reinterpret_cast<const char *>(&Sled.AlwaysInstrument), 1));
		OutStreamer->EmitZeros(6);
		}
		OutStreamer->SwitchSection(PrevSection);
		}
		Sleds.clear();
		}

lib/Target/X86/X86AsmPrinter.h

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	private:
// CurrentShadowSize counts the number of bytes encoded since the most		// CurrentShadowSize counts the number of bytes encoded since the most
// recently encountered STACKMAP, stopping when that number is greater than		// recently encountered STACKMAP, stopping when that number is greater than
// or equal to RequiredShadowSize.		// or equal to RequiredShadowSize.
unsigned RequiredShadowSize = 0, CurrentShadowSize = 0;		unsigned RequiredShadowSize = 0, CurrentShadowSize = 0;
};		};

StackMapShadowTracker SMShadowTracker;		StackMapShadowTracker SMShadowTracker;

// This describes the kind of sled we're storing in the XRay table.
enum class SledKind : uint8_t {
FUNCTION_ENTER = 0,
FUNCTION_EXIT = 1,
TAIL_CALL = 2,
};

// The table will contain these structs that point to the sled, the function
// containing the sled, and what kind of sled (and whether they should always
// be instrumented).
struct XRayFunctionEntry {
const MCSymbol *Sled;
const MCSymbol *Function;
SledKind Kind;
bool AlwaysInstrument;
const class Function *Fn;
};

// All the sleds to be emitted.
std::vector<XRayFunctionEntry> Sleds;

dberrisUnsubmitted Not Done Reply Inline Actions I think it's worth noting in the description that we're moving the XRay instrumentation support up to AsmPrinter too. dberris: I think it's worth noting in the description that we're moving the XRay instrumentation support…
rSergeAuthorUnsubmitted Not Done Reply Inline Actions Do you mean the description for the diff? Or a comment in the source code? rSerge: Do you mean the description for the diff? Or a comment in the source code?
dberrisUnsubmitted Done Reply Inline Actions Definitely a description in the diff. dberris: Definitely a description in the diff.
// All instructions emitted by the X86AsmPrinter should use this helper		// All instructions emitted by the X86AsmPrinter should use this helper
// method.		// method.
//		//
// This helper function invokes the SMShadowTracker on each instruction before		// This helper function invokes the SMShadowTracker on each instruction before
// outputting it to the OutStream. This allows the shadow tracker to minimise		// outputting it to the OutStream. This allows the shadow tracker to minimise
// the number of NOPs used for stackmap padding.		// the number of NOPs used for stackmap padding.
void EmitAndCountInstruction(MCInst &Inst);		void EmitAndCountInstruction(MCInst &Inst);
void LowerSTACKMAP(const MachineInstr &MI);		void LowerSTACKMAP(const MachineInstr &MI);
Show All 9 Lines	void LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI,
X86MCInstLower &MCIL);		X86MCInstLower &MCIL);
void LowerPATCHABLE_RET(const MachineInstr &MI, X86MCInstLower &MCIL);		void LowerPATCHABLE_RET(const MachineInstr &MI, X86MCInstLower &MCIL);
void LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI, X86MCInstLower &MCIL);		void LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI, X86MCInstLower &MCIL);

// Helper function that emits the XRay sleds we've collected for a particular		// Helper function that emits the XRay sleds we've collected for a particular
// function.		// function.
void EmitXRayTable();		void EmitXRayTable();

// Helper function to record a given XRay sled.
void recordSled(MCSymbol *Sled, const MachineInstr &MI, SledKind Kind);
public:		public:
explicit X86AsmPrinter(TargetMachine &TM,		explicit X86AsmPrinter(TargetMachine &TM,
std::unique_ptr<MCStreamer> Streamer)		std::unique_ptr<MCStreamer> Streamer)
: AsmPrinter(TM, std::move(Streamer)), SM(this), FM(this) {}		: AsmPrinter(TM, std::move(Streamer)), SM(this), FM(this) {}

const char *getPassName() const override {		const char *getPassName() const override {
return "X86 Assembly / Object Emitter";		return "X86 Assembly / Object Emitter";
}		}
Show All 35 Lines

lib/Target/X86/X86MCInstLower.cpp

Show First 20 Lines • Show All 1,017 Lines • ▼ Show 20 Lines	void X86AsmPrinter::LowerPATCHPOINT(const MachineInstr &MI,
unsigned NumBytes = opers.getNumPatchBytes();		unsigned NumBytes = opers.getNumPatchBytes();
assert(NumBytes >= EncodedBytes &&		assert(NumBytes >= EncodedBytes &&
"Patchpoint can't request size less than the length of a call.");		"Patchpoint can't request size less than the length of a call.");

EmitNops(*OutStreamer, NumBytes - EncodedBytes, Subtarget->is64Bit(),		EmitNops(*OutStreamer, NumBytes - EncodedBytes, Subtarget->is64Bit(),
getSubtargetInfo());		getSubtargetInfo());
}		}

void X86AsmPrinter::recordSled(MCSymbol *Sled, const MachineInstr &MI,
SledKind Kind) {
auto Fn = MI.getParent()->getParent()->getFunction();
auto Attr = Fn->getFnAttribute("function-instrument");
bool AlwaysInstrument =
Attr.isStringAttribute() && Attr.getValueAsString() == "xray-always";
Sleds.emplace_back(
XRayFunctionEntry{Sled, CurrentFnSym, Kind, AlwaysInstrument, Fn});
}

void X86AsmPrinter::LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI,		void X86AsmPrinter::LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI,
X86MCInstLower &MCIL) {		X86MCInstLower &MCIL) {
// We want to emit the following pattern:		// We want to emit the following pattern:
//		//
// .p2align 1, ...		// .p2align 1, ...
// .Lxray_sled_N:		// .Lxray_sled_N:
// jmp .tmpN		// jmp .tmpN
// # 9 bytes worth of noops		// # 9 bytes worth of noops
▲ Show 20 Lines • Show All 662 Lines • Show Last 20 Lines

test/CodeGen/ARM/xray-attribute-instrumentation.ll

				; RUN: llc -filetype=asm -o - -mtriple=arm-unknown-linux-gnu < %s \| FileCheck %s

				define i32 @foo() nounwind noinline uwtable "function-instrument"="xray-always" {
				; CHECK-LABEL: Lxray_sled_0:
				; CHECK-NEXT: .ascii "\005\000\000\352"
				; CHECK-NEXT: .ascii "\000\360 \343"
				rengolinUnsubmitted Done Reply Inline Actions I was expecting Nops... rengolin: I was expecting Nops...
				rSergeAuthorUnsubmitted Not Done Reply Inline Actions The first instruction is a jump over the NOPs. The other 6 instructions are NOPs. rSerge: The first instruction is a jump over the NOPs. The other 6 instructions are NOPs.
				rengolinUnsubmitted Done Reply Inline Actions Right, I was referring to the .ascii... When you use builders, this won't happen any more. It will also work in big-endian. :) rengolin: Right, I was referring to the .ascii... When you use builders, this won't happen any more. It…
				; CHECK-NEXT: .ascii "\000\360 \343"
				rSergeAuthorUnsubmitted Not Done Reply Inline Actions Done. rSerge: Done.
				; CHECK-NEXT: .ascii "\000\360 \343"
				; CHECK-NEXT: .ascii "\000\360 \343"
				; CHECK-NEXT: .ascii "\000\360 \343"
				; CHECK-NEXT: .ascii "\000\360 \343"
				; CHECK-LABEL: Ltmp0:
				ret i32 0
				; CHECK-LABEL: Lxray_sled_1:
				; CHECK-NEXT: .ascii "\005\000\000\352"
				; CHECK-NEXT: .ascii "\000\360 \343"
				; CHECK-NEXT: .ascii "\000\360 \343"
				; CHECK-NEXT: .ascii "\000\360 \343"
				; CHECK-NEXT: .ascii "\000\360 \343"
				; CHECK-NEXT: .ascii "\000\360 \343"
				; CHECK-NEXT: .ascii "\000\360 \343"
				; CHECK-LABEL: Ltmp1:
				; CHECK-NEXT: mov pc, lr
				}

This is an archive of the discontinued LLVM Phabricator instance.

[XRay] ARM 32-bit no-Thumb support in LLVMClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 69677

include/llvm/CodeGen/AsmPrinter.h

include/llvm/Target/Target.td

include/llvm/Target/TargetOpcodes.def

lib/CodeGen/AsmPrinter/AsmPrinter.cpp

lib/CodeGen/XRayInstrumentation.cpp

lib/Target/ARM/ARMAsmPrinter.h

lib/Target/ARM/ARMAsmPrinter.cpp

lib/Target/ARM/ARMMCInstLower.cpp

lib/Target/X86/X86AsmPrinter.h

lib/Target/X86/X86MCInstLower.cpp

test/CodeGen/ARM/xray-attribute-instrumentation.ll

[XRay] ARM 32-bit no-Thumb support in LLVM
ClosedPublic