Download Raw Diff

Details

Reviewers

kcc
eugenis
pcc
echristo
sanjoy
rnk

Commits

rG52735fc435b1: XRay: Add entry and exit sleds
rL275367: XRay: Add entry and exit sleds

Summary

In this patch we implement the following parts of XRay:

Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches.
Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts).
X86-specific nop sleds as described in the white paper.
A machine function pass that adds the different instrumentation marker instructions at a very late stage.
A way of identifying which return opcode is considered "normal" for each architecture.

There are some caveats here:

We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet.

The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library.

Diff Detail

Event Timeline

dberris updated this revision to Diff 56097.May 4 2016, 12:01 AM

dberris retitled this revision from to XRay: Add entry and exit sleds.

dberris updated this object.

dberris added reviewers: echristo, sanjoy.

dberris set the repository for this revision to rL LLVM.

dberris added a subscriber: llvm-commits.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptMay 4 2016, 12:01 AM

BTW, It's worth mentioning that this is a prototype implementation and we're fine with making large changes for this patch.

mcrosier added a subscriber: mcrosier.May 4 2016, 6:09 AM

mcrosier added a subscriber: bmakam.May 4 2016, 6:09 AM

emaste added a subscriber: emaste.May 4 2016, 2:16 PM

Some minor comments inline. I'm not familiar with all of the intricacies of the backend, so I'll CC some other people who are likely to have more perspective on this.

include/llvm/Target/Target.td
957 ↗	(On Diff #56097)	Looks like this isn't used yet? If so, I'd suggest dropping it completely for now.
lib/CodeGen/XRayInstrumentation.cpp
80	Why do you care only about register operands here?
lib/Target/X86/X86MCInstLower.cpp
1068 ↗	(On Diff #56097)	Why not AlwaysInstrument = !Attr.hasAttribute(Attribute::None) && Attr.isStringAttribute() && Attr.getValueAsString() == "xray-always"; ?
1069 ↗	(On Diff #56097)	I don't quite grok the Attributes API, but I think the `!Attr.hasAttribute(Attribute::None)` bit is redundant -- if `isStringAttribute` return `true`, then `Attr.hasAttribute(Attribute::None)` will be false.
1072 ↗	(On Diff #56097)	emplace_back ?
1100 ↗	(On Diff #56097)	Indent is off.

sanjoy edited edge metadata.May 5 2016, 3:40 PM

sanjoy added subscribers: rnk, atrick, majnemer.

majnemer added inline comments.May 5 2016, 4:27 PM

lib/CodeGen/XRayInstrumentation.cpp
72–77	I don't think you want to replace whacky constructs like `EH_RETURN` and `CLEANUPRET` with `PATCHABLE_RET`, do you? Just "normal" returns, right?

Address comments from sanjoy and majnemer

lib/CodeGen/XRayInstrumentation.cpp
72–77	Yes, that's right -- does this version exclude those appropriately?
80	Oops, I had assumed that the RET instructions only ever had register operands. That's a faulty assumption, good catch.

majnemer added inline comments.May 6 2016, 10:37 AM

lib/CodeGen/XRayInstrumentation.cpp
72–76	I would recommend white-listing the opcodes you know are safe to replace instead of blacklisting the ones you know which are unsafe. This would make the pass conservatively correct in the face of future changes to LLVM.

dberris added inline comments.May 8 2016, 9:19 PM

lib/CodeGen/XRayInstrumentation.cpp
72–76	That makes sense, thanks majnemer -- however it's not clear to me how I'd refer to the target-specific opcodes from here. Pardon the newbie question, but how do I say "I just want return instructions here"? The only options I can see are: Look at the original LLVM IR to see if it's a ReturnInst. Figure out a way for including the X86 opcodes here. Maybe I'm missing something simpler here?

Update operand unpacking properly.

clang-format

majnemer added inline comments.May 8 2016, 10:25 PM

lib/CodeGen/XRayInstrumentation.cpp
72–76	The typical way of doing this sort of thing, AFAICT, is to teach `TargetInstrInfo` about it. For example, it implements `getCatchReturnOpcode` and `getCallFrameSetupOpcode`.

Add a check in TargetInstrInfo for whether a return is a "normal" return.

We also implement this somewhat correctly in X86, but use the default
implementation for other platforms.

dberris marked an inline comment as done.May 8 2016, 11:06 PM

dberris added inline comments.

lib/CodeGen/XRayInstrumentation.cpp
72–76	Thanks! I've updated the patch to do this. PTAL?

Fix the uploaded diff.

majnemer added inline comments.May 9 2016, 7:45 AM

include/llvm/Target/TargetInstrInfo.h
155–166 ↗	(On Diff #56531)	I was thinking more of a whilelist-oriented solution, something like: `unsigned getNormalReturnOpcode() const { return NormalRetOpcode; }`

bmakam added inline comments.May 9 2016, 9:56 AM

lib/Target/X86/X86MCInstLower.cpp
1082 ↗	(On Diff #56531)	Could you please expand on why you need 9 bytes of noops here? I am not quite familiar with x86_64 but was under the impression that on x86_64 the jmp instruction is 1 byte for opcode and 4 bytes for signed relative displacement, so shouldn't 5 bytes worth of nops be sufficient?

dberris added inline comments.May 9 2016, 7:10 PM

include/llvm/Target/TargetInstrInfo.h
155–166 ↗	(On Diff #56531)	But how does this work on platforms that can spell 'return' many different normal ways (like in X86)? There's RETL, RETQ, and all other versions of RET which are considered "normal"?
lib/Target/X86/X86MCInstLower.cpp
1082 ↗	(On Diff #56531)	Good question, thanks. I have to check whether we're using the right version of JMP, but I'm specifically looking for the version that's one byte for the JMP instruction, and 8 bits (1 byte) for the relative offset. So far I haven't been able to spell `jmp +0x09` correctly and have it work, without having an additional symbol as a target for the jump instruction. If we get that right, we can then add the 9 byte nops we need to get exactly 11 bytes for the function entry. Is there a fool-proof way of spelling "JMP +0x09" with the builder interface? Or should I add another JMP instruction in X86 that supports the 8-bit displacement immediate operand? The reason why I can't use a JMP that isn't 2 bytes, is because it's really hard to write just 5 bytes atomically. I can probably do something with an 8-byte atomic write, but enforcing that 8-byte write doesn't span cache lines is also very tricky to make safe.

dberris added inline comments.May 9 2016, 9:49 PM

lib/Target/X86/X86MCInstLower.cpp

1082 ↗

(On Diff #56531)

Actually now that I've had a look at the generated object file, I can confirm that we're using the two-byte version of JMP with this construct. Quick test:

test.cc:

#include <cstdio>

[[clang::xray_always_instrument]] void foo() { std::printf("Hello, XRay!\n"); }

int main(int argc, char* argv[]) { foo(); }

Compiled with (modified clang to emit IR that has annotated functions for XRay):

./bin/clang -fxray-instrument -fxray-instruction-threshold=1 -std=c++11 -x c++ -S test.cc -emit-llvm

Creates this IR listing:

; ModuleID = 'test.cc'
source_filename = "test.cc"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [14 x i8] c"Hello, XRay!\0A\00", align 1

; Function Attrs: uwtable
define void @_Z3foov() #0 {
entry:
  %call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([14 x i8], [14 x i8]* @.str, i32 0, i32 0))
  ret void
}

declare i32 @printf(i8*, ...) #1

; Function Attrs: norecurse uwtable
define i32 @main(i32 %argc, i8** %argv) #2 {
entry:
  %argc.addr = alloca i32, align 4
  %argv.addr = alloca i8**, align 8
  store i32 %argc, i32* %argc.addr, align 4
  store i8** %argv, i8*** %argv.addr, align 8
  call void @_Z3foov()
  ret i32 0
}

attributes #0 = { uwtable "disable-tail-calls"="false" "function-instrument"="xray-always" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { norecurse uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "xray-instruction-threshold"="1" }

!llvm.ident = !{!0}

!0 = !{!"clang version 3.9.0 (http://llvm.org/git/clang.git 2b9ed9227330789f24e56fe9e800e7be2111073b) (http://llvm.org/git/llvm.git 1f3e353113ff9e2f835955bf005a8a5e25f16ad1)"}

Then compiled+disassembled this way:

./bin/llc -filetype=obj -o - < test.ll | ./bin/llvm-objdump -disassemble -

Produces the following output:

<stdin>:        file format ELF64-x86-64

Disassembly of section .text:
_Z3foov:
       0:       eb 09   jmp     9 <_Z3foov+0xB>
       2:       66 0f 1f 84 00 00 02 00 00      nopw    512(%rax,%rax)
       b:       55      pushq   %rbp
       c:       48 89 e5        movq    %rsp, %rbp
       f:       bf 00 00 00 00  movl    $0, %edi
      14:       31 c0   xorl    %eax, %eax
      16:       e8 00 00 00 00  callq   0 <_Z3foov+0x1B>
      1b:       5d      popq    %rbp
      1c:       c3      retq
      1d:       2e 66 0f 1f 84 00 00 02 00 00   nopw    %cs:512(%rax,%rax)
      27:       66 0f 1f 84 00 00 00 00 00      nopw    (%rax,%rax)

main:
      30:       eb 09   jmp     9 <main+0xB>
      32:       66 0f 1f 84 00 00 02 00 00      nopw    512(%rax,%rax)
      3b:       55      pushq   %rbp
      3c:       48 89 e5        movq    %rsp, %rbp
      3f:       48 83 ec 10     subq    $16, %rsp
      43:       89 7d fc        movl    %edi, -4(%rbp)
      46:       48 89 75 f0     movq    %rsi, -16(%rbp)
      4a:       e8 b1 ff ff ff  callq   -79 <_Z3foov>
      4f:       31 c0   xorl    %eax, %eax
      51:       48 83 c4 10     addq    $16, %rsp
      55:       5d      popq    %rbp
      56:       c3      retq
      57:       2e 66 0f 1f 84 00 00 02 00 00   nopw    %cs:512(%rax,%rax)

Add documentation on why we need 9 bytes of nops

dberris marked an inline comment as done.May 10 2016, 9:02 PM

dberris added inline comments.

lib/Target/X86/X86MCInstLower.cpp
1082 ↗	(On Diff #56850)	Updated the comments now to make it clear and specific. Thanks!

majnemer added inline comments.May 10 2016, 9:03 PM

include/llvm/Target/TargetInstrInfo.h
155–166 ↗	(On Diff #56850)	The same way it is handled for `X86::ADJCALLSTACKDOWN64` vs `X86::ADJCALLSTACKDOWN32`: https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86InstrInfo.cpp#L105

dberris added inline comments.May 10 2016, 9:08 PM

include/llvm/Target/TargetInstrInfo.h
155–166 ↗	(On Diff #56850)	That makes a lot of sense, thanks! I'll rework this a bit to make it more of a targeted specific OpCode. I suppose it's fine to have multiples of these functions, each for relevant flavours of fairly common instruction classes, yes? So I suppose, for `tail call` exits/returns, we would have something similar for that?

Make ReturnOpcode an argument to TargetInstrInfo constructor and whitelist one ReturnOpcode instead

dberris marked an inline comment as done.May 10 2016, 10:15 PM

dberris added inline comments.

include/llvm/Target/TargetInstrInfo.h
157–168 ↗	(On Diff #56851)	Updated now to white-list a specific return OpCode.

dberris updated this object.May 12 2016, 2:27 AM

dberris added reviewers: rnk, eugenis, kcc, pcc.

Ping -- is there something else missing for this change?

Marked more inline comments as "Done".

Drive by comment.

include/llvm/Target/TargetOpcodes.def
145 ↗	(On Diff #56851)	Remove the XRay specific stuff here and rewrite in a more generic fashion (and below).

Update documentation to remove XRay references

dberris marked an inline comment as done.May 17 2016, 7:50 PM

dberris added inline comments.

include/llvm/Target/TargetOpcodes.def
145 ↗	(On Diff #57556)	Done

dberris added a child revision: D20352: Add XRay flags to Clang.May 17 2016, 10:34 PM

Ping?

Sync to master again.

Fixes after merge
Emit a common symbol for the instrumentation map
Improve table creation and allow access to the table via xray_instr_map and xray_instr_map_end
Fix the sled type encoding

niravd added a subscriber: niravd.Jun 22 2016, 6:17 AM

dberris added a child revision: D21612: [compiler-rt] [XRay] Basic initialization and flag definition for XRay runtime.Jun 22 2016, 10:29 AM

Use an explicit byte sequence for the jmp

dberris added a parent revision: D21743: Convenience function for MCContext to get named sections when lowering..Jun 26 2016, 8:58 PM

Make XRay write out tables per function

Clean up local history, squashing to a single revision (rebased too)

Rebase
Add ELF::SHF_MERGE to flags for XRay section
Update tests to reflect adjustments in implementation

Un-break XRay section creation.

dberris mentioned this in D20352: Add XRay flags to Clang.Jul 5 2016, 10:36 PM

lgtm

lib/Target/X86/X86MCInstLower.cpp
1051 ↗	(On Diff #62716)	Does XRay support multiple DSOs in the process? If so, the XRay runtime may be more than 2GB away from the code being patched, and this offset will overflow.
1060–1061 ↗	(On Diff #62716)	I guess we can deal with this separately. IMO we should have a 'jmpb' instruction or something that forces a short jump or assembler error if the displacement is too large.

This revision is now accepted and ready to land.Jul 12 2016, 9:18 AM

Thanks Reid -- do you or echristo@ mind landing this for me? I don't think I have commit powers yet.

lib/Target/X86/X86MCInstLower.cpp
1051 ↗	(On Diff #62716)	That's true -- the runtime patching code will fail to patch certain sleds that are farther than 32-bits away. In case we need more, in the future we can tweak this so that we can try to force the XRay runtime trampoline in the first 2GB of the process' memory and use an "absolute" address (which may need more bytes in the sled), or have a mode to support larger binaries (a flag that changes the emitted sleds, and have the runtime support both kinds of sleds). I suspect if it comes up in practice that we can deal with this more effectively.
1060–1061 ↗	(On Diff #62716)	I'm happy to do this as a refactoring or a follow-up change. A new instruction sounds like the right action to take anyway.

Rebase

Closed by commit rL275367: XRay: Add entry and exit sleds (authored by dberris). · Explain WhyJul 13 2016, 9:13 PM

This revision was automatically updated to reflect the committed changes.

dberris mentioned this in D21612: [compiler-rt] [XRay] Basic initialization and flag definition for XRay runtime.Aug 2 2016, 4:56 PM

rSerge added a child revision: D23931: [XRay] ARM 32-bit no-Thumb support in LLVM.Aug 26 2016, 9:59 AM

This is an archive of the discontinued LLVM Phabricator instance.

XRay: Add entry and exit sleds
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 56525

lib/CodeGen/XRayInstrumentation.cpp

This is an archive of the discontinued LLVM Phabricator instance.

XRay: Add entry and exit sledsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 56525

lib/CodeGen/XRayInstrumentation.cpp

XRay: Add entry and exit sleds
ClosedPublic