This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt] [XRay] Basic initialization and flag definition for XRay runtime
AbandonedPublic

Authored by dberris on Jun 22 2016, 10:29 AM.

Details

Summary

This patch implements the initialisation and patching routines for the XRay runtime, along with the necessary trampolines for function entry/exit handling. For now we only define the basic hooks for allowing an implementation to define a handler that gets run on function entry/exit. We expose a minimal API for controlling the behaviour of the runtime (patching, cleanup, and setting the handler to invoke when instrumenting).

Depends on D19904

Diff Detail

Repository
rL LLVM

Event Timeline

dberris updated this revision to Diff 61570.Jun 22 2016, 10:29 AM
dberris retitled this revision from to Work-in-Progress compiler-rt prototype for XRay runtime..
dberris added reviewers: echristo, kcc.
dberris updated this object.
dberris added a subscriber: llvm-commits.
dberris updated this revision to Diff 61610.Jun 22 2016, 2:32 PM
  • Add support for function exits
dberris updated this revision to Diff 61930.Jun 26 2016, 9:00 PM
  • Use COMDAT merged section names instead of adhoc globals for instrumentation map.
dberris updated this revision to Diff 62061.Jun 27 2016, 11:37 PM
  • Use rdtscp more correctly to get timestamp and cpuid
dberris updated this revision to Diff 62159.Jun 28 2016, 6:02 PM
  • Add automatic patching in init for demo
rnk added a subscriber: rnk.Jul 7 2016, 11:31 AM
rnk added inline comments.
include/CMakeLists.txt
1 ↗(On Diff #62717)

Maybe rename this to COMPILER_RT_HEADERS, since it's not just sanitizers?

include/xray/xray_interface.h
17–19 ↗(On Diff #62717)

Are you sure you want to use a C++ scoped enum in this interface? The whole thing is extern "C", it might be nicer to make it a C style enum, so you can use it in the handler prototype instead of unsigned short.

45–47 ↗(On Diff #62717)

Maybe this should be another enum? XRayPatchingStatus?

lib/xray/xray_interface.cc
33–52 ↗(On Diff #62717)

This inline asm is too fragile. There's nothing to say the compiler won't use R10 to load Fn. I think this should write a standalone .s file like xray_hooks_x86_64.s, and then we can add _arm.s, _aarch64.s, etc as needed. There's existing cmake for adding .s files in lib/builtins.

37 ↗(On Diff #62717)

RBP is already a CSR, why do we need to save it?

lib/xray/xray_interface_internal.h
29–32 ↗(On Diff #62717)

We aren't allowed to include xray/xray_interface.h for these?

45 ↗(On Diff #62717)

ditto, it'd be nice to get this from the public interface

dberris marked 4 inline comments as done.Jul 7 2016, 11:46 PM

Thanks for the review Reid, I'm going to clean this up a bit more later with your suggestions from the mailing list on configuration and other things. In the meantime, please have another look at what's now here.

Cheers

include/CMakeLists.txt
1 ↗(On Diff #62717)

Good point. Unfortunately the install path is still .../include/sanitizer -- I've added a new variable instead, COMPILER_RT_HEADERS which aggregates SANITIZER_HEADERS and XRAY_HEADERS, and another install rule for xray headers to go into .../include/xray.

Does this make sense?

include/xray/xray_interface.h
17–19 ↗(On Diff #62717)

Good point, switched to using a C-style enum instead.

lib/xray/xray_interface.cc
37 ↗(On Diff #62717)

That's because this function isn't typically directly called -- this is jumped into unconditionally. Without stashing the RBP, it gets lost when the function call later on returns. Experimentation and hours of debugging led to this state of having to stash the RBP properly.

lib/xray/xray_interface_internal.h
29–32 ↗(On Diff #62717)

I've had to add an include directory directive to include the public header, but it works now. It's a little different from how we're doing it in the other sanitisers.

dberris updated this revision to Diff 63178.Jul 7 2016, 11:47 PM
dberris marked 2 inline comments as done.
  • Address review comments
dberris added inline comments.Jul 7 2016, 11:49 PM
lib/xray/xray_interface.cc
34–53 ↗(On Diff #63178)

Sorry, I forgot to address this one.

I'll need to have a look at how this is done in other places, but did you mean using the assembler to also generate the call to the function pointer? Or having this whole routine be implemented in assembly?

rnk added inline comments.Jul 12 2016, 8:49 AM
lib/xray/xray_interface.cc
34–53 ↗(On Diff #63178)

Yeah, I would recommend calling the function pointer in with assembly. I guess it's a little crappy that the assembly has to know the numeric value of XRayEntryType::EXIT, but I can't think of a good way to solve that.

dberris updated this revision to Diff 64099.Jul 15 2016, 12:30 AM
  • Rebase
  • Remove demo parts, define some flags
dberris retitled this revision from Work-in-Progress compiler-rt prototype for XRay runtime. to [compiler-rt] [XRay] Basic initialization and flag definition for XRay runtime.Jul 15 2016, 12:36 AM
dberris updated this object.

I've added one flag, and the beginnings of having more flags to be supported going forward.

This is now ready for another look.

lib/xray/xray_interface.cc
35–54 ↗(On Diff #64099)

I thought about this a little more, and I think it's going to be a bit harder to implement a few of the things being done here in pure assembler. I'm sure I can do it, but I'd rather do it in a separate change, after this one lands, if that's alright with you?

dberris updated this revision to Diff 64102.Jul 15 2016, 1:07 AM
  • Actually initialize the flags
rnk added inline comments.Jul 15 2016, 11:28 AM
lib/xray/xray_interface.cc
36–55 ↗(On Diff #64102)

I actually think this is important enough to get right the first time. Inline asm has scarred me enough times already.

38 ↗(On Diff #64102)

OK, but the prologue will push RBP if it is clobbered anywhere in this function, and if it isn't, the prologue of 'Fn' will preserve it.

62 ↗(On Diff #64102)

What about saving and restoring XMM registers? Those are used for parameters and return values, and are usually considered to be scratch. Is it assumed that XRay authors will carefully audit their code to avoid SSE instructions? Even memcpy uses SSE these days, though.

69–70 ↗(On Diff #64102)

I guess you only save and restore RAX/RDX since those are use for return values. I'm sure other calling conventions can break this assumption, but we can leave it this way for now if you want.

echristo added inline comments.Jul 15 2016, 2:20 PM
lib/xray/xray_interface.cc
37 ↗(On Diff #64102)

Meta comment: Typically inline asm can be a single block rather than N statements of 1 per instruction.

dberris updated this revision to Diff 64276.Jul 17 2016, 11:21 PM
dberris marked 3 inline comments as done.
  • Implement the trampolines in assembler
  • Tweak dependencies and implementation to rely on hand-written assembler
lib/xray/xray_interface.cc
36–55 ↗(On Diff #64102)

Good call. It didn't turn out to be as bad as I thought, though I did need to reverse-engineer what the compiler does for a bit to get it "right". :)

37 ↗(On Diff #64102)

Thanks -- I tried that but I didn't know how to properly write out the operands when I tried, so I split them out. :D

Now this is all in its own file, so it shouldn't be that big of a problem anymore.

62 ↗(On Diff #64102)

Yeah, that's a tricky one. I've been thinking about this too and I think it's worth saving/restoring those just to be safe. Unfortunately that might mean there's a bit more overhead as far as runtime and space requirements is concerned.

69–70 ↗(On Diff #64102)

Added a FIXME in the assembly file to address at a later point.

dberris updated this revision to Diff 64280.Jul 18 2016, 12:11 AM
  • Typo on filename
dberris updated this revision to Diff 64298.Jul 18 2016, 4:36 AM
  • Fix flag parsing to support verbosity
dberris updated this revision to Diff 64634.Jul 19 2016, 9:17 PM
  • Put the function id on the stack for the argument

This is now ready for another look -- thanks in advance!

rnk accepted this revision.Jul 20 2016, 6:58 AM
rnk edited edge metadata.

Thanks for messing around with the asm parts, it's a pain. lgtm with nits

lib/xray/xray_trampoline_x86.S
27 ↗(On Diff #64635)

This is the unwind info you should have after the sub:

.cfi_def_cfa_offset 88
.cfi_offset %rbp, -16
43–44 ↗(On Diff #64635)

Can this be movl %r10d, %edi? The spill seems unnecessary.

This revision is now accepted and ready to land.Jul 20 2016, 6:58 AM
dberris updated this revision to Diff 64684.Jul 20 2016, 7:17 AM
dberris marked an inline comment as done.
dberris edited edge metadata.
  • movl %r10d directly into %edi instead of spilling

Thanks Reid! Yeah, it's fun though hacking on low-level things like this (for me) so I take it as a learning opportunity. :)

This revision was automatically updated to reflect the committed changes.
rnk added a comment.Jul 20 2016, 7:57 AM

This doesn't build on Windows:
http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/14021/steps/build%20stage%201/logs/stdio

I also don't think it's tested on Mac, so we should find a way to avoid building it. There should be some XRAY_SUPPORTED_OS cmake variable or something else we can use to avoid building it.

rnk added a comment.Jul 20 2016, 8:06 AM

I tried to fix this in r276124, but you might want to make sure that does the right thing everywhere, I only tested that it locally doesn't build xray on Windows anymore.

Thanks Reid -- I was a little hesitant to lump XRay along with the other sanitizers, but I suspect it's easier to do that right now. I'll check to see whether it's possible to make this a little smarter based on the OS as well.

rSerge reopened this revision.Jul 25 2016, 4:57 AM
rSerge edited edge metadata.

It looks like there are 2 places where the second mprotect (returning the executable code block permissions to no-modify) can leak in compiler-rt/trunk/lib/xray/xray_interface.cc . Consider using RAII idiom. See my comments inline.

This revision is now accepted and ready to land.Jul 25 2016, 4:57 AM
rSerge requested changes to this revision.Jul 25 2016, 5:02 AM
rSerge edited edge metadata.
rSerge added inline comments.
compiler-rt/trunk/lib/xray/xray_interface.cc
126

mprotect() leaks due to this (so that the executable code block stays modifiable).

162

mprotect() leaks due to this (so that the executable code block stays modifiable).

This revision now requires changes to proceed.Jul 25 2016, 5:02 AM
dberris marked 2 inline comments as done.Jul 25 2016, 7:56 AM

Thanks rSerge, I've created D22757 which makes the changes.

This has been submitted already, BTW -- the changes in D22757 apply to trunk. Please review that instead.

Is xray_trampoline_x86.S for x86_64 only or is it supposed to work for both x86 and x86_64? If the latter, then there seems to be a bug with always using RETQ, rather than RETL for x86.

rSerge added inline comments.Jul 26 2016, 12:58 PM
compiler-rt/trunk/lib/xray/xray_trampoline_x86.S
90

Shouldn't it be RETL for x86 and RETQ for x86_64 ? The original instruction in the compile-time sled could be RETL (then we should do RETL too) or RETQ (then we should do RETQ too), I think.

rnk added inline comments.Jul 26 2016, 1:00 PM
compiler-rt/trunk/lib/xray/xray_trampoline_x86.S
90

This is definitely x86_64 only code. Perhaps this file should be renamed xray_trampoline_x86_64.S

rSerge added inline comments.Jul 26 2016, 1:06 PM
compiler-rt/trunk/lib/xray/xray_trampoline_x86.S
90

Please, see llvm/trunk/lib/Target/X86/X86InstrInfo.cpp at https://reviews.llvm.org/D19904 : the instrumentation seems to be performed for both x86 and x86_64 (at least, both RETL and RETQ are marked for replacement), so I'm afraid that on x86 the code from the current file (which is x86_64 only) may try to execute.

rnk added inline comments.Jul 26 2016, 1:16 PM
compiler-rt/trunk/lib/xray/xray_trampoline_x86.S
90

Sure, but this whole assembly file won't assemble for 32-bit. It uses RSP, RDX, R10, pushq, etc, and those are 64-bit only. We'd need a separate .s file to really support 32-bit.

rSerge accepted this revision.Jul 26 2016, 1:18 PM
rSerge edited edge metadata.
rSerge added inline comments.
compiler-rt/trunk/lib/xray/xray_trampoline_x86.S
90

Ok, understood, thanks.

This revision is now accepted and ready to land.Jul 26 2016, 1:18 PM
dberris added inline comments.Jul 26 2016, 10:04 PM
compiler-rt/trunk/lib/xray/xray_trampoline_x86.S
90

Sure, but this whole assembly file won't assemble for 32-bit. It uses RSP, RDX, R10, pushq, etc, and those are 64-bit only. We'd need a separate .s file to really support 32-bit.

Yes, along with different patching routines (we assume certain instruction widths), and probably different sleds too.

I'll rename the file in D21982 as well to make it unambiguous that this is x86_64.

eugenis added inline comments.
compiler-rt/trunk/lib/xray/CMakeLists.txt
45

Remove this chunk, it can be added later with the tests commit.

compiler-rt/trunk/lib/xray/xray_init.cc
26

I think this extern is unnecessary

36

Is it unimplemented in this revision? /proc/xx/maps may be inaccessible under sandbox, would dl_iterate_phdr work for this?

compiler-rt/trunk/lib/xray/xray_interface.cc
65

Does it need to be atomic at all? It looks like you are already synchronized on XRayPatching.

67

Do you need to reset XRayPatching to false? Also in multiple places below.

74

out of curiosity, why std:: instead of just "size_t" ?

dberris abandoned this revision.Jul 28 2016, 2:13 AM

Thanks for the review Evgeniy -- I'll abandon this revision (so that no further reviews happen here, as this has already been submitted upstream) but I'll prepare another change that addresses some of the suggestions here.

rSerge added a comment.EditedJul 30 2016, 5:07 AM

According to https://www.lri.fr/~filliatr/ens/compil/x86-64.pdf :

Floating arguments (up to 8) are passed in SSE registers %xmm0, %xmm1, ..., %xmm7. Additional arguments, if needed, are passed in stack slots. When calling a function that takes a variable number of arguments (notably printf) or lacks a prototype, byte register %al must be set before the call to indicate how many of the %xmm registers are used. A floating point return value is returned in %xmm0. All the %xmm registers are caller-save.

compiler-rt/trunk/lib/xray/xray_trampoline_x86.S
45

I think that if the tracing callback (stored in xray::XRayPatchedFunction ) performs floating-point operations, then the floating-point parameters passed to the instrumented function in SSE registers (%xmm0 to %xmm7) can get corrupted so that the instrumented function gets garbage on input. Shouldn't xray_FunctionEntry push %xmm0...%xmm7 to the stack upon entering and pop them upon exit?

83

The floating-point return value of the instrumented function may be in %xmm0 SSE register. So if xray::XRayPatchedFunction (or its callees) performs floating-point operation, it may clobber the register thus the instrumented function would return garbage. Shouldn't xray_FunctionExit push %xmm0 to the stack on entry and pop it on exit?

dberris marked an inline comment as done.Jul 31 2016, 11:14 PM

Yes Serge, they will get clobbered. We can change this to save the XMM registers too quite easily, but in a different patch. Let me put that in the list of things to do for me. :)

rSerge added a comment.Aug 2 2016, 1:44 PM

Because "jmp +9" and "ret" instructions (as I understood) are not 2-byte aligned, I doublt atomicity of replacing them with 2-byte instructions of the runtime patches. Could you take a look?

compiler-rt/trunk/lib/xray/xray_interface.cc
131–133

Can it happen that the 2 machine code bytes of "jmp +9" instruction belong to different 8-byte blocks of memory? I think that in this case the operation will not be atomic: the CPU will have to write different 8-byte blocks of memory in different cycles. Please, consider aligning "jmp +9" instruction at least at a multiple of 2 bytes.

167–169

Can it happen that the 2 machine code bytes of "mov r10d, <function id>" instruction belong to different 8-byte blocks of memory? I think that in this case the operation will not be atomic: the CPU will have to write different 8-byte blocks of memory in different cycles. Please, consider aligning "RET" instruction at least at a multiple of 2 bytes. Otherwise the CPU executing the function on a different core may get the first byte of "mov r10d, <function id>" and the second byte (and consequtive bytes) of whatever was earlier behing the "RET" instruction.

Thanks Serge -- would you consider perhaps not sending comments onto this patch anymore? We can continue more of this on either one of the dependent changes or on the llvm-dev mailing list. I'd very much appreciate that.

PS. These sleds were implemented in D19904, you can see the changes in lowering at http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86MCInstLower.cpp?r1=275272&r2=275367&pathrev=275367&diff_format=h.

compiler-rt/trunk/lib/xray/xray_interface.cc
131–133

We already align the function start sled to 2-byte alignment.

167–169

This we don't do yet. It should be easy to do. I'll send a patch to address exactly this concern.