This is an archive of the discontinued LLVM Phabricator instance.

[XRay][CodeGen] Use PIC-friendly code in XRay sleds; remove synthetic references in .text
ClosedPublic

Authored by dberris on Aug 11 2017, 7:34 AM.

Details

Summary

This change achieves two things:

  • Redefine the Custom Event handling instrumentation points emitted by the compiler to not require dynamic relocation of references to the __xray_CustomEvent trampoline.
  • Remove the synthetic reference we emit at the end of a function that we used to keep auxiliary sections alive in favour of SHF_LINK_ORDER associated with the section where the function is defined.

To achieve the custom event handling change, we've had to introduce the
concept of sled versioning -- this will need to be supported by the
runtime to allow us to understand how to turn on/off the new version of
the custom event handling sleds. That change has to land first before we
change the way we write the sleds.

To remove the synthetic reference, we rely on a relatively new linker
feature that preserves the sections that are associated with each other.
This allows us to limit the effects on the .text section of ELF
binaries.

Because we're still using absolute references that are resolved at
runtime for the instrumentation map (and function index) maps, we mark
these sections write-able. In the future we can re-define the entries in
the map to use relative relocations instead that can be statically
determined by the linker. That change will be a bit more invasive so we
defer this for later.

Depends on D36816.

Event Timeline

dberris created this revision.Aug 11 2017, 7:34 AM
dblaikie edited edge metadata.Aug 11 2017, 8:52 AM

Don't know enough about ELF attributes to review the point of this patch, sorry.

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
2793–2796

Unrelated reformatting? (or I can't spot what the change is on these lines)

2823–2828

Repetitious - reformat to a common function, perhaps?

dberris updated this revision to Diff 111470.Aug 16 2017, 11:29 PM
dberris retitled this revision from [XRay][CodeGen] Imbue attributes to the labels for XRay instrumentation to [XRay][CodeGen] Use PIC-friendly code in XRay sleds; remove synthetic references.
dberris edited the summary of this revision. (Show Details)
dberris added a reviewer: pcc.

Missing test changes; re-titling and updating for review. Next update should contain updated (and new) tests.

pcc added inline comments.Aug 17 2017, 1:36 PM
lib/CodeGen/AsmPrinter/AsmPrinter.cpp
2779

This function (and the other places where you are doing the same thing) seems very questionable. In all cases the symbol is a temporary symbol, which means that it should not even appear in the symbol table, and I would expect these attribute assignments to have no effect. In fact if I patch in your change and build an object file like this:

$ cat foo.cpp
int main() {
  __xray_customevent(0, 0);
}
$ clang++ -fPIE -pie foo.cpp -fxray-instrument -c

I get exactly the same object file if I remove all of these things.

2800

I don't think this is right. I think you need to create a *separate* xray_instr_map and xray_fn_idx section for each function you emit. Otherwise the assembler will put every xray table in the same section and link the section with the first function section that you emit. That will mean that if the first function is dropped, the entire xray section is dropped. I can reproduce this with lld like this:

$ cat foo.cpp 
void f() {
  __xray_customevent(0, 0);
}

int main() {
  __xray_customevent(0, 0);
}
$ clang++ -fPIE -pie foo.cpp -fuse-ld=lld -fxray-instrument   -ffunction-sections -Wl,--gc-sections
$ readelf -S a.out | grep xray
[no output]

If I swap the order of main and f, the xray sections are kept (and f is kept alive because of references from the xray sections).

You can create multiple sections with the same name by incrementing the UniqueID that you pass to getELFSection.

dberris updated this revision to Diff 111613.Aug 17 2017, 9:56 PM
dberris marked 2 inline comments as done.
dberris retitled this revision from [XRay][CodeGen] Use PIC-friendly code in XRay sleds; remove synthetic references to [XRay][CodeGen] Use PIC-friendly code in XRay sleds; remove synthetic references in .text.
dberris edited the summary of this revision. (Show Details)

Reword and update implementation, this time with tests.

dberris marked 2 inline comments as done.Aug 17 2017, 10:08 PM

Thanks, @pcc -- PTAL?

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
2823–2828

Looks like we didn't need to do that anyway, removed.

dberris updated this revision to Diff 111621.Aug 17 2017, 11:07 PM
dberris marked an inline comment as done.

Rebase

pcc added inline comments.Aug 18 2017, 5:09 PM
lib/CodeGen/AsmPrinter/AsmPrinter.cpp
2807

Remove this comment.

2844

Remove this part of the comment.

lib/Target/X86/X86MCInstLower.cpp
1094

(This is somewhat orthogonal to this patch, but...)

Is this code reachable? The reg-imm form of the mov instruction uses a variable number of bytes depending on the width of the immediate (I think MOV64ri will always select the 64-bit variant which uses 10 bytes), so if the reg-imm form is selected, the jmp instruction above will jump to the wrong instruction. Since this can't possibly work (nor could it have worked before), maybe assert that the operand is a register?

1102

Here you are pushing rdi and rsi onto the stack whereas you weren't doing that before.

  1. I don't understand why the pushes are necessary now when they weren't before.
  2. Since you are pushing one more value than before, is the stack still aligned correctly?
test/CodeGen/Mips/xray-section-group.ll
17 ↗(On Diff #111621)

This test should make sure that the .section directive specifies a unique ID and associated section.
https://llvm.org/docs/Extensions.html#id1

27 ↗(On Diff #111621)

Same here for the unique ID.

dberris updated this revision to Diff 111890.Aug 20 2017, 8:29 AM
dberris marked 5 inline comments as done.

Address comments

dberris added inline comments.Aug 20 2017, 8:29 AM
lib/Target/X86/X86MCInstLower.cpp
1102

Because this is an intrinsic being lowered, defined in include/llvm/IR/Instrinsics.td, we aren't quite able to provide a mask of the registers that will be clobbered/used by this routine. While we haven't quite run into bugs without stashing the %rsi and %rdi registers here (we do actually do the right thing in the trampoline), We also haven't been defining the calling convention for calls to this intrinsic (which we couldn't define for the intrinsic too). I think I'm just being a bit cautious here.

The alignment question is an interesting one. Because it's an intrinsic being lowered, and that we're already past the stack adjustment phase (and side-stepping the call-lowering code for normal call instructions), we haven't quite been tracking the stack adjustments here. What we do assume is that if the stack was aligned already, pushing two 64-bit values onto the stack will keep it aligned -- then the trampoline starts running, assuming that the stack is set up properly.

Do you have suggestions on how else to do/test this?

pcc added inline comments.Aug 21 2017, 5:31 PM
lib/Target/X86/X86MCInstLower.cpp
1099

You can replace this if statement with the "then" body.

1102

Note that the callee is allowed to clobber more registers than just %rsi and %rdi, see the full list here: https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf

If the idea is that the __xray_CustomEvent function has a custom "callee-save almost all registers" calling convention, I'd expect you to need to save all registers in __xray_CustomEvent not marked as "preserved across function calls". It does at least seem to be necessary to save %rdi and %rsi in the function itself though.

For stack alignment, I don't think you can always assume %rsp to be properly aligned at the point where you emit the __xray_CustomEvent call. Here's a quick counterexample that shows that we can either align the stack correctly (in the case of foo) or incorrectly (in the case of bar) when calling the handler.

#include <xmmintrin.h>

__attribute__((weak)) __m128 f(__m128 *i) {
  return *i;
}

void foo() {
  __xray_customevent(0, 0);
  __m128 v = {};
  f(&v);
}

void bar() {
  __xray_customevent(0, 0);
}

If you compile this you should see that foo subtracts 24 from %rsp before calling __xray_CustomEvent, while bar does not subtract anything.

I'm no backend expert, so I don't know how to fix this properly, but I imagine that you could always realign the stack in the custom event handler function.

In terms of testing, the function foo above is an example of a function that requires the stack to be aligned, so you could try writing an integration test that uses something like that as a custom event handler function.

dberris updated this revision to Diff 112115.Aug 21 2017, 10:51 PM
dberris marked 2 inline comments as done.
  • fixup: remove unnecessary if condition
dberris added inline comments.Aug 21 2017, 10:52 PM
lib/Target/X86/X86MCInstLower.cpp
1102

Cool stuff!

I made the changes in D36816 to stash more registers in __xray_CustomEvent. We already have a special calling convention as you suggest (which is documented in the trampoline implementation). I'm not sure though whether the alignment in the case of the empty function does cause issues -- we use call instructions and already have a special calling convention for the trampolines.

pcc accepted this revision.Aug 22 2017, 11:10 AM

LGTM

lib/Target/X86/X86MCInstLower.cpp
1102

Thanks, I've left some comments on that review.

This revision is now accepted and ready to land.Aug 22 2017, 11:10 AM
This revision was automatically updated to reflect the committed changes.