This is an archive of the discontinued LLVM Phabricator instance.

[XRay][compiler-rt+llvm] Update XRay register stashing semantics
ClosedPublic

Authored by dberris on Dec 6 2017, 5:30 AM.

Details

Summary

This change expands the amount of registers stashed by the entry and
__xray_CustomEvent trampolines.

We've found that since the __xray_CustomEvent trampoline calls can show up in
situations where the scratch registers are being used, and since we don't
typically want to affect the code-gen around the disabled
__xray_customevent(...) intrinsic calls, that we need to save and restore the
state of even the scratch registers in the handling of these custom events.

Diff Detail

Repository
rL LLVM

Event Timeline

dberris created this revision.Dec 6 2017, 5:30 AM
dberris updated this revision to Diff 125872.Dec 6 2017, 6:50 PM
dberris retitled this revision from [XRay][compiler-rt] Update XRay trampoline CFI and saving/restoring semantics to [XRay][compiler-rt+llvm] Update XRay trampoline CFI and register stashing semantics.
dberris edited the summary of this revision. (Show Details)

Re-title and update the definition of the customevent intrinsics to signal side-effects.

dberris updated this revision to Diff 125873.Dec 6 2017, 6:54 PM

Fix minor stack adjustment

dberris added a subscriber: dblaikie.

Adding @dblaikie to the review for the CFI directives (please feel free to nominate someone else who might know better).

The problem I'm having in particular here is the computation in the ALIGNED_CALL_RAX macro, where we're doing stack re-alignment just before the call to the custom event handler function (line 77). Working offline with a teammate here in Sydney, we're prototyping some potential ways of doing this correctly but using .cfi_escape ... and providing raw DWARF opcodes. Ideally I wouldn't have to go through such lengths, but without the correct CFI directives and/or DWARF information, debugging XRay instrumentation handlers and the trampolines becomes really hard (since the CFI directives would probably be wrong).

Before I go uploading a work-in-progress implementation here, is there an example somewhere of how to potentially resolve this in assembler?

One sticking point is the fact that there seems to be no way to get the current CFI offset from the assembler using either a directive or a special macro -- at least not one I can find from the assembler manuals.

Pointers would be most appreciated.

davide added a subscriber: davide.Dec 7 2017, 9:57 AM

I think in this context CFI stands for Call Frame information rather than Control Flow Integrity.

davide added a comment.Dec 7 2017, 9:59 AM

Probably Eric or Adrian can review this.

dberris updated this revision to Diff 126305.Dec 10 2017, 8:11 PM

Rebase and back-out the CFI details. We can follow-up on those as the semantics changes are more important at this time.

dberris updated this revision to Diff 126306.Dec 10 2017, 8:37 PM

Actually back out the CFI-specific changes.

dberris updated this revision to Diff 126309.Dec 10 2017, 11:33 PM
  • fixup: only stash r10 and r11 when calling __xray_CustomEvent
  • fixup: mark that no registers are preserved by the call to the intrinsic

This is now ready for a look @dblaikie -- or if you know someone else who might know more about the lowering magic involved in SelectionDAG for X86 in particular, a redirect would be most appreciated too.

Yeah, can't say I know much of anything about this. @echristo might, or might be able to point to someone who would be suitable.

dberris added a subscriber: chandlerc.

Thanks @dblaikie -- @echristo, when you have time, would really appreciate some thoughts on the following:

  • The register stashing strategy for the custom event intrinsic -- whether it should stash all, some, and/or maybe do something else.
  • In the SelectionDAG implementation, attempt to make the call to the intrinsic as non-disruptive to optimisations and register allocation as possible, or whether to go the current state and just signal that the intrinsic may very well use all the registers anyway.

I have a slight preference to incur the cost of stashing all/most of the registers in the trampoline when XRay is enabled and we're doing custom event handling, rather than affect the codegen around the call to the intrisic. But I understand that there might be a middleground that I haven't quite explored yet (or still can't begin to express properly).

Thoughts?

Adding in @chandlerc as well in case, for more feedback.

A bit late, but I have some concerns here.

I think the problem is that you were clobbering a bunch of registers that weren't "available" for you to clobber, but if you set this up as a normal calling convention they'll be saved - which I don't think you want to do. How about just saving all of the registers explicitly in your trampoline? If not, then you might just want to make it a normal call and save/restore over it? Given the goals of the work I'd probably lean toward the former. I think if you clobber all of the registers in call lowering you're just making the surrounding code worse?

Thoughts?

-eric

Also the register saves strike me as a somewhat odd optimization - you could also have just thrown this into SAVE_REGISTERS I think?

-eric

dberris planned changes to this revision.Jan 29 2018, 10:18 AM

A bit late, but I have some concerns here.

I think the problem is that you were clobbering a bunch of registers that weren't "available" for you to clobber, but if you set this up as a normal calling convention they'll be saved - which I don't think you want to do. How about just saving all of the registers explicitly in your trampoline? If not, then you might just want to make it a normal call and save/restore over it? Given the goals of the work I'd probably lean toward the former. I think if you clobber all of the registers in call lowering you're just making the surrounding code worse?

Thoughts?

I agree that we should try not to disturb the surrounding code when we lower the custom event instrumentation points. I'll update it to do all register stashing/restoration in the trampoline.

Also the register saves strike me as a somewhat odd optimization - you could also have just thrown this into SAVE_REGISTERS I think?

Good point. Yes, it's much cleaner that way, thanks!

dberris updated this revision to Diff 131917.Jan 29 2018, 7:56 PM
  • fixup: Stash all registers in trampolines
dberris updated this revision to Diff 132020.Jan 30 2018, 11:57 AM
  • fixup: Remove unnecessary comments
echristo added inline comments.Jan 30 2018, 4:39 PM
llvm/include/llvm/CodeGen/TargetLowering.h
2541 ↗(On Diff #132020)

AFAICT you don't use TRI in the implementation?

dberris updated this revision to Diff 132082.Jan 30 2018, 4:54 PM
dberris marked an inline comment as done.
  • fixup: Remove unused variable in API
llvm/include/llvm/CodeGen/TargetLowering.h
2541 ↗(On Diff #132020)

Good catch, yes -- removed.

dberris updated this revision to Diff 132083.Jan 30 2018, 5:03 PM
dberris retitled this revision from [XRay][compiler-rt+llvm] Update XRay trampoline CFI and register stashing semantics to [XRay][compiler-rt+llvm] Update XRay register stashing semantics.
dberris edited the summary of this revision. (Show Details)

Re-title to reflect the changes better.

Ready for another look now @echristo -- thanks in advance!

echristo accepted this revision.Jan 31 2018, 5:50 PM

One inline comment then LGTM.

-eric

compiler-rt/lib/xray/xray_trampoline_x86_64.S
240 ↗(On Diff #132083)

Remove.

This revision is now accepted and ready to land.Jan 31 2018, 5:50 PM
This revision was automatically updated to reflect the committed changes.
dberris marked an inline comment as done.