This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt][XRay] Support tail call sleds
ClosedPublic

Authored by dberris on Aug 29 2016, 3:41 AM.

Details

Summary

This change depends on D23986 which adds tail call-specific sleds. For
now we treat them first as normal exits, and in the future leave room
for implementing this as a different kind of log entry.

The reason for deferring the change is so that we can keep the naive
logging implementation more accurate without additional complexity for
reading the log. The accuracy is gained in effectively interpreting call
stacks like:

A()
  B()
    C()

Which when tail-call merged will end up not having any exit entries for
A() nor B(), but effectively in turn can be reasoned about as:

A()
B()
C()

Although we lose the fact that A() had called B() then had called C()
with the naive approach, a later iteration that adds the explicit tail
call entries would be a change in the log format and thus necessitate a
version change for the header. We can do this later to have a chance at
releasing some tools (in D21987) that are able to handle the naive log
format, then support higher version numbers of the log format too.

Diff Detail

Event Timeline

dberris updated this revision to Diff 69552.Aug 29 2016, 3:41 AM
dberris retitled this revision from to [compiler-rt][XRay] Support tail call sleds.
dberris updated this object.
dberris added reviewers: echristo, kcc, rSerge, majnemer.
dberris added a subscriber: llvm-commits.
rSerge edited edge metadata.Sep 22 2016, 10:42 AM

Ping?

I'm reviewing it, though not ready to say something because I haven't yet understood the implementation for tail calls.

rSerge requested changes to this revision.Sep 22 2016, 2:13 PM
rSerge edited edge metadata.

I have another question in https://reviews.llvm.org/D23986 .

lib/xray/xray_interface.cc
266

The comment says about a jump to the exit sled, but the code below uses CallOpCode. Which one is wrong?

This revision now requires changes to proceed.Sep 22 2016, 2:13 PM

So in this implementation the handler doesn't receive XRayEntryType::TAIL type of event, but instead receives XRayEntryType::EXIT, right? Is it also intended for now?

rSerge added a comment.EditedSep 27 2016, 12:53 PM

I've tried seemingly the same approach on ARM with the following program:

#include <cstdio>
#include <cassert>
#include <xray/xray_interface.h>

[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() { 
  std::printf("In fC()\n");
}

[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() { 
  std::printf("In fB()\n");
  fC();
}

[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() { 
  std::printf("In fA()\n");
  fB();
}

// Avoid infinite recursion in case the logging function is instrumented (so calls logging
//   function again).
[[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret)
{
  printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret));
}

int main(int argc, char* argv[]) {
  __xray_set_handler(simplyPrint);

  printf("Patching...\n");
  __xray_patch();
  fA();

  printf("Unpatching...\n");
  __xray_unpatch();       
  fA();

  return 0;
}

For me it gives the following results

Patching...
XRay: functionId=3 type=0.
In fA()
XRay: functionId=3 type=1.
XRay: functionId=2 type=0.
In fB()
XRay: functionId=2 type=1.
XRay: functionId=1 type=0.
XRay: functionId=1 type=1.
In fC()
Unpatching...
In fA()
In fB()
In fC()

So for function fC() the exit sled seems to be called too much before function exit: before printing In fC(). I compiled with the following generic flags: -O3 -g -fxray-instrument -Wall -std=c++14 -ffunction-sections -fdata-sections (this list doesn't include my specific flags like --target=armv7-linux-gnueabihf etc.).

Does the code snippet work correctly for you on x86_64? Or is the above output expected?

So in this implementation the handler doesn't receive XRayEntryType::TAIL type of event, but instead receives XRayEntryType::EXIT, right? Is it also intended for now?

Yes, this is intended for now.

I've tried seemingly the same approach on ARM with the following program:

[snipped]

So for function fC() the exit sled seems to be called too much before function exit: before printing In fC(). I compiled with the following generic flags: -O3 -g -fxray-instrument -Wall -std=c++14 -ffunction-sections -fdata-sections (this list doesn't include my specific flags like --target=armv7-linux-gnueabihf etc.).

Does the code snippet work correctly for you on x86_64? Or is the above output expected?

It's certainly not expected, but I think I can explain this.

Since printf isn't instrumented, this means the implementation of fC() could be tail-calling into printf (which is a perfectly good optimisation to make). Of course this mis-accounts the time but we really can't do anything about that yet, unless we know that we're tail-calling into something else, which will come later when we have different records for tail calls. We should probably do that when we have a more substantial change in the format once some of the stuff I'm working on (to support richer records into the log) goes upstream and increment the file version number.

Does that make sense?

Debugging for me has shown that the above happens because printf from fC is also called as a tail call. So first the exit sled of fC is executed, and only then printf is jumped into. So it seems we can't do anything about this with the current approach.

Debugging for me has shown that the above happens because printf from fC is also called as a tail call. So first the exit sled of fC is executed, and only then printf is jumped into. So it seems we can't do anything about this with the current approach.

This isn't such a bad thing though, given the following construction:

f1()
  f2()
    f3()

If we get f2() tail-calling into f3() and f3() tail calling into something else that isn't instrumented (not shown here), then we will still see:

f1()
  f2()
  f3()
  ... missing ...

However, since f1() is instrumented, we will see that time spent outside of f3() will be attributed to f1().

This is predicated though on having an enclosing function that is instrumented. To see this in your example, you can wrap the call to fA() with another function that will not tail-call into fA().

dberris added inline comments.Sep 28 2016, 8:19 PM
lib/xray/xray_interface.cc
266

The comment is inaccurate -- it should say "call to the exit sled instead". We need to call the exit sled so that it returns to the calling function just before the jump to the tail-called function. Let me update that.

dberris updated this revision to Diff 73434.Oct 4 2016, 2:15 AM
dberris edited edge metadata.
dberris marked 2 inline comments as done.
  • Rebased and redone
dberris updated this revision to Diff 73732.Oct 6 2016, 12:22 AM
dberris edited edge metadata.
  • Rebase
dberris updated this revision to Diff 73745.Oct 6 2016, 3:13 AM
  • Fix bad merge
rSerge accepted this revision.Oct 13 2016, 12:10 PM
rSerge edited edge metadata.
This revision is now accepted and ready to land.Oct 13 2016, 12:10 PM

Sorry, it's apparently too easy to miss activity on Phabricator. The emails get filtered together with LLVM mailing list piles.

This revision was automatically updated to reflect the committed changes.