This is an archive of the discontinued LLVM Phabricator instance.

[XRay] Support for for tail calls for ARM no-Thumb
ClosedPublic

Authored by rSerge on Sep 28 2016, 8:45 AM.

Details

Summary

This patch adds simplified support for tail calls on ARM with XRay instrumentation.

Known issue: compiled with generic flags: -O3 -g -fxray-instrument -Wall -std=c++14 -ffunction-sections -fdata-sections (this list doesn't include my specific flags like --target=armv7-linux-gnueabihf etc.), the following program

#include <cstdio>
#include <cassert>
#include <xray/xray_interface.h>

[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() { 
  std::printf("In fC()\n");
}

[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() { 
  std::printf("In fB()\n");
  fC();
}

[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() { 
  std::printf("In fA()\n");
  fB();
}

// Avoid infinite recursion in case the logging function is instrumented (so calls logging
//   function again).
[[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret)
{
  printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret));
}

int main(int argc, char* argv[]) {
  __xray_set_handler(simplyPrint);

  printf("Patching...\n");
  __xray_patch();
  fA();

  printf("Unpatching...\n");
  __xray_unpatch();       
  fA();

  return 0;
}

gives the following output:

Patching...
XRay: functionId=3 type=0.
In fA()
XRay: functionId=3 type=1.
XRay: functionId=2 type=0.
In fB()
XRay: functionId=2 type=1.
XRay: functionId=1 type=0.
XRay: functionId=1 type=1.
In fC()
Unpatching...
In fA()
In fB()
In fC()

So for function fC() the exit sled seems to be called too much before function exit: before printing In fC().
Debugging shows that the above happens because printf from fC is also called as a tail call. So first the exit sled of fC is executed, and only then printf is jumped into. So it seems we can't do anything about this with the current approach (i.e. within the simplification described in https://reviews.llvm.org/D23988 ).

Diff Detail

Repository
rL LLVM

Event Timeline

rSerge updated this revision to Diff 72832.Sep 28 2016, 8:45 AM
rSerge retitled this revision from to [XRay] Support for for tail calls for ARM no-Thumb.
rSerge updated this object.
rSerge added reviewers: dberris, rnk, sanjoy, echristo.
rSerge added subscribers: llvm-commits, iid_iunknown.
sanjoy resigned from this revision.Sep 30 2016, 12:10 PM
sanjoy removed a reviewer: sanjoy.

I don't think I'm qualified to review this patch, getting it out of my queue.

dberris requested changes to this revision.Oct 4 2016, 10:07 PM
dberris edited edge metadata.

Why is this doing something different from the x86_64 implementation? We'd like to be able to, in the future, define different semantics from a tail exit, which means we need to distinguish the kind of exit is happening (tail call is just one, another is for exceptions, etc.) and this makes the arm implementation semantically different.

This revision now requires changes to proceed.Oct 4 2016, 10:07 PM

As I understood, x86_64 implementation for the tail calls reuses the trampoline of function exit tracing, but normal function exit tracing jumps into the trampoline, while tail call tracing calls the exit trampoline as a function. Because on ARM normal function exit tracing already calls the exit trampoline as a function, there was no need for additional switching between jump and call instruction in the patch. Thus for ARM, to reproduce the same functionality as for x86_64, it was sufficient to check whether the instruction is a tail call.

As I understood, x86_64 implementation for the tail calls reuses the trampoline of function exit tracing, but normal function exit tracing jumps into the trampoline, while tail call tracing calls the exit trampoline as a function. Because on ARM normal function exit tracing already calls the exit trampoline as a function, there was no need for additional switching between jump and call instruction in the patch. Thus for ARM, to reproduce the same functionality as for x86_64, it was sufficient to check whether the instruction is a tail call.

This doesn't identify whether an exit is a tail call exit though (from the instrumentation map perspective), which will cause the runtime to treat it like a normal exit. This is the current state as a transitional point to when we change the entry type being passed onto the logging function, to differentiate between normal exits and tail exits. The point of the change in X86 is to make this transition staged, instead of abrupt -- lay down the sleds, differentiate them in the instrumentation map, and then later on handle the tail calls differently.

rSerge updated this revision to Diff 74768.Oct 15 2016, 3:50 AM
rSerge edited edge metadata.

Implemented more staging for the real tail call implementation.

dberris accepted this revision.Oct 16 2016, 10:20 PM
dberris edited edge metadata.

Thanks @rSerge -- let me know if you want me to submit on your behalf again (or if you've already gotten repository write access). :)

This revision is now accepted and ready to land.Oct 16 2016, 10:20 PM

Thanks @rSerge -- let me know if you want me to submit on your behalf again (or if you've already gotten repository write access). :)

@dberris , yes, please deliver it to mainline. I haven't yet got the write access.

This revision was automatically updated to reflect the committed changes.

Done @rSerge -- in the future, it would make it much easier/simpler to commit this if you use arcanist to create the patches (on git or svn).

If you're manually making the diffs, please use -p1 format (have a "before" directory and "after" directory), and run against clang-format before uploading. This way we don't introduce whitespace issues that later need to be fixed up (trailing whitespace being the biggest offender).

If you're using git it should be easy to create a patch using git diff.

Cheers

Done @rSerge -- in the future, it would make it much easier/simpler to commit this if you use arcanist to create the patches (on git or svn).

If you're manually making the diffs, please use -p1 format (have a "before" directory and "after" directory), and run against clang-format before uploading. This way we don't introduce whitespace issues that later need to be fixed up (trailing whitespace being the biggest offender).

If you're using git it should be easy to create a patch using git diff.

Cheers

Ok, I'll try to setup one of those. I am making the patches as suggested by the developer policy: svn diff --diff-cmd=diff -x -U999999 > llvm.diff
Thanks for delivering!