This is an archive of the discontinued LLVM Phabricator instance.

[TailCall] Disable tail call if the callee function contain __builtin_frame_address or __builtin_return_address
AbandonedPublic

Authored by shiva0217 on May 7 2019, 8:03 PM.

Details

Reviewers
None
Summary

Enabling tail call may remove the frame pointer and return address restoration in caller which will make the above two builtin functions get incorrect value if the depth parameter > 0.

E.g.

void __attribute__((noinline)) *callee (char *p) {
    return __builtin_frame_address (1);
}
void *caller (void) {
    char * save = (char*) alloca (4);
    return callee (save);
}

Diff Detail

Repository
rL LLVM

Event Timeline

shiva0217 created this revision.May 7 2019, 8:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 7 2019, 8:03 PM
asb added a comment.May 8 2019, 2:09 AM

The description of the llvm.frameaddress and llvm.returnaddress intrinsics seems to indicate that these are "best effort" and LLVM doesn't really guarantee a correct result for a depth > 1 https://llvm.org/docs/LangRef.html#llvm-returnaddress-intrinsic https://llvm.org/docs/LangRef.html#llvm-returnaddress-intrinsic

Is there a particular use case that is improved by improving the quality of frameaddress/returnaddress results?

In D61665#1494592, @asb wrote:

The description of the llvm.frameaddress and llvm.returnaddress intrinsics seems to indicate that these are "best effort" and LLVM doesn't really guarantee a correct result for a depth > 1 https://llvm.org/docs/LangRef.html#llvm-returnaddress-intrinsic https://llvm.org/docs/LangRef.html#llvm-returnaddress-intrinsic

Is there a particular use case that is improved by improving the quality of frameaddress/returnaddress results?

Hi Alex,
There is a test case gcc/testsuite/gcc.c-torture/execute/20010122-1.c in gcc testsuite. GCC will not generate tail call and can get the correct builtin function results for depth > 0 in this case.

I have a few concerns here:

  1. Looping over every instruction in a function is expensive, and makes any pass which checks this for every call in a function take quadratic time overall.
  2. You can't inspect the body of a function pointer, or a function in a different translation unit, so we can't make this work consistently.
  3. Even in the same translation unit, how do we "preserve" the behavior for values greater than 1?

I'd prefer to just leave the current behavior if it isn't causing any practical problems. The user can always use -fno-optimize-sibling-calls if their codebase needs it for some reason.

hfinkel added a subscriber: hfinkel.May 8 2019, 2:11 PM

I have a few concerns here:

  1. Looping over every instruction in a function is expensive, and makes any pass which checks this for every call in a function take quadratic time overall.
  2. You can't inspect the body of a function pointer, or a function in a different translation unit, so we can't make this work consistently.
  3. Even in the same translation unit, how do we "preserve" the behavior for values greater than 1?

I'd prefer to just leave the current behavior if it isn't causing any practical problems. The user can always use -fno-optimize-sibling-calls if their codebase needs it for some reason.

If we do wish to make our "best effort" contain more effort, I think that we'd want to do this during function-attribute inference - there we can iterate over the call graph and add some inhibiting attributes. That having been said, if the only use case we have for this is matching some portion of GCC's heuristic for the purpose of making their test case pass, I'm not sure that this is worthwhile.

shiva0217 abandoned this revision.May 8 2019, 6:51 PM
shiva0217 added a subscriber: eli.friedman.

I have a few concerns here:

  1. Looping over every instruction in a function is expensive, and makes any pass which checks this for every call in a function take quadratic time overall.

It may too expansive, using function attribute as Hal's comment would be a better approach.

  1. You can't inspect the body of a function pointer, or a function in a different translation unit, so we can't make this work consistently.

Yes, we can't detect the function in a different translation unit or pointed by a function pointer.

  1. Even in the same translation unit, how do we "preserve" the behavior for values greater than 1?

Yes, disabling tail call can only preserve the stack in depth 1, it may have other optimizations change the behavior of the depth greater than one.


I'd prefer to just leave the current behavior if it isn't causing any practical problems. The user can always use -fno-optimize-sibling-calls if their codebase needs it for some reason.

If we do wish to make our "best effort" contain more effort, I think that we'd want to do this during function-attribute inference - there we can iterate over the call graph and add some inhibiting attributes. That having been said, if the only use case we have for this is matching some portion of GCC's heuristic for the purpose of making their test case pass, I'm not sure that this is worthwhile.

I think the consensus is to leave the current behavior. Thanks @eli.friedman pointed out the cases the patch can't cover and the better approach suggested by @hfinkel if we would like to do so.

asb added a comment.May 9 2019, 12:00 AM

Yes, I think leaving the current behaviour probably makes sense. I encountered that test failure too and mask that test on the basis that that those builtins aren't documented as being guaranteed to work https://gcc.gnu.org/onlinedocs/gcc/Return-Address.html (that page even notes that crashing is allowable behaviour).

In D61665#1496085, @asb wrote:

Yes, I think leaving the current behaviour probably makes sense. I encountered that test failure too and mask that test on the basis that that those builtins aren't documented as being guaranteed to work https://gcc.gnu.org/onlinedocs/gcc/Return-Address.html (that page even notes that crashing is allowable behaviour).

Hi, Alex
Thanks for the information.