This is an archive of the discontinued LLVM Phabricator instance.

[Backend][X86] Improved tail call optimization for functions marked as musttail
Needs ReviewPublic

Authored by huangjd on Aug 2 2022, 5:12 PM.

Details

Summary

Now check if a function call marked as musttail also qualifies as a sibcall,
which means on-stack arguments are already in place before handling the
tail call function.

This also suppress a bug when a musttail call has struct arguments
passed on the stack, the output code results in return address
overwriting. See https://github.com/llvm/llvm-project/issues/56891

Diff Detail

Event Timeline

huangjd created this revision.Aug 2 2022, 5:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2022, 5:12 PM
huangjd requested review of this revision.Aug 2 2022, 5:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2022, 5:12 PM
davidxl added inline comments.Aug 3 2022, 9:23 AM
llvm/test/CodeGen/X86/musttail-struct.ll
42

is this the correct comment, or it is intended for a negative test (missing)?

huangjd added inline comments.Aug 3 2022, 11:26 AM
llvm/test/CodeGen/X86/musttail-struct.ll
42

This is just describing the purpose of the test. in x64 the first 6 are passed by register and the rest 2 are passed by stack

davidxl added inline comments.Aug 3 2022, 1:54 PM
llvm/test/CodeGen/X86/musttail-struct.ll
42

Ok. Can you expand the comment a little more.

Can you also add a negative test?

huangjd updated this revision to Diff 449841.Aug 3 2022, 5:33 PM

Updated comments

huangjd added inline comments.Aug 3 2022, 5:41 PM
llvm/test/CodeGen/X86/musttail-struct.ll
42

I am not sure what should a negative test be in this case, since the front end rejects the code if "musttail" is applied on function call that can't be tail called. Also if tail call is not applied, the output assembly can be quite unpredictable (if someone else change anything else in codegen) since instructions can be shuffled. I have yet seen a negative test with tail call optimization

davidxl added inline comments.Aug 4 2022, 10:38 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
4268

Can this be handled by line 4274?

4311–4312

what is this change for?

llvm/test/CodeGen/X86/musttail-struct.ll
42

An IR test can be written manually. This is to test line 4262: IsEligibleForTailCallOptimization() logic for mustTail case.

huangjd updated this revision to Diff 450107.Aug 4 2022, 12:55 PM

Restructured logic to determine isSibcall

huangjd updated this revision to Diff 450109.Aug 4 2022, 12:57 PM

Added comments

llvm/lib/Target/X86/X86ISelLowering.cpp
4311–4312

Sibcall should not adjust stack because all arguments are already in place. This is previously causing the bug I posted on github

efriedma added a comment.EditedAug 4 2022, 2:09 PM

Example of copying arguments in memory for musttail follows (which is still broken with this patch). Was leaving this broken intentional?

struct A {
    int a[5];
};
extern int G(A a, A b);
int F(A a, A b) {
    [[clang::musttail]] return G(b, a);
}
huangjd added a comment.EditedAug 9 2022, 12:38 PM

@efriedma This is a limitation of the current patch that it could only handle the case with arguments in matching order. After some investigation I found that the current function call lowering pass cannot handle in-place stack argument shuffling at all, as it doesn't take read/write dependency into account. It would require an extensive refactoring to completely fix it.

In addition, shuffling large stack arguments to force a tail call can get quite complicated, for instance, if we have F(x0, x1, x2, ... xn) calling G(x1, x2, ... xn, x0) where every argument is the same type and is passed by stack, in order to do an in-place argument swapping without further stack spill, the optimal code should be something like

for i in 0..k:
  tmpreg = x0.word_i
  x0.word_i = x1.word_i
  ...
  xn.word_i = tmpreg

where x0.word_i, ..., xn.word_i are cache-line sized partitions of the arguments x0, ..., xn.

The current patch is able to handle two common real world scenario of tail calls (a) a function that simply delegates to another function without doing anything else, and (b) recursive tail call where only the induction variable passed by register changes.