Inlined branch funnels fail expectation of LowerTypeTestsModule::lower()
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Probably a better fix would be to teach the inliner not to inline functions containing musttail calls if the musttail call involves an intrinsic.
Yes, AFAIK the only intrinsic that supports musttail is llvm.icall.branch.funnel, and inlining is invalid for it.
Is inlining incorrect in general, for any future musttail on intrinsic? If not it sounds as incorrect overgeneralization by single instance.
We will need to allow inlining on a case by case basis for new intrinsics, as we won't necessarily know whether the intrinsic will be able to tell which arguments are for the intrinsic and which are for the called function.
llvm/test/Transforms/WholeProgramDevirt/branch-funnel.ll | ||
---|---|---|
6 ↗ | (On Diff #140519) | I've added inliner test, but I'd keeps this one as well to make sure that other optimizations do not affect BFs |
You also need to update llvm::isInlineViable.
llvm/lib/Analysis/InlineCost.cpp | ||
---|---|---|
1229 ↗ | (On Diff #140695) | Leftover code from testing? |
Branch funnels can look look like this:
define hidden void @__typeid_typeid1_0_branch_funnel(i8* nest, ...) { musttail call void (...) @llvm.icall.branch.funnel(i8* %0, i8* bitcast ([1 x i8*]* @vt1_1 to i8*), i32 (i8*, i32)* @vf1_1, i8* bitcast ([1 x i8*]* @vt1_2 to i8*), i32 (i8*, i32)* @vf1_2, ...) ret void }
If we inline it then LowerTypeTests.cpp (near line 1806) will not be able to separate call targets from target arguments (here ...).
Thanks, I looked, and now I understand better what's going on. These are thunks, and we do in fact want the argument forwarding behavior of a variadic musttail call.
I guess one thing that is weird about @llvm.icall.branch.funnel is that it doesn't quite obey the musttail verifier rules, which are supposed to require that the prototype of the caller matches the prototype of the callee. This ensures that we never have to adjust the location of the return address on the stack when emitting the tail call. Maybe I forgot to check intrinsics, or the arguments that are part of the variadic pack in the verifier.
Have you considered passing these extra parameters as an operand bundle, or do those not support variable numbers of operands?
I think I actually prefer the original noinline marker patch to changing the inliner. @pcc, what do you think?
Maybe I forgot to check intrinsics, or the arguments that are part of the variadic pack in the verifier.
I had to disable the verifier check for intrinsics because the check is not appropriate for all intrinsics, specifically this one.
Have you considered passing these extra parameters as an operand bundle, or do those not support variable numbers of operands?
Something like that might work, but even if we were to represent the target list like that, that wouldn't be enough, because the backend would also need to know how to emit the other arguments as if they were being passed as part of a regular call.
That is, not being able to inline arguments into an intrinsic is effectively a backend limitation that the inliner ought to respect. If that changes, we can change the inliner as well.
I think I see what you're saying. Because this intrinsic doesn't go through the normal ISel call lowering code paths, it doesn't have any logic for getting the rest of the arguments materialized into registers or stack slots. The musttail marking is used primarily to make sure that IR passes don't screw things up and move the intrinsic call out of tail position.
The current lowering seems a bit dangerous, because if some pass inserted some computation (__fentry calls maybe?) to a branch funnel function, you haven't created virtual registers connecting the incoming argument registers to the outgoing argument registers used by the tail call, and they won't be spilled and preserved. I think fixing that is just a matter of doing this stuff when you make your TAILJMP64d instructions:
if (isVarArg && IsMustTail) { const auto &Forwards = X86Info->getForwardedMustTailRegParms(); for (const auto &F : Forwards) { SDValue Val = DAG.getCopyFromReg(Chain, dl, F.VReg, F.VT); RegsToPass.push_back(std::make_pair(unsigned(F.PReg), Val)); } }
... except you'd add MI copies and operands instead of DAG nodes.
I think we shouldn't overgeneralize here. In the future, we can always add new intrinsics which don't work when we inline them. See llvm.local.escape. This seems like one more thing that we just don't know how to inline.
I think I would generalize HasFrameEscape to HasUninlineableIntrinsic and set that if you see a musttail call of this intrinsic. For example, I can imagine someone wanting to do guaranteed tail calls with some kind of statepoint intrinsic. We shouldn't block inlining that.
Are you sure that would work? The pass runs after RA.
I think we shouldn't overgeneralize here. In the future, we can always add new intrinsics which don't work when we inline them. See llvm.local.escape. This seems like one more thing that we just don't know how to inline.
I think I would generalize HasFrameEscape to HasUninlineableIntrinsic and set that if you see a musttail call of this intrinsic. For example, I can imagine someone wanting to do guaranteed tail calls with some kind of statepoint intrinsic. We shouldn't block inlining that.
That sounds fine to me.
Good point. You might want X86ISelLowering to copy these registers into and out of the ICALL_BRANCH_FUNNEL node, though.