This is an archive of the discontinued LLVM Phabricator instance.

[Inline][X86] Avoid inlining if it would create ABI-incompatible calls (PR52660)
ClosedPublic

Authored by nikic on Dec 20 2021, 5:45 AM.

Details

Summary

X86 allows inlining functions if the callee target features are a subset of the caller target features. This ensures that we don't inline something into a caller that does not support it.

However, this does not account for possible call ABI mismatches as a result of inlining. If a call passing a vector argument was originally in -avx function, calling another -avx function, the vector is passed in xmm. If we now inline it into an +avx function, then it will be passed in ymm, even though the callee expects it in xmm.

Fix this by scanning over all calls in the function and checking whether ABI incompatibility is possible. Calls that only pass scalar types are excluded, as I believe those always use the same ABI independent of target features (right?)

Fixes https://github.com/llvm/llvm-project/issues/52660.

Diff Detail

Event Timeline

nikic created this revision.Dec 20 2021, 5:45 AM
nikic requested review of this revision.Dec 20 2021, 5:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 20 2021, 5:45 AM

The fix seems good. Just two concerns here:

  • I wonder what's the algorithm when inlining. Given we have a calling chain A->B->C. If we can inline both B and C into A, is B inlined firstly to A, then C to AB, or C inlined firstly to B, then BC to A? I'd worry if we are pessimistic for the former case.
  • Another concern is the compiling time. Scanning seems inevitable, but can we avoid to repeat scanning if the inlined function is called in many place?
nikic added a comment.Dec 22 2021, 1:42 AM

The fix seems good. Just two concerns here:

  • I wonder what's the algorithm when inlining. Given we have a calling chain A->B->C. If we can inline both B and C into A, is B inlined firstly to A, then C to AB, or C inlined firstly to B, then BC to A? I'd worry if we are pessimistic for the former case.

With few exceptions, inlining happens bottom-up, so B->C will be inlined first and then A->BC. So calls in the callee will already be eliminated, unless they can't be inlined for some reason (noinline, cost etc).

  • Another concern is the compiling time. Scanning seems inevitable, but can we avoid to repeat scanning if the inlined function is called in many place?

I think this will not have much practical impact mainly because this kind of target feature mismatch is rare. Usually target features on all functions are the same. Inlining already does multiple scans of the callee, so at least this doesn't change anything asymptotically.

pengfei accepted this revision.Dec 22 2021, 6:03 AM

Thanks for the explanation! LGTM, but please leave some time for others comments.

This revision is now accepted and ready to land.Dec 22 2021, 6:03 AM
This revision was landed with ongoing or failed builds.Dec 27 2021, 12:36 AM
This revision was automatically updated to reflect the committed changes.