This is an archive of the discontinued LLVM Phabricator instance.

[CGSCC] Don't count calls to intrinsic functions in finding potential deviritualizations.
Needs RevisionPublic

Authored by xur on Jul 1 2022, 9:48 AM.

Details

Summary

This patch exclude the direct calls to intrinsic functions that cannot
be an indirect-call target in the heuristics. The debug intrinsics make
the algorithm unstable for builds with and without "-g" option.

We found this issue in the CSPGO compilation where we only use -g
in the CSPGO optimized build (but not in CSPGO instrumentation build).
This issue leads to some profile hash-mismatch.

Here is an example to show the problem.

extern int (*goo)();
static int bar(int n, int n2, int n3) { return n*n + n2 + n3; }
int foo(int sum) {
  int n = bar(2, 0, 0);
  if (n != 4) sum += goo();
  sum += bar(sum, sum, sum);
  return sum;
}

Compile with: clang -O2.
Without -g:
before CGSCC indirect_call=1 direct_call=2
after CGSCC round1: indirect_call=0 direct_call=0 ==> stopped
With -g:
before CGSCC indirect_call=1 direct_call=7
after CGSCC round1: indirect_call=0 direct_call=9 ==> run round2
after CGSCC round2: indirect_call=0 direct_call=9 ==> stopped

The above simple example generates the same IR with extra round of CGSCC
pass. But it's not hard to notice that they can easily leads to
different CodeGen in the real world program, not to mentioning the extra
compile time.

With this patch, -g will have the same behavior as without -g.

Diff Detail

Event Timeline

xur created this revision.Jul 1 2022, 9:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2022, 9:48 AM
xur requested review of this revision.Jul 1 2022, 9:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2022, 9:48 AM
xur edited the summary of this revision. (Show Details)Jul 1 2022, 11:02 AM
nikic requested changes to this revision.Jul 1 2022, 2:08 PM
nikic added reviewers: nikic, aeubanks.
nikic added a subscriber: nikic.

I think the general idea behind this change makes sense. However, we should be treating all intrinsics the same way here (by skipping them), and not special-case assume-like and dbg intrinsics. Intrinsics generally cannot be called indirectly (this is enforced by the IR verifier), so they are irrelevant for the purposes of devirtualization.

This revision now requires changes to proceed.Jul 1 2022, 2:08 PM

Some intrinsics can be indirect-call targets, like memcpy. My first patch
actually ignored all the intrinsics, but it broke some tests.

nikic added a comment.Jul 2 2022, 10:49 AM

memcpy the libcall can be an indirect call target. llvm.memcpy the intrinsic cannot be. If an indirect memcpy becomes direct and is then replaced with the intrinsic, that's a divirtualization, but I don't think it's one relevant for the purposes of DevirtSCC, because such a call cannot be inlined. (Phrased like that, I think the relevant property here is not so much that something is an intrinsic, but that it is a declaration, as declarations are not eligible for inlining.)

xur added a comment.Jul 6 2022, 10:33 AM

Yes. Thanks for clarifying these steps. These actually were what I meant: a
memcpy libcall was devirtualized and replaced with llvm.memcpy intrinsic.
The code I changed in ScanSCC which will perform after the devirtulization
so whether to count matters here.
If we filter all the intrinsics, in some cases, we will not do another
round of CGSCC. I agree with you that we probably don't need it as inline
will not happen. But I need to point out that there are other IPA
optimizations that also happen here -- like propagation of attributes and
some cleanup.

This patch is conversative and won't change any current
optimization behavior.

If you think we should take a more aggressive approach that filters all the
instricisc, I'm all for it too as long as people accept the behavior
change.

xur updated this revision to Diff 442636.Jul 6 2022, 11:00 AM

This is the more aggressive version that filters out all intrinsics.

aeubanks added inline comments.Aug 4 2022, 3:10 PM
llvm/test/Other/cgscc-devirt-iteration.ll
104–105

this comment needs to be updated. I think it makes sense that we don't support this behavior since at this point it's more a phase ordering issue, rather than wanting to rerun the entire pipeline

I also want to change the pipeline to run function-attrs after the function simplification pipeline rather than before it, so this test is a little less relevant

nikic requested changes to this revision.Mar 7 2023, 1:34 AM

This needs a rebase now that D145210 has landed. I think this will remove the regression in cgscc-devirt-iteration.ll and we should be able to move forward with this change.

This revision now requires changes to proceed.Mar 7 2023, 1:34 AM