This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
-
CGSCCPassManager.cpp
-
test/Other/
-
Other/
1
cgscc-devirt-iteration.ll

Differential D128999

[CGSCC] Don't count calls to intrinsic functions in finding potential deviritualizations.
Needs RevisionPublic

Authored by xur on Jul 1 2022, 9:48 AM.

Download Raw Diff

Details

Reviewers

tejohnson
nikic
aeubanks

Summary

This patch exclude the direct calls to intrinsic functions that cannot
be an indirect-call target in the heuristics. The debug intrinsics make
the algorithm unstable for builds with and without "-g" option.

We found this issue in the CSPGO compilation where we only use -g
in the CSPGO optimized build (but not in CSPGO instrumentation build).
This issue leads to some profile hash-mismatch.

Here is an example to show the problem.

extern int (*goo)();
static int bar(int n, int n2, int n3) { return n*n + n2 + n3; }
int foo(int sum) {
  int n = bar(2, 0, 0);
  if (n != 4) sum += goo();
  sum += bar(sum, sum, sum);
  return sum;
}

Compile with: clang -O2.
Without -g:
before CGSCC indirect_call=1 direct_call=2
after CGSCC round1: indirect_call=0 direct_call=0 ==> stopped
With -g:
before CGSCC indirect_call=1 direct_call=7
after CGSCC round1: indirect_call=0 direct_call=9 ==> run round2
after CGSCC round2: indirect_call=0 direct_call=9 ==> stopped

The above simple example generates the same IR with extra round of CGSCC
pass. But it's not hard to notice that they can easily leads to
different CodeGen in the real world program, not to mentioning the extra
compile time.

With this patch, -g will have the same behavior as without -g.

Diff Detail

Unit TestsFailed

	Time	Test
	60,120 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
	320 ms	x64 debian > BOLT.runtime/X86::user-func-reorder.c

Event Timeline

xur created this revision.Jul 1 2022, 9:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2022, 9:48 AM

Herald added subscribers: wenlei, hiraditya. · View Herald Transcript

xur requested review of this revision.Jul 1 2022, 9:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2022, 9:48 AM

Harbormaster completed remote builds in B173260: Diff 441712.Jul 1 2022, 10:42 AM

xur edited the summary of this revision. (Show Details)Jul 1 2022, 11:02 AM

I think the general idea behind this change makes sense. However, we should be treating all intrinsics the same way here (by skipping them), and not special-case assume-like and dbg intrinsics. Intrinsics generally cannot be called indirectly (this is enforced by the IR verifier), so they are irrelevant for the purposes of devirtualization.

This revision now requires changes to proceed.Jul 1 2022, 2:08 PM

Some intrinsics can be indirect-call targets, like memcpy. My first patch
actually ignored all the intrinsics, but it broke some tests.

memcpy the libcall can be an indirect call target. llvm.memcpy the intrinsic cannot be. If an indirect memcpy becomes direct and is then replaced with the intrinsic, that's a divirtualization, but I don't think it's one relevant for the purposes of DevirtSCC, because such a call cannot be inlined. (Phrased like that, I think the relevant property here is not so much that something is an intrinsic, but that it is a declaration, as declarations are not eligible for inlining.)

Yes. Thanks for clarifying these steps. These actually were what I meant: a
memcpy libcall was devirtualized and replaced with llvm.memcpy intrinsic.
The code I changed in ScanSCC which will perform after the devirtulization
so whether to count matters here.
If we filter all the intrinsics, in some cases, we will not do another
round of CGSCC. I agree with you that we probably don't need it as inline
will not happen. But I need to point out that there are other IPA
optimizations that also happen here -- like propagation of attributes and
some cleanup.

This patch is conversative and won't change any current
optimization behavior.

If you think we should take a more aggressive approach that filters all the
instricisc, I'm all for it too as long as people accept the behavior
change.

This is the more aggressive version that filters out all intrinsics.

Harbormaster completed remote builds in B173945: Diff 442636.Jul 6 2022, 11:42 AM

aeubanks added inline comments.Aug 4 2022, 3:10 PM

llvm/test/Other/cgscc-devirt-iteration.ll
104–105	this comment needs to be updated. I think it makes sense that we don't support this behavior since at this point it's more a phase ordering issue, rather than wanting to rerun the entire pipeline I also want to change the pipeline to run function-attrs after the function simplification pipeline rather than before it, so this test is a little less relevant

This needs a rebase now that D145210 has landed. I think this will remove the regression in cgscc-devirt-iteration.ll and we should be able to move forward with this change.

This revision now requires changes to proceed.Mar 7 2023, 1:34 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptMar 7 2023, 1:34 AM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

CGSCCPassManager.cpp

8 lines

test/

Other/

cgscc-devirt-iteration.ll

9 lines

Diff 442636

llvm/lib/Analysis/CGSCCPassManager.cpp

Show All 13 Lines
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/iterator_range.h"		#include "llvm/ADT/iterator_range.h"
#include "llvm/Analysis/LazyCallGraph.h"		#include "llvm/Analysis/LazyCallGraph.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/IR/PassManagerImpl.h"		#include "llvm/IR/PassManagerImpl.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/TimeProfiler.h"		#include "llvm/Support/TimeProfiler.h"
▲ Show 20 Lines • Show All 357 Lines • ▼ Show 20 Lines	auto ScanSCC = [](LazyCallGraph::SCC &C,
SmallDenseMap<Function *, CallCount> CallCounts;		SmallDenseMap<Function *, CallCount> CallCounts;
CallCount CountLocal = {0, 0};		CallCount CountLocal = {0, 0};
for (LazyCallGraph::Node &N : C) {		for (LazyCallGraph::Node &N : C) {
CallCount &Count =		CallCount &Count =
CallCounts.insert(std::make_pair(&N.getFunction(), CountLocal))		CallCounts.insert(std::make_pair(&N.getFunction(), CountLocal))
.first->second;		.first->second;
for (Instruction &I : instructions(N.getFunction()))		for (Instruction &I : instructions(N.getFunction()))
if (auto *CB = dyn_cast<CallBase>(&I)) {		if (auto *CB = dyn_cast<CallBase>(&I)) {
		// Filter out the intrinsic functions as they cannot can be the
		// indirect-call target.
		// This avoids different compilation behavior for the same source
		// with and without -g option because of the DbgInfoInstrinsic.
		if (isa<IntrinsicInst>(&I)) {
		continue;
		}
if (CB->getCalledFunction()) {		if (CB->getCalledFunction()) {
++Count.Direct;		++Count.Direct;
} else {		} else {
++Count.Indirect;		++Count.Indirect;
CallHandles.insert({CB, WeakTrackingVH(CB)});		CallHandles.insert({CB, WeakTrackingVH(CB)});
}		}
}		}
}		}
▲ Show 20 Lines • Show All 834 Lines • Show Last 20 Lines

llvm/test/Other/cgscc-devirt-iteration.ll

	; The CGSCC pass manager includes an SCC iteration utility that tracks indirect			; The CGSCC pass manager includes an SCC iteration utility that tracks indirect
	; calls that are turned into direct calls (devirtualization) and re-visits the			; calls that are turned into direct calls (devirtualization) and re-visits the
	; SCC to expose those calls to the SCC-based IPO passes. We trigger			; SCC to expose those calls to the SCC-based IPO passes. We trigger
	; devirtualization here with GVN which forwards a store through a load and to			; devirtualization here with GVN which forwards a store through a load and to
	; an indirect call.			; an indirect call.
	;			;
	; RUN: opt -aa-pipeline=basic-aa -passes='module(inferattrs),cgscc(function-attrs,function(gvn,instcombine))' -S < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=BEFORE			; RUN: opt -aa-pipeline=basic-aa -passes='module(inferattrs),cgscc(function-attrs,function(gvn,instcombine))' -S < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=BEFORE
	; RUN: opt -aa-pipeline=basic-aa -passes='module(inferattrs),cgscc(devirt<1>(function-attrs,function(gvn,instcombine)))' -S < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=AFTER --check-prefix=AFTER1			; RUN: opt -aa-pipeline=basic-aa -passes='module(inferattrs),cgscc(devirt<1>(function-attrs,function(gvn,instcombine)))' -S < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=AFTER --check-prefix=AFTER1 --check-prefix=AFTER12
	; RUN: opt -aa-pipeline=basic-aa -passes='module(inferattrs),cgscc(devirt<2>(function-attrs,function(gvn,instcombine)))' -S < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=AFTER --check-prefix=AFTER2			; RUN: opt -aa-pipeline=basic-aa -passes='module(inferattrs),cgscc(devirt<2>(function-attrs,function(gvn,instcombine)))' -S < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=AFTER --check-prefix=AFTER2 --check-prefix=AFTER12
	;			;
	; RUN: not --crash opt -abort-on-max-devirt-iterations-reached -aa-pipeline=basic-aa -passes='module(inferattrs),cgscc(devirt<1>(function-attrs,function(gvn,instcombine)))' -S < %s			; RUN: not --crash opt -abort-on-max-devirt-iterations-reached -aa-pipeline=basic-aa -passes='module(inferattrs),cgscc(devirt<1>(function-attrs,function(gvn,instcombine)))' -S < %s
	; RUN: opt -abort-on-max-devirt-iterations-reached -aa-pipeline=basic-aa -passes='module(inferattrs),cgscc(devirt<2>(function-attrs,function(gvn,instcombine)))' -S < %s			; RUN: opt -abort-on-max-devirt-iterations-reached -aa-pipeline=basic-aa -passes='module(inferattrs),cgscc(devirt<2>(function-attrs,function(gvn,instcombine)))' -S < %s
	;			;
	; We also verify that the real O2 pipeline catches these cases.			; We also verify that the real O2 pipeline catches these cases.
	; RUN: opt -aa-pipeline=basic-aa -passes='default<O2>' -S < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=AFTER --check-prefix=AFTER2			; RUN: opt -aa-pipeline=basic-aa -passes='default<O2>' -S < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=AFTER --check-prefix=AFTER2 --check-prefix=AFTERO2

	declare void @readnone() readnone			declare void @readnone() readnone
	; CHECK: Function Attrs: nofree nosync readnone			; CHECK: Function Attrs: nofree nosync readnone
	; CHECK-NEXT: declare void @readnone()			; CHECK-NEXT: declare void @readnone()

	declare void @unknown()			declare void @unknown()
	; CHECK-NOT: Function Attrs			; CHECK-NOT: Function Attrs
	; CHECK-LABEL: declare void @unknown(){{ *$}}			; CHECK-LABEL: declare void @unknown(){{ *$}}
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; AFTER: call void @readnone()			; AFTER: call void @readnone()

	ret void			ret void
	}			}

	declare i8* @memcpy(i8, i8, i64)			declare i8* @memcpy(i8, i8, i64)
	; CHECK-LABEL: i8* @memcpy(			; CHECK-LABEL: i8* @memcpy(

	; The @test3 function checks that when we refine an indirect call to an			; The @test3 function checks that when we refine an indirect call to an
	; intrinsic we still revisit the SCC pass. This also covers cases where the			; intrinsic we still revisit the SCC pass. This also covers cases where the
				aeubanksUnsubmitted Not Done Reply Inline Actions this comment needs to be updated. I think it makes sense that we don't support this behavior since at this point it's more a phase ordering issue, rather than wanting to rerun the entire pipeline I also want to change the pipeline to run function-attrs after the function simplification pipeline rather than before it, so this test is a little less relevant aeubanks: this comment needs to be updated. I think it makes sense that we don't support this behavior…
	; value handle itself doesn't persist due to the nature of how instcombine			; value handle itself doesn't persist due to the nature of how instcombine
	; creates the memcpy intrinsic call, and we rely on the count of indirect calls			; creates the memcpy intrinsic call, and we rely on the count of indirect calls
	; decreasing and the count of direct calls increasing.			; decreasing and the count of direct calls increasing.
	; Adding 'noinline' attribute to force attributes for improved matching.			; Adding 'noinline' attribute to force attributes for improved matching.
	define void @test3(i8* %src, i8* %dest, i64 %size) noinline {			define void @test3(i8* %src, i8* %dest, i64 %size) noinline {
	; CHECK: Function Attrs			; CHECK: Function Attrs
	; CHECK-NOT: read			; CHECK-NOT: read
	; CHECK-SAME: noinline			; CHECK-SAME: noinline
	; BEFORE-LABEL: define void @test3(i8* %src, i8* %dest, i64 %size)			; BEFORE-LABEL: define void @test3(i8* %src, i8* %dest, i64 %size)
	; AFTER-LABEL: define void @test3(i8* nocapture readonly %src, i8* nocapture writeonly %dest, i64 %size)			; AFTER12-LABEL: define void @test3(i8* %src, i8* %dest, i64 %size)
				; AFTERO2-LABEL: define void @test3(i8* nocapture readonly %src, i8* nocapture writeonly %dest, i64 %size)
	%fptr = alloca i8* (i8, i8, i64)*			%fptr = alloca i8* (i8, i8, i64)*
	store i8* (i8, i8, i64)* @memcpy, i8* (i8, i8, i64)** %fptr			store i8* (i8, i8, i64)* @memcpy, i8* (i8, i8, i64)** %fptr
	%f = load i8* (i8, i8, i64), i8 (i8, i8, i64)** %fptr			%f = load i8* (i8, i8, i64), i8 (i8, i8, i64)** %fptr
	call i8* %f(i8* %dest, i8* %src, i64 %size)			call i8* %f(i8* %dest, i8* %src, i64 %size)
	; CHECK: call void @llvm.memcpy			; CHECK: call void @llvm.memcpy
	ret void			ret void
	}			}

	Show All 11 Lines