This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Fix alias checking with potential later stack reuse
AbandonedPublic

Authored by yonghong-song on Oct 9 2020, 10:24 AM.

Download Raw Diff

Details

Reviewers

niravd
courbet
ecnelises

Summary

This is to address the bug:

https://bugs.llvm.org/show_bug.cgi?id=47591

Currently, when selection dag tries to check whether two frame index
accesses alias to each other, it assumes it cannot alias if for two
different frame indices at least one is not fixed. This does not take into
account later frames can be reused if they are disjoint.

This issue is exposed to BPF as it uses ILP scheduler and x86 does
not have issue as it uses Source. Even x86 uses ILP, it does not
expose the issue as its target code is different from BPF.
I have a BPF backend patch to also use Source scheduler

https://reviews.llvm.org/D88525

and it fixed the issue. But I feel this is not the right fix and
it appears to me that the optimized dag does not sound right.

The patch fixed the issue by assuming two frames may alias due to
possible future stack reuse. The fix caused some failures
for X86 target due to different code sequence. Will fix tests
once we got consensus about what is the right fix for this problem.

TODO: fix tests
tested with llvm/tests and found the following failures on x86:

Failed Tests (25):
  LLVM :: CodeGen/X86/2008-05-12-tailmerge-5.ll
  LLVM :: CodeGen/X86/alias-static-alloca.ll
  LLVM :: CodeGen/X86/arg-copy-elide-win64.ll
  LLVM :: CodeGen/X86/atomic-fp.ll
  LLVM :: CodeGen/X86/atomic-mi.ll
  ...

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	50 ms	linux > LLVM.CodeGen/AArch64::arm64-abi-varargs.ll
	80 ms	linux > LLVM.CodeGen/AArch64::arm64-vext.ll
	90 ms	linux > LLVM.CodeGen/AArch64::swifterror.ll
	60 ms	linux > LLVM.CodeGen/ARM::va_arg.ll
	40 ms	linux > LLVM.CodeGen/AVR::alloca.ll
		View Full Test Results (85 Failed)

Event Timeline

yonghong-song created this revision.Oct 9 2020, 10:24 AM

Herald added a reviewer: ecnelises. · View Herald TranscriptOct 9 2020, 10:24 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, pengfei, jfb, hiraditya. · View Herald Transcript

yonghong-song requested review of this revision.Oct 9 2020, 10:24 AM

Harbormaster completed remote builds in B74612: Diff 297275.Oct 9 2020, 10:58 AM

yonghong-song added a subscriber: eli.friedman.Oct 9 2020, 1:06 PM

ping @courbet @niravd, any comments on this patch?

niravd added inline comments.Oct 14 2020, 8:39 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
116	Removing this would be a bit of a shame as it has some nice improvements in other backends. (Also, is it only showing up in BPF or can you replicate this on X86/ARM? I don't have the best knowledge on FrameInfo structures, so we should probably loop in someone who gets that better, but we should see if we can extract disjointed information out of the frame info. At the very least, we should be able to retain that they can't alias if exactly one is fixed. However, the vast majority of improvements from this was catching disjointness of two separate allocations to see if we can keep this.
130	Can you change this so we check the two Index value match no matter what if we show non-aliasing? We really should check the both base and Index if we're going to show non-aliasing. if (((BasePtr0.getIndex() == BasePtr1.getIndex()) && ((IsFI0 != IsFI1) \|\| (IsGV0 != IsGV1) \|\| (IsCV0 != IsCV1)) &&

yonghong-song added inline comments.Oct 15 2020, 12:59 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
116	Removing this would be a bit of a shame as it has some nice improvements in other backends. (Also, is it only showing up in BPF or can you replicate this on X86/ARM? I did not check ARM, but the issue did not show up on X86. X86 implemented enableMachineScheduler() returning true, and this ensures Source scheduler is used for selection dag, createDefaultScheduler(): ... if (OptLevel == CodeGenOpt::None \|\| (ST.enableMachineScheduler() && ST.enableMachineSchedDefaultSched()) \|\| TLI->getSchedulingPreference() == Sched::Source) return createSourceListDAGScheduler(IS, OptLevel); ... For Source dag scheduler, if there are multiple choices, it perfers to follow lexicographic order. So this won't be an issue for X86. ARM and AArch64 has similar override so it also uses Source dag scheduler and it won't have issue either. I have a similar patch for BPF backend to also use Source scheduler: https://reviews.llvm.org/D88525 But I feel that does not fix the root cause, so hence this patch. But Source scheduler should fix the issue for disjoint allocations since even these allocations are reused, Source schedule ensures early load/stores happen before later load/stores... I don't have the best knowledge on FrameInfo structures, so we should probably loop in someone who gets that better, but we should see if we can extract disjointed information out of the frame info. At the very least, we should be able to retain that they can't alias if exactly one is fixed. However, the vast majority of improvements from this was catching disjointness of two separate allocations to see if we can keep this. Yes, the majority cases should be from two separate allocations. But theoretically separate allocations can be reused as long as lifetime start/end are honored for allocations, right? There are some x86 tests failed with my patch, I could spend some time on these failed tests to find patterns so we can try to maintain good cases...
130	Yes, I can do this.

As an alternative, we could update ImproveChain (and visitLIFETIME_END) to limit the aliasing around lifetime_start / end to disallow improving the chain dependence of a mem op node from a different lifetime node that may alias. That should prohibit access from two aliasable frame indices from being concurrent while still allowing us to leverage that disjoint allocs should be disjoint.

In D89149#2333147, @niravd wrote:

As an alternative, we could update ImproveChain (and visitLIFETIME_END) to limit the aliasing around lifetime_start / end to disallow improving the chain dependence of a mem op node from a different lifetime node that may alias. That should prohibit access from two aliasable frame indices from being concurrent while still allowing us to leverage that disjoint allocs should be disjoint.

I am not familiar with selection dag code base. Could you help draft a patch for this? I can help do some testing. Feel free to use my test case at https://reviews.llvm.org/D88525.

ecnelises resigned from this revision.Oct 16 2020, 12:01 AM

dxu added a subscriber: dxu.Nov 30 2020, 11:23 AM

Herald added a subscriber: ecnelises. · View Herald TranscriptNov 30 2020, 11:23 AM

As an alternative, we could update ImproveChain (and visitLIFETIME_END) to limit the aliasing
around lifetime_start / end to disallow improving the chain dependence of a mem op node
from a different lifetime node that may alias. That should prohibit access from two aliasable
frame indices from being concurrent while still allowing us to leverage that disjoint allocs should be disjoint.

@niravd Any update on this?

@niravd Looks like https://reviews.llvm.org/D91833 fixed the issue. I have created a BPF patch https://reviews.llvm.org/D92451 with a test case so if anything happens we can detect the regression. Could you take a look whether https://reviews.llvm.org/D91833 can truely fix the issue?

If D91833 indeed fixed the issue, do you think we can backport it to llvm11?

yonghong-song abandoned this revision.Dec 30 2020, 4:20 PM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

SelectionDAGAddressAnalysis.cpp

28 lines

test/

CodeGen/

BPF/

selectiondag-bug.ll

82 lines

Diff 297275

llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp

Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	IsAlias = !(
// ========PtrDiff========>		// ========PtrDiff========>
(*NumBytes0 <= PtrDiff) \|\|		(*NumBytes0 <= PtrDiff) \|\|
// [----BasePtr0----]		// [----BasePtr0----]
// [---BasePtr1--]		// [---BasePtr1--]
// =====(-PtrDiff)====>		// =====(-PtrDiff)====>
(PtrDiff + NumBytes1 <= 0)); // i.e. NumBytes1 < -PtrDiff.		(PtrDiff + NumBytes1 <= 0)); // i.e. NumBytes1 < -PtrDiff.
return true;		return true;
}		}
// If both BasePtr0 and BasePtr1 are FrameIndexes, we will not be
// able to calculate their relative offset if at least one arises
// from an alloca. However, these allocas cannot overlap and we
// can infer there is no alias.
if (auto *A = dyn_cast<FrameIndexSDNode>(BasePtr0.getBase()))
niravdUnsubmitted Not Done Reply Inline Actions Removing this would be a bit of a shame as it has some nice improvements in other backends. (Also, is it only showing up in BPF or can you replicate this on X86/ARM? I don't have the best knowledge on FrameInfo structures, so we should probably loop in someone who gets that better, but we should see if we can extract disjointed information out of the frame info. At the very least, we should be able to retain that they can't alias if exactly one is fixed. However, the vast majority of improvements from this was catching disjointness of two separate allocations to see if we can keep this. niravd: Removing this would be a bit of a shame as it has some nice improvements in other backends.
yonghong-songAuthorUnsubmitted Done Reply Inline Actions Removing this would be a bit of a shame as it has some nice improvements in other backends. (Also, is it only showing up in BPF or can you replicate this on X86/ARM? I did not check ARM, but the issue did not show up on X86. X86 implemented enableMachineScheduler() returning true, and this ensures Source scheduler is used for selection dag, createDefaultScheduler(): ... if (OptLevel == CodeGenOpt::None \|\| (ST.enableMachineScheduler() && ST.enableMachineSchedDefaultSched()) \|\| TLI->getSchedulingPreference() == Sched::Source) return createSourceListDAGScheduler(IS, OptLevel); ... For Source dag scheduler, if there are multiple choices, it perfers to follow lexicographic order. So this won't be an issue for X86. ARM and AArch64 has similar override so it also uses Source dag scheduler and it won't have issue either. I have a similar patch for BPF backend to also use Source scheduler: https://reviews.llvm.org/D88525 But I feel that does not fix the root cause, so hence this patch. But Source scheduler should fix the issue for disjoint allocations since even these allocations are reused, Source schedule ensures early load/stores happen before later load/stores... I don't have the best knowledge on FrameInfo structures, so we should probably loop in someone who gets that better, but we should see if we can extract disjointed information out of the frame info. At the very least, we should be able to retain that they can't alias if exactly one is fixed. However, the vast majority of improvements from this was catching disjointness of two separate allocations to see if we can keep this. Yes, the majority cases should be from two separate allocations. But theoretically separate allocations can be reused as long as lifetime start/end are honored for allocations, right? There are some x86 tests failed with my patch, I could spend some time on these failed tests to find patterns so we can try to maintain good cases... yonghong-song: > Removing this would be a bit of a shame as it has some nice improvements in other backends.
if (auto *B = dyn_cast<FrameIndexSDNode>(BasePtr1.getBase())) {
MachineFrameInfo &MFI = DAG.getMachineFunction().getFrameInfo();
// If the base are the same frame index but the we couldn't find a
// constant offset, (indices are different) be conservative.
if (A != B && (!MFI.isFixedObjectIndex(A->getIndex()) \|\|
!MFI.isFixedObjectIndex(B->getIndex()))) {
IsAlias = false;
return true;
}
}

bool IsFI0 = isa<FrameIndexSDNode>(BasePtr0.getBase());		bool IsFI0 = isa<FrameIndexSDNode>(BasePtr0.getBase());
bool IsFI1 = isa<FrameIndexSDNode>(BasePtr1.getBase());		bool IsFI1 = isa<FrameIndexSDNode>(BasePtr1.getBase());
bool IsGV0 = isa<GlobalAddressSDNode>(BasePtr0.getBase());		bool IsGV0 = isa<GlobalAddressSDNode>(BasePtr0.getBase());
bool IsGV1 = isa<GlobalAddressSDNode>(BasePtr1.getBase());		bool IsGV1 = isa<GlobalAddressSDNode>(BasePtr1.getBase());
bool IsCV0 = isa<ConstantPoolSDNode>(BasePtr0.getBase());		bool IsCV0 = isa<ConstantPoolSDNode>(BasePtr0.getBase());
bool IsCV1 = isa<ConstantPoolSDNode>(BasePtr1.getBase());		bool IsCV1 = isa<ConstantPoolSDNode>(BasePtr1.getBase());

// If of mismatched base types or checkable indices we can check		// If of mismatched base types or checkable indices we can check
// they do not alias.		// they do not alias except from frame indices where they may
if ((BasePtr0.getIndex() == BasePtr1.getIndex() \|\| (IsFI0 != IsFI1) \|\|		// alias due to later stack reuse.
(IsGV0 != IsGV1) \|\| (IsCV0 != IsCV1)) &&		bool IdxCheck;
		if (dyn_cast<FrameIndexSDNode>(BasePtr0.getBase()) &&
		dyn_cast<FrameIndexSDNode>(BasePtr1.getBase()))
		IdxCheck = false;
		else
		IdxCheck = BasePtr0.getIndex() == BasePtr1.getIndex();

		if ((IdxCheck \|\| (IsFI0 != IsFI1) \|\| (IsGV0 != IsGV1) \|\| (IsCV0 != IsCV1)) &&
		niravdUnsubmitted Not Done Reply Inline Actions Can you change this so we check the two Index value match no matter what if we show non-aliasing? We really should check the both base and Index if we're going to show non-aliasing. if (((BasePtr0.getIndex() == BasePtr1.getIndex()) && ((IsFI0 != IsFI1) \|\| (IsGV0 != IsGV1) \|\| (IsCV0 != IsCV1)) && niravd: Can you change this so we check the two Index value match no matter what if we show non…
		yonghong-songAuthorUnsubmitted Done Reply Inline Actions Yes, I can do this. yonghong-song: Yes, I can do this.
(IsFI0 \|\| IsGV0 \|\| IsCV0) && (IsFI1 \|\| IsGV1 \|\| IsCV1)) {		(IsFI0 \|\| IsGV0 \|\| IsCV0) && (IsFI1 \|\| IsGV1 \|\| IsCV1)) {
IsAlias = false;		IsAlias = false;
return true;		return true;
}		}
return false; // Cannot determine whether the pointers alias.		return false; // Cannot determine whether the pointers alias.
}		}

bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize,		bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize,
▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

llvm/test/CodeGen/BPF/selectiondag-bug.ll

This file was added.

				; RUN: llc -march=bpf < %s \| FileCheck %s
				;
				; The IR is generated from a bpftrace script (https://github.com/iovisor/bpftrace/issues/1305)
				; and then slightly adapted for easy unit testing.
				; The llvm bugzilla link: https://bugs.llvm.org/show_bug.cgi?id=47591

				%printf_t = type { i64, i64 }

				define i64 @"kprobe:blk_update_request"(i8* %0) local_unnamed_addr section "s_kprobe:blk_update_request_1" {
				entry:
				%"struct kernfs_node.parent" = alloca i64, align 8
				%printf_args = alloca %printf_t, align 8
				%"struct cgroup.kn" = alloca i64, align 8
				%"struct cgroup_subsys_state.cgroup" = alloca i64, align 8
				%"struct blkcg_gq.blkcg" = alloca i64, align 8
				%"struct bio.bi_blkg" = alloca i64, align 8
				%"struct request.bio" = alloca i64, align 8
				%1 = getelementptr i8, i8* %0, i64 112
				%2 = bitcast i8* %1 to i64*
				%arg0 = load volatile i64, i64* %2, align 8
				%3 = add i64 %arg0, 56
				%4 = bitcast i64* %"struct request.bio" to i8*
				call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %4)
				%probe_read = call i64 inttoptr (i64 4 to i64 (i64, i32, i64))(i64* nonnull %"struct request.bio", i32 8, i64 %3)
				%5 = load i64, i64* %"struct request.bio", align 8
				call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %4)
				%6 = add i64 %5, 72
				%7 = bitcast i64* %"struct bio.bi_blkg" to i8*
				call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %7)
				%probe_read1 = call i64 inttoptr (i64 5 to i64 (i64, i32, i64))(i64* nonnull %"struct bio.bi_blkg", i32 8, i64 %6)
				%8 = load i64, i64* %"struct bio.bi_blkg", align 8
				call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %7)
				%9 = add i64 %8, 40
				%10 = bitcast i64* %"struct blkcg_gq.blkcg" to i8*
				call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %10)
				%probe_read2 = call i64 inttoptr (i64 6 to i64 (i64, i32, i64))(i64* nonnull %"struct blkcg_gq.blkcg", i32 8, i64 %9)
				%11 = load i64, i64* %"struct blkcg_gq.blkcg", align 8
				call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %10)
				%12 = bitcast i64* %"struct cgroup_subsys_state.cgroup" to i8*
				call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %12)
				%probe_read3 = call i64 inttoptr (i64 7 to i64 (i64, i32, i64))(i64* nonnull %"struct cgroup_subsys_state.cgroup", i32 8, i64 %11)
				%13 = load i64, i64* %"struct cgroup_subsys_state.cgroup", align 8
				call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %12)
				%14 = add i64 %13, 288
				%15 = bitcast i64* %"struct cgroup.kn" to i8*
				call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %15)
				%probe_read4 = call i64 inttoptr (i64 8 to i64 (i64, i32, i64))(i64* nonnull %"struct cgroup.kn", i32 8, i64 %14)
				%16 = load i64, i64* %"struct cgroup.kn", align 8
				call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %15)
				%17 = bitcast %printf_t* %printf_args to i8*
				call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %17)
				%18 = add i64 %16, 8
				%19 = bitcast i64* %"struct kernfs_node.parent" to i8*
				%20 = getelementptr inbounds %printf_t, %printf_t* %printf_args, i64 0, i32 0
				store i64 0, i64* %20, align 8
				call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %19)

				; CHECK: call 8
				; CHECK-NOT: r{{[0-9]+}} = 0
				; CHECK: [[REG3:r[0-9]+]] = (u64 )(r10 - 24)
				; CHECK: [[REG1:r[0-9]+]] = 0
				; CHECK: (u64 )(r10 - 24) = [[REG1]]

				%probe_read5 = call i64 inttoptr (i64 9 to i64 (i64, i32, i64))(i64* nonnull %"struct kernfs_node.parent", i32 8, i64 %18)
				%21 = load i64, i64* %"struct kernfs_node.parent", align 8
				call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %19)
				%22 = getelementptr inbounds %printf_t, %printf_t* %printf_args, i64 0, i32 1
				store i64 %21, i64* %22, align 8
				%get_cpu_id = call i64 inttoptr (i64 18 to i64 ()*)()
				%perf_event_output = call i64 inttoptr (i64 10 to i64 (i8, i64, i64, %printf_t, i64))(i8 %0, i64 2, i64 %get_cpu_id, %printf_t* nonnull %printf_args, i64 16)
				call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %17)
				ret i64 0
				}

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.lifetime.start.p0i8(i64 immarg %0, i8* nocapture %1) #1

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.lifetime.end.p0i8(i64 immarg %0, i8* nocapture %1) #1

				attributes #0 = { nounwind }
				attributes #1 = { argmemonly nounwind willreturn }