This is an archive of the discontinued LLVM Phabricator instance.

[Analysis] fold load of untouched alloca to undef
AbandonedPublic

Authored by spatel on Feb 18 2019, 2:50 PM.

Download Raw Diff

Details

Reviewers

efriedma
fhahn
reames

Summary

This is not stated explicitly in the LangRef, but loading directly from an alloca should always fold to undef?
We already do this fold more generally in GVN::AnalyzeLoadAvailability() (but apparently not in NewGVN), so I'm assuming it's just an oversight that it was not included in FindAvailableLoadedValue().

The diffs here result from calling FindAvailableLoadedValue() from instcombine's visitLoadInst(). I tried to salvage existing regression tests to still provide coverage for their original bugs by removing the now undef alloca+load patterns.

Diff Detail

Event Timeline

spatel created this revision.Feb 18 2019, 2:50 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 18 2019, 2:50 PM

Herald added subscribers: hiraditya, eraman, Prazek, mcrosier. · View Herald Transcript

Yes, we fold loads from alloca to undef. (In GVN like you mention, but also in mem2reg.) LangRef should state memory allocated with alloca is uninitialized, and that loading from uninitialized memory produces undef; if either of those is missing, patch welcome to fix it.

That said, this seems like the wrong direction, in terms of where we want to perform this sort of optimization. Most functions have more than one basic block, and you won't catch any of those cases. EarlyCSE/GVN/NewGVN should be able to handle this case, and similar cases where there's more than one basic block. Also, FindAvailablePtrLoadStore is basically only used from two places: JumpThreading, and InstCombine. InstCombine really shouldn't be doing this sort of scan, and JumpThreading obviously only triggers in functions with more than one BB.

In D58359#1402730, @efriedma wrote:

Yes, we fold loads from alloca to undef. (In GVN like you mention, but also in mem2reg.) LangRef should state memory allocated with alloca is uninitialized, and that loading from uninitialized memory produces undef; if either of those is missing, patch welcome to fix it.

That said, this seems like the wrong direction, in terms of where we want to perform this sort of optimization. Most functions have more than one basic block, and you won't catch any of those cases. EarlyCSE/GVN/NewGVN should be able to handle this case, and similar cases where there's more than one basic block. Also, FindAvailablePtrLoadStore is basically only used from two places: JumpThreading, and InstCombine. InstCombine really shouldn't be doing this sort of scan, and JumpThreading obviously only triggers in functions with more than one BB.

Thanks. I agree that the load optimizations from instcombine are iffy. I assumed that was there as a cheap first cut of optimization to save time for other passes, but it might be doing the opposite for overall compile-time at this point. I only noticed this because an unrelated instcombine change showed up as a possible regression on 1 of the fuzzer-reduced tests, but it sounds like I should ignore that.

I'll add some text to the LangRef.

spatel mentioned this in rL354394: [LangRef] add to description of alloca instruction.Feb 19 2019, 2:36 PM

spatel mentioned this in rGb6bc11d4067d: [LangRef] add to description of alloca instruction.

Abandoning. Added a sentence to the LangRef here:
rL354394

JFYI, I disagree w/the sentiment expressed that we shouldn't do obvious memory optimizations in InstCombine. InstCombine does local peephole optimizations *including memory optimizations*. Even if I accept the stated goal that it *shouldn't*, today it *does* and there's no reason to reject this patch.

Now, we should *also* handle this transform in GVN/DSE/etc..., but that's a different point.

spatel mentioned this in rL354748: [InstCombine] canonicalize add/sub with bool.Feb 24 2019, 9:01 AM

spatel mentioned this in rG9907d3c8b4ac: [InstCombine] canonicalize add/sub with bool.

Revision Contents

Path

Size

llvm/

lib/

Analysis/

Loads.cpp

6 lines

test/

Transforms/

Inline/

byval-tail-call.ll

11 lines

InstCombine/

and-or-icmps.ll

67 lines

apint-shift.ll

2 lines

getelementptr.ll

26 lines

multiple-uses-load-bitcast-select.ll

39 lines

Diff 187274

llvm/lib/Analysis/Loads.cpp

Show First 20 Lines • Show All 368 Lines • ▼ Show 20 Lines	while (ScanFrom != ScanBB->begin()) {
if (NumScanedInst)		if (NumScanedInst)
++(*NumScanedInst);		++(*NumScanedInst);

// Don't scan huge blocks.		// Don't scan huge blocks.
if (MaxInstsToScan-- == 0)		if (MaxInstsToScan-- == 0)
return nullptr;		return nullptr;

--ScanFrom;		--ScanFrom;

		// Loading from uninitialized stack memory? The loaded value is undefined.
		// TODO: We could do a similar check for a load from a malloc-like function.
		if (isa<AllocaInst>(Inst) && Inst == StrippedPtr)
		return UndefValue::get(AccessTy);

// If this is a load of Ptr, the loaded value is available.		// If this is a load of Ptr, the loaded value is available.
// (This is true even if the load is volatile or atomic, although		// (This is true even if the load is volatile or atomic, although
// those cases are unlikely.)		// those cases are unlikely.)
if (LoadInst *LI = dyn_cast<LoadInst>(Inst))		if (LoadInst *LI = dyn_cast<LoadInst>(Inst))
if (AreEquivalentAddressValues(		if (AreEquivalentAddressValues(
LI->getPointerOperand()->stripPointerCasts(), StrippedPtr) &&		LI->getPointerOperand()->stripPointerCasts(), StrippedPtr) &&
CastInst::isBitOrNoopPointerCastable(LI->getType(), AccessTy, DL)) {		CastInst::isBitOrNoopPointerCastable(LI->getType(), AccessTy, DL)) {

▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/test/Transforms/Inline/byval-tail-call.ll

	Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; CHECK: store i32 %[[VAL]], i32* %[[POS]]			; CHECK: store i32 %[[VAL]], i32* %[[POS]]
	; CHECK: tail call void @ext2(i32* byval nonnull %[[POS]]			; CHECK: tail call void @ext2(i32* byval nonnull %[[POS]]
	; CHECK: ret void			; CHECK: ret void
	tail call void @bar2(i32* byval %x)			tail call void @bar2(i32* byval %x)
	ret void			ret void
	}			}

	define void @barfoo() {			define void @barfoo() {
	; CHECK-LABEL: define void @barfoo(			; CHECK-LABEL: @barfoo(
	; CHECK: %[[POS:.*]] = alloca i32			; CHECK-NEXT: [[X1:%.*]] = alloca i32, align 4
	; CHECK: %[[VAL:.]] = load i32, i32 %x			; CHECK: tail call void @ext2(i32* byval nonnull [[X1]])
	; CHECK: store i32 %[[VAL]], i32* %[[POS]]
	; CHECK: tail call void @ext2(i32* byval nonnull %[[POS]]
	; CHECK: ret void			; CHECK: ret void
				;
	%x = alloca i32			%x = alloca i32
	tail call void @bar2(i32* byval %x)			tail call void @bar2(i32* byval %x)
	ret void			ret void
	}			}

llvm/test/Transforms/InstCombine/and-or-icmps.ll

	Show First 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret <2 x i1> [[TMP2]]			; CHECK-NEXT: ret <2 x i1> [[TMP2]]
	;			;
	%cmp1 = icmp ne <2 x i32> %x, <i32 40, i32 40>			%cmp1 = icmp ne <2 x i32> %x, <i32 40, i32 40>
	%cmp2 = icmp ne <2 x i32> %x, <i32 39, i32 39>			%cmp2 = icmp ne <2 x i32> %x, <i32 39, i32 39>
	%and = and <2 x i1> %cmp1, %cmp2			%and = and <2 x i1> %cmp1, %cmp2
	ret <2 x i1> %and			ret <2 x i1> %and
	}			}

	; This is a fuzzer-generated test that would assert because			; This was a fuzzer-generated test that would assert because
	; we'd get into foldAndOfICmps() without running InstSimplify			; we'd get into foldAndOfICmps() without running InstSimplify
	; on an 'and' that should have been killed. It's not obvious			; on an 'and' that should have been killed. It's not obvious
	; why, but removing anything hides the bug, hence the long test.			; why, but removing anything hides the bug, hence the long test.

	define void @simplify_before_foldAndOfICmps() {			define void @simplify_before_foldAndOfICmps() {
	; CHECK-LABEL: @simplify_before_foldAndOfICmps(			; CHECK-LABEL: @simplify_before_foldAndOfICmps(
	; CHECK-NEXT: [[A8:%.*]] = alloca i16, align 2			; CHECK-NEXT: store i1 true, i1* undef, align 1
	; CHECK-NEXT: [[L7:%.]] = load i16, i16 [[A8]], align 2			; CHECK-NEXT: store i1* null, i1** undef, align 8
	; CHECK-NEXT: [[C10:%.*]] = icmp ult i16 [[L7]], 2			; CHECK-NEXT: ret void
	; CHECK-NEXT: [[C7:%.*]] = icmp slt i16 [[L7]], 0			;
	; CHECK-NEXT: [[C18:%.*]] = or i1 [[C7]], [[C10]]			%A8 = alloca i16
	; CHECK-NEXT: [[L7_LOBIT:%.*]] = ashr i16 [[L7]], 15			%L7 = load i16, i16* %A8
	; CHECK-NEXT: [[TMP1:%.*]] = sext i16 [[L7_LOBIT]] to i64			%G21 = getelementptr i16, i16* %A8, i8 -1
	; CHECK-NEXT: [[G26:%.]] = getelementptr i1, i1 null, i64 [[TMP1]]			%B11 = udiv i16 %L7, -1
				%G4 = getelementptr i16, i16* %A8, i16 %B11
				%L2 = load i16, i16* %G4
				%L = load i16, i16* %G4
				%B23 = mul i16 %B11, %B11
				%L4 = load i16, i16* %A8
				%B21 = sdiv i16 %L7, %L4
				%B7 = sub i16 0, %B21
				%B18 = mul i16 %B23, %B7
				%C10 = icmp ugt i16 %L, %B11
				%B20 = and i16 %L7, %L2
				%B1 = mul i1 %C10, true
				%C5 = icmp sle i16 %B21, %L
				%C11 = icmp ule i16 %B21, %L
				%C7 = icmp slt i16 %B20, 0
				%B29 = srem i16 %L4, %B18
				%B15 = add i1 %C7, %C10
				%B19 = add i1 %C11, %B15
				%C6 = icmp sge i1 %C11, %B19
				%B33 = or i16 %B29, %L4
				%C13 = icmp uge i1 %C5, %B1
				%C3 = icmp ult i1 %C13, %C6
				store i16 undef, i16* %G21
				%C18 = icmp ule i1 %C10, %C7
				%G26 = getelementptr i1, i1* null, i1 %C3
				store i16 %B33, i16* undef
				store i1 %C18, i1* undef
				store i1* %G26, i1** undef
				ret void
				}

				define void @simplify_before_foldAndOfICmps_alt(i16* %A8) {
				; CHECK-LABEL: @simplify_before_foldAndOfICmps_alt(
				; CHECK-NEXT: [[L7:%.]] = load i16, i16 [[A8:%.*]], align 2
				; CHECK-NEXT: [[G4:%.]] = getelementptr i16, i16 [[A8]], i64 1
				; CHECK-NEXT: [[L2:%.]] = load i16, i16 [[G4]], align 2
				; CHECK-NEXT: [[C10:%.*]] = icmp ugt i16 [[L2]], 1
				; CHECK-NEXT: [[B20:%.*]] = and i16 [[L7]], [[L2]]
				; CHECK-NEXT: [[C5:%.*]] = icmp slt i16 [[L2]], 1
				; CHECK-NEXT: [[C11:%.*]] = icmp ne i16 [[L2]], 0
				; CHECK-NEXT: [[C7:%.*]] = icmp slt i16 [[B20]], 0
				; CHECK-NEXT: [[B15:%.*]] = xor i1 [[C7]], [[C10]]
				; CHECK-NEXT: [[B19:%.*]] = xor i1 [[C11]], [[B15]]
				; CHECK-NEXT: [[TMP1:%.*]] = and i1 [[C10]], [[C5]]
				; CHECK-NEXT: [[C3:%.*]] = and i1 [[B19]], [[TMP1]]
				; CHECK-NEXT: [[TMP2:%.*]] = xor i1 [[C10]], true
				; CHECK-NEXT: [[C18:%.*]] = or i1 [[C7]], [[TMP2]]
				; CHECK-NEXT: [[TMP3:%.*]] = sext i1 [[C3]] to i64
				; CHECK-NEXT: [[G26:%.]] = getelementptr i1, i1 null, i64 [[TMP3]]
	; CHECK-NEXT: store i16 [[L7]], i16* undef, align 2			; CHECK-NEXT: store i16 [[L7]], i16* undef, align 2
	; CHECK-NEXT: store i1 [[C18]], i1* undef, align 1			; CHECK-NEXT: store i1 [[C18]], i1* undef, align 1
	; CHECK-NEXT: store i1* [[G26]], i1** undef, align 8			; CHECK-NEXT: store i1* [[G26]], i1** undef, align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%A8 = alloca i16
	%L7 = load i16, i16* %A8			%L7 = load i16, i16* %A8
	%G21 = getelementptr i16, i16* %A8, i8 -1			%G21 = getelementptr i16, i16* %A8, i8 -1
	%B11 = udiv i16 %L7, -1			%B11 = udiv i16 %L7, -1
	%G4 = getelementptr i16, i16* %A8, i16 %B11			%G4 = getelementptr i16, i16* %A8, i16 %B11
	%L2 = load i16, i16* %G4			%L2 = load i16, i16* %G4
	%L = load i16, i16* %G4			%L = load i16, i16* %G4
	%B23 = mul i16 %B11, %B11			%B23 = mul i16 %B11, %B11
	%L4 = load i16, i16* %A8			%L4 = load i16, i16* %A8
	Show All 25 Lines

llvm/test/Transforms/InstCombine/apint-shift.ll

Show First 20 Lines • Show All 525 Lines • ▼ Show 20 Lines	;
%D = shl i40 %C, 1		%D = shl i40 %C, 1
ret i40 %D		ret i40 %D
}		}

; OSS-Fuzz #9880		; OSS-Fuzz #9880
; https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9880		; https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9880
define i177 @ossfuzz_9880(i177 %X) {		define i177 @ossfuzz_9880(i177 %X) {
; CHECK-LABEL: @ossfuzz_9880(		; CHECK-LABEL: @ossfuzz_9880(
; CHECK-NEXT: ret i177 1		; CHECK-NEXT: ret i177 undef
;		;
%A = alloca i177		%A = alloca i177
%L1 = load i177, i177* %A		%L1 = load i177, i177* %A
%B = or i177 0, -1		%B = or i177 0, -1
%B5 = udiv i177 %L1, %B		%B5 = udiv i177 %L1, %B
%B4 = add i177 %B5, %B		%B4 = add i177 %B5, %B
%B2 = add i177 %B, %B4		%B2 = add i177 %B, %B4
%B6 = mul i177 %B5, %B2		%B6 = mul i177 %B5, %B2
%B20 = shl i177 %L1, %B6		%B20 = shl i177 %L1, %B6
%B14 = sub i177 %B20, %B5		%B14 = sub i177 %B20, %B5
%B1 = udiv i177 %B14, %B6		%B1 = udiv i177 %B14, %B6
ret i177 %B1		ret i177 %B1
}		}

llvm/test/Transforms/InstCombine/getelementptr.ll

Show First 20 Lines • Show All 353 Lines • ▼ Show 20 Lines	define i32 @test20_as1(i32 addrspace(1)* %P, i32 %A, i32 %B) {
%tmp.4 = getelementptr inbounds i32, i32 addrspace(1)* %P, i32 %A		%tmp.4 = getelementptr inbounds i32, i32 addrspace(1)* %P, i32 %A
%tmp.6 = icmp eq i32 addrspace(1)* %tmp.4, %P		%tmp.6 = icmp eq i32 addrspace(1)* %tmp.4, %P
%tmp.7 = zext i1 %tmp.6 to i32		%tmp.7 = zext i1 %tmp.6 to i32
ret i32 %tmp.7		ret i32 %tmp.7
; CHECK-LABEL: @test20_as1(		; CHECK-LABEL: @test20_as1(
; CHECK: icmp eq i16 %1, 0		; CHECK: icmp eq i16 %1, 0
}		}

		; Verify that we return the access type -- not the alloca type -- when simplifying to undef.

define i32 @test21() {		define i32 @test21() {
		; CHECK-LABEL: @test21(
		; CHECK-NEXT: ret i32 undef
		;
%pbob1 = alloca %intstruct		%pbob1 = alloca %intstruct
%pbob2 = getelementptr %intstruct, %intstruct* %pbob1		%pbob2 = getelementptr %intstruct, %intstruct* %pbob1
%pbobel = getelementptr %intstruct, %intstruct* %pbob2, i64 0, i32 0		%pbobel = getelementptr %intstruct, %intstruct* %pbob2, i64 0, i32 0
%rval = load i32, i32* %pbobel		%rval = load i32, i32* %pbobel
ret i32 %rval		ret i32 %rval
; CHECK-LABEL: @test21(		}
; CHECK: getelementptr inbounds %intstruct, %intstruct* %pbob1, i64 0, i32 0
		define i32 @test21_alt(%intstruct* %pbob1) {
		; CHECK-LABEL: @test21_alt(
		; CHECK-NEXT: [[PBOBEL:%.]] = getelementptr [[INTSTRUCT:%.]], %intstruct* [[PBOB1:%.*]], i64 0, i32 0
		; CHECK-NEXT: [[RVAL:%.]] = load i32, i32 [[PBOBEL]], align 4
		; CHECK-NEXT: ret i32 [[RVAL]]
		;
		%pbob2 = getelementptr %intstruct, %intstruct* %pbob1
		%pbobel = getelementptr %intstruct, %intstruct* %pbob2, i64 0, i32 0
		%rval = load i32, i32* %pbobel
		ret i32 %rval
}		}


@A = global i32 1 ; <i32*> [#uses=1]		@A = global i32 1 ; <i32*> [#uses=1]
@B = global i32 2 ; <i32*> [#uses=1]		@B = global i32 2 ; <i32*> [#uses=1]

define i1 @test22() {		define i1 @test22() {
%C = icmp ult i32* getelementptr (i32, i32* @A, i64 1),		%C = icmp ult i32* getelementptr (i32, i32* @A, i64 1),
▲ Show 20 Lines • Show All 567 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/multiple-uses-load-bitcast-select.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S -data-layout="E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64" \| FileCheck %s			; RUN: opt < %s -instcombine -S -data-layout="E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64" \| FileCheck %s

				; This used to test for an infinite loop, but now we simplify
				; the loads to undef before we get to the problematic fold.

	define void @PR35618(i64* %st1, double* %st2) {			define void @PR35618(i64* %st1, double* %st2) {
	; CHECK-LABEL: @PR35618(			; CHECK-LABEL: @PR35618(
				; CHECK-NEXT: ret void
				;
				%y1 = alloca double
				%z1 = alloca double
				%ld1 = load double, double* %y1
				%ld2 = load double, double* %z1
				%tmp10 = fcmp olt double %ld1, %ld2
				%sel = select i1 %tmp10, double* %y1, double* %z1
				%tmp11 = bitcast double* %sel to i64*
				%tmp12 = load i64, i64* %tmp11
				store i64 %tmp12, i64* %st1
				%bc = bitcast double* %st2 to i64*
				store i64 %tmp12, i64* %bc
				ret void
				}

				define void @PR35618_better(double %y, double %z, i64* %st1, double* %st2) {
				; CHECK-LABEL: @PR35618_better(
	; CHECK-NEXT: [[Y1:%.*]] = alloca double, align 8			; CHECK-NEXT: [[Y1:%.*]] = alloca double, align 8
	; CHECK-NEXT: [[Z1:%.*]] = alloca double, align 8			; CHECK-NEXT: [[Z1:%.*]] = alloca double, align 8
	; CHECK-NEXT: [[LD1:%.]] = load double, double [[Y1]], align 8			; CHECK-NEXT: store double [[Y:%.]], double [[Y1]], align 8
	; CHECK-NEXT: [[LD2:%.]] = load double, double [[Z1]], align 8			; CHECK-NEXT: store double [[Z:%.]], double [[Z1]], align 8
	; CHECK-NEXT: [[TMP10:%.*]] = fcmp olt double [[LD1]], [[LD2]]			; CHECK-NEXT: [[TMP10:%.*]] = fcmp olt double [[Y]], [[Z]]
	; CHECK-NEXT: [[TMP121:%.*]] = select i1 [[TMP10]], double [[LD1]], double [[LD2]]			; CHECK-NEXT: [[SEL:%.]] = select i1 [[TMP10]], double [[Y1]], double* [[Z1]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[ST1:%.]] to double			; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[SEL]] to i64*
	; CHECK-NEXT: store double [[TMP121]], double* [[TMP1]], align 8			; CHECK-NEXT: [[TMP12:%.]] = load i64, i64 [[TMP11]], align 8
	; CHECK-NEXT: store double [[TMP121]], double* [[ST2:%.*]], align 8			; CHECK-NEXT: store i64 [[TMP12]], i64* [[ST1:%.*]], align 8
				; CHECK-NEXT: [[BC:%.]] = bitcast double [[ST2:%.]] to i64
				; CHECK-NEXT: store i64 [[TMP12]], i64* [[BC]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%y1 = alloca double			%y1 = alloca double
	%z1 = alloca double			%z1 = alloca double
				store double %y, double* %y1
				store double %z, double* %z1
	%ld1 = load double, double* %y1			%ld1 = load double, double* %y1
	%ld2 = load double, double* %z1			%ld2 = load double, double* %z1
	%tmp10 = fcmp olt double %ld1, %ld2			%tmp10 = fcmp olt double %ld1, %ld2
	%sel = select i1 %tmp10, double* %y1, double* %z1			%sel = select i1 %tmp10, double* %y1, double* %z1
	%tmp11 = bitcast double* %sel to i64*			%tmp11 = bitcast double* %sel to i64*
	%tmp12 = load i64, i64* %tmp11			%tmp12 = load i64, i64* %tmp11
	store i64 %tmp12, i64* %st1			store i64 %tmp12, i64* %st1
	%bc = bitcast double* %st2 to i64*			%bc = bitcast double* %st2 to i64*
	store i64 %tmp12, i64* %bc			store i64 %tmp12, i64* %bc
	ret void			ret void
	}			}