This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Don't fold icmp sgt/slt (add nsw X, C2), C --> icmp sgt/slt X, (C - C2) when X is a PHI node.
AbandonedPublic

Authored by wdng on May 19 2017, 11:45 PM.

Download Raw Diff

Details

Reviewers

rampitec
arsenm
kzhuravl
t-tye
b-sumner
spatel
nlopes
efriedma

Summary

When X is PHI node like the following:

%myValue.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
%inc = add nuw nsw i32 %myValue.0, 1
%cmp15 = icmp slt i32 %inc, 255

Misinformed IR will be produced if we fold imp sgt/slt (https://reviews.llvm.org/D29774), this will bring in program regression.

<badref> = icmp slt i32 %myValue.0, 254

Diff Detail

Repository: rL LLVM

Event Timeline

wdng created this revision.May 19 2017, 11:45 PM

wdng edited the summary of this revision. (Show Details)

wdng added reviewers: spatel, nlopes, efriedma.May 22 2017, 8:56 AM

I don't understand why this change is needed. Can you explain how the test in this patch can miscompile?

In D33375#761110, @spatel wrote:

I don't understand why this change is needed. Can you explain how the test in this patch can miscompile?

entry:
  %call = tail call i64 @_Z13get_global_idj(i32 0) #3
  %conv = trunc i64 %call to i32
  %conv1 = and i64 %call, 4294967295
  %sub = add i32 %threadCount, -1
  %sub2 = sub i32 %sub, %conv
  %conv3 = zext i32 %sub2 to i64
  %mul = shl nuw nsw i64 %conv1, 8
  %arrayidx5 = getelementptr inbounds i32, i32 addrspace(1)* %destMemory, i64 %conv1
  %0 = addrspacecast i32 addrspace(1)* %arrayidx5 to i32 addrspace(4)*
  %conv13 = zext i32 %threadCount to i64
  br label %do.body

do.body:                                          ; preds = %do.body, %entry
  %myValue.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
  %hisId.0 = phi i64 [ %conv3, %entry ], [ %rem, %do.body ]
  %inc = add nuw nsw i32 %myValue.0, 1
  %conv4 = sext i32 %inc to i64
  %add = add nsw i64 %mul, %conv4
  %arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %oldValues, i64 %add
  store i32 %inc, i32 addrspace(1)* %arrayidx, align 4, !tbaa !8
  store atomic volatile i32 %inc, i32 addrspace(4)* %0 release, align 4
  %arrayidx6 = getelementptr inbounds i32, i32 addrspace(1)* %destMemory, i64 %hisId.0
  %1 = addrspacecast i32 addrspace(1)* %arrayidx6 to i32 addrspace(4)*
  %2 = load atomic volatile i32, i32 addrspace(4)* %1 monotonic, align 4
  tail call void @_Z22atomic_work_item_fencej12memory_order12memory_scope(i32 2, i32 1, i32 3) #5
  %mul8 = shl nsw i64 %hisId.0, 8
  %conv9 = sext i32 %2 to i64
  %add10 = add nsw i64 %mul8, %conv9
  %arrayidx11 = getelementptr inbounds i32, i32 addrspace(1)* %oldValues, i64 %add10
  %3 = load i32, i32 addrspace(1)* %arrayidx11, align 4, !tbaa !8
  %add12 = add nsw i64 %hisId.0, 1
  %rem = urem i64 %add12, %conv13
  %cmp = icmp eq i32 %2, %3
  <badref> = icmp slt i32 %myValue.0, 254    // we need correct one like "%cmp15 = icmp slt i32 %inc, 255"
  %4 = and i1 %cmp15, %cmp
  br i1 %4, label %do.body, label %do.end

do.end:                                           ; preds = %do.body
  br i1 %cmp, label %if.end56, label %if.then

This is the IR for one of our test, I found it failed if optimization "fold icmp sgt/slt (add nsw X, C2), C --> icmp sgt/slt X, (C - C2)" is applied, compiler generates "<badref> = icmp slt i32 %myValue.0, 254", so there is a need to check whether X is a PHI node.

Sorry, but I still don't understand.

%inc = add nuw nsw i32 %myValue.0, 1
%cmp15 = icmp slt i32 %inc, 255

%myValue.0 will be 0 when we enter the loop. Eventually, it will increment up to 255, and %cmp15 will be false. How is that logically different than:
%cmp15 = icmp slt i32 %myValue.0, 254 ?

%inc is not always one more than %myValue.0?

In D33375#761170, @spatel wrote:
Sorry, but I still don't understand.
%inc = add nuw nsw i32 %myValue.0, 1
%cmp15 = icmp slt i32 %inc, 255
%myValue.0 will be 0 when we enter the loop. Eventually, it will increment up to 255, and %cmp15 will be false. How is that logically different than:
%cmp15 = icmp slt i32 %myValue.0, 254 ?

%inc is not always one more than %myValue.0?

Logically, yes it's correct. The problem here is the compiler optimized the code and generated malformed LLVM IR.

%inc = add nuw nsw i32 %myValue.0, 1
%cmp15 = icmp slt i32 %inc, 255

Optimization folding icmp sgt/slt (add nsw X, C2), C --> icmp sgt/slt X, (C - C2) will try to merge the above two instructions and produce the following code while it's malformed:

<badref> = icmp slt i32 %myValue.0, 254

I also do not understand what difference does it make if X is a PHI or whatever else opcode. This transformation is either correct or not.

In D33375#761185, @wdng wrote:

<badref> = icmp slt i32 %myValue.0, 254

Why is this IR malformed? Does "<badref>" have something to do with the problem? Ie, are we not replacing all uses correctly?

wdng abandoned this revision.Jun 29 2017, 12:50 PM

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineCompares.cpp

2 lines

test/

Transforms/

InstCombine/

icmp-add.ll

19 lines

Diff 99664

lib/Transforms/InstCombine/InstCombineCompares.cpp

Show First 20 Lines • Show All 2,386 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::foldICmpAddConstant(ICmpInst &Cmp,
// canonicalized to SGT/SLT.		// canonicalized to SGT/SLT.
if (Add->hasNoSignedWrap() &&		if (Add->hasNoSignedWrap() &&
(Pred == ICmpInst::ICMP_SGT \|\| Pred == ICmpInst::ICMP_SLT)) {		(Pred == ICmpInst::ICMP_SGT \|\| Pred == ICmpInst::ICMP_SLT)) {
bool Overflow;		bool Overflow;
APInt NewC = C->ssub_ov(*C2, Overflow);		APInt NewC = C->ssub_ov(*C2, Overflow);
// If there is overflow, the result must be true or false.		// If there is overflow, the result must be true or false.
// TODO: Can we assert there is no overflow because InstSimplify always		// TODO: Can we assert there is no overflow because InstSimplify always
// handles those cases?		// handles those cases?
if (!Overflow)		if (!Overflow && !isa<PHINode>(X))
// icmp Pred (add nsw X, C2), C --> icmp Pred X, (C - C2)		// icmp Pred (add nsw X, C2), C --> icmp Pred X, (C - C2)
return new ICmpInst(Pred, X, ConstantInt::get(Ty, NewC));		return new ICmpInst(Pred, X, ConstantInt::get(Ty, NewC));
}		}

auto CR = ConstantRange::makeExactICmpRegion(Pred, C).subtract(C2);		auto CR = ConstantRange::makeExactICmpRegion(Pred, C).subtract(C2);
const APInt &Upper = CR.getUpper();		const APInt &Upper = CR.getUpper();
const APInt &Lower = CR.getLower();		const APInt &Lower = CR.getLower();
if (Cmp.isSigned()) {		if (Cmp.isSigned()) {
▲ Show 20 Lines • Show All 2,586 Lines • Show Last 20 Lines

test/Transforms/InstCombine/icmp-add.ll

	Show First 20 Lines • Show All 169 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 %a, -1			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 %a, -1
	; CHECK-NEXT: ret i1 [[CMP]]			; CHECK-NEXT: ret i1 [[CMP]]
	;			;
	%add = add nsw i32 %a, 1			%add = add nsw i32 %a, 1
	%cmp = icmp slt i32 %add, 0			%cmp = icmp slt i32 %add, 0
	ret i1 %cmp			ret i1 %cmp
	}			}

				; icmp Pred (add nsw X, C2), C --> should not be optimized to icmp Pred X, (C - C2) when X is a PHI node.
				define void @slt_phi_node_add_nsw(i32 %a) {
				; CHECK-LABEL: @slt_phi_node_add_nsw(
				; CHECK: [[CMP:%.*]] = icmp slt i32 %inc, 255
				;
				entry:
				br label %do.body

				do.body: ; preds = %do.body, %entry
				%myValue = phi i32 [ 0, %entry ], [ %inc, %do.body ]
				%inc = add nuw nsw i32 %myValue, 1
				%add = add nsw i32 %a, 1
				%cmp = icmp slt i32 %inc, 255
				br i1 %cmp, label %do.body, label %do.end

				do.end:
				ret void;
				}

	; The same fold should work with vectors.			; The same fold should work with vectors.

	define <2 x i1> @slt_zero_add_nsw_splat_vec(<2 x i8> %a) {			define <2 x i1> @slt_zero_add_nsw_splat_vec(<2 x i8> %a) {
	; CHECK-LABEL: @slt_zero_add_nsw_splat_vec(			; CHECK-LABEL: @slt_zero_add_nsw_splat_vec(
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt <2 x i8> %a, <i8 -1, i8 -1>			; CHECK-NEXT: [[CMP:%.*]] = icmp slt <2 x i8> %a, <i8 -1, i8 -1>
	; CHECK-NEXT: ret <2 x i1> [[CMP]]			; CHECK-NEXT: ret <2 x i1> [[CMP]]
	;			;
	%add = add nsw <2 x i8> %a, <i8 1, i8 1>			%add = add nsw <2 x i8> %a, <i8 1, i8 1>
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines