This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Don't fold icmp sgt/slt (add nsw X, C2), C --> icmp sgt/slt X, (C - C2) when X is a PHI node.
AbandonedPublic

Authored by wdng on May 19 2017, 11:45 PM.

Details

Summary

When X is PHI node like the following:

%myValue.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
%inc = add nuw nsw i32 %myValue.0, 1
%cmp15 = icmp slt i32 %inc, 255

Misinformed IR will be produced if we fold imp sgt/slt (https://reviews.llvm.org/D29774), this will bring in program regression.

<badref> = icmp slt i32 %myValue.0, 254

Diff Detail

Repository
rL LLVM

Event Timeline

wdng created this revision.May 19 2017, 11:45 PM
wdng edited the summary of this revision. (Show Details)
spatel edited edge metadata.May 22 2017, 9:08 AM

I don't understand why this change is needed. Can you explain how the test in this patch can miscompile?

wdng added a comment.EditedMay 22 2017, 9:14 AM

I don't understand why this change is needed. Can you explain how the test in this patch can miscompile?

entry:
  %call = tail call i64 @_Z13get_global_idj(i32 0) #3
  %conv = trunc i64 %call to i32
  %conv1 = and i64 %call, 4294967295
  %sub = add i32 %threadCount, -1
  %sub2 = sub i32 %sub, %conv
  %conv3 = zext i32 %sub2 to i64
  %mul = shl nuw nsw i64 %conv1, 8
  %arrayidx5 = getelementptr inbounds i32, i32 addrspace(1)* %destMemory, i64 %conv1
  %0 = addrspacecast i32 addrspace(1)* %arrayidx5 to i32 addrspace(4)*
  %conv13 = zext i32 %threadCount to i64
  br label %do.body

do.body:                                          ; preds = %do.body, %entry
  %myValue.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
  %hisId.0 = phi i64 [ %conv3, %entry ], [ %rem, %do.body ]
  %inc = add nuw nsw i32 %myValue.0, 1
  %conv4 = sext i32 %inc to i64
  %add = add nsw i64 %mul, %conv4
  %arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %oldValues, i64 %add
  store i32 %inc, i32 addrspace(1)* %arrayidx, align 4, !tbaa !8
  store atomic volatile i32 %inc, i32 addrspace(4)* %0 release, align 4
  %arrayidx6 = getelementptr inbounds i32, i32 addrspace(1)* %destMemory, i64 %hisId.0
  %1 = addrspacecast i32 addrspace(1)* %arrayidx6 to i32 addrspace(4)*
  %2 = load atomic volatile i32, i32 addrspace(4)* %1 monotonic, align 4
  tail call void @_Z22atomic_work_item_fencej12memory_order12memory_scope(i32 2, i32 1, i32 3) #5
  %mul8 = shl nsw i64 %hisId.0, 8
  %conv9 = sext i32 %2 to i64
  %add10 = add nsw i64 %mul8, %conv9
  %arrayidx11 = getelementptr inbounds i32, i32 addrspace(1)* %oldValues, i64 %add10
  %3 = load i32, i32 addrspace(1)* %arrayidx11, align 4, !tbaa !8
  %add12 = add nsw i64 %hisId.0, 1
  %rem = urem i64 %add12, %conv13
  %cmp = icmp eq i32 %2, %3
  <badref> = icmp slt i32 %myValue.0, 254    // we need correct one like "%cmp15 = icmp slt i32 %inc, 255"
  %4 = and i1 %cmp15, %cmp
  br i1 %4, label %do.body, label %do.end

do.end:                                           ; preds = %do.body
  br i1 %cmp, label %if.end56, label %if.then

This is the IR for one of our test, I found it failed if optimization "fold icmp sgt/slt (add nsw X, C2), C --> icmp sgt/slt X, (C - C2)" is applied, compiler generates "<badref> = icmp slt i32 %myValue.0, 254", so there is a need to check whether X is a PHI node.

Sorry, but I still don't understand.

%inc = add nuw nsw i32 %myValue.0, 1
%cmp15 = icmp slt i32 %inc, 255

%myValue.0 will be 0 when we enter the loop. Eventually, it will increment up to 255, and %cmp15 will be false. How is that logically different than:
%cmp15 = icmp slt i32 %myValue.0, 254 ?

%inc is not always one more than %myValue.0?

wdng added a comment.EditedMay 22 2017, 9:56 AM

Sorry, but I still don't understand.

%inc = add nuw nsw i32 %myValue.0, 1
%cmp15 = icmp slt i32 %inc, 255

%myValue.0 will be 0 when we enter the loop. Eventually, it will increment up to 255, and %cmp15 will be false. How is that logically different than:
%cmp15 = icmp slt i32 %myValue.0, 254 ?

%inc is not always one more than %myValue.0?

Logically, yes it's correct. The problem here is the compiler optimized the code and generated malformed LLVM IR.

%inc = add nuw nsw i32 %myValue.0, 1
%cmp15 = icmp slt i32 %inc, 255

Optimization folding icmp sgt/slt (add nsw X, C2), C --> icmp sgt/slt X, (C - C2) will try to merge the above two instructions and produce the following code while it's malformed:

<badref> = icmp slt i32 %myValue.0, 254
rampitec edited edge metadata.May 22 2017, 10:00 AM

I also do not understand what difference does it make if X is a PHI or whatever else opcode. This transformation is either correct or not.

In D33375#761185, @wdng wrote:
<badref> = icmp slt i32 %myValue.0, 254

Why is this IR malformed? Does "<badref>" have something to do with the problem? Ie, are we not replacing all uses correctly?

wdng abandoned this revision.Jun 29 2017, 12:50 PM