This is an archive of the discontinued LLVM Phabricator instance.

[LICM][WIP] Transform and hoist select instructions if possible
Needs RevisionPublic

Authored by StephenFan on Jan 4 2023, 6:06 AM.

Details

Reviewers
nikic
fhahn
Summary

Try to hoist select instruction that match the following and
%c and %cond are loop invariants:
Loop:

%x = add i32 %y, %c
%z = select i1 %cond, i32 %y, i32 %x

>

OutOfLoop:

%a = select i1 %cond, i32 0, i32 %c
...

Loop:

%z = add i32 %y, %a

TBH, I want to optimize the program:

define dso_local i32 @test(i32 noundef %x) local_unnamed_addr #0 {
entry:
  br label %for.cond

for.cond:                                         ; preds = %for.inc, %entry
  %y.0 = phi i32 [ 0, %entry ], [ %y.1, %for.inc ]
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
  %cmp = icmp ult i32 %i.0, 1000
  br i1 %cmp, label %for.body, label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond
  ret i32 %y.0

for.body:                                         ; preds = %for.cond
  %tobool.not = icmp eq i32 %x, 0
  br i1 %tobool.not, label %for.inc, label %if.then

if.then:                                          ; preds = %for.body
  %add = add nsw i32 %y.0, 10
  br label %for.inc

for.inc:                                          ; preds = %for.body, %if.then
  %y.1 = phi i32 [ %add, %if.then ], [ %y.0, %for.body ]
  %inc = add nuw nsw i32 %i.0, 1
  br label %for.cond
}

Idealy, it can be optimized to

define dso_local i32 @test(i32 noundef %x) local_unnamed_addr #0 {
entry:
  %tobool.not = icmp eq i32 %x, 0
  %0 = select i1 %tobool.not, i32 0, i32 10000
  ret i32 %0
}

But SimplifyCFG transforms for.body and if.then to a select instruction. Which is:

define dso_local i32 @test(i32 noundef %x) local_unnamed_addr {
entry:
  br label %for.cond

for.cond:                                         ; preds = %for.body, %entry
  %y.0 = phi i32 [ 0, %entry ], [ %spec.select, %for.body ]
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
  %cmp = icmp ult i32 %i.0, 1000
  br i1 %cmp, label %for.body, label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond
  ret i32 %y.0

for.body:                                         ; preds = %for.cond
  %tobool.not = icmp eq i32 %x, 0
  %add = add nsw i32 %y.0, 10
  %spec.select = select i1 %tobool.not, i32 %y.0, i32 %add
  %inc = add nuw nsw i32 %i.0, 1
  br label %for.cond
}

This prevents it from being loop unswitched. In order to get the expected result, this patch hoisted the select instruction, so that indvars pass can rewrite the exit value.

Diff Detail

Event Timeline

StephenFan created this revision.Jan 4 2023, 6:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2023, 6:06 AM
StephenFan requested review of this revision.Jan 4 2023, 6:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2023, 6:06 AM
nikic added a comment.Jan 4 2023, 6:23 AM

Shouldn't this be handled by adding this fold to InstCombine? https://alive2.llvm.org/ce/z/EXY5JH

Shouldn't this be handled by adding this fold to InstCombine? https://alive2.llvm.org/ce/z/EXY5JH

Thanks for your advice! I will try to handle this in InstCombine. But it seems it is not a beneficial transformation if we transform

define i8 @src(i1 %c, i8 %x) {
  %add = add i8 %x, 10
  %select = select i1 %c, i8 %x, i8 %add
  ret i8 %select
}

to

define i8 @tgt(i1 %c, i8 %x) {
  %select = select i1 %c, i8 0, i8 10
  %add = add i8 %select, %x
  ret i8 %add
}

https://godbolt.org/z/zxeojTxqb

nikic added a subscriber: spatel.Jan 4 2023, 7:01 AM

Shouldn't this be handled by adding this fold to InstCombine? https://alive2.llvm.org/ce/z/EXY5JH

Thanks for your advice! I will try to handle this in InstCombine. But it seems it is not a beneficial transformation if we transform

define i8 @src(i1 %c, i8 %x) {
  %add = add i8 %x, 10
  %select = select i1 %c, i8 %x, i8 %add
  ret i8 %select
}

to

define i8 @tgt(i1 %c, i8 %x) {
  %select = select i1 %c, i8 0, i8 10
  %add = add i8 %select, %x
  ret i8 %add
}

https://godbolt.org/z/zxeojTxqb

This looks beneficial from a middle-end perspective, so we probably need to undo in the backend? @spatel Any thoughts?

spatel added a comment.Jan 4 2023, 7:33 AM

We already have that fold?
https://alive2.llvm.org/ce/z/XPZL38

There is already an undo of that or something similar for FP ops on some targets, so it may need some generalization.

nikic added a comment.Jan 4 2023, 7:39 AM

We already have that fold?
https://alive2.llvm.org/ce/z/XPZL38

There is already an undo of that or something similar for FP ops on some targets, so it may need some generalization.

Ooops, you're right. I could have sworn I checked that first...

So to return to the original problem here, there may be a phase ordering issue -- can you please add a PhaseOrdering test?

spatel added a comment.Jan 4 2023, 7:52 AM

We already have that fold?
https://alive2.llvm.org/ce/z/XPZL38

There is already an undo of that or something similar for FP ops on some targets, so it may need some generalization.

Ooops, you're right. I could have sworn I checked that first...

So to return to the original problem here, there may be a phase ordering issue -- can you please add a PhaseOrdering test?

Aha - there's a limitation for the example with a constant, so it doesn't fold:
https://alive2.llvm.org/ce/z/QfjhJP

I think it's from this (not sure why we have it):
https://github.com/llvm/llvm-project/blob/85960043d594fc12d340ccb66a30861b023ab496/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp#L492

nikic requested changes to this revision.Jan 6 2023, 1:17 AM

Per above comments, I believe we should 1) remove the InstCombine limitation that @spatel found and 2) undo in the backend if needed. I'd rather not have this kind of special case sitting in LICM.

This revision now requires changes to proceed.Jan 6 2023, 1:17 AM