This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Conditionally fold select i1 into and/or
ClosedPublic

Authored by aqjune on Mar 31 2021, 10:37 AM.

Details

Summary

This patch fixes llvm.org/pr49688 by conditionally folding select i1 into and/or:

select cond, cond2, false
->
and cond, cond2

This is not safe if cond2 is poison whereas cond isn’t.

Unconditionally disabling this transformation affects later pipelines that depend on and/or i1s.
To minimize its impact, this patch conservatively checks whether cond2 is an instruction that
creates a poison or its operand creates a poison.
This approach is similar to what InstSimplify's SimplifyWithOpReplaced is doing.

Diff Detail

Event Timeline

aqjune created this revision.Mar 31 2021, 10:37 AM
aqjune requested review of this revision.Mar 31 2021, 10:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2021, 10:37 AM
aqjune added inline comments.Mar 31 2021, 10:40 AM
llvm/test/Transforms/InstCombine/and-fcmp.ll
79

This optimization can be supported by inserting freeze again.
If this patch is accepted, I'll record this as a regression at bugzilla and make a patch then.

Unconditionally disabling this transformation affects later pipelines that depend on and/or i1s.
To minimize its impact, this patch conservatively checks whether cond2 is an instruction that
creates a poison or its operand creates a poison.

Do you have any further information on the regressions this would result in?

Given all the preliminary work that has gone into this, I'd really like to have the option flipped outright, and not introduce another hack.

llvm/test/Transforms/InstCombine/pr49668.ll
1 ↗(On Diff #334483)

Wrong filename? https://bugs.llvm.org/show_bug.cgi?id=49668 does not look related.

aqjune added a comment.Apr 1 2021, 9:06 AM

Unconditionally disabling this transformation affects later pipelines that depend on and/or i1s.
To minimize its impact, this patch conservatively checks whether cond2 is an instruction that
creates a poison or its operand creates a poison.

Do you have any further information on the regressions this would result in?

Given all the preliminary work that has gone into this, I'd really like to have the option flipped outright, and not introduce another hack.

If -instcombine-unsafe-select-transform is flipped from true to false, we have 35 LLVM unit test failures. A few of them are unsafe ones so okay to be removed, but the remaining ones need updates in analysis or transformations to reenable them.
To minimize performance regression, what do you think about this workflow?
(1) This patch is landed to fix the miscompilation without significant regressions
(2) The clang noundef patch is enabled by default (I left a comment at its test-only patch)
(3) Flip -instcombine-unsafe-select-transform to false

Enabling the clang noundef flag reduces assembly diffs from LLVM test suite.
Without the noundef flag, the # of different assembly files is 737/5247 when -instcombine-unsafe-select-transform is flipped from true to false.
With the noundef flag, it becomes 632/5247.
It looks big, but about 80% of those are simply from reordering of basic blocks (I randomly picked 5 samples and 4 were such cases).
With this patch (D99674) only, the assembly diff # is already 439/5247, so 632 isn't very big.

aqjune updated this revision to Diff 334724.Apr 1 2021, 9:08 AM

rebase, correctify the test file name

aqjune marked an inline comment as done.Apr 1 2021, 9:09 AM
nikic accepted this revision.Apr 3 2021, 12:15 PM

If -instcombine-unsafe-select-transform is flipped from true to false, we have 35 LLVM unit test failures. A few of them are unsafe ones so okay to be removed, but the remaining ones need updates in analysis or transformations to reenable them.
To minimize performance regression, what do you think about this workflow?
(1) This patch is landed to fix the miscompilation without significant regressions
(2) The clang noundef patch is enabled by default (I left a comment at its test-only patch)
(3) Flip -instcombine-unsafe-select-transform to false

Enabling the clang noundef flag reduces assembly diffs from LLVM test suite.
Without the noundef flag, the # of different assembly files is 737/5247 when -instcombine-unsafe-select-transform is flipped from true to false.
With the noundef flag, it becomes 632/5247.
It looks big, but about 80% of those are simply from reordering of basic blocks (I randomly picked 5 samples and 4 were such cases).
With this patch (D99674) only, the assembly diff # is already 439/5247, so 632 isn't very big.

I'm OK with landing this first to fix an active miscompile, but I don't think we should block flipping the switch on noundef work. I think your numbers support that we shouldn't wait on that, because the noundef impact it small relative to the whole change (it only makes a 15% difference in affected files).

llvm/test/Transforms/PhaseOrdering/unsigned-multiply-overflow-check.ll
116 ↗(On Diff #334724)

This test update looks unnecessary? Only changes names.

This revision is now accepted and ready to land.Apr 3 2021, 12:15 PM
aqjune added a comment.Apr 3 2021, 9:35 PM

I'm OK with landing this first to fix an active miscompile, but I don't think we should block flipping the switch on noundef work. I think your numbers support that we shouldn't wait on that, because the noundef impact it small relative to the whole change (it only makes a 15% difference in affected files).

Let's wait for two weeks after this patch is landed to see whether any serious regression from this patch is reported - if there is no such report, I think it is a good sign and we can move forward without waiting for the noundef patch.
Thanks for the review!

aqjune marked an inline comment as done.Apr 3 2021, 9:36 PM
aqjune updated this revision to Diff 335129.Apr 3 2021, 10:08 PM

rebase, clang-format

This revision was landed with ongoing or failed builds.Apr 3 2021, 10:16 PM
This revision was automatically updated to reflect the committed changes.
bjope added a subscriber: bjope.Apr 7 2021, 12:36 AM

I've seen problems with infinite loops in InstCombine after mergin this patch downstream:

IC: Visiting:   %not. = xor <4 x i1> %271, <i1 poison, i1 poison, i1 poison, i1 true>, !dbg !252
IC: Visiting:   %274 = select <4 x i1> %not., <4 x i1> <i1 poison, i1 poison, i1 poison, i1 true>, <4 x i1> %273, !dbg !252
IC: ADD DEFERRED:   %not. = xor <4 x i1> %271, <i1 poison, i1 poison, i1 poison, i1 true>, !dbg !252
IC: Mod =   %274 = select <4 x i1> %not., <4 x i1> <i1 poison, i1 poison, i1 poison, i1 true>, <4 x i1> %273, !dbg !252
    New =   %274 = select <4 x i1> %271, <4 x i1> %273, <4 x i1> <i1 poison, i1 poison, i1 poison, i1 true>, !dbg !252
IC: ADD:   %274 = select <4 x i1> %271, <4 x i1> %273, <4 x i1> <i1 poison, i1 poison, i1 poison, i1 true>, !dbg !252
IC: ERASE   %not. = xor <4 x i1> %271, <i1 poison, i1 poison, i1 poison, i1 true>, !dbg !252
IC: ADD DEFERRED:   %271 = and <4 x i1> %broadcast.splat2008, %270, !dbg !252
IC: ADD:   %271 = and <4 x i1> %broadcast.splat2008, %270, !dbg !252
IC: Visiting:   %271 = and <4 x i1> %broadcast.splat2008, %270, !dbg !252
IC: Visiting:   %274 = select <4 x i1> %271, <4 x i1> %273, <4 x i1> <i1 poison, i1 poison, i1 poison, i1 true>, !dbg !252
IC: ADD DEFERRED:   %not. = xor <4 x i1> %271, <i1 true, i1 true, i1 true, i1 true>
IC: Old =   %274 = select <4 x i1> %271, <4 x i1> %273, <4 x i1> <i1 poison, i1 poison, i1 poison, i1 true>, !dbg !252
    New =   <badref> = select <4 x i1> %not., <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i1> %273
IC: ADD:   %274 = select <4 x i1> %not., <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i1> %273, !dbg !252
IC: ERASE   %275 = select <4 x i1> %271, <4 x i1> %273, <4 x i1> <i1 poison, i1 poison, i1 poison, i1 true>, !dbg !252
IC: ADD DEFERRED:   %271 = and <4 x i1> %broadcast.splat2008, %270, !dbg !252
IC: ADD DEFERRED:   %273 = icmp sge <4 x i32> %272, %broadcast.splat2010, !dbg !252
IC: ADD:   %273 = icmp sge <4 x i32> %272, %broadcast.splat2010, !dbg !252
IC: ADD:   %271 = and <4 x i1> %broadcast.splat2008, %270, !dbg !252
IC: ADD:   %not. = xor <4 x i1> %271, <i1 true, i1 true, i1 true, i1 true>, !dbg !252
IC: Visiting:   %not. = xor <4 x i1> %271, <i1 true, i1 true, i1 true, i1 true>, !dbg !252
IC: Visiting:   %271 = and <4 x i1> %broadcast.splat2008, %270, !dbg !252
IC: Visiting:   %273 = icmp sge <4 x i32> %272, %broadcast.splat2010, !dbg !252
IC: Visiting:   %274 = select <4 x i1> %not., <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i1> %273, !dbg !252
IC: Visiting:   %predphi = select <4 x i1> %274, <4 x i32> %281, <4 x i32> <i32 poison, i32 poison, i32 poison, i32 0>, !dbg !252
IC: Visiting:   %282 = extractelement <4 x i32> %predphi, i32 3
IC: ADD DEFERRED:   %not. = xor <4 x i1> %271, <i1 poison, i1 poison, i1 poison, i1 true>, !dbg !252
IC: ADD DEFERRED:   %274 = select <4 x i1> %not., <4 x i1> <i1 poison, i1 poison, i1 poison, i1 true>, <4 x i1> %273, !dbg !252
IC: ADD DEFERRED:   %predphi = select <4 x i1> %274, <4 x i32> %281, <4 x i32> <i32 poison, i32 poison, i32 poison, i32 0>, !dbg !252
IC: Mod =   %282 = extractelement <4 x i32> %predphi, i32 3
    New =   %282 = extractelement <4 x i32> %predphi, i32 3
IC: ADD:   %282 = extractelement <4 x i32> %predphi, i32 3
IC: ADD:   %predphi = select <4 x i1> %274, <4 x i32> %281, <4 x i32> <i32 poison, i32 poison, i32 poison, i32 0>, !dbg !252
IC: ADD:   %274 = select <4 x i1> %not., <4 x i1> <i1 poison, i1 poison, i1 poison, i1 true>, <4 x i1> %273, !dbg !252
IC: ADD:   %not. = xor <4 x i1> %271, <i1 poison, i1 poison, i1 poison, i1 true>, !dbg !252
IC: Visiting:   %not. = xor <4 x i1> %271, <i1 poison, i1 poison, i1 poison, i1 true>, !dbg !252
IC: Visiting:   %274 = select <4 x i1> %not., <4 x i1> <i1 poison, i1 poison, i1 poison, i1 true>, <4 x i1> %273, !dbg !252
IC: ADD DEFERRED:   %not. = xor <4 x i1> %271, <i1 poison, i1 poison, i1 poison, i1 true>, !dbg !252
IC: Mod =   %274 = select <4 x i1> %not., <4 x i1> <i1 poison, i1 poison, i1 poison, i1 true>, <4 x i1> %273, !dbg !252
    New =   %274 = select <4 x i1> %271, <4 x i1> %273, <4 x i1> <i1 poison, i1 poison, i1 poison, i1 true>, !dbg !252
IC: ADD:   %274 = select <4 x i1> %271, <4 x i1> %273, <4 x i1> <i1 poison, i1 poison, i1 poison, i1 true>, !dbg !252
IC: ERASE   %not. = xor <4 x i1> %271, <i1 poison, i1 poison, i1 poison, i1 true>, !dbg !252
IC: ADD DEFERRED:   %271 = and <4 x i1> %broadcast.splat2008, %270, !dbg !252
...

Working on reducing the test case a bit, and if I can find a reproducer for the main trunk then I think we need to revert this patch.

bjope added a comment.Apr 7 2021, 4:38 AM

Here is a reduced reproducer that goes into infinite loop with this patch:

; RUN: opt < %s -instcombine -S -o /dev/null

; This test case end up in an infinite loop with:
;    commit 5207cde5cb4147155c469e1861427ea9d569bd5a
;    Date:   Sun Apr 4 13:35:33 2021 +0900
;
;        [InstCombine] Conditionally fold select i1 into and/or

define dso_local void @main(i16* %in, i32* %out, i32 %a) local_unnamed_addr {
entry:
  %0 = load i16, i16* %in, align 1
  %conv456 = sext i16 %0 to i32
  %cmp468 = icmp sgt i16 %0, 0
  %broadcast.splatinsert2007 = insertelement <4 x i1> poison, i1 %cmp468, i32 0
  %broadcast.splat2008 = shufflevector <4 x i1> %broadcast.splatinsert2007, <4 x i1> poison, <4 x i32> zeroinitializer
  %broadcast.splatinsert2009 = insertelement <4 x i32> poison, i32 %conv456, i32 0
  %broadcast.splat2010 = shufflevector <4 x i32> %broadcast.splatinsert2009, <4 x i32> poison, <4 x i32> zeroinitializer
  %broadcast.splatinsert2005 = insertelement <4 x i32> poison, i32 %a, i32 0
  %broadcast.splat2006 = shufflevector <4 x i32> %broadcast.splatinsert2005, <4 x i32> poison, <4 x i32> zeroinitializer
  %1 = add <4 x i32> %broadcast.splat2006, <i32 1, i32 1, i32 1, i32 1>
  %2 = select <4 x i1> zeroinitializer, <4 x i32> zeroinitializer, <4 x i32> %1
  %3 = icmp slt <4 x i32> %2, zeroinitializer
  %4 = and <4 x i1> %broadcast.splat2008, %3
  %5 = add nsw <4 x i32> %2, <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
  %6 = icmp slt <4 x i32> %5, %broadcast.splat2010
  %7 = select <4 x i1> %4, <4 x i1> %6, <4 x i1> zeroinitializer
  %8 = sub nsw <4 x i32> %broadcast.splat2010, %2
  %9 = select <4 x i1> zeroinitializer, <4 x i32> zeroinitializer, <4 x i32> %8
  %10 = xor <4 x i1> %7, <i1 true, i1 true, i1 true, i1 true>
  %predphi = select <4 x i1> %10, <4 x i32> %9, <4 x i32> zeroinitializer
  %11 = extractelement <4 x i32> %predphi, i32 1
  store i32 %11, i32* %out, align 1
  ret void
}
uabelho added a subscriber: uabelho.Apr 7 2021, 5:11 AM
spatel added a comment.Apr 7 2021, 5:51 AM

Apart from being unsafe, this pair of transforms also goes against the usual rule of not creating more instructions than we started with:

// select a, false, b -> select !a, b, false
if (match(TrueVal, m_Zero())) {
  Value *NotCond = Builder.CreateNot(CondVal, "not." + CondVal->getName());
  return SelectInst::Create(NotCond, FalseVal, Zero);
}
// select a, b, true -> select !a, true, b
if (match(FalseVal, m_One())) {
  Value *NotCond = Builder.CreateNot(CondVal, "not." + CondVal->getName());
  return SelectInst::Create(NotCond, One, TrueVal);
}

I think we should just delete those (either with or without reverting this patch). We can add a safe transform for "select i1 %x, i1 0, i1 1 --> xor i1 %x, -1" to avoid a regression:
https://alive2.llvm.org/ce/z/Bhdy-Y

nikic added a comment.Apr 7 2021, 6:24 AM

@spatel These are important canonicalization transforms, that allow us to treat logical and/or the same way as and/or in many other passes. There is shouldAvoidAbsorbingNotIntoSelect() to guard against infinite loops, and probably it needs to be applied in more places.

spatel added a comment.Apr 7 2021, 8:46 AM

@spatel These are important canonicalization transforms, that allow us to treat logical and/or the same way as and/or in many other passes. There is shouldAvoidAbsorbingNotIntoSelect() to guard against infinite loops, and probably it needs to be applied in more places.

Ah - I stepped into the example here for a closer look. It's a vector partial-undef problem (conflicting with a demanded elements transform). I think we can avoid it by just making sure we're matching a true zero or one constant (no undef/poison elements).

Here's what I have the test down to:

define i32 @main(<2 x i1> %a, <2 x i1> %b, <2 x i32> %x, <2 x i32> %y) {
  %t5 = add nsw <2 x i32> %y, <i32 2147483647, i32 2147483647>
  %t6 = icmp slt <2 x i32> %t5, %x
  %ab = and <2 x i1> %a, %b
  %t7 = select <2 x i1> %ab, <2 x i1> %t6, <2 x i1> <i1 0, i1 poison>
  %t10 = xor <2 x i1> %t7, <i1 true, i1 poison>
  %p = select <2 x i1> %t10, <2 x i32> zeroinitializer, <2 x i32> %y
  %t11 = extractelement <2 x i32> %p, i32 0
  ret i32 %t11
}

There hasn't been any performance regression reported (modulo the instruction cost issue, the patch of which was swiftly landed); maybe it is time to remove the folding.
I'll make the patch in a day or two.