Page MenuHomePhabricator

[InstCombine] Fully disable select to and/or i1 folding
ClosedPublic

Authored by aqjune on Apr 23 2021, 12:00 PM.

Details

Summary

This is a patch that disables the poison-unsafe select -> and/or i1 folding.

It has been blocking D72396 and also has been the source of a few miscompilations
described in llvm.org/pr49688 .
D99674 conditionally blocked this folding and successfully fixed the latter one.
The former one was still blocked, and this patch addresses it.

Note that a few test functions that has _logical suffix are now deoptimized.
These are created by @nikic to check the impact of disabling this optimization
by copying existing original functions and replacing and/or with select.

I can see that most of these are poison-unsafe; they can be revived by introducing
freeze instruction. I left comments at fcmp + select optimizations (or-fcmp.ll, and-fcmp.ll)
because I think they are good targets for freeze fix.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
aqjune added inline comments.Apr 27 2021, 9:39 AM
llvm/test/Transforms/InstCombine/logical-select.ll
385

It is splitted into D101375

llvm/test/Transforms/PhaseOrdering/unsigned-multiply-overflow-check.ll
20

I promised to do the patch at D82317, but these days I'm occupied with other things, so it might not be a recent future (not in a month at least)...

I think it is a good chance to use freeze here: We can add

%cond = icmp ne %x, 0
%v = call @llvm.umul.with.overflow(%x, %y)
%ov = extractvalue %v, 1
%res = select i1 %cond, %ov, false
  =>
%y.fr = freeze %y
%v = call @llvm.umul.with.overflow(%x, %y.fr)
%ov = extractvalue %v, 1
%res = %ov

into CodeGenPrepare.
What do you think? CodeGenPrepare is already using freeze for a few transformations.

aqjune added inline comments.Apr 27 2021, 9:46 AM
llvm/test/Transforms/PhaseOrdering/unsigned-multiply-overflow-check.ll
20
spatel added inline comments.Apr 27 2021, 12:12 PM
llvm/test/Transforms/PhaseOrdering/unsigned-multiply-overflow-check.ll
20

I don't think we want to add code to CGP if we can avoid it. CGP is supposed to be a temporary hack that is not needed with GlobalISel. I do acknowledge that "temporary" in the code may outlive the people working on it today (!).

If we don't care about undef correctness in codegen (in other words, the select->and transform will live on in codegen forever), then we might as well add a DAG combine. Alternatively, are we comfortable creating freeze in instcombine for rare code like umul.with.ov?

nikic added inline comments.Apr 27 2021, 12:45 PM
llvm/test/Transforms/PhaseOrdering/unsigned-multiply-overflow-check.ll
20

I think this is one of the rare cases where inserting Freeze in InstCombine makes sense. There's not much (any?) further optimization opportunities for a umul.with.overflow with non-constant operands.

aqjune added inline comments.Apr 27 2021, 5:22 PM
llvm/test/Transforms/PhaseOrdering/unsigned-multiply-overflow-check.ll
20

InstCombine sounds good as well!

spatel added inline comments.Apr 30 2021, 5:40 AM
llvm/test/Transforms/PhaseOrdering/unsigned-multiply-overflow-check.ll
20

For reference, that part is D101423, and it will be a preliminary patch before this one.

aqjune edited the summary of this revision. (Show Details)May 1 2021, 8:44 PM
aqjune updated this revision to Diff 342203.May 1 2021, 9:06 PM

ctpop is splitted out

aqjune updated this revision to Diff 342207.May 1 2021, 9:20 PM

rebase to reduce diffs

aqjune updated this revision to Diff 342214.May 1 2021, 9:41 PM

remove the noundef comment

aqjune added a comment.May 1 2021, 9:58 PM

I found that there are a few more patterns that can be salvaged. Will create a patch for them.

llvm/test/Transforms/InstCombine/and.ll
898

This can be resurrected.

1033

This can be resurrected as well.

llvm/test/Transforms/InstCombine/and2.ll
51

This one too

llvm/test/Transforms/PGOProfile/chr.ll
1374

This one as well

aqjune added a comment.May 2 2021, 4:29 AM

I made D101720 to cover the remaining cases except Transforms/InstCombine/and2.ll.
Supporting and2.ll doesn't seem super-straightforward, but maybe some minor tweaks can make it.

aqjune added a comment.May 2 2021, 4:30 AM

(BTW, if the patch is reviewed, then I believe we are finally ready to land this patch.)

nikic added a comment.May 3 2021, 2:05 PM

Went through all the tests again, looks like there's three patterns that could still be handled. I think the last one of them should be handled, and for the other two it's fine to just open an issue.

llvm/test/Transforms/InstCombine/and2.ll
51

Looks like we still miss this one (https://alive2.llvm.org/ce/z/E_F768).

llvm/test/Transforms/InstCombine/logical-select-inseltpoison.ll
450

Looks like this could be salvaged: https://alive2.llvm.org/ce/z/YQUQew

Doesn't seem important though.

llvm/test/Transforms/InstCombine/or.ll
1099

Can be salvaged: https://alive2.llvm.org/ce/z/-STJ2d

I thought this was already covered by one of the folds though...

aqjune added inline comments.May 3 2021, 6:18 PM
llvm/test/Transforms/InstCombine/or.ll
1099

No, it didn't exist.
I think there are too many possible combinations here. It should be reduced into more compact code someday...
Made a patch here: D101807

aqjune updated this revision to Diff 342624.May 3 2021, 7:26 PM

Rebase onto D101807

aqjune added inline comments.May 3 2021, 7:31 PM
llvm/test/Transforms/InstCombine/or.ll
1133

This can be salvaged as well:
https://alive2.llvm.org/ce/z/yXF96T

But I think there are more patterns that are missing. I'll leave them as missing optimization opportunities at bugzilla after this patch is reviewed.

I think this patch is ready to go: running the test-suite on an ARM machine didn't complain anything.

Well, but one thing that I'm concerned about is that from tomorrow I'll not be online for about three weeks. :(

I'd like to find someone who reverts this patch if there is any serious problem.

llvm/test/Transforms/InstCombine/or.ll
1133

Addressed via D101801

nikic accepted this revision.May 5 2021, 12:39 AM

LGTM. I can take care of reverting if there are issues.

This revision is now accepted and ready to land.May 5 2021, 12:39 AM
This revision was automatically updated to reflect the committed changes.
aqjune added a comment.May 5 2021, 5:39 PM

LGTM. I can take care of reverting if there are issues.

Thanks!

Hi, we've got a ~6% regression in SPEC INT 2006 462.libquantum on AArch64 (both -flto and -Ofast) that comes back to this change. See here for a reproducer https://godbolt.org/z/dq98Gqqxn (-fno-vectorize is not strictly necessary, but it does make the difference easier to spot). @dmgreen mentioned to me that we could probably fix this up in the AArch64 backend, but a fix in the mid-end might be more generally useful too.

nikic added a comment.May 11 2021, 9:46 AM

Hi, we've got a ~6% regression in SPEC INT 2006 462.libquantum on AArch64 (both -flto and -Ofast) that comes back to this change. See here for a reproducer https://godbolt.org/z/dq98Gqqxn (-fno-vectorize is not strictly necessary, but it does make the difference easier to spot). @dmgreen mentioned to me that we could probably fix this up in the AArch64 backend, but a fix in the mid-end might be more generally useful too.

Thanks for the report! This looks like the "one hot merge" optimization, which is indeed disabled by this patch, because it is incorrect in the current form (onehot_merge.ll). We can bring back this optimization in InstCombine, but need to freeze one of the operands (c2 in your example).

fhahn added a subscriber: fhahn.May 25 2021, 1:24 AM

Looks like this is causing an infinite loop in instcombine: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34661

Reproducer test case: https://oss-fuzz.com/download?testcase_id=6383789942112256 , hangs with opt -instcombine, terminates with the patch reverted.

Looks like this is causing an infinite loop in instcombine: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34661

This seems likely to be another partial undef/poison vector problem ( 1894c6c59e ). I'll take a look.

Looks like this is causing an infinite loop in instcombine: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34661

This seems likely to be another partial undef/poison vector problem ( 1894c6c59e ). I'll take a look.

The fuzzer test can be reduced to a single line. I committed a minimal fix here:
ae1bc9ebf3
...but some follow-up seems necessary.

FYI, I'm seeing what I think is a miscompile that bisects to this patch. Greatly simplified, the problematic snippet is this:

struct Stats {
  int a;
  int b;
  int a_or_b;
};

bool x = ...
bool y = ...
Stats stats;
stats.a = x ? 1 : 0;
stats.b = y ? 1 : 0;
stats.a_or_b = (x || y) ? 1 : 0;

What we see is that when x is false and y is true, a is 0 (expected), b is 1 (expected), but a_or_b is 0 (unexpected).

The issue I'm seeing seems more directly caused by SLP vectorization, as it goes away with -fno-slp-vectorize. This patch merely unblocks that bad optimization AFAICT.

The issue I'm seeing seems more directly caused by SLP vectorization, as it goes away with -fno-slp-vectorize. This patch merely unblocks that bad optimization AFAICT.

Filed as http://llvm.org/PR50500

The issue I'm seeing seems more directly caused by SLP vectorization, as it goes away with -fno-slp-vectorize. This patch merely unblocks that bad optimization AFAICT.

Filed as http://llvm.org/PR50500

We needed SLP to get to the problem state in that example, but the bug was in instcombine/instsimplify.
Should be fixed with:
7bb8bfa0622b

The issue I'm seeing seems more directly caused by SLP vectorization, as it goes away with -fno-slp-vectorize. This patch merely unblocks that bad optimization AFAICT.

Filed as http://llvm.org/PR50500

We needed SLP to get to the problem state in that example, but the bug was in instcombine/instsimplify.
Should be fixed with:
7bb8bfa0622b

Thanks Sanjay! This fixes the unreduced miscompile we're seeing.