This is an archive of the discontinued LLVM Phabricator instance.

[NARY-REASSOCIATE] Support reassociation of min/max
ClosedPublic

Authored by ebrevnov on Sep 25 2020, 1:56 AM.

Details

Summary

Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available.

Diff Detail

Event Timeline

ebrevnov created this revision.Sep 25 2020, 1:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 25 2020, 1:56 AM
ebrevnov requested review of this revision.Sep 25 2020, 1:56 AM
mkazantsev added inline comments.Oct 29 2020, 9:54 PM
llvm/lib/Transforms/Scalar/NaryReassociate.cpp
334

Can we factor out code of if(match) { ... } into a lambda that takes a matcher and returns instruction and avoid this copy-paste?

586

The result of tryReassociateMinOrMax is only used as instruction, and non-instruction values are discarded by dyn_cast_or_null. Maybe change API to return instruction from here?

615

findClosestMatchingDominator returns an Instruction *, do we really need to downcast it do Value *

629

Does it make more sense to name it ".reassociate" instead?

llvm/test/Transforms/NaryReassociate/nary-smax.ll
1

Please commit these tests with auto-generated checks without your patch and rebase on top of it to see what your patch changes.

mkazantsev added inline comments.Oct 29 2020, 9:56 PM
llvm/lib/Transforms/Scalar/NaryReassociate.cpp
586

Never mind, I think what you have is fine (I didn't notice it's a mutator).

ebrevnov added inline comments.Dec 2 2020, 2:39 AM
llvm/lib/Transforms/Scalar/NaryReassociate.cpp
334

Not sure how many lines of the code it will save.... let me try that... will see if it's any better

615

I don't think it will make any difference in this particular case. No problems to change to Instruction*

629

In addition to NaryReassociate there is Ressociate pass. I want the name to clearly point to NaryReassociate pass. Using "reassociate" seems too long for me...

llvm/test/Transforms/NaryReassociate/nary-smax.ll
1

Ok

mkazantsev added inline comments.Dec 4 2020, 4:10 AM
llvm/lib/Transforms/Scalar/NaryReassociate.cpp
334

I still think it would be useful, these lines copy-paste over and over.

llvm/test/Transforms/NaryReassociate/nary-smax.ll
1

Still actual.

ebrevnov updated this revision to Diff 326312.Feb 25 2021, 12:31 AM

Addressed comments

mkazantsev accepted this revision.Feb 25 2021, 3:04 AM

LGTM

llvm/lib/Transforms/Scalar/NaryReassociate.cpp
315

{ } not needed

This revision is now accepted and ready to land.Feb 25 2021, 3:04 AM
ebrevnov added inline comments.Feb 25 2021, 3:13 AM
llvm/lib/Transforms/Scalar/NaryReassociate.cpp
315

will remove before commit

This revision was landed with ongoing or failed builds.Feb 25 2021, 3:23 AM
This revision was automatically updated to reflect the committed changes.
RKSimon added inline comments.
llvm/lib/Transforms/Scalar/NaryReassociate.cpp
595

@ebrevnov uint is an unknown type on most targets - breaking buiilds - just use int ?

nlopes added a subscriber: nlopes.Feb 25 2021, 8:39 AM

This patch regressed the following tests:

  • LLVM :: Transforms/NaryReassociate/nary-smax.ll
  • LLVM :: Transforms/NaryReassociate/nary-smin.ll
  • LLVM :: Transforms/NaryReassociate/nary-umax.ll
  • LLVM :: Transforms/NaryReassociate/nary-umin.ll

The reason is that it doesn't account for undef values. Example:

----------------------------------------
define i32 @smax_test1(i32 %a, i32 %b, i32 %c) {
%0:
  %c1 = icmp sgt i32 %a, %b
  %smax1 = select i1 %c1, i32 %a, i32 %b
  %c2 = icmp sgt i32 %b, %c
  %smax2 = select i1 %c2, i32 %b, i32 %c
  %c3 = icmp sgt i32 %smax2, %a
  %smax3 = select i1 %c3, i32 %smax2, i32 %a
  %res = add i32 %smax1, %smax3
  ret i32 %res
}
=>
define i32 @smax_test1(i32 %a, i32 %b, i32 %c) {
%0:
  %c1 = icmp sgt i32 %a, %b
  %smax1 = select i1 %c1, i32 %a, i32 %b
  %1 = icmp sgt i32 %smax1, %c
  %smax3.nary = select i1 %1, i32 %smax1, i32 %c
  %res = add i32 %smax1, %smax3.nary
  ret i32 %res
}
Transformation doesn't verify!
ERROR: Target's return value is more undefined

Example:
i32 %a = #x7fffffff (2147483647)
i32 %b = #x00000000 (0)
i32 %c = undef

Source:
i1 %c1 = #x1 (1)
i32 %smax1 = #x7fffffff (2147483647)
i1 %c2 = any
i32 %smax2 = any
i1 %c3 = #x0 (0)
i32 %smax3 = #x7fffffff (2147483647)
i32 %res = #xfffffffe (4294967294, -2)

Target:
i1 %c1 = #x1 (1)
i32 %smax1 = #x7fffffff (2147483647)
i1 %1 = #x0 (0)
i32 %smax3.nary = #x03002006 (50339846)
i32 %res = #x83002005 (2197823493, -2097143803)
Source value: #xfffffffe (4294967294, -2)
Target value: #x83002005 (2197823493, -2097143803)

I think this is an issue of verification itself. In the first case max(0, undef)=>any and max(any, max_int)=>max_int. In the second case max(max_int, undef)=>x03002006. I believe the behavior of the verifier is inconsistent in these two cases and max(max_int, undef) should be evaluated to max_int as well. We can do the following trivial transformations to prove that: max(max_int, undef) is trivially equal to max(max_int, max(undef, undef)) and max(undef, undef) should be evaluated to 'any' since max(0, undef) is evaluated to 'any' in the first case. Thus we get max(max_int, any) which is evaluated to 'max_int' in the first case. So max(max_int, undef) should be evaluated to 'max_int' but not 'x03002006'.

Makes sense?

I think this is an issue of verification itself. In the first case max(0, undef)=>any and max(any, max_int)=>max_int. In the second case max(max_int, undef)=>x03002006. I believe the behavior of the verifier is inconsistent in these two cases and max(max_int, undef) should be evaluated to max_int as well. We can do the following trivial transformations to prove that: max(max_int, undef) is trivially equal to max(max_int, max(undef, undef)) and max(undef, undef) should be evaluated to 'any' since max(0, undef) is evaluated to 'any' in the first case. Thus we get max(max_int, any) which is evaluated to 'max_int' in the first case. So max(max_int, undef) should be evaluated to 'max_int' but not 'x03002006'.

Makes sense?

Note that given

%a = undef
%b = %a

, %a and %b have undefined values, and there are no guarantees that they are equal/not equal.
Since you emitted icmp+select, you 'read' from undefined %c twice, and you are free to get different result each time.

I think this is an issue of verification itself. In the first case max(0, undef)=>any and max(any, max_int)=>max_int. In the second case max(max_int, undef)=>x03002006. I believe the behavior of the verifier is inconsistent in these two cases and max(max_int, undef) should be evaluated to max_int as well. We can do the following trivial transformations to prove that: max(max_int, undef) is trivially equal to max(max_int, max(undef, undef)) and max(undef, undef) should be evaluated to 'any' since max(0, undef) is evaluated to 'any' in the first case. Thus we get max(max_int, any) which is evaluated to 'max_int' in the first case. So max(max_int, undef) should be evaluated to 'max_int' but not 'x03002006'.

Makes sense?

Note that given

%a = undef
%b = %a

, %a and %b have undefined values, and there are no guarantees that they are equal/not equal.
Since you emitted icmp+select, you 'read' from undefined %c twice, and you are free to get different result each time.

I must be missing something but I don't see how that applies to the above case. I think the problem is not connected with evaluation of %c twice (to different values).
I think the problem is that in the second case "%1 = icmp sgt i32 %smax1, %c" was evaluated to 'false' even though 'smax1' is known to be max_int. But if we replace "undef" with "any" like in the first case it is evaluated to max_int....

I think I understand the problem now. 'any'>max_int is known to be false (first case), while max_int >'any' is unknown (second case)....so this is not verification issue....

I think this is an issue of verification itself. In the first case max(0, undef)=>any and max(any, max_int)=>max_int. In the second case max(max_int, undef)=>x03002006. I believe the behavior of the verifier is inconsistent in these two cases and max(max_int, undef) should be evaluated to max_int as well. We can do the following trivial transformations to prove that: max(max_int, undef) is trivially equal to max(max_int, max(undef, undef)) and max(undef, undef) should be evaluated to 'any' since max(0, undef) is evaluated to 'any' in the first case. Thus we get max(max_int, any) which is evaluated to 'max_int' in the first case. So max(max_int, undef) should be evaluated to 'max_int' but not 'x03002006'.

Makes sense?

Just to add to what Roman wrote, thinking of the code as max(,) is misleading. The code is doing icmp sgt INT_MAX, undef which can evaluate to true or false. But we cannot assume that undef from now on is equal to INT_MAX just because the comparison evaluated to true.
The are 2 possible fixes:

  1. only do the optimization if %c is known non-undef (use ValueTracking's utility), e.g., https://alive2.llvm.org/ce/z/sLeOE0
  2. freeze %c, e.g., https://alive2.llvm.org/ce/z/ZX557G

I think this is an issue of verification itself. In the first case max(0, undef)=>any and max(any, max_int)=>max_int. In the second case max(max_int, undef)=>x03002006. I believe the behavior of the verifier is inconsistent in these two cases and max(max_int, undef) should be evaluated to max_int as well. We can do the following trivial transformations to prove that: max(max_int, undef) is trivially equal to max(max_int, max(undef, undef)) and max(undef, undef) should be evaluated to 'any' since max(0, undef) is evaluated to 'any' in the first case. Thus we get max(max_int, any) which is evaluated to 'max_int' in the first case. So max(max_int, undef) should be evaluated to 'max_int' but not 'x03002006'.

Makes sense?

Just to add to what Roman wrote, thinking of the code as max(,) is misleading. The code is doing icmp sgt INT_MAX, undef which can evaluate to true or false. But we cannot assume that undef from now on is equal to INT_MAX just because the comparison evaluated to true.
The are 2 possible fixes:

  1. only do the optimization if %c is known non-undef (use ValueTracking's utility), e.g., https://alive2.llvm.org/ce/z/sLeOE0
  2. freeze %c, e.g., https://alive2.llvm.org/ce/z/ZX557G

0. fix SCEV's https://github.com/llvm/llvm-project/blob/00fe10c6a65173c9c578babd19f8fee44d07a761/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L1788-L1789 to emit proper intrinsics?

Just to add to what Roman wrote, thinking of the code as max(,) is misleading. The code is doing icmp sgt INT_MAX, undef which can evaluate to true or false. But we cannot assume that undef from now on is equal to INT_MAX just because the comparison evaluated to true.

Sure "undef" can be evaluated to any value each time. That's why in the first test case "%c2 = icmp sgt i32 %b, %c" is evaluated to 'any'. I don't see where 'undef' is assumed to be always equal to INT_MAX. Let me some time to think on possible fix. If I can't find solution quickly I will revert. Thank you.

I think the solution is to use >= instead of > when we do min/max reassociation. In other words, originally we had 'any' > MAX_INT which is known to be false. If we want semantically equal but reassociated expression we should invert the comparison logic. In other words we should check MAX_INT >= 'any' which is known to be true and MAX_INT will be selected.

I think the solution is to use >= instead of > when we do min/max reassociation. In other words, originally we had 'any' > MAX_INT which is known to be false. If we want semantically equal but reassociated expression we should invert the comparison logic. In other words we should check MAX_INT >= 'any' which is known to be true and MAX_INT will be selected.

That's clever yes!
Alive2 says it's correct: https://alive2.llvm.org/ce/z/gieHpn

Current: https://alive2.llvm.org/ce/z/x8NBH8

I think the solution is to use >= instead of > when we do min/max reassociation. In other words, originally we had 'any' > MAX_INT which is known to be false. If we want semantically equal but reassociated expression we should invert the comparison logic. In other words we should check MAX_INT >= 'any' which is known to be true and MAX_INT will be selected.

No.

Like i have already said, the fix is https://alive2.llvm.org/ce/z/RkBWxC.
Let me just fix this then.

No.

Like i have already said, the fix is https://alive2.llvm.org/ce/z/RkBWxC.
Let me just fix this then.

This is because you transform 'sgt,select" to smax which are not semantically equal due to the same reason
https://alive2.llvm.org/ce/z/zk_RrZ

I think it's better to revert it for now. Looks like the topic is not that trivial and we should first agree on the fix. There is no need to hurry.

commit 13a5cac2ba919b4d02a296428b58919231e08569 (HEAD -> main, origin/main)
Author: Evgeniy Brevnov <ybrevnov@azul.com>
Date: Fri Feb 26 19:23:32 2021 +0700

Revert "[NARY-REASSOCIATE] Support reassociation of min/max"

This reverts commit 83d134c3c4222e8b8d3d90c099f749a3b3abc8e0
ebrevnov reopened this revision.Mar 2 2021, 12:18 AM
This revision is now accepted and ready to land.Mar 2 2021, 12:18 AM
ebrevnov updated this revision to Diff 327370.Mar 2 2021, 12:18 AM

Fix found verification issue

lebedev.ri requested changes to this revision.Mar 2 2021, 12:20 AM

I don't think we should reinvent SCEV Expander.
I was looking at fixing this properly, but got sidetracked by having to first fix the isHighCostExpansion().

This revision now requires changes to proceed.Mar 2 2021, 12:20 AM

I don't think we should reinvent SCEV Expander.
I was looking at fixing this properly, but got sidetracked by having to first fix the isHighCostExpansion().

I tried to adjust SCEV Expander but it's pretty challenging by itself. The reason is it should support not only integer types but floating point and pointer types as well. For example, for floating point types changing "cmp" and "select" pair to related intrinsic may potentially change the semantics for corner cases like NANs, INFs, exceptions, etc...

I'm not saying that enhancing SCEV Expander is definitely wrong way to go. It's pretty challenging and can potentially take long time. I think these two should not be mixed. Once SCEV Expander is reworked it will be a trivial change in reassociation pass to employ it.

I don't think we should reinvent SCEV Expander.
I was looking at fixing this properly, but got sidetracked by having to first fix the isHighCostExpansion().

I tried to adjust SCEV Expander but it's pretty challenging by itself.

The reason is it should support not only integer types but floating point and pointer types as well. For example, for floating point types changing "cmp" and "select" pair to related intrinsic may potentially change the semantics for corner cases like NANs, INFs, exceptions, etc...

Uhm, no? SCEV only supports integers and pointers.

I'm not saying that enhancing SCEV Expander is definitely wrong way to go. It's pretty challenging and can potentially take long time. I think these two should not be mixed. Once SCEV Expander is reworked it will be a trivial change in reassociation pass to employ it.

Uhm, no? SCEV only supports integers and pointers.

I looked at the failing list one more time and indeed it was a pointer to double case.... In total, there are 54 failing tests which needs to be investigated... Roman, Are you working in this direction?
SCEV Expander should be fixed separately anyway and I don't think there is strong dependence between these pieces.

Uhm, no? SCEV only supports integers and pointers.

I looked at the failing list one more time and indeed it was a pointer to double case.... In total, there are 54 failing tests which needs to be investigated...

I actually think this patch should not have been reverted in the first place.
SCEV is known to be not good with undef, i don't really see why we should start blocking on that now.
So i would personally recommend to directly revert your revert.

Roman, Are you working in this direction?
SCEV Expander should be fixed separately anyway and I don't think there is strong dependence between these pieces.

Right now i'm not looking at that, but i already had a patch to fix this,
but got sidetracked into isHighCostExpansion(), which then resulted in rage-closing IDE...
Right now i'm trying to deal with another SCEV issue.

I actually think this patch should not have been reverted in the first place.
SCEV is known to be not good with undef, i don't really see why we should start blocking on that now.
So i would personally recommend to directly revert your revert.

If I'm getting it right you are OK to commit the patch in its current state. If this is the case please unblock it.

I actually think this patch should not have been reverted in the first place.
SCEV is known to be not good with undef, i don't really see why we should start blocking on that now.
So i would personally recommend to directly revert your revert.

If I'm getting it right you are OK to commit the patch in its current state. If this is the case please unblock it.

Sure can do, but i don't want to unblock/accept *this* current code state, so please upload the result of git revert.

Hi Roman, I'm somewhat confused too. If we just revert the revert, we introduce undef value where was no undef (in fact, we create poison where it wasn't present). Do we have plans to fix SCEV expander's behavior with undefs in near term? If no, reverting the revert will simply introduce the bug. I'd rather stick to the current solution, saying that "ok, SCEVExpander is broken, let's not use it". We can add a FIXME to replace this with expander in the future, but I don't understand why would we want to use it despite it's buggy.

Fixed SCEVExpaned in b46c085d2b6d15873fb53718f0a70b3848e19e4a, please rebase/update this patch accordingly.

But, i think we may have a problem still.
Does this patch intend to reassociate only integers, or pointers too? :)

ebrevnov updated this revision to Diff 329868.Mar 11 2021, 1:06 AM

Stepped back to use SCEVExpander

Fixed SCEVExpaned in b46c085d2b6d15873fb53718f0a70b3848e19e4a, please rebase/update this patch accordingly.

But, i think we may have a problem still.
Does this patch intend to reassociate only integers, or pointers too? :)

Updated to use SCEVExpander and limited to integer types only.

lebedev.ri accepted this revision.Mar 11 2021, 1:20 AM
lebedev.ri added subscribers: nikic, spatel.

Please do add an precommit tests for pointers (unless i just missed them)

LG

Fixed SCEVExpaned in b46c085d2b6d15873fb53718f0a70b3848e19e4a, please rebase/update this patch accordingly.

But, i think we may have a problem still.
Does this patch intend to reassociate only integers, or pointers too? :)

Updated to use SCEVExpander and limited to integer types only.

So i guess we may want to extend those intrinsics to support pointer types after all... @spatel @nikic

llvm/lib/Transforms/Scalar/NaryReassociate.cpp
284–289

Early return may be cleaner?

628
This revision is now accepted and ready to land.Mar 11 2021, 1:20 AM

This commit may cause multiple CI failures on TensorFlow/NVPTX backend. See https://github.com/tensorflow/tensorflow/commit/b1758bd553dfc2ebbfd07eec01d1e3254eda25b8#commitcomment-48080097.

I am trying to find a minimal reproducible test case here.

This commit may cause multiple CI failures on TensorFlow/NVPTX backend. See https://github.com/tensorflow/tensorflow/commit/b1758bd553dfc2ebbfd07eec01d1e3254eda25b8#commitcomment-48080097.

I am trying to find a minimal reproducible test case here.

According to bisect log the guilty commit is "first bad commit: [b46c085d2b6d15873fb53718f0a70b3848e19e4a] [NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions". Am I missing something?

This commit may cause multiple CI failures on TensorFlow/NVPTX backend. See https://github.com/tensorflow/tensorflow/commit/b1758bd553dfc2ebbfd07eec01d1e3254eda25b8#commitcomment-48080097.

I am trying to find a minimal reproducible test case here.

According to bisect log the guilty commit is "first bad commit: [b46c085d2b6d15873fb53718f0a70b3848e19e4a] [NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions". Am I missing something?

Right now this seems like a downstream problem. I've yet to see a reproducer.

This commit may cause multiple CI failures on TensorFlow/NVPTX backend. See https://github.com/tensorflow/tensorflow/commit/b1758bd553dfc2ebbfd07eec01d1e3254eda25b8#commitcomment-48080097.

I am trying to find a minimal reproducible test case here.

According to bisect log the guilty commit is "first bad commit: [b46c085d2b6d15873fb53718f0a70b3848e19e4a] [NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions". Am I missing something?

Right now this seems like a downstream problem. I've yet to see a reproducer.

Sorry, posted to the wrong venue. Downstream seems to get a workaround and I will test again and get back to you.

This revision was landed with ongoing or failed builds.Apr 2 2021, 1:30 AM
This revision was automatically updated to reflect the committed changes.
critson added a subscriber: critson.Apr 6 2021, 7:30 PM

Raising this here as well, as it seems the previous concern I raised with the commit was ignored.

I am seeing this change go into an infinite loop attempting to example an expression, adding more and more .nary postfixes.
I have reduced the failure to the attached bugpoint.

llc -march=amdgcn -mcpu=gfx700 < bugpoint-reduced-simplified.ll

This comment was removed by ebrevnov.

Raising this here as well, as it seems the previous concern I raised with the commit was ignored.

Thanks for letting me know. Hadn't seen this before. I can reproduce the issue and working on the fix.

I am seeing this change go into an infinite loop attempting to example an expression, adding more and more .nary postfixes.
I have reduced the failure to the attached bugpoint.

llc -march=amdgcn -mcpu=gfx700 < bugpoint-reduced-simplified.ll

Raising this here as well, as it seems the previous concern I raised with the commit was ignored.

Thanks for letting me know. Hadn't seen this before. I can reproduce the issue and working on the fix.

I am seeing this change go into an infinite loop attempting to example an expression, adding more and more .nary postfixes.
I have reduced the failure to the attached bugpoint.

llc -march=amdgcn -mcpu=gfx700 < bugpoint-reduced-simplified.ll

Fix has been submitted for review https://reviews.llvm.org/D100170

spatel added a subscriber: Hipony.EditedOct 18 2021, 10:24 AM

cc @Hipony
This patch causes an infinite loop with the following example and "opt -nary-reassociate":

define i32 @nary_infinite_loop_minmax(i32 %d0, i32 %d1, i32 %d2, i32 %d3) {
  %cmp0 = icmp slt i32 %d2, %d1
  %sel0 = select i1 %cmp0, i32 %d1, i32 %d2

  %cmp1 = icmp slt i32 %d3, %d0
  %sel1 = select i1 %cmp1, i32 %d0, i32 %d3

  %cmp2 = icmp slt i32 %sel1, %sel0
  %sel2 = select i1 %cmp2, i32 %sel1, i32 %sel0

  %cmp3 = icmp slt i32 %d3, %d0
  %sel3 = select i1 %cmp3, i32 %d0, i32 %d3

  %cmp4 = icmp slt i32 %sel3, %d2
  %sel4 = select i1 %cmp4, i32 %d2, i32 %sel3

  %cmp5 = icmp slt i32 %sel4, %d1
  %sel5 = select i1 %cmp5, i32 %d1, i32 %sel4
  ret i32 %sel5
}

There was another infinite loop example attached to the commit message for this patch - https://reviews.llvm.org/rG83d134c3c4222e8b8d3d90c099f749a3b3abc8e0
I'm not sure if that is the same root cause, but I don't think there was any reply to that failure.

There was another infinite loop example attached to the commit message for this patch - https://reviews.llvm.org/rG83d134c3c4222e8b8d3d90c099f749a3b3abc8e0
I'm not sure if that is the same root cause, but I don't think there was any reply to that failure.

This must be another problem. Previously reported issue has been fixed here https://reviews.llvm.org/rG36b932d6a385bb97efe17818a7a47d29d2d8acf3.
Working on the fix...

spatel removed a subscriber: Hipony.Oct 19 2021, 9:10 AM