Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/Scalar/NaryReassociate.cpp | ||
---|---|---|
319 | Can we factor out code of if(match) { ... } into a lambda that takes a matcher and returns instruction and avoid this copy-paste? | |
606 | The result of tryReassociateMinOrMax is only used as instruction, and non-instruction values are discarded by dyn_cast_or_null. Maybe change API to return instruction from here? | |
635 | findClosestMatchingDominator returns an Instruction *, do we really need to downcast it do Value * | |
649 | Does it make more sense to name it ".reassociate" instead? | |
llvm/test/Transforms/NaryReassociate/nary-smax.ll | ||
1 | Please commit these tests with auto-generated checks without your patch and rebase on top of it to see what your patch changes. |
llvm/lib/Transforms/Scalar/NaryReassociate.cpp | ||
---|---|---|
606 | Never mind, I think what you have is fine (I didn't notice it's a mutator). |
llvm/lib/Transforms/Scalar/NaryReassociate.cpp | ||
---|---|---|
319 | Not sure how many lines of the code it will save.... let me try that... will see if it's any better | |
635 | I don't think it will make any difference in this particular case. No problems to change to Instruction* | |
649 | In addition to NaryReassociate there is Ressociate pass. I want the name to clearly point to NaryReassociate pass. Using "reassociate" seems too long for me... | |
llvm/test/Transforms/NaryReassociate/nary-smax.ll | ||
1 | Ok |
llvm/lib/Transforms/Scalar/NaryReassociate.cpp | ||
---|---|---|
340 | will remove before commit |
This patch regressed the following tests:
- LLVM :: Transforms/NaryReassociate/nary-smax.ll
- LLVM :: Transforms/NaryReassociate/nary-smin.ll
- LLVM :: Transforms/NaryReassociate/nary-umax.ll
- LLVM :: Transforms/NaryReassociate/nary-umin.ll
The reason is that it doesn't account for undef values. Example:
---------------------------------------- define i32 @smax_test1(i32 %a, i32 %b, i32 %c) { %0: %c1 = icmp sgt i32 %a, %b %smax1 = select i1 %c1, i32 %a, i32 %b %c2 = icmp sgt i32 %b, %c %smax2 = select i1 %c2, i32 %b, i32 %c %c3 = icmp sgt i32 %smax2, %a %smax3 = select i1 %c3, i32 %smax2, i32 %a %res = add i32 %smax1, %smax3 ret i32 %res } => define i32 @smax_test1(i32 %a, i32 %b, i32 %c) { %0: %c1 = icmp sgt i32 %a, %b %smax1 = select i1 %c1, i32 %a, i32 %b %1 = icmp sgt i32 %smax1, %c %smax3.nary = select i1 %1, i32 %smax1, i32 %c %res = add i32 %smax1, %smax3.nary ret i32 %res } Transformation doesn't verify! ERROR: Target's return value is more undefined Example: i32 %a = #x7fffffff (2147483647) i32 %b = #x00000000 (0) i32 %c = undef Source: i1 %c1 = #x1 (1) i32 %smax1 = #x7fffffff (2147483647) i1 %c2 = any i32 %smax2 = any i1 %c3 = #x0 (0) i32 %smax3 = #x7fffffff (2147483647) i32 %res = #xfffffffe (4294967294, -2) Target: i1 %c1 = #x1 (1) i32 %smax1 = #x7fffffff (2147483647) i1 %1 = #x0 (0) i32 %smax3.nary = #x03002006 (50339846) i32 %res = #x83002005 (2197823493, -2097143803) Source value: #xfffffffe (4294967294, -2) Target value: #x83002005 (2197823493, -2097143803)
I think this is an issue of verification itself. In the first case max(0, undef)=>any and max(any, max_int)=>max_int. In the second case max(max_int, undef)=>x03002006. I believe the behavior of the verifier is inconsistent in these two cases and max(max_int, undef) should be evaluated to max_int as well. We can do the following trivial transformations to prove that: max(max_int, undef) is trivially equal to max(max_int, max(undef, undef)) and max(undef, undef) should be evaluated to 'any' since max(0, undef) is evaluated to 'any' in the first case. Thus we get max(max_int, any) which is evaluated to 'max_int' in the first case. So max(max_int, undef) should be evaluated to 'max_int' but not 'x03002006'.
Makes sense?
Note that given
%a = undef %b = %a
, %a and %b have undefined values, and there are no guarantees that they are equal/not equal.
Since you emitted icmp+select, you 'read' from undefined %c twice, and you are free to get different result each time.
I must be missing something but I don't see how that applies to the above case. I think the problem is not connected with evaluation of %c twice (to different values).
I think the problem is that in the second case "%1 = icmp sgt i32 %smax1, %c" was evaluated to 'false' even though 'smax1' is known to be max_int. But if we replace "undef" with "any" like in the first case it is evaluated to max_int....
I think I understand the problem now. 'any'>max_int is known to be false (first case), while max_int >'any' is unknown (second case)....so this is not verification issue....
Just to add to what Roman wrote, thinking of the code as max(,) is misleading. The code is doing icmp sgt INT_MAX, undef which can evaluate to true or false. But we cannot assume that undef from now on is equal to INT_MAX just because the comparison evaluated to true.
The are 2 possible fixes:
- only do the optimization if %c is known non-undef (use ValueTracking's utility), e.g., https://alive2.llvm.org/ce/z/sLeOE0
- freeze %c, e.g., https://alive2.llvm.org/ce/z/ZX557G
0. fix SCEV's https://github.com/llvm/llvm-project/blob/00fe10c6a65173c9c578babd19f8fee44d07a761/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L1788-L1789 to emit proper intrinsics?
Just to add to what Roman wrote, thinking of the code as max(,) is misleading. The code is doing icmp sgt INT_MAX, undef which can evaluate to true or false. But we cannot assume that undef from now on is equal to INT_MAX just because the comparison evaluated to true.
Sure "undef" can be evaluated to any value each time. That's why in the first test case "%c2 = icmp sgt i32 %b, %c" is evaluated to 'any'. I don't see where 'undef' is assumed to be always equal to INT_MAX. Let me some time to think on possible fix. If I can't find solution quickly I will revert. Thank you.
I think the solution is to use >= instead of > when we do min/max reassociation. In other words, originally we had 'any' > MAX_INT which is known to be false. If we want semantically equal but reassociated expression we should invert the comparison logic. In other words we should check MAX_INT >= 'any' which is known to be true and MAX_INT will be selected.
Current: https://alive2.llvm.org/ce/z/x8NBH8
Like i have already said, the fix is https://alive2.llvm.org/ce/z/RkBWxC.
Let me just fix this then.
Like i have already said, the fix is https://alive2.llvm.org/ce/z/RkBWxC.
Let me just fix this then.
This is because you transform 'sgt,select" to smax which are not semantically equal due to the same reason
https://alive2.llvm.org/ce/z/zk_RrZ
I think it's better to revert it for now. Looks like the topic is not that trivial and we should first agree on the fix. There is no need to hurry.
commit 13a5cac2ba919b4d02a296428b58919231e08569 (HEAD -> main, origin/main)
Author: Evgeniy Brevnov <ybrevnov@azul.com>
Date: Fri Feb 26 19:23:32 2021 +0700
Revert "[NARY-REASSOCIATE] Support reassociation of min/max" This reverts commit 83d134c3c4222e8b8d3d90c099f749a3b3abc8e0
I don't think we should reinvent SCEV Expander.
I was looking at fixing this properly, but got sidetracked by having to first fix the isHighCostExpansion().
I tried to adjust SCEV Expander but it's pretty challenging by itself. The reason is it should support not only integer types but floating point and pointer types as well. For example, for floating point types changing "cmp" and "select" pair to related intrinsic may potentially change the semantics for corner cases like NANs, INFs, exceptions, etc...
I'm not saying that enhancing SCEV Expander is definitely wrong way to go. It's pretty challenging and can potentially take long time. I think these two should not be mixed. Once SCEV Expander is reworked it will be a trivial change in reassociation pass to employ it.
The reason is it should support not only integer types but floating point and pointer types as well. For example, for floating point types changing "cmp" and "select" pair to related intrinsic may potentially change the semantics for corner cases like NANs, INFs, exceptions, etc...
Uhm, no? SCEV only supports integers and pointers.
I'm not saying that enhancing SCEV Expander is definitely wrong way to go. It's pretty challenging and can potentially take long time. I think these two should not be mixed. Once SCEV Expander is reworked it will be a trivial change in reassociation pass to employ it.
Uhm, no? SCEV only supports integers and pointers.
I looked at the failing list one more time and indeed it was a pointer to double case.... In total, there are 54 failing tests which needs to be investigated... Roman, Are you working in this direction?
SCEV Expander should be fixed separately anyway and I don't think there is strong dependence between these pieces.
I actually think this patch should not have been reverted in the first place.
SCEV is known to be not good with undef, i don't really see why we should start blocking on that now.
So i would personally recommend to directly revert your revert.
Roman, Are you working in this direction?
SCEV Expander should be fixed separately anyway and I don't think there is strong dependence between these pieces.
Right now i'm not looking at that, but i already had a patch to fix this,
but got sidetracked into isHighCostExpansion(), which then resulted in rage-closing IDE...
Right now i'm trying to deal with another SCEV issue.
I actually think this patch should not have been reverted in the first place.
SCEV is known to be not good with undef, i don't really see why we should start blocking on that now.
So i would personally recommend to directly revert your revert.
If I'm getting it right you are OK to commit the patch in its current state. If this is the case please unblock it.
Sure can do, but i don't want to unblock/accept *this* current code state, so please upload the result of git revert.
Hi Roman, I'm somewhat confused too. If we just revert the revert, we introduce undef value where was no undef (in fact, we create poison where it wasn't present). Do we have plans to fix SCEV expander's behavior with undefs in near term? If no, reverting the revert will simply introduce the bug. I'd rather stick to the current solution, saying that "ok, SCEVExpander is broken, let's not use it". We can add a FIXME to replace this with expander in the future, but I don't understand why would we want to use it despite it's buggy.
Fixed SCEVExpaned in b46c085d2b6d15873fb53718f0a70b3848e19e4a, please rebase/update this patch accordingly.
But, i think we may have a problem still.
Does this patch intend to reassociate only integers, or pointers too? :)
This commit may cause multiple CI failures on TensorFlow/NVPTX backend. See https://github.com/tensorflow/tensorflow/commit/b1758bd553dfc2ebbfd07eec01d1e3254eda25b8#commitcomment-48080097.
I am trying to find a minimal reproducible test case here.
According to bisect log the guilty commit is "first bad commit: [b46c085d2b6d15873fb53718f0a70b3848e19e4a] [NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions". Am I missing something?
Sorry, posted to the wrong venue. Downstream seems to get a workaround and I will test again and get back to you.
Raising this here as well, as it seems the previous concern I raised with the commit was ignored.
I am seeing this change go into an infinite loop attempting to example an expression, adding more and more .nary postfixes.
I have reduced the failure to the attached bugpoint.
llc -march=amdgcn -mcpu=gfx700 < bugpoint-reduced-simplified.ll
Thanks for letting me know. Hadn't seen this before. I can reproduce the issue and working on the fix.
I am seeing this change go into an infinite loop attempting to example an expression, adding more and more .nary postfixes.
I have reduced the failure to the attached bugpoint.llc -march=amdgcn -mcpu=gfx700 < bugpoint-reduced-simplified.ll
cc @Hipony
This patch causes an infinite loop with the following example and "opt -nary-reassociate":
define i32 @nary_infinite_loop_minmax(i32 %d0, i32 %d1, i32 %d2, i32 %d3) { %cmp0 = icmp slt i32 %d2, %d1 %sel0 = select i1 %cmp0, i32 %d1, i32 %d2 %cmp1 = icmp slt i32 %d3, %d0 %sel1 = select i1 %cmp1, i32 %d0, i32 %d3 %cmp2 = icmp slt i32 %sel1, %sel0 %sel2 = select i1 %cmp2, i32 %sel1, i32 %sel0 %cmp3 = icmp slt i32 %d3, %d0 %sel3 = select i1 %cmp3, i32 %d0, i32 %d3 %cmp4 = icmp slt i32 %sel3, %d2 %sel4 = select i1 %cmp4, i32 %d2, i32 %sel3 %cmp5 = icmp slt i32 %sel4, %d1 %sel5 = select i1 %cmp5, i32 %d1, i32 %sel4 ret i32 %sel5 }
There was another infinite loop example attached to the commit message for this patch - https://reviews.llvm.org/rG83d134c3c4222e8b8d3d90c099f749a3b3abc8e0
I'm not sure if that is the same root cause, but I don't think there was any reply to that failure.
There was another infinite loop example attached to the commit message for this patch - https://reviews.llvm.org/rG83d134c3c4222e8b8d3d90c099f749a3b3abc8e0
I'm not sure if that is the same root cause, but I don't think there was any reply to that failure.
This must be another problem. Previously reported issue has been fixed here https://reviews.llvm.org/rG36b932d6a385bb97efe17818a7a47d29d2d8acf3.
Working on the fix...
Early return may be cleaner?