InstCombine today rewrites (zext X == zext Y) as (X == Y), but the same
mechanics is missing for switch; instead, it tries to shrink the switch
condition type to as small as possible. This difference in behaviors
reduces the effectiveness of global value numbering.
Details
- Reviewers
spatel lebedev.ri
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Here is a longer example showing the improvements of this change:
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" define i32 @f1({ i32, i32 }* %x, { i32, i32 }* %y) { start: %0 = getelementptr inbounds { i32, i32 }, { i32, i32 }* %x, i64 0, i32 0 %1 = load i32, i32* %0, align 4, !range !5 %_1 = zext i32 %1 to i64 %2 = getelementptr inbounds { i32, i32 }, { i32, i32 }* %y, i64 0, i32 0 %3 = load i32, i32* %2, align 4, !range !5 %_3 = zext i32 %3 to i64 %cond1 = icmp eq i64 %_1, %_3 br i1 %cond1, label %bb1, label %bb41 bb1: switch i64 %_1, label %bb41 [ i64 0, label %bb2 i64 1, label %bb5 i64 2, label %bb7 i64 3, label %bb9 ] bb2: ret i32 100 bb5: ret i32 101 bb7: ret i32 102 bb9: %4 = getelementptr inbounds { i32, i32 }, { i32, i32 }* %y, i64 0, i32 0 %5 = load i32, i32* %4, align 4, !range !5 %_5 = zext i32 %5 to i64 %cond2 = icmp eq i64 %_5, 3 br i1 %cond2, label %bb10, label %bb11 bb10: ret i32 1001 bb11: ret i32 1002 bb41: unreachable } !5 = !{i32 0, i32 4}
Without this patch, opt -S -O3 no.ll produces:
define i32 @f1({ i32, i32 }* nocapture readonly %x, { i32, i32 }* nocapture readonly %y) local_unnamed_addr #0 { start: %0 = getelementptr inbounds { i32, i32 }, { i32, i32 }* %x, i64 0, i32 0 %1 = load i32, i32* %0, align 4, !range !0 %_1 = zext i32 %1 to i64 %2 = getelementptr inbounds { i32, i32 }, { i32, i32 }* %y, i64 0, i32 0 %3 = load i32, i32* %2, align 4, !range !0 %cond1 = icmp eq i32 %1, %3 tail call void @llvm.assume(i1 %cond1) switch i64 %_1, label %bb41 [ i64 0, label %bb2 i64 1, label %bb5 i64 2, label %bb7 i64 3, label %bb9 ] bb2: ; preds = %bb9, %bb7, %bb5, %start %merge = phi i32 [ 100, %start ], [ 101, %bb5 ], [ 102, %bb7 ], [ %., %bb9 ] ret i32 %merge bb5: ; preds = %start br label %bb2 bb7: ; preds = %start br label %bb2 bb9: ; preds = %start %cond2 = icmp eq i32 %1, 3 %. = select i1 %cond2, i32 1001, i32 1002 br label %bb2 bb41: ; preds = %start unreachable } !0 = !{i32 0, i32 4}
Note that the icmp eq i32 %1, 3 in bb9 is actually is unnecessary, because in the input, icmp eq i64 %_1, %_3 from the start block must be true.
With this fix, the output becomes:
define i32 @f1({ i32, i32 }* nocapture readonly %x, { i32, i32 }* nocapture readonly %y) local_unnamed_addr #0 { start: %0 = getelementptr inbounds { i32, i32 }, { i32, i32 }* %x, i64 0, i32 0 %1 = load i32, i32* %0, align 4, !range !0 %2 = getelementptr inbounds { i32, i32 }, { i32, i32 }* %y, i64 0, i32 0 %3 = load i32, i32* %2, align 4, !range !0 %cond1 = icmp eq i32 %1, %3 tail call void @llvm.assume(i1 %cond1) switch i32 %1, label %bb41 [ i32 0, label %bb2 i32 1, label %bb5 i32 2, label %bb7 i32 3, label %bb9 ] bb2: ; preds = %bb9, %bb7, %bb5, %start %merge = phi i32 [ 100, %start ], [ 101, %bb5 ], [ 102, %bb7 ], [ 1001, %bb9 ] ret i32 %merge bb5: ; preds = %start br label %bb2 bb7: ; preds = %start br label %bb2 bb9: ; preds = %start br label %bb2 bb41: ; preds = %start unreachable } !0 = !{i32 0, i32 4}
The difference is also reflected in output assembly.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
2952–2953 | Why do we need this? |
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
2952–2953 | This handles cases like this: %zx = zext i32 %x to i64 switch i64 %zx, label %bb2 [ i64 0x7fff_ffff_ffff_ffff, label %bb1 ; exceeds i32 range ] The 0x7fff_ffff_ffff_ffff has 1 leading zero, so LeadingKnownZeros will be 1, and NewWidth will be 64-1 = 63, which exceeds the range of i32. |
Aha, i see. I think the existing fold should be fixed instead,
because if it doesn't fire (because shouldChangeType() said so?),
how do we know we can ignore that hook here?
@lebedev.ri , which "hook" did you mean?
I think the existing fold is too aggressive. Please take a look at the example I gave in an earlier comment -- doing a less strict fold first gives GVN a chance to optimize the IR.
When the new fold is triggered, the existing more aggressive fold would be triggered (if NewWidth happens to be a standard integer type, e.g. i8, so that shouldChangeType() returns true) the next time InstCombine gets executed.
The shouldChangeType()
I think the existing fold is too aggressive.
Please take a look at the example I gave in an earlier comment -- doing a less strict fold first gives GVN a chance to optimize the IR.
Could you please post a godbolt example showing *how* it GVN would optimize it? https://godbolt.org/z/cq5M5f
But that sounds like a GVN bug to me.
When the new fold is triggered, the existing more aggressive fold would be triggered (if NewWidth happens to be a standard integer type, e.g. i8, so that shouldChangeType() returns true) the next time InstCombine gets executed.
Ah yes, sorry, so the example would already be optimized, just to a different width.
https://godbolt.org/z/37jMeM
@lebedev.ri , here is what GVN does for the current output, and this is the new behavior. Note that the first one does
%cond2 = icmp eq i32 %1, 3 br i1 %cond2, label %bb10, label %bb11
while in the second one it's just
br i1 true, label %bb10, label %bb11
that sounds like a GVN bug to me
Probably, but I am not sure whether GVN or InstCombine (or both?) is supposed to know that if zext X == 3 then X == 3.
Ah yes, sorry, so the example would already be optimized, just to a different width.
@lebedev.ri , I made another patch by changing only GVN -- please take a look at D93888. If it could be approved, I will close this PR ("diff"?).
Why do we need this?