Download Raw Diff

Details

Reviewers

nlopes
spatel
nikic
mkazantsev

Commits

rZORG89d1bdfe2187: [ValueTracking] Improve isKnowNonZero for Ints
rZORGaf50709cf9d3: [ValueTracking] Improve isKnowNonZero for Ints
rG89d1bdfe2187: [ValueTracking] Improve isKnowNonZero for Ints
rGaf50709cf9d3: [ValueTracking] Improve isKnowNonZero for Ints
rG3b137a495686: [ValueTracking] Improve isKnowNonZero for Ints
rL360222: [ValueTracking] Improve isKnowNonZero for Ints

Summary

Improve isKnownNonZero for integers in order to improve cttz
optimizations.

Diff Detail

Repository: rL LLVM

Event Timeline

dlrobertson created this revision.Apr 17 2019, 6:02 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 17 2019, 6:02 PM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

Would an NFS baseline tests patch be useful here?

s/NFS/NFC

Here are all the uses of isKnownNonZero with a (potentially) non-pointer operand I have found, so these might be affected by / may benefit from this change:

4 uses in simplifyICmpWithZero: https://github.com/llvm-mirror/llvm/blob/a2c8107c80101593ce06445cd3dcb3628dc95819/lib/Analysis/InstructionSimplify.cpp#L2414

isKnownPositive: https://github.com/llvm-mirror/llvm/blob/a2c8107c80101593ce06445cd3dcb3628dc95819/lib/Analysis/ValueTracking.cpp#L264 which has only 2 uses in https://github.com/llvm-mirror/llvm/blob/a2c8107c80101593ce06445cd3dcb3628dc95819/lib/Transforms/InstCombine/InstCombineCompares.cpp#L1318.

2 uses in computeKnownBitsMul: https://github.com/llvm-mirror/llvm/blob/a2c8107c80101593ce06445cd3dcb3628dc95819/lib/Analysis/ValueTracking.cpp#L348

2 uses in computeKnownBitsFromShiftOperator: https://github.com/llvm-mirror/llvm/blob/a2c8107c80101593ce06445cd3dcb3628dc95819/lib/Analysis/ValueTracking.cpp#L936

foldCttzCtlz: https://github.com/llvm-mirror/llvm/blob/a2c8107c80101593ce06445cd3dcb3628dc95819/lib/Transforms/InstCombine/InstCombineCalls.cpp#L1333

phi combining: https://github.com/llvm-mirror/llvm/blob/a2c8107c80101593ce06445cd3dcb3628dc95819/lib/Transforms/InstCombine/InstCombinePHI.cpp#L1183

nikic added inline comments.Apr 18 2019, 12:47 AM

llvm/test/Transforms/InstCombine/cttz.ll
12 ↗	(On Diff #195657)	Hm, it looks like the optimization we want doesn't actually happen here? The `i1 false` should have changed to an `i1 true`.

dlrobertson marked 2 inline comments as done.Apr 18 2019, 6:38 AM

dlrobertson added inline comments.

llvm/test/Transforms/InstCombine/cttz.ll
12 ↗	(On Diff #195657)	Good catch. Just a cast of not updating the patch I uploaded.

Update after added tests

dlrobertson updated this revision to Diff 195960.Apr 19 2019, 7:34 PM

nikic added a reviewer: spatel.Apr 20 2019, 12:27 AM

Looks good to me technically, I think it's only reasonable that integers and pointers are treated the same way here. Would like to have a second opinion though, maybe from @spatel?

It should be noted that the instsimplify cases here would also be simplified by either instcombine (presumably the dominating condition code there) or CVP. Is that possibly why this was not done before?

Can you push the baseline tests to trunk/head, so this diff is against an updated public baseline?
There's also a TODO comment/test about this enhancement in test/Transforms/LICM/hoist-mustexec.ll -- that should be updated as part of this patch.

About the relation to instcombine/CVP: there's an independent question (and unresolved in terms of implementation in LLVM) about which pass(es) have responsibility for these kinds of patterns. This patch makes valuetracking better with no obvious increase in complexity, so I think it's the right step. And if I'm seeing correctly, neither of those passes would get the last test with the extra blocks and phi without this patch.

spatel mentioned this in D60932: [NFC] Add baseline tests for int isKnownNonZero.Apr 22 2019, 9:01 AM

dlrobertson updated this revision to Diff 196996.Apr 27 2019, 6:16 PM

LGTM

This revision is now accepted and ready to land.Apr 28 2019, 6:55 AM

test/Transforms/LICM/hoist-mustexec.ll still needs to be updated. (Should probably generate full checks as a preliminary NFC commit first.)

In D60846#1481774, @nikic wrote:

test/Transforms/LICM/hoist-mustexec.ll still needs to be updated. (Should probably generate full checks as a preliminary NFC commit first.)

Oops - I forgot that even though I mentioned it in the earlier comment. Yes, that must be included in this patch (and any other existing tests that are known to be affected).

This revision now requires changes to proceed.Apr 28 2019, 9:15 AM

In D60846#1481782, @spatel wrote:

In D60846#1481774, @nikic wrote:

test/Transforms/LICM/hoist-mustexec.ll still needs to be updated. (Should probably generate full checks as a preliminary NFC commit first.)

Oops - I forgot that even though I mentioned it in the earlier comment. Yes, that must be included in this patch (and any other existing tests that are known to be affected).

Sorry, I forgot all about this as well. I did run the update_test_checks script on this file after applying the patch and there was not improvement to the generated code.

In D60846#1481903, @dlrobertson wrote:

In D60846#1481782, @spatel wrote:

In D60846#1481774, @nikic wrote:

test/Transforms/LICM/hoist-mustexec.ll still needs to be updated. (Should probably generate full checks as a preliminary NFC commit first.)

Oops - I forgot that even though I mentioned it in the earlier comment. Yes, that must be included in this patch (and any other existing tests that are known to be affected).

Sorry, I forgot all about this as well. I did run the update_test_checks script on this file after applying the patch and there was not improvement to the generated code.

Are you sure? I didn't check what exactly changed, but this test was failing for me locally after applying the patch.

In D60846#1482001, @nikic wrote:

In D60846#1481903, @dlrobertson wrote:

In D60846#1481782, @spatel wrote:

In D60846#1481774, @nikic wrote:

test/Transforms/LICM/hoist-mustexec.ll still needs to be updated. (Should probably generate full checks as a preliminary NFC commit first.)

Oops - I forgot that even though I mentioned it in the earlier comment. Yes, that must be included in this patch (and any other existing tests that are known to be affected).

Sorry, I forgot all about this as well. I did run the update_test_checks script on this file after applying the patch and there was not improvement to the generated code.

Are you sure? I didn't check what exactly changed, but this test was failing for me locally after applying the patch.

I'm probably doing something wrong. I'll run some tests and tinker with this later today.

Update LICM tests

Herald added a subscriber: asbirlea. · View Herald TranscriptMay 2 2019, 8:19 PM

I can't tell what's actually happening in the LICM tests, so:
rL359881

Please rebase and update the patch here. Don't forget to update that TODO comment in the test file too; we should be set after that.

Updated the LICM test.

LGTM

This revision was not accepted when it landed; it landed in state Needs Review.May 7 2019, 7:23 PM

Closed by commit rL360222: [ValueTracking] Improve isKnowNonZero for Ints (authored by dlrobertson). · Explain Why

This revision was automatically updated to reflect the committed changes.

Hi folks,

This introduces a regression with RADV (the open source Vulkan driver in mesa) and some CTS tests, here's the list of failures:

dEQP-VK.glsl.functions.control_flow.mixed_return_break_continue_fragment
dEQP-VK.glsl.functions.control_flow.mixed_return_break_continue_vertex

Here's the LLVM IR generated by Mesa (before any optimizations passes):

; ModuleID = 'mesa-shader'
source_filename = "mesa-shader"
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-ni:7"
target triple = "amdgcn-mesa-mesa3d"

define amdgpu_ps void @main([0 x i8] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, i32, i32, i32, i32) #0 {
main_body:
  %temp7 = alloca float, addrspace(5)
  %temp6 = alloca float, addrspace(5)
  %temp5 = alloca float, addrspace(5)
  %temp4 = alloca float, addrspace(5)
  %temp3 = alloca float, addrspace(5)
  %temp2 = alloca float, addrspace(5)
  %temp1 = alloca float, addrspace(5)
  %temp = alloca float, addrspace(5)
  %18 = alloca float, addrspace(5)
  %19 = alloca float, addrspace(5)
  %20 = alloca float, addrspace(5)
  %21 = alloca float, addrspace(5)
  %22 = call i8 addrspace(4)* @llvm.amdgcn.implicit.buffer.ptr() #2
  %23 = bitcast i8 addrspace(4)* %22 to [0 x <4 x i32>] addrspace(4)*
  %24 = call float @llvm.amdgcn.interp.mov(i32 2, i32 0, i32 0, i32 %1) #2
  %25 = bitcast float %24 to i32
  %26 = call float @llvm.amdgcn.interp.mov(i32 2, i32 1, i32 0, i32 %1) #2
  %27 = bitcast float %26 to i32
  %28 = call float @llvm.amdgcn.interp.mov(i32 2, i32 2, i32 0, i32 %1) #2
  %29 = bitcast float %28 to i32
  %30 = call float @llvm.amdgcn.interp.mov(i32 2, i32 3, i32 0, i32 %1) #2
  %31 = bitcast float %30 to i32
  br label %loop1

loop1:                                            ; preds = %endif5, %main_body
  %32 = phi i32 [ %25, %main_body ], [ %49, %endif5 ]
  %33 = phi i32 [ 0, %main_body ], [ %50, %endif5 ]
  %34 = icmp sge i32 %33, 6
  %35 = select i1 %34, i32 -1, i32 0
  %36 = icmp ne i32 %35, 0
  br i1 %36, label %if2, label %else3

if2:                                              ; preds = %loop1
  br label %endloop1

else3:                                            ; preds = %loop1
  br label %endif2

endif2:                                           ; preds = %else3
  %37 = icmp ne i32 %33, 0
  %38 = select i1 %37, i32 -1, i32 0
  %39 = icmp ne i32 %38, 0
  br i1 %39, label %if5, label %else12

if5:                                              ; preds = %endif2
  %40 = icmp ne i32 %33, 1
  %41 = select i1 %40, i32 -1, i32 0
  %42 = icmp ne i32 %41, 0
  br i1 %42, label %if6, label %else10

if6:                                              ; preds = %if5
  %43 = icmp eq i32 %33, 3
  %44 = select i1 %43, i32 -1, i32 0
  %45 = icmp ne i32 %44, 0
  br i1 %45, label %if7, label %else8

if7:                                              ; preds = %if6
  br label %endloop1

else8:                                            ; preds = %if6
  br label %endloop1

endif7:                                           ; No predecessors!
  br label %endif6

else10:                                           ; preds = %if5
  br label %endif6

endif6:                                           ; preds = %else10, %endif7
  %46 = bitcast i32 %32 to float
  %47 = fsub float -0.000000e+00, %46
  %48 = bitcast float %47 to i32
  br label %endif5

else12:                                           ; preds = %endif2
  br label %endif5

endif5:                                           ; preds = %else12, %endif6
  %49 = phi i32 [ %32, %else12 ], [ %48, %endif6 ]
  %50 = add i32 %33, 1
  br label %loop1

endloop1:                                         ; preds = %else8, %if7, %if2
  %51 = phi i32 [ undef, %if2 ], [ undef, %if7 ], [ %32, %else8 ]
  %52 = phi i32 [ 0, %if2 ], [ 0, %if7 ], [ -1, %else8 ]
  %53 = icmp ne i32 %52, 0
  %54 = select i1 %53, i32 %51, i32 1065353216
  %55 = getelementptr [0 x i8], [0 x i8] addrspace(6)* %0, i32 0, i32 0
  %56 = bitcast i8 addrspace(6)* %55 to <4 x i32> addrspace(6)*, !amdgpu.uniform !0
  %57 = load <4 x i32>, <4 x i32> addrspace(6)* %56, !invariant.load !0
  %58 = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %57, i32 0, i32 0) #2
  %59 = bitcast float %58 to i32
  %60 = bitcast i32 %59 to float
  %61 = fsub float -0.000000e+00, %60
  %62 = bitcast float %61 to i32
  %63 = bitcast i32 %54 to float
  %64 = bitcast i32 %62 to float
  %65 = fadd float %63, %64
  %66 = bitcast float %65 to i32
  %67 = bitcast i32 %66 to float
  %68 = call float @llvm.fabs.f32(float %67) #2
  %69 = bitcast float %68 to i32
  %70 = bitcast i32 %59 to float
  %71 = call float @llvm.fabs.f32(float %70) #2
  %72 = bitcast float %71 to i32
  %73 = bitcast i32 %72 to float
  %74 = fmul float 0x3FA99999A0000000, %73
  %75 = bitcast float %74 to i32
  %76 = bitcast i32 %75 to float
  %77 = fadd float %76, 0x3FA99999A0000000
  %78 = bitcast float %77 to i32
  %79 = bitcast i32 %78 to float
  %80 = bitcast i32 %69 to float
  %81 = fcmp oge float %79, %80
  %82 = select i1 %81, i32 -1, i32 0
  %83 = and i32 %82, 1065353216
  %84 = bitcast i32 %83 to float
  %85 = bitcast float %84 to i32
  %86 = insertelement <4 x i32> undef, i32 %85, i32 0
  %87 = insertelement <4 x i32> %86, i32 %85, i32 1
  %88 = insertelement <4 x i32> %87, i32 %85, i32 2
  %89 = insertelement <4 x i32> %88, i32 1065353216, i32 3
  %90 = bitcast <4 x i32> %89 to <4 x float>
  %91 = extractelement <4 x float> %90, i32 0
  store float %91, float addrspace(5)* %21
  %92 = extractelement <4 x float> %90, i32 1
  store float %92, float addrspace(5)* %20
  %93 = extractelement <4 x float> %90, i32 2
  store float %93, float addrspace(5)* %19
  %94 = extractelement <4 x float> %90, i32 3
  store float %94, float addrspace(5)* %18
  %95 = load float, float addrspace(5)* %21
  %96 = load float, float addrspace(5)* %20
  %97 = load float, float addrspace(5)* %19
  %98 = load float, float addrspace(5)* %18
  %99 = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %95, float %96) #2
  %100 = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %97, float %98) #2
  %101 = bitcast <2 x half> %99 to <2 x i16>
  %102 = bitcast <2 x half> %100 to <2 x i16>
  call void @llvm.amdgcn.exp.compr.v2i16(i32 0, i32 5, <2 x i16> %101, <2 x i16> %102, i1 true, i1 true) #3
  ret void
}

; Function Attrs: nounwind readnone speculatable
declare i8 addrspace(4)* @llvm.amdgcn.implicit.buffer.ptr() #1

; Function Attrs: nounwind readnone speculatable
declare float @llvm.amdgcn.interp.mov(i32, i32, i32, i32) #1

; Function Attrs: nounwind readnone
declare float @llvm.amdgcn.s.buffer.load.f32(<4 x i32>, i32, i32 immarg) #2

; Function Attrs: nounwind readnone speculatable
declare float @llvm.fabs.f32(float) #1

; Function Attrs: nounwind readnone speculatable
declare <2 x half> @llvm.amdgcn.cvt.pkrtz(float, float) #1

; Function Attrs: nounwind
declare void @llvm.amdgcn.exp.compr.v2i16(i32 immarg, i32 immarg, <2 x i16>, <2 x i16>, i1 immarg, i1 immarg) #3

attributes #0 = { "amdgpu-32bit-address-high-bits"="0xffff8000" }
attributes #1 = { nounwind readnone speculatable }
attributes #2 = { nounwind readnone }
attributes #3 = { nounwind }

!0 = !{}

Can you have a look?
Thanks,
Samuel.

In D60846#1494561, @hakzsam wrote:

Hi folks,

This introduces a regression with RADV (the open source Vulkan driver in mesa) and some CTS tests, here's the list of failures:

Is the regression a compiler-crash, program crash, miscompile, or performance loss?
Probably related to addrspace?

Hi,

I'm experiencing problems with this patch as well. I'm not 100% sure what the
problem is yet but I have a suspicion:

The comment for isKnownNonZero says:

/ Return true if the given value is known to be non-zero when defined.
[...]
/ For pointers, if the context instruction and dominator tree are
/ specified, perform context-sensitive analysis and return true if the
/ pointer couldn't possibly be null at the specified instruction.

The context-sensitive analysis is done by isKnownNonZeroFromDominatingCondition()
but since we now allow that to return true for non-pointers as well, we suddenly
can return true from isKnownNonZero() based on a context-sensitive analysis. Or?

And perhaps the users of isKnownNonZero() then assumes that
"Return true if the given value is known to be non-zero when defined."
still holds for non-pointers?

The problem I'm seeing is a miscompile for my out-of-tree target so it's quite nasty.
I'll try to reduce and come up with an x86 example.

This is as far as I've come:

With

opt -S -o - red.ll -simplifycfg -loop-unroll

I get

; ModuleID = 'red.ll'
source_filename = "red.ll"

declare void @foo(i16)

define i16 @main() {
  br label %bb1

bb1:                                              ; preds = %0
  call void @foo(i16 0)
  call void @foo(i16 1000)
  ret i16 0
}

which I think is correct, and with

opt -S -o - red.ll -instcombine -simplifycfg -loop-unroll

I get

; ModuleID = 'red.ll'
source_filename = "red.ll"

declare void @foo(i16)

define i16 @main() {
  br label %bb1

bb1:                                              ; preds = %0
  call void @foo(i16 0)
  call void @foo(i16 1)
  ret i16 0
}

which I think is incorrect.

And in both cases red.ll is

declare void @foo(i16)

define i16 @main() {
  br label %bb1

bb1:                                              ; preds = %bb9, %0
  %i.3.0 = phi i16 [ 0, %0 ], [ %_tmp19, %bb9 ]
  %_tmp8 = icmp uge i16 %i.3.0, 1
  br i1 %_tmp8, label %bb3, label %bb5

bb3:                                              ; preds = %bb1
  %_tmp11 = add i16 1, 1
  %_tmp12 = icmp ult i16 %i.3.0, %_tmp11
  br i1 %_tmp12, label %bb4, label %bb5

bb4:                                              ; preds = %bb3
  br label %bb6

bb5:                                              ; preds = %bb3, %bb1
  br label %bb6

bb6:                                              ; preds = %bb5, %bb4
  %tmp0.5.0 = phi i16 [ 1, %bb4 ], [ 0, %bb5 ]
  %_tmp14 = icmp ne i16 %tmp0.5.0, 0
  br i1 %_tmp14, label %bb7, label %bb8

bb7:                                              ; preds = %bb6
  br label %bb9

bb8:                                              ; preds = %bb6
  br label %bb9

bb9:                                              ; preds = %bb8, %bb7
  %tmp1.6.0 = phi i16 [ 1000, %bb7 ], [ %i.3.0, %bb8 ]
  call void @foo(i16 %tmp1.6.0)
  %_tmp19 = add i16 %i.3.0, 1
  %_tmp21 = icmp ult i16 %_tmp19, 2
  br i1 %_tmp21, label %bb1, label %bb10

bb10:                                             ; preds = %bb9
  ret i16 0
}

If I revert this commit I get

call void @foo(i16 0)
call void @foo(i16 1000)

in both cases.

In D60846#1494878, @spatel wrote:

In D60846#1494561, @hakzsam wrote:

Hi folks,

This introduces a regression with RADV (the open source Vulkan driver in mesa) and some CTS tests, here's the list of failures:

Is the regression a compiler-crash, program crash, miscompile, or performance loss?
Probably related to addrspace?

It's a test failure, it doesn't crash, the mentioned tests just fail to pass with that change.

I've reverted this change for now (rL360260).

nikic reopened this revision.May 8 2019, 7:51 AM

@uabelho Thanks for the reproducer. I think the problems is that when we thread a comparison over a phi in instsimplify, we retain the original context instruction, while (I think) we should be using the terminator of the incoming value as context.

@uabelho great job coming up with a reproducer and thanks @nikic for reverting! I'll start working on a better solution.

I ran ninja check, but I did not hit any failures. Are there more architectures and/or tools I should enable and check before a commit?

In D60846#1495721, @dlrobertson wrote:

@uabelho great job coming up with a reproducer and thanks @nikic for reverting! I'll start working on a better solution.

I ran ninja check, but I did not hit any failures. Are there more architectures and/or tools I should enable and check before a commit?

I found this when running our own test suite for our out-of-tree target. Actually it was a testcase testing memset, but for some reason
it managed to trigger this problem.

Here's a slightly cleaned up version of the reproducer:

define void @test(i1* %p) {
entry:
  br label %loop

loop:                                             ; preds = %common, %entry
  %i = phi i16 [ 0, %entry ], [ %i.inc, %common ]
  %is_zero = icmp eq i16 %i, 0
  br i1 %is_zero, label %common, label %non_zero

non_zero:                                         ; preds = %loop
  %is_one = icmp eq i16 %i, 1
  store i1 %is_one, i1* %p
  br label %common

common:                                           ; preds = %non_zero, %loop
  %i.inc = add i16 %i, 1
  %loop_cond = icmp ult i16 %i.inc, 2
  br i1 %loop_cond, label %loop, label %exit

exit:                                             ; preds = %common
  ret void
}

The %is_one is incorrectly optimized to false. I've been trying to come up with some variation that makes it fail without this patch, but no luck so far (something assume() based should be possible in theory, but it's very hard to actually construct something.)

In D60846#1497243, @nikic wrote:

Here's a slightly cleaned up version of the reproducer

Thanks!

jdoerfert mentioned this in D69571: [ValueTracking] Allow context-sensitive nullness check for non-pointers.Oct 29 2019, 4:10 PM

I accidentally did the same thing you did in D69571 while looking into https://bugs.llvm.org/show_bug.cgi?id=43833. Anyway, I took your reproducer, fixed the issue our patches exposed (I think), and put it up there :)

jdoerfert mentioned this in rGe15ab8f277c7: [InstCombine] Add the test that triggered the D60846 revert.Oct 31 2019, 11:50 AM

jdoerfert mentioned this in rG57dd4b03e480: [ValueTracking] Allow context-sensitive nullness check for non-pointers.Oct 31 2019, 12:43 PM

hans mentioned this in rG6ea477590085: Revert 57dd4b0 "[ValueTracking] Allow context-sensitive nullness check for non….Nov 13 2019, 3:29 AM

jdoerfert mentioned this in D71181: [ValueTracking] Allow context-sensitive nullness check for non-pointers.Dec 8 2019, 8:14 PM

jdoerfert mentioned this in rGa7d992c0f2d2: [ValueTracking] Allow context-sensitive nullness check for non-pointers.Dec 9 2019, 1:21 PM

nikic resigned from this revision.Jan 23 2020, 12:59 PM

mkazantsev resigned from this revision.Jan 30 2020, 4:19 AM

jdoerfert mentioned this in D88276: [IsKnownNonZero] Handle the case with non-constant phi nodes.Sep 27 2020, 10:42 PM

Diff 198575

llvm/trunk/lib/Analysis/ValueTracking.cpp

Show First 20 Lines • Show All 1,887 Lines • ▼ Show 20 Lines	for (gep_type_iterator GTI = gep_type_begin(GEP), GTE = gep_type_end(GEP);

if (isKnownNonZero(GTI.getOperand(), Depth, Q))		if (isKnownNonZero(GTI.getOperand(), Depth, Q))
return true;		return true;
}		}

return false;		return false;
}		}

static bool isKnownNonNullFromDominatingCondition(const Value *V,		static bool isKnownNonZeroFromDominatingCondition(const Value *V,
const Instruction *CtxI,		const Instruction *CtxI,
const DominatorTree *DT) {		const DominatorTree *DT) {
assert(V->getType()->isPointerTy() && "V must be pointer type");
assert(!isa<ConstantData>(V) && "Did not expect ConstantPointerNull");		assert(!isa<ConstantData>(V) && "Did not expect ConstantPointerNull");

if (!CtxI \|\| !DT)		if (!CtxI \|\| !DT)
return false;		return false;

unsigned NumUsesExplored = 0;		unsigned NumUsesExplored = 0;
for (auto *U : V->users()) {		for (auto *U : V->users()) {
// Avoid massive lists		// Avoid massive lists
if (NumUsesExplored >= DomConditionsMaxUses)		if (NumUsesExplored >= DomConditionsMaxUses)
break;		break;
NumUsesExplored++;		NumUsesExplored++;

// If the value is used as an argument to a call or invoke, then argument		// If the value is a pointer and used as an argument to a call or invoke,
// attributes may provide an answer about null-ness.		// then argument attributes may provide an answer about null-ness.
		if (V->getType()->isPointerTy())
if (auto CS = ImmutableCallSite(U))		if (auto CS = ImmutableCallSite(U))
if (auto *CalledFunc = CS.getCalledFunction())		if (auto *CalledFunc = CS.getCalledFunction())
for (const Argument &Arg : CalledFunc->args())		for (const Argument &Arg : CalledFunc->args())
if (CS.getArgOperand(Arg.getArgNo()) == V &&		if (CS.getArgOperand(Arg.getArgNo()) == V &&
Arg.hasNonNullAttr() && DT->dominates(CS.getInstruction(), CtxI))		Arg.hasNonNullAttr() && DT->dominates(CS.getInstruction(), CtxI))
return true;		return true;

// Consider only compare instructions uniquely controlling a branch		// Consider only compare instructions uniquely controlling a branch
CmpInst::Predicate Pred;		CmpInst::Predicate Pred;
if (!match(const_cast<User *>(U),		if (!match(const_cast<User *>(U),
m_c_ICmp(Pred, m_Specific(V), m_Zero())) \|\|		m_c_ICmp(Pred, m_Specific(V), m_Zero())) \|\|
(Pred != ICmpInst::ICMP_EQ && Pred != ICmpInst::ICMP_NE))		(Pred != ICmpInst::ICMP_EQ && Pred != ICmpInst::ICMP_NE))
continue;		continue;

▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	if (const auto *Call = dyn_cast<CallBase>(V)) {
if (Call->isReturnNonNull())		if (Call->isReturnNonNull())
return true;		return true;
if (const auto *RP = getArgumentAliasingToReturnedPointer(Call))		if (const auto *RP = getArgumentAliasingToReturnedPointer(Call))
return isKnownNonZero(RP, Depth, Q);		return isKnownNonZero(RP, Depth, Q);
}		}
}		}


// Check for recursive pointer simplifications.		if (isKnownNonZeroFromDominatingCondition(V, Q.CxtI, Q.DT))
if (V->getType()->isPointerTy()) {
if (isKnownNonNullFromDominatingCondition(V, Q.CxtI, Q.DT))
return true;		return true;

		// Check for recursive pointer simplifications.
		if (V->getType()->isPointerTy()) {
// Look through bitcast operations, GEPs, and int2ptr instructions as they		// Look through bitcast operations, GEPs, and int2ptr instructions as they
// do not alter the value, or at least not the nullness property of the		// do not alter the value, or at least not the nullness property of the
// value, e.g., int2ptr is allowed to zero/sign extend the value.		// value, e.g., int2ptr is allowed to zero/sign extend the value.
//		//
// Note that we have to take special care to avoid looking through		// Note that we have to take special care to avoid looking through
// truncating casts, e.g., int2ptr/ptr2int with appropriate sizes, as well		// truncating casts, e.g., int2ptr/ptr2int with appropriate sizes, as well
// as casts that can alter the value, e.g., AddrSpaceCasts.		// as casts that can alter the value, e.g., AddrSpaceCasts.
if (const GEPOperator *GEP = dyn_cast<GEPOperator>(V))		if (const GEPOperator *GEP = dyn_cast<GEPOperator>(V))
▲ Show 20 Lines • Show All 3,663 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/known-non-zero.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s

	declare i64 @llvm.cttz.i64(i64, i1)			declare i64 @llvm.cttz.i64(i64, i1)

	declare i64 @llvm.ctlz.i64(i64, i1)			declare i64 @llvm.ctlz.i64(i64, i1)

	declare <8 x i64> @llvm.cttz.v8i64(<8 x i64>, i1)			declare <8 x i64> @llvm.cttz.v8i64(<8 x i64>, i1)

	define i32 @test0(i64 %x) {			define i32 @test0(i64 %x) {
	; CHECK-LABEL: @test0(			; CHECK-LABEL: @test0(
	; CHECK-NEXT: start:			; CHECK-NEXT: start:
	; CHECK-NEXT: [[C:%.]] = icmp eq i64 [[X:%.]], 0			; CHECK-NEXT: [[C:%.]] = icmp eq i64 [[X:%.]], 0
	; CHECK-NEXT: br i1 [[C]], label [[EXIT:%.]], label [[NON_ZERO:%.]]			; CHECK-NEXT: br i1 [[C]], label [[EXIT:%.]], label [[NON_ZERO:%.]]
	; CHECK: non_zero:			; CHECK: non_zero:
	; CHECK-NEXT: [[CTZ:%.*]] = call i64 @llvm.cttz.i64(i64 [[X]], i1 false), !range !0			; CHECK-NEXT: [[CTZ:%.*]] = call i64 @llvm.cttz.i64(i64 [[X]], i1 true), !range !0
	; CHECK-NEXT: [[CTZ32:%.*]] = trunc i64 [[CTZ]] to i32			; CHECK-NEXT: [[CTZ32:%.*]] = trunc i64 [[CTZ]] to i32
	; CHECK-NEXT: br label [[EXIT]]			; CHECK-NEXT: br label [[EXIT]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[RES:%.]] = phi i32 [ [[CTZ32]], [[NON_ZERO]] ], [ 0, [[START:%.]] ]			; CHECK-NEXT: [[RES:%.]] = phi i32 [ [[CTZ32]], [[NON_ZERO]] ], [ 0, [[START:%.]] ]
	; CHECK-NEXT: ret i32 [[RES]]			; CHECK-NEXT: ret i32 [[RES]]
	;			;
	start:			start:
	%c = icmp eq i64 %x, 0			%c = icmp eq i64 %x, 0
	Show All 10 Lines
	}			}

	define i32 @test1(i64 %x) {			define i32 @test1(i64 %x) {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: start:			; CHECK-NEXT: start:
	; CHECK-NEXT: [[C:%.]] = icmp eq i64 [[X:%.]], 0			; CHECK-NEXT: [[C:%.]] = icmp eq i64 [[X:%.]], 0
	; CHECK-NEXT: br i1 [[C]], label [[EXIT:%.]], label [[NON_ZERO:%.]]			; CHECK-NEXT: br i1 [[C]], label [[EXIT:%.]], label [[NON_ZERO:%.]]
	; CHECK: non_zero:			; CHECK: non_zero:
	; CHECK-NEXT: [[CTZ:%.*]] = call i64 @llvm.ctlz.i64(i64 [[X]], i1 false), !range !0			; CHECK-NEXT: [[CTZ:%.*]] = call i64 @llvm.ctlz.i64(i64 [[X]], i1 true), !range !0
	; CHECK-NEXT: [[CTZ32:%.*]] = trunc i64 [[CTZ]] to i32			; CHECK-NEXT: [[CTZ32:%.*]] = trunc i64 [[CTZ]] to i32
	; CHECK-NEXT: br label [[EXIT]]			; CHECK-NEXT: br label [[EXIT]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[RES:%.]] = phi i32 [ [[CTZ32]], [[NON_ZERO]] ], [ 0, [[START:%.]] ]			; CHECK-NEXT: [[RES:%.]] = phi i32 [ [[CTZ32]], [[NON_ZERO]] ], [ 0, [[START:%.]] ]
	; CHECK-NEXT: ret i32 [[RES]]			; CHECK-NEXT: ret i32 [[RES]]
	;			;
	start:			start:
	%c = icmp eq i64 %x, 0			%c = icmp eq i64 %x, 0
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstSimplify/known-non-zero.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instsimplify -S \| FileCheck %s			; RUN: opt < %s -instsimplify -S \| FileCheck %s

	define i64 @test0(i64 %x) {			define i64 @test0(i64 %x) {
	; CHECK-LABEL: @test0(			; CHECK-LABEL: @test0(
	; CHECK-NEXT: start:			; CHECK-NEXT: start:
	; CHECK-NEXT: [[A:%.]] = icmp eq i64 [[X:%.]], 0			; CHECK-NEXT: [[A:%.]] = icmp eq i64 [[X:%.]], 0
	; CHECK-NEXT: br i1 [[A]], label [[EXIT:%.]], label [[NON_ZERO:%.]]			; CHECK-NEXT: br i1 [[A]], label [[EXIT:%.]], label [[NON_ZERO:%.]]
	; CHECK: non_zero:			; CHECK: non_zero:
	; CHECK-NEXT: [[B:%.*]] = icmp eq i64 [[X]], 0			; CHECK-NEXT: br i1 false, label [[UNREACHABLE:%.*]], label [[EXIT]]
	; CHECK-NEXT: br i1 [[B]], label [[UNREACHABLE:%.*]], label [[EXIT]]
	; CHECK: unreachable:			; CHECK: unreachable:
	; CHECK-NEXT: br label [[EXIT]]			; CHECK-NEXT: br label [[EXIT]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[C:%.]] = phi i64 [ 0, [[START:%.]] ], [ 1, [[NON_ZERO]] ], [ 2, [[UNREACHABLE]] ]			; CHECK-NEXT: [[C:%.]] = phi i64 [ 0, [[START:%.]] ], [ 1, [[NON_ZERO]] ], [ 2, [[UNREACHABLE]] ]
	; CHECK-NEXT: ret i64 [[C]]			; CHECK-NEXT: ret i64 [[C]]
	;			;
	start:			start:
	%a = icmp eq i64 %x, 0			%a = icmp eq i64 %x, 0
	Show All 12 Lines
	}			}

	define i64 @test1(i64 %x) {			define i64 @test1(i64 %x) {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: start:			; CHECK-NEXT: start:
	; CHECK-NEXT: [[A:%.]] = icmp eq i64 [[X:%.]], 0			; CHECK-NEXT: [[A:%.]] = icmp eq i64 [[X:%.]], 0
	; CHECK-NEXT: br i1 [[A]], label [[EXIT:%.]], label [[NON_ZERO:%.]]			; CHECK-NEXT: br i1 [[A]], label [[EXIT:%.]], label [[NON_ZERO:%.]]
	; CHECK: non_zero:			; CHECK: non_zero:
	; CHECK-NEXT: [[B:%.*]] = icmp ugt i64 [[X]], 0			; CHECK-NEXT: br i1 true, label [[EXIT]], label [[UNREACHABLE:%.*]]
	; CHECK-NEXT: br i1 [[B]], label [[EXIT]], label [[UNREACHABLE:%.*]]
	; CHECK: unreachable:			; CHECK: unreachable:
	; CHECK-NEXT: br label [[EXIT]]			; CHECK-NEXT: br label [[EXIT]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[C:%.]] = phi i64 [ 0, [[START:%.]] ], [ [[X]], [[NON_ZERO]] ], [ 0, [[UNREACHABLE]] ]			; CHECK-NEXT: [[C:%.]] = phi i64 [ 0, [[START:%.]] ], [ [[X]], [[NON_ZERO]] ], [ 0, [[UNREACHABLE]] ]
	; CHECK-NEXT: ret i64 [[C]]			; CHECK-NEXT: ret i64 [[C]]
	;			;
	start:			start:
	%a = icmp eq i64 %x, 0			%a = icmp eq i64 %x, 0
	Show All 18 Lines
	; CHECK-NEXT: br i1 [[A]], label [[EXIT:%.]], label [[NON_ZERO:%.]]			; CHECK-NEXT: br i1 [[A]], label [[EXIT:%.]], label [[NON_ZERO:%.]]
	; CHECK: non_zero:			; CHECK: non_zero:
	; CHECK-NEXT: br i1 [[Y:%.]], label [[ONE:%.]], label [[TWO:%.*]]			; CHECK-NEXT: br i1 [[Y:%.]], label [[ONE:%.]], label [[TWO:%.*]]
	; CHECK: one:			; CHECK: one:
	; CHECK-NEXT: br label [[MAINBLOCK:%.*]]			; CHECK-NEXT: br label [[MAINBLOCK:%.*]]
	; CHECK: two:			; CHECK: two:
	; CHECK-NEXT: br label [[MAINBLOCK]]			; CHECK-NEXT: br label [[MAINBLOCK]]
	; CHECK: mainblock:			; CHECK: mainblock:
	; CHECK-NEXT: [[P:%.*]] = phi i64 [ [[X]], [[ONE]] ], [ 42, [[TWO]] ]
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i64 [[P]], 0
	; CHECK-NEXT: br label [[EXIT]]			; CHECK-NEXT: br label [[EXIT]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[RES:%.]] = phi i1 [ [[CMP]], [[MAINBLOCK]] ], [ true, [[START:%.]] ]			; CHECK-NEXT: [[RES:%.]] = phi i1 [ false, [[MAINBLOCK]] ], [ true, [[START:%.]] ]
	; CHECK-NEXT: ret i1 [[RES]]			; CHECK-NEXT: ret i1 [[RES]]
	;			;
	start:			start:
	%a = icmp eq i64 %x, 0			%a = icmp eq i64 %x, 0
	br i1 %a, label %exit, label %non_zero			br i1 %a, label %exit, label %non_zero

	non_zero:			non_zero:
	br i1 %y, label %one, label %two			br i1 %y, label %one, label %two
	Show All 16 Lines

llvm/trunk/test/Transforms/LICM/hoist-mustexec.ll

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	for.cond.cleanup:
ret i32 %add		ret i32 %add

fail:		fail:
call void @f()		call void @f()
ret i32 -1		ret i32 -1
}		}

; requires fact length is non-zero		; requires fact length is non-zero
; TODO: IsKnownNonNullFromDominatingConditions is currently only be done for
; pointers; should handle integers too
define i32 @test4(i32* noalias nocapture readonly %a) nounwind uwtable {		define i32 @test4(i32* noalias nocapture readonly %a) nounwind uwtable {
; CHECK-LABEL: @test4(		; CHECK-LABEL: @test4(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[LEN:%.]] = load i32, i32 [[A:%.*]], align 4, !range !0		; CHECK-NEXT: [[LEN:%.]] = load i32, i32 [[A:%.*]], align 4, !range !0
; CHECK-NEXT: [[IS_ZERO:%.*]] = icmp eq i32 [[LEN]], 0		; CHECK-NEXT: [[IS_ZERO:%.*]] = icmp eq i32 [[LEN]], 0
; CHECK-NEXT: br i1 [[IS_ZERO]], label [[FAIL:%.]], label [[PREHEADER:%.]]		; CHECK-NEXT: br i1 [[IS_ZERO]], label [[FAIL:%.]], label [[PREHEADER:%.]]
; CHECK: preheader:		; CHECK: preheader:
		; CHECK-NEXT: [[I1:%.]] = load i32, i32 [[A]], align 4
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[PREHEADER]] ], [ [[INC:%.]], [[CONTINUE:%.*]] ]		; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[PREHEADER]] ], [ [[INC:%.]], [[CONTINUE:%.*]] ]
; CHECK-NEXT: [[ACC:%.]] = phi i32 [ 0, [[PREHEADER]] ], [ [[ADD:%.]], [[CONTINUE]] ]		; CHECK-NEXT: [[ACC:%.]] = phi i32 [ 0, [[PREHEADER]] ], [ [[ADD:%.]], [[CONTINUE]] ]
; CHECK-NEXT: [[R_CHK:%.*]] = icmp ult i32 [[IV]], [[LEN]]		; CHECK-NEXT: [[R_CHK:%.*]] = icmp ult i32 [[IV]], [[LEN]]
; CHECK-NEXT: br i1 [[R_CHK]], label [[CONTINUE]], label [[FAIL_LOOPEXIT:%.*]]		; CHECK-NEXT: br i1 [[R_CHK]], label [[CONTINUE]], label [[FAIL_LOOPEXIT:%.*]]
; CHECK: continue:		; CHECK: continue:
; CHECK-NEXT: [[I1:%.]] = load i32, i32 [[A]], align 4
; CHECK-NEXT: [[ADD]] = add nsw i32 [[I1]], [[ACC]]		; CHECK-NEXT: [[ADD]] = add nsw i32 [[I1]], [[ACC]]
; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[IV]], 1		; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[IV]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 1000		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 1000
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
; CHECK: for.cond.cleanup:		; CHECK: for.cond.cleanup:
; CHECK-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ [[ADD]], [[CONTINUE]] ]		; CHECK-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ [[ADD]], [[CONTINUE]] ]
; CHECK-NEXT: ret i32 [[ADD_LCSSA]]		; CHECK-NEXT: ret i32 [[ADD_LCSSA]]
; CHECK: fail.loopexit:		; CHECK: fail.loopexit:
▲ Show 20 Lines • Show All 473 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ValueTracking] Improve isKnowNonZero for Ints
Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 198575

llvm/trunk/lib/Analysis/ValueTracking.cpp

llvm/trunk/test/Transforms/InstCombine/known-non-zero.ll

llvm/trunk/test/Transforms/InstSimplify/known-non-zero.ll

llvm/trunk/test/Transforms/LICM/hoist-mustexec.ll

This is an archive of the discontinued LLVM Phabricator instance.

[ValueTracking] Improve isKnowNonZero for IntsNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 198575

llvm/trunk/lib/Analysis/ValueTracking.cpp

llvm/trunk/test/Transforms/InstCombine/known-non-zero.ll

llvm/trunk/test/Transforms/InstSimplify/known-non-zero.ll

llvm/trunk/test/Transforms/LICM/hoist-mustexec.ll

[ValueTracking] Improve isKnowNonZero for Ints
Needs ReviewPublic