This is an archive of the discontinued LLVM Phabricator instance.

[TargetLowering][ARM] Don't alter opaque constants in TargetLowering::ShrinkDemandedConstant.
ClosedPublic

Authored by craig.topper on Jun 23 2021, 6:43 PM.

Details

Summary

We don't constant fold based on demanded bits elsewhere in
SimplifyDemandedBits, so I don't think we should shrink them either.

The affected ARM test changes because a constant become non-opaque
and eventually enabled some constant folding. This no longer happens.
I checked and InstCombine is able to simplify this test. I'm not sure exactly
what it was trying to test.

Diff Detail

Event Timeline

craig.topper created this revision.Jun 23 2021, 6:43 PM
craig.topper requested review of this revision.Jun 23 2021, 6:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2021, 6:43 PM
craig.topper edited the summary of this revision. (Show Details)Jun 23 2021, 6:47 PM
craig.topper added reviewers: efriedma, rogfer01.
dmgreen accepted this revision.Jun 24 2021, 1:32 AM
dmgreen added a subscriber: dmgreen.

Sounds OK to me, for my understanding of opaque constants

This revision is now accepted and ready to land.Jun 24 2021, 1:32 AM

The affected ARM test changes because a constant become non-opaque
and eventually enabled some constant folding. This no longer happens.
I checked and InstCombine is able to simplify this test. I'm not sure exactly
what it was trying to test.

Looking at the original diff, back then we only cared about small subset of it where we checked that some constants were materialised in efficient ways.

 define i1 @t11() {
 entry:
   %bit = alloca i32
   %load = load i32, i32* %bit
   %clear = and i32 %load, -4096
   %set = or i32 %clear, 33
   store i32 %set, i32* %bit
   %load1 = load i32, i32* %bit
   %clear2 = and i32 %load1, -33550337
   %set3 = or i32 %clear2, 40960
   %clear5 = and i32 %set3, 4095
   %rem = srem i32 %clear5, 10
   %clear9 = and i32 %set3, -4096
   %set10 = or i32 %clear9, %rem
   store i32 %set10, i32* %bit
   %clear12 = and i32 %set10, 4095
   %cmp = icmp eq i32 %clear12, 3
   ret i1 %cmp
 
 ; ARM-LABEL: t11:
-; ARM: mov     r0, #0
-; ARM: cmp     r1, #3
-; ARM: moveq   r0, #1
+; ARM: rsbs r1, r0, #0
+; ARM: adc  r0, r0, r1
 
 ; ARMT2-LABEL: t11:
-; ARMT2: mov     r0, #0
-; ARMT2: cmp     r1, #3
-; ARMT2: movweq  r0, #1
+; ARMT2: clz r0, r0
+; ARMT2: lsr r0, r0, #5
 
 ; THUMB1-LABEL: t11:
 ; THUMB1-NOT: movs r0, #0
 ; THUMB1: movs r0, #5
 
 ; THUMB2-LABEL: t11:
-; THUMB2: movs    r0, #0
-; THUMB2: cmp     r1, #3
-; THUMB2: it      eq
-; THUMB2: moveq   r0, #1
+; THUMB2: clz r0, r0
+; THUMB2: lsrs r0, r0, #5
 }

The thumb checks (ARMT2 and THUMB*) seem still to be there. (The Arm expectation somehow seems to have been gone since 9d2df964070700ae0d244e84572ac2275050e49a in a change that seems positive overall)

When update_llc_test_check.py was applied it also included some part of the test that I understand it wasn't originally relevant.

The culprit seems to be a bitcast introduced by consthoist but I haven't tried to understand what it is doing to be honest. It looks like it is trying hard to push a constant into a (named) value (perhaps we want the constant materialised at all costs) and bitcast is a hackish way to achieve this?

*** IR Dump Before Constant Hoisting (consthoist) ***
define i1 @t11() #0 {
entry:
  %bit = alloca i32, align 4
  %load = load i32, i32* %bit, align 4
  %clear = and i32 %load, -4096
  %set = or i32 %clear, 33
  store i32 %set, i32* %bit, align 4
  %load1 = load i32, i32* %bit, align 4
  %clear2 = and i32 %load1, -33550337
  %set3 = or i32 %clear2, 40960
  %clear5 = and i32 %set3, 4095
  %rem = srem i32 %clear5, 10
  %clear9 = and i32 %set3, -4096
  %set10 = or i32 %clear9, %rem
  store i32 %set10, i32* %bit, align 4
  %clear12 = and i32 %set10, 4095
  %cmp = icmp eq i32 %clear12, 3
  ret i1 %cmp
}
*** IR Dump After Constant Hoisting (consthoist) ***
define i1 @t11() #0 {
entry:
  %const1 = bitcast i32 -4096 to i32  ; ---
  %const = bitcast i32 4095 to i32    ; ---
  %bit = alloca i32, align 4
  %load = load i32, i32* %bit, align 4
  %clear = and i32 %load, %const1
  %set = or i32 %clear, 33
  store i32 %set, i32* %bit, align 4
  %load1 = load i32, i32* %bit, align 4
  %clear2 = and i32 %load1, -33550337
  %set3 = or i32 %clear2, 40960
  %clear5 = and i32 %set3, %const
  %rem = srem i32 %clear5, 10
  %clear9 = and i32 %set3, %const1
  %set10 = or i32 %clear9, %rem
  store i32 %set10, i32* %bit, align 4
  %clear12 = and i32 %set10, %const
  %cmp = icmp eq i32 %clear12, 3
  ret i1 %cmp
}
lebedev.ri accepted this revision.Jun 24 2021, 2:25 AM

Sounds OK to me, for my understanding of opaque constants

Same

The culprit seems to be a bitcast introduced by consthoist but I haven't tried to understand what it is doing to be honest. It looks like it is trying hard to push a constant into a (named) value (perhaps we want the constant materialised at all costs) and bitcast is a hackish way to achieve this?

Yep. Constant hoisting pulls expensive constants out of the code, in order to only materialize them once. It uses a bitcast to do that, which in turn is expected to produce an opaque constant. That never seemed like a great way of doing it to me, but apparently it is fairly important for codesize in some cases. It's controlled through the costs of getIntImmCostInst, and v6m will have fairly high costs for a lot of constants.

I don't think there is much point in pulling constants out into the same block, but that seems like a separate issue.

The culprit seems to be a bitcast introduced by consthoist but I haven't tried to understand what it is doing to be honest. It looks like it is trying hard to push a constant into a (named) value (perhaps we want the constant materialised at all costs) and bitcast is a hackish way to achieve this?

Yep. Constant hoisting pulls expensive constants out of the code, in order to only materialize them once. It uses a bitcast to do that, which in turn is expected to produce an opaque constant. That never seemed like a great way of doing it to me, but apparently it is fairly important for codesize in some cases. It's controlled through the costs of getIntImmCostInst, and v6m will have fairly high costs for a lot of constants.

I don't think there is much point in pulling constants out into the same block, but that seems like a separate issue.

Thanks a lot for the explanation David, makes sense.