This is an archive of the discontinued LLVM Phabricator instance.

[CGP] Also freeze ctlz/cttz operand when despeculating
ClosedPublic

Authored by nikic on May 30 2022, 2:19 AM.

Details

Summary

D125887 changed the ctlz/cttz despeculation transform to insert a freeze for the introduced branch on zero. While this does fix the "branch on poison" issue, we may still get in trouble if we pick a different value for the branch and for the ctz argument (i.e. non-zero for the branch, but zero for the ctz). To avoid this, we should use the same frozen value in both positions.

This does cause a regression in RISCV codegen by introducing an additional sext. The DAG looks like this:

SelectionDAG has 22 nodes:
  t0: ch = EntryToken
      t2: i64,ch = CopyFromReg t0, Register:i64 %3
    t4: i64 = AssertSext t2, ValueType:ch:i32
  t23: i64 = freeze t4
        t9: ch = CopyToReg t0, Register:i64 %0, t23
        t16: ch = CopyToReg t0, Register:i64 %4, Constant:i64<32>
      t18: ch = TokenFactor t9, t16
          t25: i64 = sign_extend_inreg t23, ValueType:ch:i32
        t24: i64 = setcc t25, Constant:i64<0>, seteq:ch
      t28: i64 = and t24, Constant:i64<1>
    t19: ch = brcond t18, t28, BasicBlock:ch<cond.end 0x8311f68>
  t21: ch = br t19, BasicBlock:ch<cond.false 0x8311e80>

I don't see a really obvious way to improve this, as we can't push the freeze past the AssertSext (which may produce poison).

Diff Detail

Event Timeline

nikic created this revision.May 30 2022, 2:19 AM
nikic requested review of this revision.May 30 2022, 2:19 AM
spatel accepted this revision.Jun 8 2022, 7:15 AM

https://alive2.llvm.org/ce/z/iwd4ba
This is needed for correctness, so LGTM.

It might be good to add tests for other targets. I didn't step through this to see why, but the code for AArch and ARM actually looks better when we re-use the frozen value.

rbit	w9, w0
mov	w8, #32
clz	w9, w9
cmp	w0, #0
csel	w0, w8, w9, eq

vs.

rbit	w8, w0
clz	w0, w8
This revision is now accepted and ready to land.Jun 8 2022, 7:15 AM
nikic added a comment.Jun 9 2022, 2:22 AM

https://alive2.llvm.org/ce/z/iwd4ba
This is needed for correctness, so LGTM.

It might be good to add tests for other targets. I didn't step through this to see why, but the code for AArch and ARM actually looks better when we re-use the frozen value.

rbit	w9, w0
mov	w8, #32
clz	w9, w9
cmp	w0, #0
csel	w0, w8, w9, eq

vs.

rbit	w8, w0
clz	w0, w8

For what input is that? As far as I can tell, AArch64 never uses despeculation, and ARM uses a libcall legalization where it makes no difference. I'm trying this one:

define i32 @ctlz_i32(i32 %a) nounwind {
  %r = call i32 @llvm.ctlz.i32(i32 %a, i1 false)
  ret i32 %r
}

declare i32 @llvm.ctlz.i32(i32, i1)
spatel added a comment.Jun 9 2022, 6:08 AM

For what input is that? As far as I can tell, AArch64 never uses despeculation, and ARM uses a libcall legalization where it makes no difference. I'm trying this one:

I was experimenting with tests based on the IR in the regression test in this patch, so just changing the frozen value alone after doing the despeculation:

define i32 @tgt(i32 %A) {
entry:
  %f = freeze i32 %A
  %i = icmp eq i32 %f, 0
  br i1 %i, label %end, label %false
false:
  %z = call i32 @llvm.ctlz.i32(i32 %A, i1 true)
  br label %end
end:
  %ctz = phi i32 [ 32, %entry ], [ %z, %false ]
  ret i32 %ctz
}

define i32 @tgt2(i32 %A) {
entry:
  %f = freeze i32 %A
  %i = icmp eq i32 %f, 0
  br i1 %i, label %end, label %false
false:
  %z = call i32 @llvm.ctlz.i32(i32 %f, i1 true)
  br label %end
end:
  %ctz = phi i32 [ 32, %entry ], [ %z, %false ]
  ret i32 %ctz
}

And then sending that through llc with:
$ llc -o - ctlz.ll -mtriple=aarch64
$ llc -o - ctlz.ll -mtriple=armv7

I used 'v7' with arm; I do see the libcalls with base 'arm' codegen.

nikic added a comment.Jun 9 2022, 8:24 AM

For what input is that? As far as I can tell, AArch64 never uses despeculation, and ARM uses a libcall legalization where it makes no difference. I'm trying this one:

I was experimenting with tests based on the IR in the regression test in this patch, so just changing the frozen value alone after doing the despeculation:
[...]
And then sending that through llc with:
$ llc -o - ctlz.ll -mtriple=aarch64
$ llc -o - ctlz.ll -mtriple=armv7

I used 'v7' with arm; I do see the libcalls with base 'arm' codegen.

Okay, I see. I don't think adding those tests would be super meaningful, because both aarch64/armv7 disable ctz despeculation via TargetLowering hooks, so I don't think we'd particularly care how the pattern would optimize.

This revision was landed with ongoing or failed builds.Jun 10 2022, 12:48 AM
This revision was automatically updated to reflect the committed changes.