This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Enable preferZeroCompareBranch to optimize branch on zero in codegenprepare
ClosedPublic

Authored by liaolucy on Jan 18 2023, 6:23 PM.

Details

Diff Detail

Event Timeline

liaolucy created this revision.Jan 18 2023, 6:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2023, 6:23 PM
liaolucy requested review of this revision.Jan 18 2023, 6:23 PM
liaolucy updated this revision to Diff 490352.Jan 18 2023, 6:25 PM

Update, we can see the diff of testcases.

asb added a comment.Jan 19 2023, 5:12 AM

Thanks for this! We typically add CHECK lines for both RV32 and RV64, and I think it would be worth doing here. I did look at this before, but held off as it seemed to cause a regression on one of the SystemZ test cases and I didn't have a chance to look at it.

Trying now, it looks like that's only currently the case on optbranch_64 with RV32 - I don't know if you've looked at that regression at all?

It might be worth grabbing the optbranch_32 test case from SystemZ as well to expand the test coverage.

liaolucy updated this revision to Diff 491212.EditedJan 22 2023, 4:48 PM
  1. grabbing the optbranch_32 test case from SystemZ
  2. add CHECK lines for RV32
  1. optbranch_64 with RV32 has regression,thanks Alex. I don't have a solution for the regression now. Any suggestions?

I analyze ARM and SystemZ:

RISCV: after codegenprepare, generate @llvm.uadd.with.overflow.i64 + extractvalue,which causes the regression

*** IR Dump After CodeGen Prepare (codegenprepare) ***
define i64 @optbranch_64(i64 %Arg) {
bb:
  %0 = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %Arg, i64 1)
  %math = extractvalue { i64, i1 } %0, 0
  %ov = extractvalue { i64, i1 } %0, 1
  br i1 %ov, label %bb2, label %bb3

bb2:                                              ; preds = %bb
  ret i64 -1

bb3:                                              ; preds = %bb
  ret i64 %math
}

ARM has simplifycfg pass before codegenprepare:

*** IR Dump After Simplify the CFG (simplifycfg) ***
define i64 @optbranch_64(i64 %Arg) {
bb:
  %i1 = icmp eq i64 %Arg, -1
  %i4 = add nuw i64 %Arg, 1
  %common.ret.op = select i1 %i1, i64 -1, i64 %i4
  ret i64 %common.ret.op
}

systemZ is the same as RISCV, generate @llvm.uadd.with.overflow.i64 + extractvalue, but SystemZ seems to have the register pairs $r2d.

 *** IR Dump After SystemZ DAG->DAG Pattern Instruction Selection (systemz-isel) ***:
 Machine code for function optbranch_64: IsSSA, TracksLiveness
Function Live Ins: $r2d in %1

bb.0.bb:
  successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%)
  liveins: $r2d
  %1:gr64bit = COPY $r2d
  %0:gr64bit = ALGHSIK %1:gr64bit, 1, implicit-def $cc
  BRC 15, 12, %bb.2, implicit $cc
  J %bb.1

bb.1.bb2:
; predecessors: %bb.0

  %2:gr64bit = LGHI -1
  $r2d = COPY %2:gr64bit
  Return implicit $r2d

bb.2.bb3:
; predecessors: %bb.0

  $r2d = COPY %0:gr64bit
  Return implicit $r2d
reames added a subscriber: reames.Jan 30 2023, 11:03 AM

Can you land the test with current codegen so that we can see the regression being discussed?

liaolucy updated this revision to Diff 494175.Feb 1 2023, 11:42 PM

@reames Sorry for the late.

precommit tests:https://reviews.llvm.org/rG66a30b9fd0bab0d5db24772674932a66d433212c

I plan to implement similar code in riscv-codegenprepare, including only the initial Shift. This should not have the regression for add.

reames added a comment.Feb 2 2023, 7:50 AM

Based on your prior comment, I think your regression here comes down to how we lower uadd_with_overflow. It looks like our lowering of that could use some improvement.

You could investigate improving that and then returning here. Another option to consider is where we should be returning false from TLI->shouldFormOverflowOp. (Either case, the result should be a separate patch and this one should then depend on it.)

Based on your prior comment, I think your regression here comes down to how we lower uadd_with_overflow. It looks like our lowering of that could use some improvement.

You could investigate improving that and then returning here. Another option to consider is where we should be returning false from TLI->shouldFormOverflowOp. (Either case, the result should be a separate patch and this one should then depend on it.)

Thanks.

A deeper look at the ARM's lowering.
RISCV may need setOperationAction(ISD::ADDCARRY, MVT::i32, Custom);

Reference @rogfer01’s D35192

Testcase:

declare {i64, i1} @llvm.uadd.with.overflow.i64(i64, i64) nounwind readnone
define i64 @optbranch_64(i64 %Arg) {
bb:
  %0 = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %Arg, i64 -1)
  %math = extractvalue { i64, i1 } %0, 0
  %ov = extractvalue { i64, i1 } %0, 1
  %.math = select i1 %ov, i64 -1, i64 %math
  ret i64 %.math
}

Step 2: Optimized lowered selection DAG: %bb.0, Arm and RISCV are the same.

SelectionDAG has 18 nodes:
  t0: ch,glue = EntryToken
      t2: i32,ch = CopyFromReg t0, Register:i32 %0
      t4: i32,ch = CopyFromReg t0, Register:i32 %1
    t5: i64 = build_pair t2, t4
  t7: i64,i1 = uaddo t5, Constant:i64<-1>
  t8: i64 = select t7:1, Constant:i64<-1>, t7
    t12: i32 = extract_element t8, Constant:i32<0>
  t14: ch,glue = CopyToReg t0, Register:i32 $r0, t12
    t10: i32 = extract_element t8, Constant:i32<1>
  t16: ch,glue = CopyToReg t14, Register:i32 $r1, t10, t14:1
  t17: ch = ARMISD::RET_FLAG t16, Register:i32 $r0, Register:i32 $r1, t16:1

but step 3,
ARM: Type-legalized selection DAG:

Type-legalized selection DAG: %bb.0 'optbranch_64:bb'
SelectionDAG has 18 nodes:
  t0: ch,glue = EntryToken
    t23: i32 = select t28, Constant:i32<-1>, t21
  t14: ch,glue = CopyToReg t0, Register:i32 $r0, t23
    t24: i32 = select t28, Constant:i32<-1>, t22
  t16: ch,glue = CopyToReg t14, Register:i32 $r1, t24, t14:1
    t2: i32,ch = CopyFromReg t0, Register:i32 %0
  t21: i32,i32 = uaddo t2, Constant:i32<-1>
    t4: i32,ch = CopyFromReg t0, Register:i32 %1
    t26: i32 = and t21:1, Constant:i32<1>
  t22: i32,i32 = addcarry t4, Constant:i32<-1>, t26
  t28: i32 = and t22:1, Constant:i32<1>
  t17: ch = ARMISD::RET_FLAG t16, Register:i32 $r0, Register:i32 $r1, t16:1

RISCV:
Type-legalized selection DAG: %bb.0 'optbranch_64:bb'

SelectionDAG has 24 nodes:
  t0: ch,glue = EntryToken
  t2: i32,ch = CopyFromReg t0, Register:i32 %0
  t4: i32,ch = CopyFromReg t0, Register:i32 %1
    t31: i32 = select t38, Constant:i32<-1>, t26
  t14: ch,glue = CopyToReg t0, Register:i32 $x10, t31
    t32: i32 = select t38, Constant:i32<-1>, t29
  t16: ch,glue = CopyToReg t14, Register:i32 $x11, t32, t14:1
  t26: i32 = add t2, Constant:i32<-1>
  t28: i32 = setcc t26, t2, setult:ch
    t27: i32 = add t4, Constant:i32<-1>
  t29: i32 = add t27, t28
      t35: i32 = setcc t29, t4, seteq:ch
      t33: i32 = setcc t29, t4, setult:ch
    t36: i32 = select t35, t28, t33
  t38: i32 = and t36, Constant:i32<1>
  t17: ch = RISCVISD::RET_FLAG t16, Register:i32 $x10, Register:i32 $x11, t16:1

ARM seems to have introduced new nodes ARMISD::ADDC and ARMISD::ADDE
I need more time to implement this feature because I haven't sorted out the logic yet.

If we want to solve this regression for @optbranch_64, we need to lowering ISD::ADDCARRY and ISD::SETCCCARRY.

But I checked these nodes all used for carry bit/flags. For rv64gc/rv32gc we do not have instructions to handle carry-add or carry-sub.

Maybe this lowering can be used in rvv. They have 11.4. Vector Integer Add-with-Carry / Subtract-with-Borrow Instructions

So I will create a patch for TLI->shouldFormOverflowOp.

craig.topper accepted this revision.Feb 23 2023, 7:34 PM

LGTM, but wait a day or two to see if @asb or @reames have any comments.

This revision is now accepted and ready to land.Feb 23 2023, 7:34 PM
This revision was landed with ongoing or failed builds.Feb 27 2023, 10:43 PM
This revision was automatically updated to reflect the committed changes.