Similar to ARM and SystemZ.
Related Patchs: D101778(preferZeroCompareBranch)
https://reviews.llvm.org/rG9a9421a461166482465e786a46f8cced63cd2e9f ( == 0 to u< 1)
Differential D142071
[RISCV] Enable preferZeroCompareBranch to optimize branch on zero in codegenprepare liaolucy on Jan 18 2023, 6:23 PM. Authored by
Details Similar to ARM and SystemZ. Related Patchs: D101778(preferZeroCompareBranch)
Diff Detail
Event TimelineComment Actions Thanks for this! We typically add CHECK lines for both RV32 and RV64, and I think it would be worth doing here. I did look at this before, but held off as it seemed to cause a regression on one of the SystemZ test cases and I didn't have a chance to look at it. Trying now, it looks like that's only currently the case on optbranch_64 with RV32 - I don't know if you've looked at that regression at all? It might be worth grabbing the optbranch_32 test case from SystemZ as well to expand the test coverage. Comment Actions
I analyze ARM and SystemZ: RISCV: after codegenprepare, generate @llvm.uadd.with.overflow.i64 + extractvalue,which causes the regression *** IR Dump After CodeGen Prepare (codegenprepare) *** define i64 @optbranch_64(i64 %Arg) { bb: %0 = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %Arg, i64 1) %math = extractvalue { i64, i1 } %0, 0 %ov = extractvalue { i64, i1 } %0, 1 br i1 %ov, label %bb2, label %bb3 bb2: ; preds = %bb ret i64 -1 bb3: ; preds = %bb ret i64 %math } ARM has simplifycfg pass before codegenprepare: *** IR Dump After Simplify the CFG (simplifycfg) *** define i64 @optbranch_64(i64 %Arg) { bb: %i1 = icmp eq i64 %Arg, -1 %i4 = add nuw i64 %Arg, 1 %common.ret.op = select i1 %i1, i64 -1, i64 %i4 ret i64 %common.ret.op } systemZ is the same as RISCV, generate @llvm.uadd.with.overflow.i64 + extractvalue, but SystemZ seems to have the register pairs $r2d. *** IR Dump After SystemZ DAG->DAG Pattern Instruction Selection (systemz-isel) ***: Machine code for function optbranch_64: IsSSA, TracksLiveness Function Live Ins: $r2d in %1 bb.0.bb: successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%) liveins: $r2d %1:gr64bit = COPY $r2d %0:gr64bit = ALGHSIK %1:gr64bit, 1, implicit-def $cc BRC 15, 12, %bb.2, implicit $cc J %bb.1 bb.1.bb2: ; predecessors: %bb.0 %2:gr64bit = LGHI -1 $r2d = COPY %2:gr64bit Return implicit $r2d bb.2.bb3: ; predecessors: %bb.0 $r2d = COPY %0:gr64bit Return implicit $r2d Comment Actions Can you land the test with current codegen so that we can see the regression being discussed? Comment Actions @reames Sorry for the late. precommit tests:https://reviews.llvm.org/rG66a30b9fd0bab0d5db24772674932a66d433212c I plan to implement similar code in riscv-codegenprepare, including only the initial Shift. This should not have the regression for add. Comment Actions Based on your prior comment, I think your regression here comes down to how we lower uadd_with_overflow. It looks like our lowering of that could use some improvement. You could investigate improving that and then returning here. Another option to consider is where we should be returning false from TLI->shouldFormOverflowOp. (Either case, the result should be a separate patch and this one should then depend on it.) Comment Actions Thanks. A deeper look at the ARM's lowering. Testcase: declare {i64, i1} @llvm.uadd.with.overflow.i64(i64, i64) nounwind readnone define i64 @optbranch_64(i64 %Arg) { bb: %0 = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %Arg, i64 -1) %math = extractvalue { i64, i1 } %0, 0 %ov = extractvalue { i64, i1 } %0, 1 %.math = select i1 %ov, i64 -1, i64 %math ret i64 %.math } Step 2: Optimized lowered selection DAG: %bb.0, Arm and RISCV are the same. SelectionDAG has 18 nodes: t0: ch,glue = EntryToken t2: i32,ch = CopyFromReg t0, Register:i32 %0 t4: i32,ch = CopyFromReg t0, Register:i32 %1 t5: i64 = build_pair t2, t4 t7: i64,i1 = uaddo t5, Constant:i64<-1> t8: i64 = select t7:1, Constant:i64<-1>, t7 t12: i32 = extract_element t8, Constant:i32<0> t14: ch,glue = CopyToReg t0, Register:i32 $r0, t12 t10: i32 = extract_element t8, Constant:i32<1> t16: ch,glue = CopyToReg t14, Register:i32 $r1, t10, t14:1 t17: ch = ARMISD::RET_FLAG t16, Register:i32 $r0, Register:i32 $r1, t16:1 but step 3, Type-legalized selection DAG: %bb.0 'optbranch_64:bb' SelectionDAG has 18 nodes: t0: ch,glue = EntryToken t23: i32 = select t28, Constant:i32<-1>, t21 t14: ch,glue = CopyToReg t0, Register:i32 $r0, t23 t24: i32 = select t28, Constant:i32<-1>, t22 t16: ch,glue = CopyToReg t14, Register:i32 $r1, t24, t14:1 t2: i32,ch = CopyFromReg t0, Register:i32 %0 t21: i32,i32 = uaddo t2, Constant:i32<-1> t4: i32,ch = CopyFromReg t0, Register:i32 %1 t26: i32 = and t21:1, Constant:i32<1> t22: i32,i32 = addcarry t4, Constant:i32<-1>, t26 t28: i32 = and t22:1, Constant:i32<1> t17: ch = ARMISD::RET_FLAG t16, Register:i32 $r0, Register:i32 $r1, t16:1 RISCV: SelectionDAG has 24 nodes: t0: ch,glue = EntryToken t2: i32,ch = CopyFromReg t0, Register:i32 %0 t4: i32,ch = CopyFromReg t0, Register:i32 %1 t31: i32 = select t38, Constant:i32<-1>, t26 t14: ch,glue = CopyToReg t0, Register:i32 $x10, t31 t32: i32 = select t38, Constant:i32<-1>, t29 t16: ch,glue = CopyToReg t14, Register:i32 $x11, t32, t14:1 t26: i32 = add t2, Constant:i32<-1> t28: i32 = setcc t26, t2, setult:ch t27: i32 = add t4, Constant:i32<-1> t29: i32 = add t27, t28 t35: i32 = setcc t29, t4, seteq:ch t33: i32 = setcc t29, t4, setult:ch t36: i32 = select t35, t28, t33 t38: i32 = and t36, Constant:i32<1> t17: ch = RISCVISD::RET_FLAG t16, Register:i32 $x10, Register:i32 $x11, t16:1 ARM seems to have introduced new nodes ARMISD::ADDC and ARMISD::ADDE Comment Actions If we want to solve this regression for @optbranch_64, we need to lowering ISD::ADDCARRY and ISD::SETCCCARRY. But I checked these nodes all used for carry bit/flags. For rv64gc/rv32gc we do not have instructions to handle carry-add or carry-sub. Maybe this lowering can be used in rvv. They have 11.4. Vector Integer Add-with-Carry / Subtract-with-Borrow Instructions So I will create a patch for TLI->shouldFormOverflowOp. |