Download Raw Diff

Details

Reviewers

craig.topper
asb
reames

Commits

rGeb54254b6e09: [RISCV] Return false from shouldFormOverflowOp when type is i8 and i16

Summary

i8 and i16 are not using overflow.
Reduce the number of zero extension instructions.

To reduce the uncertainty of the unknown,
most of the checks of the virtual function are kept

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

liaolucy created this revision.Feb 9 2023, 6:45 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2023, 6:45 AM

Herald added subscribers: luke, VincentWu, vkmr and 28 others. · View Herald Transcript

liaolucy requested review of this revision.Feb 9 2023, 6:45 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2023, 6:45 AM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

liaolucy added a child revision: D142071: [RISCV] Enable preferZeroCompareBranch to optimize branch on zero in codegenprepare.Feb 9 2023, 6:46 AM

Harbormaster completed remote builds in B212798: Diff 496111.Feb 9 2023, 7:35 AM

The regression in D142071 was for i64 on rv32. Do we need to disable this completely or only for that case?

liaolucy mentioned this in rGd624b9217d35: [RISCV] Add precommit tests for D143646.Feb 10 2023, 3:39 AM

rebase and add more testcases

liaolucy added inline comments.Feb 10 2023, 4:13 AM

llvm/test/CodeGen/RISCV/overflow-intrinsics.ll
84	This can be optimized in riscv DAG combine. I'll write another patch. to: %add = add i64 %b, %a %cmp = icmp ult i64 %b, 0 %Q = select i1 %cmp, i64 %b, i64 42 store i64 %add, ptr %res

In D143646#4115724, @craig.topper wrote:

The regression in D142071 was for i64 on rv32. Do we need to disable this completely or only for that case?

I added more examples and it looks like there are optimizations on some rv64

Harbormaster completed remote builds in B213013: Diff 496420.Feb 10 2023, 5:03 AM

craig.topper added inline comments.Feb 12 2023, 4:42 PM

llvm/test/CodeGen/RISCV/overflow-intrinsics.ll
84	That doesn't look right. %cmp = icmp ult i64 %b, 0 is always false. There is no value of %b that can less than 0 when treated as unsigned. 0 is the smallest value.

Solving the regression of uaddo1_math_overflow_used.

the regression testcase.

define i64 @uaddo1_math_overflow_used(i64 %a, i64 %b, ptr %res) nounwind ssp {
  %add = add i64 %b, %a
  %cmp = icmp ult i64 %add, %a ----------%a
  %Q = select i1 %cmp, i64 %b, i64 42
  store i64 %add, ptr %res
  ret i64 %Q
}

No regression, the testcase from the following of file

define i64 @uaddo2_math_overflow_used(i64 %a, i64 %b, ptr %res) nounwind ssp {
  %add = add i64 %b, %a
  %cmp = icmp ult i64 %add, %b -----------%b
  %Q = select i1 %cmp, i64 %b, i64 42
  store i64 %add, ptr %res
  ret i64 %Q
}

So I think it is good to modify it to:

define i64 @uaddo1_math_overflow_used(i64 %a, i64 %b, ptr %res) nounwind ssp {
  %add = add i64 %a, %b    --------a and b exchange positions
  %cmp = icmp ult i64 %add, %a 
  %Q = select i1 %cmp, i64 %b, i64 42
  store i64 %add, ptr %res
  ret i64 %Q
}

Harbormaster completed remote builds in B214689: Diff 498742.Feb 20 2023, 1:52 AM

craig.topper added inline comments.Feb 20 2023, 4:54 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8931 ↗	(On Diff #498742)	This replaces N0, but the variable N0 still escapes into the following code pointing at the original ADD.

craig.topper added inline comments.Feb 20 2023, 4:58 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8923 ↗	(On Diff #498742)	I don't understand this criteria. Just looking at the operands of the add is arbitrary. In your example, it seems like the select is important, but you don't check for that.

liaolucy added inline comments.Feb 20 2023, 10:38 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

8923 ↗

(On Diff #498742)

It should have nothing to do with the select instruction.
In fact selectDAG is just one more instruction than before: %10:gpr = SLTU %6:gpr, %0:gpr
IR Dump After Machine code sinking (machine-sink) : generating redundant branch instruction + copy SLTU

when add 64 is expanded, it is compared with the left side by default.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp#L3017

Optimized lowered selection DAG: %bb.0 
'uaddo1_math_overflow_used:'
SelectionDAG has 29 nodes:
  t0: ch,glue = EntryToken
    t2: i32,ch = CopyFromReg t0, Register:i32 %0
    t4: i32,ch = CopyFromReg t0, Register:i32 %1
  t11: i64 = build_pair t2, t4
    t6: i32,ch = CopyFromReg t0, Register:i32 %2
    t8: i32,ch = CopyFromReg t0, Register:i32 %3
  t12: i64 = build_pair t6, t8
  t13: i64 = add t12, t11
    t15: i1 = setcc t13, t11, setult:ch
  t17: i64 = select t15, t12, Constant:i64<42>
      t10: i32,ch = CopyFromReg t0, Register:i32 %4
    t20: ch = store<(store (s64) into %ir.res)> t0, t13, t10, undef:i32
    t23: i32 = extract_element t17, Constant:i32<0>
  t25: ch,glue = CopyToReg t20, Register:i32 $x10, t23
    t22: i32 = extract_element t17, Constant:i32<1>
  t27: ch,glue = CopyToReg t25, Register:i32 $x11, t22, t25:1
  t28: ch = RISCVISD::RET_FLAG t27, Register:i32 $x10, Register:i32 $x11, t27:1


Type-legalized selection DAG: %bb.0 'uaddo1_math_overflow_used:'
SelectionDAG has 38 nodes:
  t0: ch,glue = EntryToken
  t2: i32,ch = CopyFromReg t0, Register:i32 %0
  t4: i32,ch = CopyFromReg t0, Register:i32 %1
  t6: i32,ch = CopyFromReg t0, Register:i32 %2
  t8: i32,ch = CopyFromReg t0, Register:i32 %3
  t10: i32,ch = CopyFromReg t0, Register:i32 %4
      t44: ch = store<(store (s32) into %ir.res, align 8)> t0, t30, t10, undef:i32
        t46: i32 = add nuw t10, Constant:i32<4>
      t47: ch = store<(store (s32) into %ir.res + 4, basealign 8)> t0, t33, t46, undef:i32
    t48: ch = TokenFactor t44, t47
    t35: i32 = select t38, t6, Constant:i32<42>
  t25: ch,glue = CopyToReg t48, Register:i32 $x10, t35
    t36: i32 = select t38, t8, Constant:i32<0>
  t27: ch,glue = CopyToReg t25, Register:i32 $x11, t36, t25:1
  t30: i32 = add t6, t2
    t31: i32 = add t8, t4
    t32: i32 = setcc t30, t6, setult:ch -------- if change to :  t32: i32 = setcc t30, t2, setult:ch
  t33: i32 = add t31, t32
      t42: i32 = setcc t33, t4, seteq:ch
      t39: i32 = setcc t30, t2, setult:ch. -------  t32 and t42 are the same， t32/t42 can be delete.
      t40: i32 = setcc t33, t4, setult:ch
    t43: i32 = select t42, t39, t40
  t38: i32 = and t43, Constant:i32<1>
  t28: ch = RISCVISD::RET_FLAG t27, Register:i32 $x10, Register:i32 $x11, t27:1

craig.topper added inline comments.Feb 20 2023, 11:06 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8923 ↗	(On Diff #498742)	Ok I understand now. I'm concerned there aren't enough checks here to prevent infinite loops. If there are two setccs that both use the add but compare to a different operand, we'll infinite loop commuting the add.

liaolucy added inline comments.Feb 22 2023, 4:26 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8923 ↗	(On Diff #498742)	Thanks Craig. I didn't think of a good way to solve this problem and I will remove this code. Probably need to test benchmark to prove if this patch is really needed.

A+B u< A is the same result as A+B u< B.
There is no regression now, and there should be no logical problem.

I have a completely different proposal I will post shortly.

Harbormaster completed remote builds in B215419: Diff 499713.Feb 22 2023, 9:45 PM

liaolucy mentioned this in D144614: [LegalizeTypes][RISCV] Add a special case to ExpandIntRes_UADDSUBO for (uaddo X, 1)..Feb 23 2023, 5:01 AM

I found that i8 and i16 are better without overflow, so I kept it here and returned false when the data type is i8 and i16

overflow promote, LHS and RHS use ZExtPromotedInteger in PromoteIntRes_UADDSUBO.
add promote, LHS and RHS GetPromotedInteger int PromoteIntRes_SimpleIntBinOp.
Therefore the overflow has an extra zero extension instruction.
If there is another better way, I am willing to learn

Harbormaster completed remote builds in B218069: Diff 503325.Mar 8 2023, 6:15 AM

I'm not sure the tests are here are sufficient to motivate this change. They "noncanonical" having the 1 on the LHS of an add instead of the RHS. And they only test the +1 case. Does shouldFormOverflowOp only apply to +1?

Add a testcase where both the LHS and RHS are not immediate, and it reduces one 'and' instruction.

Harbormaster completed remote builds in B218260: Diff 503585.Mar 8 2023, 7:01 PM

craig.topper added inline comments.Mar 8 2023, 8:38 PM

llvm/lib/Target/RISCV/RISCVISelLowering.h
492	What if we did if (VT == MVT::i8 \|\| VT == MVT::i16) return false; return TargetLowering::shouldFormOverflowOp(Opcode, VT, MathUsed); That would match how you've described this patch.

Address craig.topper's comments.
Address jrtc27's comments. https://reviews.llvm.org/rG42a5dda553e8

Harbormaster completed remote builds in B218295: Diff 503624.Mar 8 2023, 10:51 PM

LGTM

This revision is now accepted and ready to land.Mar 13 2023, 8:34 PM

Herald added a subscriber: jobnoorman. · View Herald TranscriptMar 13 2023, 8:34 PM

Closed by commit rGeb54254b6e09: [RISCV] Return false from shouldFormOverflowOp when type is i8 and i16 (authored by liaolucy). · Explain WhyMar 14 2023, 5:43 AM

This revision was automatically updated to reflect the committed changes.

liaolucy added a commit: rGeb54254b6e09: [RISCV] Return false from shouldFormOverflowOp when type is i8 and i16.

Diff 505054

llvm/lib/Target/RISCV/RISCVISelLowering.h

Show First 20 Lines • Show All 481 Lines • ▼ Show 20 Lines	EmitInstrWithCustomInserter(MachineInstr &MI,
MachineBasicBlock *BB) const override;		MachineBasicBlock *BB) const override;

void AdjustInstrPostInstrSelection(MachineInstr &MI,		void AdjustInstrPostInstrSelection(MachineInstr &MI,
SDNode *Node) const override;		SDNode *Node) const override;

EVT getSetCCResultType(const DataLayout &DL, LLVMContext &Context,		EVT getSetCCResultType(const DataLayout &DL, LLVMContext &Context,
EVT VT) const override;		EVT VT) const override;

		bool shouldFormOverflowOp(unsigned Opcode, EVT VT,
		bool MathUsed) const override {
		if (VT == MVT::i8 \|\| VT == MVT::i16)
		craig.topperUnsubmitted Not Done Reply Inline Actions What if we did if (VT == MVT::i8 \|\| VT == MVT::i16) return false; return TargetLowering::shouldFormOverflowOp(Opcode, VT, MathUsed); That would match how you've described this patch. craig.topper: What if we did ``` if (VT == MVT::i8 \|\| VT == MVT::i16) return false; return TargetLowering…
		return false;

		return TargetLowering::shouldFormOverflowOp(Opcode, VT, MathUsed);
		}

bool convertSetCCLogicToBitwiseLogic(EVT VT) const override {		bool convertSetCCLogicToBitwiseLogic(EVT VT) const override {
return VT.isScalarInteger();		return VT.isScalarInteger();
}		}
bool convertSelectOfConstantsToMath(EVT VT) const override { return true; }		bool convertSelectOfConstantsToMath(EVT VT) const override { return true; }

bool preferZeroCompareBranch() const override { return true; }		bool preferZeroCompareBranch() const override { return true; }

bool shouldInsertFencesForAtomic(const Instruction *I) const override {		bool shouldInsertFencesForAtomic(const Instruction *I) const override {
▲ Show 20 Lines • Show All 330 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/overflow-intrinsics.ll

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
; RV64-NEXT: .LBB1_2:		; RV64-NEXT: .LBB1_2:
; RV64-NEXT: sd a0, 0(a2)		; RV64-NEXT: sd a0, 0(a2)
; RV64-NEXT: mv a0, a1		; RV64-NEXT: mv a0, a1
; RV64-NEXT: ret		; RV64-NEXT: ret
%add = add i64 %b, %a		%add = add i64 %b, %a
%cmp = icmp ult i64 %add, %a		%cmp = icmp ult i64 %add, %a
%Q = select i1 %cmp, i64 %b, i64 42		%Q = select i1 %cmp, i64 %b, i64 42
store i64 %add, ptr %res		store i64 %add, ptr %res
ret i64 %Q		ret i64 %Q
		liaolucyAuthorUnsubmitted Done Reply Inline Actions This can be optimized in riscv DAG combine. I'll write another patch. to: %add = add i64 %b, %a %cmp = icmp ult i64 %b, 0 %Q = select i1 %cmp, i64 %b, i64 42 store i64 %add, ptr %res liaolucy: This can be optimized in riscv DAG combine. I'll write another patch. to: ``` %add = add i64…
		craig.topperUnsubmitted Not Done Reply Inline Actions That doesn't look right. %cmp = icmp ult i64 %b, 0 is always false. There is no value of %b that can less than 0 when treated as unsigned. 0 is the smallest value. craig.topper: That doesn't look right. ``` %cmp = icmp ult i64 %b, 0 ``` is always false. There is no value…
}		}

define i64 @uaddo2_overflow_used(i64 %a, i64 %b) nounwind ssp {		define i64 @uaddo2_overflow_used(i64 %a, i64 %b) nounwind ssp {
; RV32-LABEL: uaddo2_overflow_used:		; RV32-LABEL: uaddo2_overflow_used:
; RV32: # %bb.0:		; RV32: # %bb.0:
; RV32-NEXT: add a1, a3, a1		; RV32-NEXT: add a1, a3, a1
; RV32-NEXT: add a0, a2, a0		; RV32-NEXT: add a0, a2, a0
; RV32-NEXT: sltu a0, a0, a2		; RV32-NEXT: sltu a0, a0, a2
▲ Show 20 Lines • Show All 445 Lines • ▼ Show 20 Lines	; RV64-NEXT: ret
%ov = icmp eq i64 %a, 0		%ov = icmp eq i64 %a, 0
store i64 %a, ptr %p		store i64 %a, ptr %p
ret i1 %ov		ret i1 %ov
}		}

define i1 @uaddo_i8_increment_noncanonical_1(i8 %x, ptr %p) {		define i1 @uaddo_i8_increment_noncanonical_1(i8 %x, ptr %p) {
; RV32-LABEL: uaddo_i8_increment_noncanonical_1:		; RV32-LABEL: uaddo_i8_increment_noncanonical_1:
; RV32: # %bb.0:		; RV32: # %bb.0:
; RV32-NEXT: andi a0, a0, 255
; RV32-NEXT: addi a2, a0, 1		; RV32-NEXT: addi a2, a0, 1
; RV32-NEXT: andi a0, a2, 255		; RV32-NEXT: andi a0, a2, 255
; RV32-NEXT: xor a0, a0, a2		; RV32-NEXT: seqz a0, a0
; RV32-NEXT: snez a0, a0
; RV32-NEXT: sb a2, 0(a1)		; RV32-NEXT: sb a2, 0(a1)
; RV32-NEXT: ret		; RV32-NEXT: ret
;		;
; RV64-LABEL: uaddo_i8_increment_noncanonical_1:		; RV64-LABEL: uaddo_i8_increment_noncanonical_1:
; RV64: # %bb.0:		; RV64: # %bb.0:
; RV64-NEXT: andi a0, a0, 255		; RV64-NEXT: addiw a2, a0, 1
; RV64-NEXT: addi a2, a0, 1
; RV64-NEXT: andi a0, a2, 255		; RV64-NEXT: andi a0, a2, 255
; RV64-NEXT: xor a0, a0, a2		; RV64-NEXT: seqz a0, a0
; RV64-NEXT: snez a0, a0
; RV64-NEXT: sb a2, 0(a1)		; RV64-NEXT: sb a2, 0(a1)
; RV64-NEXT: ret		; RV64-NEXT: ret
%a = add i8 1, %x ; commute		%a = add i8 1, %x ; commute
%ov = icmp eq i8 %a, 0		%ov = icmp eq i8 %a, 0
store i8 %a, ptr %p		store i8 %a, ptr %p
ret i1 %ov		ret i1 %ov
}		}

Show All 15 Lines	; RV64-NEXT: ret
%ov = icmp eq i32 0, %a ; commute		%ov = icmp eq i32 0, %a ; commute
store i32 %a, ptr %p		store i32 %a, ptr %p
ret i1 %ov		ret i1 %ov
}		}

define i1 @uaddo_i16_increment_noncanonical_3(i16 %x, ptr %p) {		define i1 @uaddo_i16_increment_noncanonical_3(i16 %x, ptr %p) {
; RV32-LABEL: uaddo_i16_increment_noncanonical_3:		; RV32-LABEL: uaddo_i16_increment_noncanonical_3:
; RV32: # %bb.0:		; RV32: # %bb.0:
; RV32-NEXT: lui a2, 16		; RV32-NEXT: addi a2, a0, 1
; RV32-NEXT: addi a2, a2, -1		; RV32-NEXT: slli a0, a2, 16
; RV32-NEXT: and a0, a0, a2		; RV32-NEXT: srli a0, a0, 16
; RV32-NEXT: addi a3, a0, 1		; RV32-NEXT: seqz a0, a0
; RV32-NEXT: and a2, a3, a2		; RV32-NEXT: sh a2, 0(a1)
; RV32-NEXT: xor a2, a2, a3
; RV32-NEXT: snez a0, a2
; RV32-NEXT: sh a3, 0(a1)
; RV32-NEXT: ret		; RV32-NEXT: ret
;		;
; RV64-LABEL: uaddo_i16_increment_noncanonical_3:		; RV64-LABEL: uaddo_i16_increment_noncanonical_3:
; RV64: # %bb.0:		; RV64: # %bb.0:
; RV64-NEXT: lui a2, 16		; RV64-NEXT: addiw a2, a0, 1
; RV64-NEXT: addiw a2, a2, -1		; RV64-NEXT: slli a0, a2, 48
; RV64-NEXT: and a0, a0, a2		; RV64-NEXT: srli a0, a0, 48
; RV64-NEXT: addi a3, a0, 1		; RV64-NEXT: seqz a0, a0
; RV64-NEXT: and a2, a3, a2		; RV64-NEXT: sh a2, 0(a1)
; RV64-NEXT: xor a2, a2, a3
; RV64-NEXT: snez a0, a2
; RV64-NEXT: sh a3, 0(a1)
; RV64-NEXT: ret		; RV64-NEXT: ret
%a = add i16 1, %x ; commute		%a = add i16 1, %x ; commute
%ov = icmp eq i16 0, %a ; commute		%ov = icmp eq i16 0, %a ; commute
store i16 %a, ptr %p		store i16 %a, ptr %p
ret i1 %ov		ret i1 %ov
}		}

; The overflow check may be against the input rather than the sum.		; The overflow check may be against the input rather than the sum.
▲ Show 20 Lines • Show All 656 Lines • ▼ Show 20 Lines	true:
%svalue = add i64 %key, -1		%svalue = add i64 %key, -1
store i64 %svalue, ptr %p64		store i64 %svalue, ptr %p64
br label %exit		br label %exit

exit:		exit:
ret void		ret void
}		}

define i16 @overflow_not_used(i16 %a, i16 %b, ptr %res) nounwind ssp {		define i16 @overflow_not_used(i16 %a, i16 %b, ptr %res) {
; RV32-LABEL: overflow_not_used:		; RV32-LABEL: overflow_not_used:
; RV32: # %bb.0:		; RV32: # %bb.0:
; RV32-NEXT: lui a3, 16		; RV32-NEXT: lui a3, 16
; RV32-NEXT: addi a3, a3, -1		; RV32-NEXT: addi a3, a3, -1
; RV32-NEXT: and a0, a0, a3
; RV32-NEXT: and a4, a1, a3		; RV32-NEXT: and a4, a1, a3
; RV32-NEXT: add a0, a4, a0		; RV32-NEXT: add a0, a1, a0
; RV32-NEXT: and a3, a0, a3		; RV32-NEXT: and a3, a0, a3
; RV32-NEXT: bne a3, a0, .LBB37_2		; RV32-NEXT: bltu a3, a4, .LBB37_2
; RV32-NEXT: # %bb.1:		; RV32-NEXT: # %bb.1:
; RV32-NEXT: li a1, 42		; RV32-NEXT: li a1, 42
; RV32-NEXT: .LBB37_2:		; RV32-NEXT: .LBB37_2:
; RV32-NEXT: sh a0, 0(a2)		; RV32-NEXT: sh a0, 0(a2)
; RV32-NEXT: mv a0, a1		; RV32-NEXT: mv a0, a1
; RV32-NEXT: ret		; RV32-NEXT: ret
;		;
; RV64-LABEL: overflow_not_used:		; RV64-LABEL: overflow_not_used:
; RV64: # %bb.0:		; RV64: # %bb.0:
; RV64-NEXT: lui a3, 16		; RV64-NEXT: lui a3, 16
; RV64-NEXT: addiw a3, a3, -1		; RV64-NEXT: addiw a3, a3, -1
; RV64-NEXT: and a0, a0, a3
; RV64-NEXT: and a4, a1, a3		; RV64-NEXT: and a4, a1, a3
; RV64-NEXT: add a0, a4, a0		; RV64-NEXT: add a0, a1, a0
; RV64-NEXT: and a3, a0, a3		; RV64-NEXT: and a3, a0, a3
; RV64-NEXT: bne a3, a0, .LBB37_2		; RV64-NEXT: bltu a3, a4, .LBB37_2
; RV64-NEXT: # %bb.1:		; RV64-NEXT: # %bb.1:
; RV64-NEXT: li a1, 42		; RV64-NEXT: li a1, 42
; RV64-NEXT: .LBB37_2:		; RV64-NEXT: .LBB37_2:
; RV64-NEXT: sh a0, 0(a2)		; RV64-NEXT: sh a0, 0(a2)
; RV64-NEXT: mv a0, a1		; RV64-NEXT: mv a0, a1
; RV64-NEXT: ret		; RV64-NEXT: ret
%add = add i16 %b, %a		%add = add i16 %b, %a
%cmp = icmp ult i16 %add, %b		%cmp = icmp ult i16 %add, %b
%Q = select i1 %cmp, i16 %b, i16 42		%Q = select i1 %cmp, i16 %b, i16 42
store i16 %add, ptr %res		store i16 %add, ptr %res
ret i16 %Q		ret i16 %Q
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Return false from shouldFormOverflowOp when type is i8 and i16
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 505054

llvm/lib/Target/RISCV/RISCVISelLowering.h

llvm/test/CodeGen/RISCV/overflow-intrinsics.ll

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Return false from shouldFormOverflowOp when type is i8 and i16ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 505054

llvm/lib/Target/RISCV/RISCVISelLowering.h

llvm/test/CodeGen/RISCV/overflow-intrinsics.ll

[RISCV] Return false from shouldFormOverflowOp when type is i8 and i16
ClosedPublic