This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Return false from shouldFormOverflowOp when type is i8 and i16
ClosedPublic

Authored by liaolucy on Feb 9 2023, 6:45 AM.

Details

Summary

i8 and i16 are not using overflow.
Reduce the number of zero extension instructions.

To reduce the uncertainty of the unknown,
most of the checks of the virtual function are kept

Diff Detail

Event Timeline

liaolucy created this revision.Feb 9 2023, 6:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2023, 6:45 AM
liaolucy requested review of this revision.Feb 9 2023, 6:45 AM

The regression in D142071 was for i64 on rv32. Do we need to disable this completely or only for that case?

liaolucy updated this revision to Diff 496420.Feb 10 2023, 4:07 AM
liaolucy edited the summary of this revision. (Show Details)

rebase and add more testcases

liaolucy added inline comments.Feb 10 2023, 4:13 AM
llvm/test/CodeGen/RISCV/overflow-intrinsics.ll
84

This can be optimized in riscv DAG combine. I'll write another patch.
to:

%add = add i64 %b, %a
%cmp = icmp ult i64 %b, 0
%Q = select i1 %cmp, i64 %b, i64 42
store i64 %add, ptr %res

The regression in D142071 was for i64 on rv32. Do we need to disable this completely or only for that case?

I added more examples and it looks like there are optimizations on some rv64

craig.topper added inline comments.Feb 12 2023, 4:42 PM
llvm/test/CodeGen/RISCV/overflow-intrinsics.ll
84

That doesn't look right.

%cmp = icmp ult i64 %b, 0

is always false. There is no value of %b that can less than 0 when treated as unsigned. 0 is the smallest value.

liaolucy updated this revision to Diff 498742.Feb 19 2023, 11:57 PM

Solving the regression of uaddo1_math_overflow_used.

the regression testcase.

define i64 @uaddo1_math_overflow_used(i64 %a, i64 %b, ptr %res) nounwind ssp {
  %add = add i64 %b, %a
  %cmp = icmp ult i64 %add, %a ----------%a
  %Q = select i1 %cmp, i64 %b, i64 42
  store i64 %add, ptr %res
  ret i64 %Q
}

No regression, the testcase from the following of file

define i64 @uaddo2_math_overflow_used(i64 %a, i64 %b, ptr %res) nounwind ssp {
  %add = add i64 %b, %a
  %cmp = icmp ult i64 %add, %b -----------%b
  %Q = select i1 %cmp, i64 %b, i64 42
  store i64 %add, ptr %res
  ret i64 %Q
}

So I think it is good to modify it to:

define i64 @uaddo1_math_overflow_used(i64 %a, i64 %b, ptr %res) nounwind ssp {
  %add = add i64 %a, %b    --------a and b exchange positions
  %cmp = icmp ult i64 %add, %a 
  %Q = select i1 %cmp, i64 %b, i64 42
  store i64 %add, ptr %res
  ret i64 %Q
}
craig.topper added inline comments.Feb 20 2023, 4:54 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8931 ↗(On Diff #498742)

This replaces N0, but the variable N0 still escapes into the following code pointing at the original ADD.

craig.topper added inline comments.Feb 20 2023, 4:58 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8923 ↗(On Diff #498742)

I don't understand this criteria. Just looking at the operands of the add is arbitrary. In your example, it seems like the select is important, but you don't check for that.

liaolucy added inline comments.Feb 20 2023, 10:38 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8923 ↗(On Diff #498742)

It should have nothing to do with the select instruction.
In fact selectDAG is just one more instruction than before: %10:gpr = SLTU %6:gpr, %0:gpr
IR Dump After Machine code sinking (machine-sink) : generating redundant branch instruction + copy SLTU

when add 64 is expanded, it is compared with the left side by default.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp#L3017

Optimized lowered selection DAG: %bb.0 
'uaddo1_math_overflow_used:'
SelectionDAG has 29 nodes:
  t0: ch,glue = EntryToken
    t2: i32,ch = CopyFromReg t0, Register:i32 %0
    t4: i32,ch = CopyFromReg t0, Register:i32 %1
  t11: i64 = build_pair t2, t4
    t6: i32,ch = CopyFromReg t0, Register:i32 %2
    t8: i32,ch = CopyFromReg t0, Register:i32 %3
  t12: i64 = build_pair t6, t8
  t13: i64 = add t12, t11
    t15: i1 = setcc t13, t11, setult:ch
  t17: i64 = select t15, t12, Constant:i64<42>
      t10: i32,ch = CopyFromReg t0, Register:i32 %4
    t20: ch = store<(store (s64) into %ir.res)> t0, t13, t10, undef:i32
    t23: i32 = extract_element t17, Constant:i32<0>
  t25: ch,glue = CopyToReg t20, Register:i32 $x10, t23
    t22: i32 = extract_element t17, Constant:i32<1>
  t27: ch,glue = CopyToReg t25, Register:i32 $x11, t22, t25:1
  t28: ch = RISCVISD::RET_FLAG t27, Register:i32 $x10, Register:i32 $x11, t27:1


Type-legalized selection DAG: %bb.0 'uaddo1_math_overflow_used:'
SelectionDAG has 38 nodes:
  t0: ch,glue = EntryToken
  t2: i32,ch = CopyFromReg t0, Register:i32 %0
  t4: i32,ch = CopyFromReg t0, Register:i32 %1
  t6: i32,ch = CopyFromReg t0, Register:i32 %2
  t8: i32,ch = CopyFromReg t0, Register:i32 %3
  t10: i32,ch = CopyFromReg t0, Register:i32 %4
      t44: ch = store<(store (s32) into %ir.res, align 8)> t0, t30, t10, undef:i32
        t46: i32 = add nuw t10, Constant:i32<4>
      t47: ch = store<(store (s32) into %ir.res + 4, basealign 8)> t0, t33, t46, undef:i32
    t48: ch = TokenFactor t44, t47
    t35: i32 = select t38, t6, Constant:i32<42>
  t25: ch,glue = CopyToReg t48, Register:i32 $x10, t35
    t36: i32 = select t38, t8, Constant:i32<0>
  t27: ch,glue = CopyToReg t25, Register:i32 $x11, t36, t25:1
  t30: i32 = add t6, t2
    t31: i32 = add t8, t4
    t32: i32 = setcc t30, t6, setult:ch -------- if change to :  t32: i32 = setcc t30, t2, setult:ch
  t33: i32 = add t31, t32
      t42: i32 = setcc t33, t4, seteq:ch
      t39: i32 = setcc t30, t2, setult:ch. -------  t32 and t42 are the same, t32/t42 can be delete.
      t40: i32 = setcc t33, t4, setult:ch
    t43: i32 = select t42, t39, t40
  t38: i32 = and t43, Constant:i32<1>
  t28: ch = RISCVISD::RET_FLAG t27, Register:i32 $x10, Register:i32 $x11, t27:1
craig.topper added inline comments.Feb 20 2023, 11:06 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8923 ↗(On Diff #498742)

Ok I understand now. I'm concerned there aren't enough checks here to prevent infinite loops. If there are two setccs that both use the add but compare to a different operand, we'll infinite loop commuting the add.

liaolucy added inline comments.Feb 22 2023, 4:26 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8923 ↗(On Diff #498742)

Thanks Craig.
I didn't think of a good way to solve this problem and I will remove this code. Probably need to test benchmark to prove if this patch is really needed.

liaolucy updated this revision to Diff 499713.Feb 22 2023, 7:46 PM

A+B u< A is the same result as A+B u< B.
There is no regression now, and there should be no logical problem.

I have a completely different proposal I will post shortly.

liaolucy updated this revision to Diff 503325.Mar 8 2023, 4:49 AM
liaolucy retitled this revision from [RISCV] Return false from shouldFormOverflowOp to [RISCV] Return false from shouldFormOverflowOp when type is i8 and i16.
liaolucy edited the summary of this revision. (Show Details)

I found that i8 and i16 are better without overflow, so I kept it here and returned false when the data type is i8 and i16

overflow promote, LHS and RHS use ZExtPromotedInteger in PromoteIntRes_UADDSUBO.
add promote, LHS and RHS GetPromotedInteger int PromoteIntRes_SimpleIntBinOp.
Therefore the overflow has an extra zero extension instruction.
If there is another better way, I am willing to learn

I'm not sure the tests are here are sufficient to motivate this change. They "noncanonical" having the 1 on the LHS of an add instead of the RHS. And they only test the +1 case. Does shouldFormOverflowOp only apply to +1?

liaolucy updated this revision to Diff 503585.Mar 8 2023, 6:05 PM

Add a testcase where both the LHS and RHS are not immediate, and it reduces one 'and' instruction.

craig.topper added inline comments.Mar 8 2023, 8:38 PM
llvm/lib/Target/RISCV/RISCVISelLowering.h
492

What if we did

if (VT == MVT::i8 || VT == MVT::i16)
  return false;

return TargetLowering::shouldFormOverflowOp(Opcode, VT, MathUsed);

That would match how you've described this patch.

liaolucy updated this revision to Diff 503624.Mar 8 2023, 10:04 PM

Address craig.topper's comments.
Address jrtc27's comments. https://reviews.llvm.org/rG42a5dda553e8

This revision is now accepted and ready to land.Mar 13 2023, 8:34 PM