`ISD::SADDO` uses the suggested sequence described in the section §2.4 of the RISCV Spec v2.2. `ISD::SSUBO` uses the dual approach but checking for positive.

# Details

# Diff Detail

- Repository
- rL LLVM

### Event Timeline

It would probably make sense to change SelectionDAGLegalize::ExpandNode to use this lowering for SADDO/SSUBO, instead of making this target-specific. The target-independent version uses approximately the same operations anyway, just in a less efficient way.

The lowering for UADDO/USUBO appears to be essentially identical to the lowering in SelectionDAGLegalize::ExpandNode, so there isn't really any point.

lib/Target/RISCV/RISCVISelLowering.cpp | ||
---|---|---|

553 ↗ | (On Diff #150448) | The XOR here is a bit weird; could you just use SETGE instead? |

ChangeLog:

- Remove RISC-V dependent expansions
- Simplify current target-independent expansion of
`S{ADD,SUB}O`. Not considering`U{ADD,SUB}O`anymore. - Update tests that saw codegen changes after this

For now marking this as WIP because I want to test this some more.

Also test `CodeGen/AMDGPU/saddo.ll` needs updating but I'm rather clueless when it comes to that target.

The changes for vector arithmetic look nice.

llvm/test/CodeGen/RISCV/arith-with-overflow.ll | ||
---|---|---|

34 ↗ | (On Diff #213056) | It's probably worth adding a testcase where the i1 result is used as the operand to a branch, to show we correctly fold the xor into the branch. (See also https://bugs.llvm.org/show_bug.cgi?id=42876 .) |

ChangeLog:

- Update AMDGPU test.
- Add RISC-V tests to show that we fold the xor in the branch.

one minor - @efriedma / @lebedev.ri anything to add?

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp | ||
---|---|---|

6494 ↗ | (On Diff #213359) | Not sure if its necessary, but you could merge the comments and just have: SDValue ConditionRHS = DAG.getSetCC(dl, OType, RHS, Zero, IsAdd ? ISD::SETLT : ISD::SETGT); |

Correct in general case:

---------------------------------------- Name: sadd %t = sadd_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = sadd_overflow i8 %LHS, %RHS %v0 = add i8 %LHS, %RHS %ResultLowerThanLHS = icmp slt i8 %v0, %LHS %ConditionRHS = icmp slt i8 %RHS, 0 %v1 = xor i1 %ConditionRHS, %ResultLowerThanLHS Done: 1 Optimization is correct! ---------------------------------------- Name: ssub %t = ssub_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = ssub_overflow i8 %LHS, %RHS %v0 = sub i8 %LHS, %RHS %ResultLowerThanLHS = icmp slt i8 %v0, %LHS %ConditionRHS = icmp sgt i8 %RHS, 0 %v1 = xor i1 %ConditionRHS, %ResultLowerThanLHS Done: 1 Optimization is correct!

Invalid for `undef`:

---------------------------------------- Name: sadd %t = sadd_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = sadd_overflow i8 %LHS, %RHS %v0 = add i8 %LHS, %RHS %ResultLowerThanLHS = icmp slt i8 %v0, %LHS %ConditionRHS = icmp slt i8 %RHS, 0 %v1 = xor i1 %ConditionRHS, %ResultLowerThanLHS ERROR: Value mismatch for i1 %v1 Example: i8 %LHS = #x00 (0) i8 %RHS = undef {i8, i1} %t = { #x00 (0), #x0 (0) } [based on undef value] i8 %v0 = undef i1 %ResultLowerThanLHS = undef i1 %ConditionRHS = undef Source value: #x0 (0) Target value: #x1 (1) ---------------------------------------- Name: ssub %t = ssub_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = ssub_overflow i8 %LHS, %RHS %v0 = sub i8 %LHS, %RHS %ResultLowerThanLHS = icmp slt i8 %v0, %LHS %ConditionRHS = icmp sgt i8 %RHS, 0 %v1 = xor i1 %ConditionRHS, %ResultLowerThanLHS ERROR: Value mismatch for i1 %v1 Example: i8 %LHS = undef i8 %RHS = #x00 (0) {i8, i1} %t = { #x00 (0), #x0 (0) } [based on undef value] i8 %v0 = undef i1 %ResultLowerThanLHS = undef i1 %ConditionRHS = #x0 (0) Source value: #x0 (0) Target value: #x1 (1)

Though the current expansion is just as incorrect for `undef`:

---------------------------------------- Name: sadd %t = sadd_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = sadd_overflow i8 %LHS, %RHS %v0 = add i8 %LHS, %RHS %LHSSign = icmp sge i8 %LHS, 0 %RHSSign = icmp sge i8 %RHS, 0 %SignsMatch = icmp eq i1 %LHSSign, %RHSSign %SumSign = icmp sge i8 %v0, 0 %SumSignNE = icmp ne i1 %LHSSign, %SumSign %v1 = and i1 %SignsMatch, %SumSignNE Done: 1 Optimization is correct! ---------------------------------------- Name: ssub %t = ssub_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = ssub_overflow i8 %LHS, %RHS %v0 = sub i8 %LHS, %RHS %LHSSign = icmp sge i8 %LHS, 0 %RHSSign = icmp sge i8 %RHS, 0 %SignsMatch = icmp ne i1 %LHSSign, %RHSSign %SumSign = icmp sge i8 %v0, 0 %SumSignNE = icmp ne i1 %LHSSign, %SumSign %v1 = and i1 %SignsMatch, %SumSignNE Done: 1 Optimization is correct!

---------------------------------------- Name: sadd %t = sadd_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = sadd_overflow i8 %LHS, %RHS %v0 = add i8 %LHS, %RHS %LHSSign = icmp sge i8 %LHS, 0 %RHSSign = icmp sge i8 %RHS, 0 %SignsMatch = icmp eq i1 %LHSSign, %RHSSign %SumSign = icmp sge i8 %v0, 0 %SumSignNE = icmp ne i1 %LHSSign, %SumSign %v1 = and i1 %SignsMatch, %SumSignNE ERROR: Value mismatch for i1 %v1 Example: i8 %LHS = #x00 (0) i8 %RHS = undef {i8, i1} %t = { #x00 (0), #x0 (0) } [based on undef value] i8 %v0 = undef i1 %LHSSign = #x1 (1) i1 %RHSSign = undef i1 %SignsMatch = undef i1 %SumSign = undef i1 %SumSignNE = undef Source value: #x0 (0) Target value: #x1 (1) ---------------------------------------- Name: ssub %t = ssub_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = ssub_overflow i8 %LHS, %RHS %v0 = sub i8 %LHS, %RHS %LHSSign = icmp sge i8 %LHS, 0 %RHSSign = icmp sge i8 %RHS, 0 %SignsMatch = icmp ne i1 %LHSSign, %RHSSign %SumSign = icmp sge i8 %v0, 0 %SumSignNE = icmp ne i1 %LHSSign, %SumSign %v1 = and i1 %SignsMatch, %SumSignNE ERROR: Value mismatch for i1 %v1 Example: i8 %LHS = undef i8 %RHS = #x00 (0) {i8, i1} %t = { #x00 (0), #x0 (0) } [based on undef value] i8 %v0 = undef i1 %LHSSign = undef i1 %RHSSign = #x1 (1) i1 %SignsMatch = undef i1 %SumSign = undef i1 %SumSignNE = undef Source value: #x0 (0) Target value: #x1 (1)

So LG i guess..

Hmm, while there, just to point out the obvious, the another approach here would be to teach DAGCombine/`TargetLowering::SimplifySetCC()` about these folds.

That is explicitly one of a few valid reasons to add optimizations into backend 'instead' of middle-end.

ChangeLog:

- Combine if-else logic.
- Adjust subtraction comment to
`is (non-zero) positive`to avoid ambiguity.

Sorry, I'm not sure to understand your comment: do you mean, as an alternative, `SimplifySetCC` can be extended to simplify the original overflow detection?

Thanks for the clarification Roman.

unless there is a lot of interest from others on landing this, I'll look into your suggestion first.

IMHO this should land as-is, and setcc folds can be implemented additionally if there are other places where they would be useful. My rationale would be that it is better to directly perform a simpler lowering than a complex lowering that then gets optimized. (Basically: If you can reduce the size of the implementing code *and* get a better result, then I think we should always be doing that.)

Hi @nikic, thanks for your comments. It also makes sense to me to do this.

If there aren't any further comments I pland to land this in the next days.