ISD::SADDO uses the suggested sequence described in the section §2.4 of the RISCV Spec v2.2. ISD::SSUBO uses the dual approach but checking for positive.
Details
Diff Detail
Event Timeline
It would probably make sense to change SelectionDAGLegalize::ExpandNode to use this lowering for SADDO/SSUBO, instead of making this target-specific. The target-independent version uses approximately the same operations anyway, just in a less efficient way.
The lowering for UADDO/USUBO appears to be essentially identical to the lowering in SelectionDAGLegalize::ExpandNode, so there isn't really any point.
lib/Target/RISCV/RISCVISelLowering.cpp | ||
---|---|---|
553 | The XOR here is a bit weird; could you just use SETGE instead? |
ChangeLog:
- Remove RISC-V dependent expansions
- Simplify current target-independent expansion of S{ADD,SUB}O. Not considering U{ADD,SUB}O anymore.
- Update tests that saw codegen changes after this
For now marking this as WIP because I want to test this some more.
Also test CodeGen/AMDGPU/saddo.ll needs updating but I'm rather clueless when it comes to that target.
The changes for vector arithmetic look nice.
llvm/test/CodeGen/RISCV/arith-with-overflow.ll | ||
---|---|---|
34 ↗ | (On Diff #213056) | It's probably worth adding a testcase where the i1 result is used as the operand to a branch, to show we correctly fold the xor into the branch. (See also https://bugs.llvm.org/show_bug.cgi?id=42876 .) |
ChangeLog:
- Update AMDGPU test.
- Add RISC-V tests to show that we fold the xor in the branch.
one minor - @efriedma / @lebedev.ri anything to add?
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp | ||
---|---|---|
6494 ↗ | (On Diff #213359) | Not sure if its necessary, but you could merge the comments and just have: SDValue ConditionRHS = DAG.getSetCC(dl, OType, RHS, Zero, IsAdd ? ISD::SETLT : ISD::SETGT); |
Correct in general case:
---------------------------------------- Name: sadd %t = sadd_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = sadd_overflow i8 %LHS, %RHS %v0 = add i8 %LHS, %RHS %ResultLowerThanLHS = icmp slt i8 %v0, %LHS %ConditionRHS = icmp slt i8 %RHS, 0 %v1 = xor i1 %ConditionRHS, %ResultLowerThanLHS Done: 1 Optimization is correct! ---------------------------------------- Name: ssub %t = ssub_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = ssub_overflow i8 %LHS, %RHS %v0 = sub i8 %LHS, %RHS %ResultLowerThanLHS = icmp slt i8 %v0, %LHS %ConditionRHS = icmp sgt i8 %RHS, 0 %v1 = xor i1 %ConditionRHS, %ResultLowerThanLHS Done: 1 Optimization is correct!
Invalid for undef:
---------------------------------------- Name: sadd %t = sadd_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = sadd_overflow i8 %LHS, %RHS %v0 = add i8 %LHS, %RHS %ResultLowerThanLHS = icmp slt i8 %v0, %LHS %ConditionRHS = icmp slt i8 %RHS, 0 %v1 = xor i1 %ConditionRHS, %ResultLowerThanLHS ERROR: Value mismatch for i1 %v1 Example: i8 %LHS = #x00 (0) i8 %RHS = undef {i8, i1} %t = { #x00 (0), #x0 (0) } [based on undef value] i8 %v0 = undef i1 %ResultLowerThanLHS = undef i1 %ConditionRHS = undef Source value: #x0 (0) Target value: #x1 (1) ---------------------------------------- Name: ssub %t = ssub_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = ssub_overflow i8 %LHS, %RHS %v0 = sub i8 %LHS, %RHS %ResultLowerThanLHS = icmp slt i8 %v0, %LHS %ConditionRHS = icmp sgt i8 %RHS, 0 %v1 = xor i1 %ConditionRHS, %ResultLowerThanLHS ERROR: Value mismatch for i1 %v1 Example: i8 %LHS = undef i8 %RHS = #x00 (0) {i8, i1} %t = { #x00 (0), #x0 (0) } [based on undef value] i8 %v0 = undef i1 %ResultLowerThanLHS = undef i1 %ConditionRHS = #x0 (0) Source value: #x0 (0) Target value: #x1 (1)
Though the current expansion is just as incorrect for undef:
---------------------------------------- Name: sadd %t = sadd_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = sadd_overflow i8 %LHS, %RHS %v0 = add i8 %LHS, %RHS %LHSSign = icmp sge i8 %LHS, 0 %RHSSign = icmp sge i8 %RHS, 0 %SignsMatch = icmp eq i1 %LHSSign, %RHSSign %SumSign = icmp sge i8 %v0, 0 %SumSignNE = icmp ne i1 %LHSSign, %SumSign %v1 = and i1 %SignsMatch, %SumSignNE Done: 1 Optimization is correct! ---------------------------------------- Name: ssub %t = ssub_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = ssub_overflow i8 %LHS, %RHS %v0 = sub i8 %LHS, %RHS %LHSSign = icmp sge i8 %LHS, 0 %RHSSign = icmp sge i8 %RHS, 0 %SignsMatch = icmp ne i1 %LHSSign, %RHSSign %SumSign = icmp sge i8 %v0, 0 %SumSignNE = icmp ne i1 %LHSSign, %SumSign %v1 = and i1 %SignsMatch, %SumSignNE Done: 1 Optimization is correct!
---------------------------------------- Name: sadd %t = sadd_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = sadd_overflow i8 %LHS, %RHS %v0 = add i8 %LHS, %RHS %LHSSign = icmp sge i8 %LHS, 0 %RHSSign = icmp sge i8 %RHS, 0 %SignsMatch = icmp eq i1 %LHSSign, %RHSSign %SumSign = icmp sge i8 %v0, 0 %SumSignNE = icmp ne i1 %LHSSign, %SumSign %v1 = and i1 %SignsMatch, %SumSignNE ERROR: Value mismatch for i1 %v1 Example: i8 %LHS = #x00 (0) i8 %RHS = undef {i8, i1} %t = { #x00 (0), #x0 (0) } [based on undef value] i8 %v0 = undef i1 %LHSSign = #x1 (1) i1 %RHSSign = undef i1 %SignsMatch = undef i1 %SumSign = undef i1 %SumSignNE = undef Source value: #x0 (0) Target value: #x1 (1) ---------------------------------------- Name: ssub %t = ssub_overflow i8 %LHS, %RHS %v0 = extractvalue {i8, i1} %t, 0 %v1 = extractvalue {i8, i1} %t, 1 => %t = ssub_overflow i8 %LHS, %RHS %v0 = sub i8 %LHS, %RHS %LHSSign = icmp sge i8 %LHS, 0 %RHSSign = icmp sge i8 %RHS, 0 %SignsMatch = icmp ne i1 %LHSSign, %RHSSign %SumSign = icmp sge i8 %v0, 0 %SumSignNE = icmp ne i1 %LHSSign, %SumSign %v1 = and i1 %SignsMatch, %SumSignNE ERROR: Value mismatch for i1 %v1 Example: i8 %LHS = undef i8 %RHS = #x00 (0) {i8, i1} %t = { #x00 (0), #x0 (0) } [based on undef value] i8 %v0 = undef i1 %LHSSign = undef i1 %RHSSign = #x1 (1) i1 %SignsMatch = undef i1 %SumSign = undef i1 %SumSignNE = undef Source value: #x0 (0) Target value: #x1 (1)
So LG i guess..
Hmm, while there, just to point out the obvious, the another approach here would be to teach DAGCombine/TargetLowering::SimplifySetCC() about these folds.
That is explicitly one of a few valid reasons to add optimizations into backend 'instead' of middle-end.
ChangeLog:
- Combine if-else logic.
- Adjust subtraction comment to is (non-zero) positive to avoid ambiguity.
Sorry, I'm not sure to understand your comment: do you mean, as an alternative, SimplifySetCC can be extended to simplify the original overflow detection?
Thanks for the clarification Roman.
unless there is a lot of interest from others on landing this, I'll look into your suggestion first.
IMHO this should land as-is, and setcc folds can be implemented additionally if there are other places where they would be useful. My rationale would be that it is better to directly perform a simpler lowering than a complex lowering that then gets optimized. (Basically: If you can reduce the size of the implementing code and get a better result, then I think we should always be doing that.)
Hi @nikic, thanks for your comments. It also makes sense to me to do this.
If there aren't any further comments I pland to land this in the next days.
The XOR here is a bit weird; could you just use SETGE instead?