This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Avoid converting (x or/xor const) + y to (x + y) + const if benefit is unclear
ClosedPublic

Authored by aqjune on Feb 15 2023, 10:05 AM.

Details

Summary

This patch resolves suboptimal code generation reported by https://github.com/llvm/llvm-project/issues/60571 .

DAGCombiner currently converts (x or/xor const) + y to (x + y) + const if this is valid.
However, if .. + const is broken down into a sequences of adds with carries, the benefit is not clear, introducing two more add(-with-carry) ops (total 6) in the case of the reported issue whereas the optimal sequence must only have 4 add(-with-carry)s.

This patch resolves this issue by allowing this conversion only when (1) .. + const is legal or promotable, or (2) const is a sign bit because it does not introduce more adds.

Diff Detail

Event Timeline

aqjune created this revision.Feb 15 2023, 10:05 AM
aqjune requested review of this revision.Feb 15 2023, 10:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 10:05 AM
RKSimon added inline comments.Feb 16 2023, 6:22 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2699

You might be able to do this as NoAddCarry = isMinSignedConstant(N0.getOperand(1));

RKSimon added inline comments.Feb 16 2023, 6:42 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2695–2705

Just because the ADD isn't legal doesn't mean we're going to end up with ADC instructions

aqjune updated this revision to Diff 498249.Feb 16 2023, 9:27 PM

Use isMinSignedConstant

aqjune marked an inline comment as done.Feb 16 2023, 9:31 PM
aqjune added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2695–2705

Could you elaborate a bit about this please?
If it means that a legal ADD can still be splitted into a chain of ADDS/ADCS/ADC, is there a way to find the bitwidth of addition that will not be splitted?

Would it be possible to optimize the ADDCARRY to the same result as without this fold? Similar to combineADDCARRYDiamond. I looked at the DAG that was being produced, but it's not obvious to me how it would be sensible combined to the same result as before.

I added the fold really to handle cases like this, which can often come up after lowering geps:

or x1, x1, #1
add x1, x1, x2
ldr x0, [x1]

Which can be transformed into

add x1, x1, x2
ldr x0, [x1, #1]

If the add+add is reassociated, it makes sense for the add+add-like-or to be reassociated. I have no objections to limiting the fold if we need to though.

Hi,

Would it be possible to optimize the ADDCARRY to the same result as without this fold? Similar to combineADDCARRYDiamond. I looked at the DAG that was being produced, but it's not obvious to me how it would be sensible combined to the same result as before.

I added the fold really to handle cases like this, which can often come up after lowering geps:

or x1, x1, #1
add x1, x1, x2
ldr x0, [x1]

Which can be transformed into

add x1, x1, x2
ldr x0, [x1, #1]

If the add+add is reassociated, it makes sense for the add+add-like-or to be reassociated. I have no objections to limiting the fold if we need to though.

I think the example still optimizes with my patch because the addition is a legal op, if I understand correctly.
I wrote a sample LLVM IR function that seems to be analogous to the example above:

define void @f(i64 %ofs, ptr %p) {
  %ofs_2 = shl i64 %ofs, 1   ; Make %ofs_2 have LSB set to zero; this makes the `or i64 .., 1` below is equivalent to `add i64 .., 1`.
  %ofs_3 = or i64 %ofs_2, 1
  %p2 = getelementptr i8, ptr %p, i64 %ofs_3
  store i64 10, ptr %p2
  ret void
}

Running llc --mtriple=arm64-unknown-unknown -mcpu=neoverse-n1 -O3 b.ll -o - results in:

add     x8, x1, x0, lsl #1
mov     w9, #10
stur    x9, [x8, #1]

Which correctly puts #1 inside the store instruction's address operand.
Would allowing this fold when the addition is a legal op be sufficient, which is currently this patch is doing?

RKSimon added inline comments.Feb 20 2023, 2:08 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2695–2705

I think TLI.getTypeToExpandTo should help you ?

aqjune added inline comments.Feb 24 2023, 10:11 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2695–2705

Hi, I tried calling TargetLowering's getTypeToExpandTo, but it hit this error in the function if the integers' bitwidths were too small:

llvm_unreachable("Type is not legal nor is it to be expanded!");

Such case did not happen in the unit test of this patch, but another unit test (llvm/test/CodeGen/X86/setcc-combine.ll).

I think that my current version also deals with the add's bitwidth type because N0.getValueType() is passed as the second argument of isOperationLegal. What do you think?

RKSimon added inline comments.Feb 26 2023, 5:15 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2695–2705

I haven't had time to look at this in detail - but my concern is types that will be promoted (i4 etc.) instead of expanded (i128) - won't your legality logic assume they will need ADC as well?

aqjune updated this revision to Diff 500592.Feb 26 2023, 9:08 AM

Allow transformation or types that are to be promoted

aqjune marked an inline comment as done.Feb 26 2023, 9:12 AM
aqjune added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2695–2705

Thanks. I updated the patch.
For the vector types I wasn't sure what this patch should do. Do you have any idea how they must be dealt with?

aqjune marked an inline comment as done.Mar 4 2023, 2:36 PM

ping

RKSimon accepted this revision.Mar 7 2023, 8:40 AM

LGTM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2695–2705

Nothing comes to mind - I think your change should be OK for vector types.

This revision is now accepted and ready to land.Mar 7 2023, 8:40 AM
aqjune added a comment.Mar 7 2023, 8:51 AM

Thanks..! :)

aqjune edited the summary of this revision. (Show Details)Mar 7 2023, 9:53 AM
This revision was landed with ongoing or failed builds.Mar 8 2023, 10:14 AM
This revision was automatically updated to reflect the committed changes.