This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
5/8
DAGCombiner.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
add-i256.ll

Differential D144116

[DAGCombiner] Avoid converting (x or/xor const) + y to (x + y) + const if benefit is unclear
ClosedPublic

Authored by aqjune on Feb 15 2023, 10:05 AM.

Download Raw Diff

Details

Reviewers

dmgreen
RKSimon
craig.topper

Commits

rGa66bc1c4a30c: [DAGCombiner] Avoid converting (x or/xor const) + y to (x + y) + const if…

Summary

This patch resolves suboptimal code generation reported by https://github.com/llvm/llvm-project/issues/60571 .

DAGCombiner currently converts (x or/xor const) + y to (x + y) + const if this is valid.
However, if .. + const is broken down into a sequences of adds with carries, the benefit is not clear, introducing two more add(-with-carry) ops (total 6) in the case of the reported issue whereas the optimal sequence must only have 4 add(-with-carry)s.

This patch resolves this issue by allowing this conversion only when (1) .. + const is legal or promotable, or (2) const is a sign bit because it does not introduce more adds.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aqjune created this revision.Feb 15 2023, 10:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 10:05 AM

Herald added subscribers: ecnelises, steven.zhang, hiraditya. · View Herald Transcript

aqjune requested review of this revision.Feb 15 2023, 10:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 10:05 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

aqjune added reviewers: dmgreen, RKSimon.Feb 15 2023, 10:47 AM

Harbormaster completed remote builds in B213928: Diff 497718.Feb 15 2023, 10:51 AM

RKSimon added inline comments.Feb 16 2023, 6:22 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2699	You might be able to do this as NoAddCarry = isMinSignedConstant(N0.getOperand(1));

RKSimon added inline comments.Feb 16 2023, 6:42 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2695–2705	Just because the ADD isn't legal doesn't mean we're going to end up with ADC instructions

Use isMinSignedConstant

aqjune marked an inline comment as done.Feb 16 2023, 9:31 PM

aqjune added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2695–2705	Could you elaborate a bit about this please? If it means that a legal ADD can still be splitted into a chain of ADDS/ADCS/ADC, is there a way to find the bitwidth of addition that will not be splitted?

Harbormaster completed remote builds in B214324: Diff 498249.Feb 16 2023, 11:41 PM

Would it be possible to optimize the ADDCARRY to the same result as without this fold? Similar to combineADDCARRYDiamond. I looked at the DAG that was being produced, but it's not obvious to me how it would be sensible combined to the same result as before.

I added the fold really to handle cases like this, which can often come up after lowering geps:

or x1, x1, #1
add x1, x1, x2
ldr x0, [x1]

Which can be transformed into

add x1, x1, x2
ldr x0, [x1, #1]

If the add+add is reassociated, it makes sense for the add+add-like-or to be reassociated. I have no objections to limiting the fold if we need to though.

Hi,

In D144116#4138209, @dmgreen wrote:
Would it be possible to optimize the ADDCARRY to the same result as without this fold? Similar to combineADDCARRYDiamond. I looked at the DAG that was being produced, but it's not obvious to me how it would be sensible combined to the same result as before.

I added the fold really to handle cases like this, which can often come up after lowering geps:
or x1, x1, #1
add x1, x1, x2
ldr x0, [x1]
Which can be transformed into
add x1, x1, x2
ldr x0, [x1, #1]
If the add+add is reassociated, it makes sense for the add+add-like-or to be reassociated. I have no objections to limiting the fold if we need to though.

I think the example still optimizes with my patch because the addition is a legal op, if I understand correctly.
I wrote a sample LLVM IR function that seems to be analogous to the example above:

define void @f(i64 %ofs, ptr %p) {
  %ofs_2 = shl i64 %ofs, 1   ; Make %ofs_2 have LSB set to zero; this makes the `or i64 .., 1` below is equivalent to `add i64 .., 1`.
  %ofs_3 = or i64 %ofs_2, 1
  %p2 = getelementptr i8, ptr %p, i64 %ofs_3
  store i64 10, ptr %p2
  ret void
}

Running llc --mtriple=arm64-unknown-unknown -mcpu=neoverse-n1 -O3 b.ll -o - results in:

add     x8, x1, x0, lsl #1
mov     w9, #10
stur    x9, [x8, #1]

Which correctly puts #1 inside the store instruction's address operand.
Would allowing this fold when the addition is a legal op be sufficient, which is currently this patch is doing?

RKSimon added inline comments.Feb 20 2023, 2:08 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2695–2705	I think TLI.getTypeToExpandTo should help you ?

aqjune added inline comments.Feb 24 2023, 10:11 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2695–2705	Hi, I tried calling TargetLowering's `getTypeToExpandTo`, but it hit this error in the function if the integers' bitwidths were too small: llvm_unreachable("Type is not legal nor is it to be expanded!"); Such case did not happen in the unit test of this patch, but another unit test (llvm/test/CodeGen/X86/setcc-combine.ll). I think that my current version also deals with the add's bitwidth type because `N0.getValueType()` is passed as the second argument of `isOperationLegal`. What do you think?

RKSimon added inline comments.Feb 26 2023, 5:15 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2695–2705	I haven't had time to look at this in detail - but my concern is types that will be promoted (i4 etc.) instead of expanded (i128) - won't your legality logic assume they will need ADC as well?

Allow transformation or types that are to be promoted

aqjune marked an inline comment as done.Feb 26 2023, 9:12 AM

aqjune added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2695–2705	Thanks. I updated the patch. For the vector types I wasn't sure what this patch should do. Do you have any idea how they must be dealt with?

Harbormaster completed remote builds in B216083: Diff 500592.Feb 26 2023, 10:05 AM

ping

aqjune added a reviewer: craig.topper.Mar 7 2023, 8:22 AM

LGTM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2695–2705	Nothing comes to mind - I think your change should be OK for vector types.

This revision is now accepted and ready to land.Mar 7 2023, 8:40 AM

Thanks..! :)

aqjune edited the summary of this revision. (Show Details)Mar 7 2023, 9:53 AM

This revision was landed with ongoing or failed builds.Mar 8 2023, 10:14 AM

Closed by commit rGa66bc1c4a30c: [DAGCombiner] Avoid converting (x or/xor const) + y to (x + y) + const if… (authored by aqjune). · Explain Why

This revision was automatically updated to reflect the committed changes.

aqjune added a commit: rGa66bc1c4a30c: [DAGCombiner] Avoid converting (x or/xor const) + y to (x + y) + const if….

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

16 lines

test/

CodeGen/

AArch64/

add-i256.ll

65 lines

Diff 503433

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,681 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitADDLike(SDNode *N) {
if (!reassociationCanBreakAddressingModePattern(ISD::ADD, DL, N, N0, N1)) {		if (!reassociationCanBreakAddressingModePattern(ISD::ADD, DL, N, N0, N1)) {
if (SDValue RADD = reassociateOps(ISD::ADD, DL, N0, N1, N->getFlags()))		if (SDValue RADD = reassociateOps(ISD::ADD, DL, N0, N1, N->getFlags()))
return RADD;		return RADD;

// Reassociate (add (or x, c), y) -> (add add(x, y), c)) if (or x, c) is		// Reassociate (add (or x, c), y) -> (add add(x, y), c)) if (or x, c) is
// equivalent to (add x, c).		// equivalent to (add x, c).
// Reassociate (add (xor x, c), y) -> (add add(x, y), c)) if (xor x, c) is		// Reassociate (add (xor x, c), y) -> (add add(x, y), c)) if (xor x, c) is
// equivalent to (add x, c).		// equivalent to (add x, c).
		// Do this optimization only when adding c does not introduce instructions
		// for adding carries.
auto ReassociateAddOr = [&](SDValue N0, SDValue N1) {		auto ReassociateAddOr = [&](SDValue N0, SDValue N1) {
if (isADDLike(N0, DAG) && N0.hasOneUse() &&		if (isADDLike(N0, DAG) && N0.hasOneUse() &&
isConstantOrConstantVector(N0.getOperand(1), /* NoOpaque */ true)) {		isConstantOrConstantVector(N0.getOperand(1), /* NoOpaque */ true)) {
return DAG.getNode(ISD::ADD, DL, VT,		// If N0's type does not split or is a sign mask, it does not introduce
		// add carry.
		auto TyActn = TLI.getTypeAction(*DAG.getContext(), N0.getValueType());
		bool NoAddCarry = TyActn == TargetLoweringBase::TypeLegal \|\|
		TyActn == TargetLoweringBase::TypePromoteInteger \|\|
		RKSimonUnsubmitted Done Reply Inline Actions You might be able to do this as NoAddCarry = isMinSignedConstant(N0.getOperand(1)); RKSimon: You might be able to do this as NoAddCarry = isMinSignedConstant(N0.getOperand(1));
		isMinSignedConstant(N0.getOperand(1));
		if (NoAddCarry)
		return DAG.getNode(
		ISD::ADD, DL, VT,
DAG.getNode(ISD::ADD, DL, VT, N1, N0.getOperand(0)),		DAG.getNode(ISD::ADD, DL, VT, N1, N0.getOperand(0)),
N0.getOperand(1));		N0.getOperand(1));
		RKSimonUnsubmitted Not Done Reply Inline Actions Just because the ADD isn't legal doesn't mean we're going to end up with ADC instructions RKSimon: Just because the ADD isn't legal doesn't mean we're going to end up with ADC instructions
		aqjuneAuthorUnsubmitted Done Reply Inline Actions Could you elaborate a bit about this please? If it means that a legal ADD can still be splitted into a chain of ADDS/ADCS/ADC, is there a way to find the bitwidth of addition that will not be splitted? aqjune: Could you elaborate a bit about this please? If it means that a legal ADD can still be splitted…
		RKSimonUnsubmitted Not Done Reply Inline Actions I think TLI.getTypeToExpandTo should help you ? RKSimon: I think TLI.getTypeToExpandTo should help you ?
		aqjuneAuthorUnsubmitted Done Reply Inline Actions Hi, I tried calling TargetLowering's `getTypeToExpandTo`, but it hit this error in the function if the integers' bitwidths were too small: llvm_unreachable("Type is not legal nor is it to be expanded!"); Such case did not happen in the unit test of this patch, but another unit test (llvm/test/CodeGen/X86/setcc-combine.ll). I think that my current version also deals with the add's bitwidth type because `N0.getValueType()` is passed as the second argument of `isOperationLegal`. What do you think? aqjune: Hi, I tried calling TargetLowering's `getTypeToExpandTo`, but it hit this error in the function…
		RKSimonUnsubmitted Done Reply Inline Actions I haven't had time to look at this in detail - but my concern is types that will be promoted (i4 etc.) instead of expanded (i128) - won't your legality logic assume they will need ADC as well? RKSimon: I haven't had time to look at this in detail - but my concern is types that will be promoted…
		aqjuneAuthorUnsubmitted Done Reply Inline Actions Thanks. I updated the patch. For the vector types I wasn't sure what this patch should do. Do you have any idea how they must be dealt with? aqjune: Thanks. I updated the patch. For the vector types I wasn't sure what this patch should do. Do…
		RKSimonUnsubmitted Not Done Reply Inline Actions Nothing comes to mind - I think your change should be OK for vector types. RKSimon: Nothing comes to mind - I think your change should be OK for vector types.
}		}
return SDValue();		return SDValue();
};		};
if (SDValue Add = ReassociateAddOr(N0, N1))		if (SDValue Add = ReassociateAddOr(N0, N1))
return Add;		return Add;
if (SDValue Add = ReassociateAddOr(N1, N0))		if (SDValue Add = ReassociateAddOr(N1, N0))
return Add;		return Add;

▲ Show 20 Lines • Show All 24,291 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/add-i256.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mcpu=neoverse-n1 < %s \| FileCheck %s
				target triple = "aarch64-linux-unknown"

				define void @add_i256(i64 %x0, i64 %x1, i64 %x2, i64 %x3, i64 %y1, i64 %y2, i64 %y3, i8* %store_addr_ptr) {
				; CHECK-LABEL: add_i256:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: adds x8, x0, #1
				; CHECK-NEXT: adcs x9, x1, x4
				; CHECK-NEXT: stp x8, x9, [x7]
				; CHECK-NEXT: adcs x8, x2, x5
				; CHECK-NEXT: adc x9, x3, x6
				; CHECK-NEXT: stp x8, x9, [x7, #16]
				; CHECK-NEXT: ret
				entry:
				; Build x_256 = x0 \| x1 << 64 \| x2 << 128 \| x3 << 192
				%temp = zext i64 %x0 to i256
				%temp57 = zext i64 %x1 to i256
				%temp58 = zext i64 %x2 to i256
				%temp59 = zext i64 %x3 to i256
				%temp_shifted = shl i256 %temp, 0
				%temp_shifted60 = shl i256 %temp57, 64
				%temp_shifted61 = shl i256 %temp58, 128
				%temp_shifted62 = shl i256 %temp59, 192
				%x = or i256 %temp_shifted, %temp_shifted60
				%x63 = or i256 %x, %temp_shifted61
				%x_big = or i256 %x63, %temp_shifted62

				; Build y_256 = 1 \| y1 << 64 \| y2 << 128 \| y3 << 192
				%temp65 = zext i64 %y1 to i256
				%temp66 = zext i64 %y2 to i256
				%temp67 = zext i64 %y3 to i256
				%temp_shifted68 = shl i256 %temp65, 64
				%temp_shifted69 = shl i256 %temp66, 128
				%temp_shifted70 = shl i256 %temp67, 192
				%y = or i256 1, %temp_shifted68
				%y71 = or i256 %y, %temp_shifted69
				%y_big = or i256 %y71, %temp_shifted70

				; z_256 = x_256 + y_256
				%z_256 = add i256 %x_big, %y_big

				; split z_256 into 4 64-bit registers
				%split_64bits = lshr i256 %z_256, 0
				%z0 = trunc i256 %split_64bits to i64
				%split_64bits74 = lshr i256 %z_256, 64
				%z1 = trunc i256 %split_64bits74 to i64
				%split_64bits76 = lshr i256 %z_256, 128
				%z2 = trunc i256 %split_64bits76 to i64
				%split_64bits78 = lshr i256 %z_256, 192
				%z3 = trunc i256 %split_64bits78 to i64

				%outptr0 = bitcast i8* %store_addr_ptr to i64*
				store i64 %z0, i64* %outptr0, align 4
				%gep = getelementptr i8, i8* %store_addr_ptr, i64 8
				%outptr1 = bitcast i8* %gep to i64*
				store i64 %z1, i64* %outptr1, align 4
				%store_addr_ofs = getelementptr i8, i8* %store_addr_ptr, i64 16
				%outptr081 = bitcast i8* %store_addr_ofs to i64*
				store i64 %z2, i64* %outptr081, align 4
				%gep82 = getelementptr i8, i8* %store_addr_ofs, i64 8
				%outptr183 = bitcast i8* %gep82 to i64*
				store i64 %z3, i64* %outptr183, align 4
				ret void
				}