This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Improve v8.1-A code-gen for atomic load-subtract
ClosedPublic

Authored by olista01 on Jan 24 2018, 6:57 AM.

Download Raw Diff

Details

Reviewers

mcrosier
gberry
christof

Commits

rG426991730495: [AArch64] Improve v8.1-A code-gen for atomic load-subtract
rL324892: [AArch64] Improve v8.1-A code-gen for atomic load-subtract

Summary

Armv8.1-A added an atomic load-add instruction, but not a load-subtract
instruction. Our current code-generation for atomic load-subtract always
inserts a NEG instruction to negate it's argument, even if it could be
folded into a constant or another instruction.

This adds lowering early in selection DAG to convert a load-subtract
operation into a subtract and a load-add, allowing the normal DAG
optimisations to work on it.

I've left the old tablegen patterns in because they are still needed for
global isel.

Some of the tests in this patch are copied from D35375 by Chad Rosier (which was abandoned).

Diff Detail

Repository: rL LLVM

Event Timeline

olista01 created this revision.Jan 24 2018, 6:57 AM

Herald added subscribers: kristof.beyls, javed.absar, rengolin, aemerson. · View Herald TranscriptJan 24 2018, 6:57 AM

olista01 added a child revision: D42478: [AArch64] Improve v8.1-A code-gen for atomic load-and.Jan 24 2018, 6:59 AM

Ping.

Ping (also for the related D42478).

On the previous review (D35375), Geoff Berry suggested to do this in DAG Combine, to see if you can do some further combines, and suggested to not limit this to constants: https://reviews.llvm.org/D35375#810045

Did you consider those points?

lib/Target/AArch64/AArch64ISelLowering.cpp
7394 ↗	(On Diff #131261)	Is there no (easy) way to do this in tablegen? I would prefer a tablegen pattern over C code. Even though this C looks nice and tidy.

Geoff Berry suggested to do this in DAG Combine

DAG combine runs after this code, so it is being used to do the optimisations we want. In the *_neg_imm test cases it folds a constant and a sub into a different constant, and in the *_neg_arg test cases it folds two SUBs into nothing.

Is there no (easy) way to do this in tablegen? I would prefer a tablegen pattern over C code. Even though this C looks nice and tidy.

That's what the current code does (I've left the patterns in for now since they are still needed for global isel). The problem with this is that the patterns get used at the end of the SelectionDAG pass (to emit MachineInstrs), so we'd have to re-implement the folding of subtraction instructions at that level. The alternative would be to add extra DAG patterns which match when the operand is a negative immediate, a subtraction etc, but I didn't do that as it would be duplicating the optimisations done by DAGCombine in a narrower context.

Thanks for the clarification. Looked at the tests and now see what this is doing.
The code looks good to me.
Thanks

This revision is now accepted and ready to land.Feb 12 2018, 5:48 AM

Closed by commit rL324892: [AArch64] Improve v8.1-A code-gen for atomic load-subtract (authored by olista01). · Explain WhyFeb 12 2018, 6:23 AM

This revision was automatically updated to reflect the committed changes.

dtcxzyw mentioned this in D158673: [SDAG][RISCV] Avoid neg instructions when lowering atomic_load_sub with a constant rhs.Aug 25 2023, 7:25 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AArch64/

AArch64ISelLowering.h

1 line

AArch64ISelLowering.cpp

21 lines

test/

CodeGen/

AArch64/

atomic-ops-lse.ll

112 lines

Diff 133848

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 590 Lines • ▼ Show 20 Lines	private:
SDValue LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVectorAND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVectorAND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVectorOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVectorOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFSINCOS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFSINCOS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerATOMIC_LOAD_SUB(SDValue Op, SelectionDAG &DAG) const;

SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,		SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,
std::vector<SDNode > Created) const override;		std::vector<SDNode > Created) const override;
SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,		SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
int &ExtraSteps, bool &UseOneConst,		int &ExtraSteps, bool &UseOneConst,
bool Reciprocal) const override;		bool Reciprocal) const override;
SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,		SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
int &ExtraSteps) const override;		int &ExtraSteps) const override;
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 458 Lines • ▼ Show 20 Lines	if (Subtarget->hasFullFP16()) {
setOperationAction(ISD::FMAXNUM, MVT::f16, Legal);		setOperationAction(ISD::FMAXNUM, MVT::f16, Legal);
setOperationAction(ISD::FMINNAN, MVT::f16, Legal);		setOperationAction(ISD::FMINNAN, MVT::f16, Legal);
setOperationAction(ISD::FMAXNAN, MVT::f16, Legal);		setOperationAction(ISD::FMAXNAN, MVT::f16, Legal);
}		}

setOperationAction(ISD::PREFETCH, MVT::Other, Custom);		setOperationAction(ISD::PREFETCH, MVT::Other, Custom);

setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i128, Custom);		setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i128, Custom);
		setOperationAction(ISD::ATOMIC_LOAD_SUB, MVT::i32, Custom);
		setOperationAction(ISD::ATOMIC_LOAD_SUB, MVT::i64, Custom);

// Lower READCYCLECOUNTER using an mrs from PMCCNTR_EL0.		// Lower READCYCLECOUNTER using an mrs from PMCCNTR_EL0.
// This requires the Performance Monitors extension.		// This requires the Performance Monitors extension.
if (Subtarget->hasPerfMon())		if (Subtarget->hasPerfMon())
setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Legal);		setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Legal);

if (getLibcallName(RTLIB::SINCOS_STRET_F32) != nullptr &&		if (getLibcallName(RTLIB::SINCOS_STRET_F32) != nullptr &&
getLibcallName(RTLIB::SINCOS_STRET_F64) != nullptr) {		getLibcallName(RTLIB::SINCOS_STRET_F64) != nullptr) {
▲ Show 20 Lines • Show All 2,199 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
case ISD::VECREDUCE_SMAX:		case ISD::VECREDUCE_SMAX:
case ISD::VECREDUCE_SMIN:		case ISD::VECREDUCE_SMIN:
case ISD::VECREDUCE_UMAX:		case ISD::VECREDUCE_UMAX:
case ISD::VECREDUCE_UMIN:		case ISD::VECREDUCE_UMIN:
case ISD::VECREDUCE_FMAX:		case ISD::VECREDUCE_FMAX:
case ISD::VECREDUCE_FMIN:		case ISD::VECREDUCE_FMIN:
return LowerVECREDUCE(Op, DAG);		return LowerVECREDUCE(Op, DAG);
		case ISD::ATOMIC_LOAD_SUB:
		return LowerATOMIC_LOAD_SUB(Op, DAG);
}		}
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Calling Convention Implementation		// Calling Convention Implementation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AArch64GenCallingConv.inc"		#include "AArch64GenCallingConv.inc"
▲ Show 20 Lines • Show All 4,678 Lines • ▼ Show 20 Lines	return DAG.getNode(
DAG.getConstant(Intrinsic::aarch64_neon_fminnmv, dl, MVT::i32),		DAG.getConstant(Intrinsic::aarch64_neon_fminnmv, dl, MVT::i32),
Op.getOperand(0));		Op.getOperand(0));
}		}
default:		default:
llvm_unreachable("Unhandled reduction");		llvm_unreachable("Unhandled reduction");
}		}
}		}

		SDValue AArch64TargetLowering::LowerATOMIC_LOAD_SUB(SDValue Op,
		SelectionDAG &DAG) const {
		auto &Subtarget = static_cast<const AArch64Subtarget &>(DAG.getSubtarget());
		if (!Subtarget.hasLSE())
		return SDValue();

		// LSE has an atomic load-add instruction, but not a load-sub.
		SDLoc dl(Op);
		MVT VT = Op.getSimpleValueType();
		SDValue RHS = Op.getOperand(2);
		AtomicSDNode *AN = cast<AtomicSDNode>(Op.getNode());
		RHS = DAG.getNode(ISD::SUB, dl, VT, DAG.getConstant(0, dl, VT), RHS);
		return DAG.getAtomic(ISD::ATOMIC_LOAD_ADD, dl, AN->getMemoryVT(),
		Op.getOperand(0), Op.getOperand(1), RHS,
		AN->getMemOperand());
		}

/// getTgtMemIntrinsic - Represent NEON load and store intrinsics as		/// getTgtMemIntrinsic - Represent NEON load and store intrinsics as
/// MemIntrinsicNodes. The associated MachineMemOperands record the alignment		/// MemIntrinsicNodes. The associated MachineMemOperands record the alignment
/// specified in the intrinsic calls.		/// specified in the intrinsic calls.
bool AArch64TargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,		bool AArch64TargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
const CallInst &I,		const CallInst &I,
MachineFunction &MF,		MachineFunction &MF,
unsigned Intrinsic) const {		unsigned Intrinsic) const {
auto &DL = I.getModule()->getDataLayout();		auto &DL = I.getModule()->getDataLayout();
▲ Show 20 Lines • Show All 3,677 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/atomic-ops-lse.ll

	Show First 20 Lines • Show All 808 Lines • ▼ Show 20 Lines
	; CHECK: add x[[ADDR:[0-9]+]], [[TMPADDR]], {{#?}}:lo12:var64			; CHECK: add x[[ADDR:[0-9]+]], [[TMPADDR]], {{#?}}:lo12:var64

	; CHECK: ldaddal x[[NEG]], x[[NEW:[0-9]+]], [x[[ADDR]]]			; CHECK: ldaddal x[[NEG]], x[[NEW:[0-9]+]], [x[[ADDR]]]
	; CHECK-NOT: dmb			; CHECK-NOT: dmb

	ret void			ret void
	}			}

				define i8 @test_atomic_load_sub_i8_neg_imm() nounwind {
				; CHECK-LABEL: test_atomic_load_sub_i8_neg_imm:
				%old = atomicrmw sub i8* @var8, i8 -1 seq_cst

				; CHECK-NOT: dmb
				; CHECK: adrp [[TMPADDR:x[0-9]+]], var8
				; CHECK: add x[[ADDR:[0-9]+]], [[TMPADDR]], {{#?}}:lo12:var8
				; CHECK: orr w[[IMM:[0-9]+]], wzr, #0x1
				; CHECK: ldaddalb w[[IMM]], w[[NEW:[0-9]+]], [x[[ADDR]]]
				; CHECK-NOT: dmb

				ret i8 %old
				}

				define i16 @test_atomic_load_sub_i16_neg_imm() nounwind {
				; CHECK-LABEL: test_atomic_load_sub_i16_neg_imm:
				%old = atomicrmw sub i16* @var16, i16 -1 seq_cst

				; CHECK-NOT: dmb
				; CHECK: adrp [[TMPADDR:x[0-9]+]], var16
				; CHECK: add x[[ADDR:[0-9]+]], [[TMPADDR]], {{#?}}:lo12:var16
				; CHECK: orr w[[IMM:[0-9]+]], wzr, #0x1
				; CHECK: ldaddalh w[[IMM]], w[[NEW:[0-9]+]], [x[[ADDR]]]
				; CHECK-NOT: dmb

				ret i16 %old
				}

				define i32 @test_atomic_load_sub_i32_neg_imm() nounwind {
				; CHECK-LABEL: test_atomic_load_sub_i32_neg_imm:
				%old = atomicrmw sub i32* @var32, i32 -1 seq_cst

				; CHECK-NOT: dmb
				; CHECK: adrp [[TMPADDR:x[0-9]+]], var32
				; CHECK: add x[[ADDR:[0-9]+]], [[TMPADDR]], {{#?}}:lo12:var32
				; CHECK: orr w[[IMM:[0-9]+]], wzr, #0x1
				; CHECK: ldaddal w[[IMM]], w[[NEW:[0-9]+]], [x[[ADDR]]]
				; CHECK-NOT: dmb

				ret i32 %old
				}

				define i64 @test_atomic_load_sub_i64_neg_imm() nounwind {
				; CHECK-LABEL: test_atomic_load_sub_i64_neg_imm:
				%old = atomicrmw sub i64* @var64, i64 -1 seq_cst

				; CHECK-NOT: dmb
				; CHECK: adrp [[TMPADDR:x[0-9]+]], var64
				; CHECK: add x[[ADDR:[0-9]+]], [[TMPADDR]], {{#?}}:lo12:var64
				; CHECK: orr w[[IMM:[0-9]+]], wzr, #0x1
				; CHECK: ldaddal x[[IMM]], x[[NEW:[0-9]+]], [x[[ADDR]]]
				; CHECK-NOT: dmb

				ret i64 %old
				}

				define i8 @test_atomic_load_sub_i8_neg_arg(i8 %offset) nounwind {
				; CHECK-LABEL: test_atomic_load_sub_i8_neg_arg:
				%neg = sub i8 0, %offset
				%old = atomicrmw sub i8* @var8, i8 %neg seq_cst

				; CHECK-NOT: dmb
				; CHECK: adrp [[TMPADDR:x[0-9]+]], var8
				; CHECK: add x[[ADDR:[0-9]+]], [[TMPADDR]], {{#?}}:lo12:var8
				; CHECK: ldaddalb w0, w[[NEW:[0-9]+]], [x[[ADDR]]]
				; CHECK-NOT: dmb

				ret i8 %old
				}

				define i16 @test_atomic_load_sub_i16_neg_arg(i16 %offset) nounwind {
				; CHECK-LABEL: test_atomic_load_sub_i16_neg_arg:
				%neg = sub i16 0, %offset
				%old = atomicrmw sub i16* @var16, i16 %neg seq_cst

				; CHECK-NOT: dmb
				; CHECK: adrp [[TMPADDR:x[0-9]+]], var16
				; CHECK: add x[[ADDR:[0-9]+]], [[TMPADDR]], {{#?}}:lo12:var16
				; CHECK: ldaddalh w0, w[[NEW:[0-9]+]], [x[[ADDR]]]
				; CHECK-NOT: dmb

				ret i16 %old
				}

				define i32 @test_atomic_load_sub_i32_neg_arg(i32 %offset) nounwind {
				; CHECK-LABEL: test_atomic_load_sub_i32_neg_arg:
				%neg = sub i32 0, %offset
				%old = atomicrmw sub i32* @var32, i32 %neg seq_cst

				; CHECK-NOT: dmb
				; CHECK: adrp [[TMPADDR:x[0-9]+]], var32
				; CHECK: add x[[ADDR:[0-9]+]], [[TMPADDR]], {{#?}}:lo12:var32
				; CHECK: ldaddal w0, w[[NEW:[0-9]+]], [x[[ADDR]]]
				; CHECK-NOT: dmb

				ret i32 %old
				}

				define i64 @test_atomic_load_sub_i64_neg_arg(i64 %offset) nounwind {
				; CHECK-LABEL: test_atomic_load_sub_i64_neg_arg:
				%neg = sub i64 0, %offset
				%old = atomicrmw sub i64* @var64, i64 %neg seq_cst

				; CHECK-NOT: dmb
				; CHECK: adrp [[TMPADDR:x[0-9]+]], var64
				; CHECK: add x[[ADDR:[0-9]+]], [[TMPADDR]], {{#?}}:lo12:var64
				; CHECK: ldaddal x0, x[[NEW:[0-9]+]], [x[[ADDR]]]
				; CHECK-NOT: dmb

				ret i64 %old
				}

	define i8 @test_atomic_load_and_i8(i8 %offset) nounwind {			define i8 @test_atomic_load_and_i8(i8 %offset) nounwind {
	; CHECK-LABEL: test_atomic_load_and_i8:			; CHECK-LABEL: test_atomic_load_and_i8:
	%old = atomicrmw and i8* @var8, i8 %offset seq_cst			%old = atomicrmw and i8* @var8, i8 %offset seq_cst
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK: mvn w[[NOT:[0-9]+]], w[[OLD:[0-9]+]]			; CHECK: mvn w[[NOT:[0-9]+]], w[[OLD:[0-9]+]]
	; CHECK: adrp [[TMPADDR:x[0-9]+]], var8			; CHECK: adrp [[TMPADDR:x[0-9]+]], var8
	; CHECK: add x[[ADDR:[0-9]+]], [[TMPADDR]], {{#?}}:lo12:var8			; CHECK: add x[[ADDR:[0-9]+]], [[TMPADDR]], {{#?}}:lo12:var8

	▲ Show 20 Lines • Show All 4,156 Lines • Show Last 20 Lines