This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics
ClosedPublic

Authored by asb on Jul 20 2022, 11:50 AM.

Download Raw Diff

Details

Reviewers

craig.topper
jrtc27
reames

Commits

rG28f12a09ae63: [RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics

Summary

[RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics

An unnecessary sext.w is generated when masking the result of the
riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the
intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed.

Although this isn't a particularly important optimisation, removing the
sext.w simplifies implementation of an additional cmpxchg-related
optimisation in D130192.

Although I can't produce a test with different codegen for the other
atomics intrinsics, these are added as well for completeness.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

asb created this revision.Jul 20 2022, 11:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2022, 11:50 AM

Herald added subscribers: wingo, sunshaoce, pmatos and 30 others. · View Herald Transcript

asb requested review of this revision.Jul 20 2022, 11:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2022, 11:50 AM

Herald added subscribers: • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

asb added a child revision: D130192: [RISCV] Avoid redundant branch-to-branch when expanding cmpxchg.Jul 20 2022, 11:52 AM

asb edited the summary of this revision. (Show Details)

Do we care about other masked i64 atomic intrinsics, and do the i32 ones ever see a benefit from implementing an equivalent?

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9890	Should have extra { } so it's easy to add other cases

Harbormaster completed remote builds in B176563: Diff 446225.Jul 20 2022, 12:35 PM

reames added inline comments.Jul 21 2022, 7:58 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9893	I don't follow this comment. Is there some documentation you can point to for what this intrinsic does? Or a good pointer in code to understand it? If I'm gathering this correctly, the and-by-mask is to handle the zext of the type less than XLEN? Not following where the sign bit below that is coming from.

asb added inline comments.Jul 21 2022, 10:08 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9893	The intrinsics are underdocumented - I'll loop back round and fix that. For general context on our atomics lowering approach, see here. As for this specific code comment: it may be because it's the end of the day, but I started trying to better explain and confused myself. So it may be I have a mistake, the code comment needs improving, or both. Just leaving this comment now, as it may be it's best for you to hold off on taking a proper look until I've had a chance to re-review.

craig.topper added inline comments.Jul 24 2022, 3:56 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9893	Isn't the output of the intrinsic only the result of the LR_W? There are ANDs in the expansion, but they are to the scratch register aren't they?
llvm/test/CodeGen/RISCV/atomic-cmpxchg-branch-on-result.ll
132	I don't think this AND is part of the intrinsic. It's a separate operation in the IR and DAG. The output of the intrinsic is a4, the lr.w.aqrl result.

Update to fix logic error regarding the sign bits (it's always just 33, the intrinsic doesn't to any masking itself. I improved the documentation on the intrinsics in 85c6fab which should help avoid such mistakes.

This update also lists all masked intrinsics, even though I've been unable to produce test cases for the others that lead to different codegen.

asb marked an inline comment as done.Aug 2 2022, 2:56 AM

asb added inline comments.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9893	I've improved docs for intrinsics in 85c6fab. You're both completely right that I made a logic error here initially - the intrinsic doesn't mask its return value at all, so the number of sign bits is always just 33.

In D130191#3666436, @jrtc27 wrote:

Do we care about other masked i64 atomic intrinsics, and do the i32 ones ever see a benefit from implementing an equivalent?

I've added in the other masked i64 atomics, though was unable to produce examples where it makes a difference. Now I've fixed the logic error on the number of sign bits, there's no potential for benefit for the i32 ones (there would just be 1 mask bit after LR_W or atomicrmw_W of a native width value).

asb retitled this revision from [RISCV] Teach ComputeNumSignBitsForTargetNode about Intrinsic::riscv_masked_cmpxchg_i64 to [RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics.Aug 2 2022, 2:58 AM

Harbormaster completed remote builds in B178728: Diff 449237.Aug 2 2022, 5:17 AM

LGTM w/requested changes.

As an optional follow up, I find myself wondering if the 64 bit versions should actually have a 64 bit return type. Having them instead have i32 operands and return types - to model the emulated access actually being done - and then explicitly sign extended afterwards would get the same effect here with less code and less confusion.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9899	Ok, I finally wrapped my head around all of the machinery here, and this is correct. However, I want to suggest some changes. First, the comment: // riscv_masked_atomicrmw_* represents an emulated unaligned atomicrmw operation at the minimum supported atomicrmw width whose result is then sign extended to XLEN. With +A, the minimum width is 32 for both 64 and 32. Second, please add the asserts... assert(XLenVT == 64) assert(getMinCmpXchgSizeInBits() == 32);

This revision is now accepted and ready to land.Aug 2 2022, 1:08 PM

In D130191#3694550, @reames wrote:

LGTM w/requested changes.

As an optional follow up, I find myself wondering if the 64 bit versions should actually have a 64 bit return type. Having them instead have i32 operands and return types - to model the emulated access actually being done - and then explicitly sign extended afterwards would get the same effect here with less code and less confusion.

Wouldn't that require the intrinsics to go through type legalization to be promoted to a new RISCVISD node with i64 types? And then we'd need to teach computeKnownSignBits about the new RISCVISD node so we could remove in sext_inreg that got created from the sign_extend.

Unless I'm misunderstanding what you're proposing?

Yes, my recollection is that the need to handle legalisation of the intrinsics is the main barrier to defining i32 versions on RV64. I'll have another think though.

Thanks for the review! I'll tweak as suggested and land tomorrow morning.

Address review comments.

asb added inline comments.Aug 3 2022, 5:41 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9899	Hi Philip, I've drafted a description that is very similar to yours, but made a few small alterations: Referencing cmpxchg as well at atomicrmw unaligned => narrow. I think this is close enough to your suggestion that I'm going to go ahead and commit, but obviously just shout if you think that didn't match your intent and we can resolve in a follow-up patch. Thanks.

This revision was landed with ongoing or failed builds.Aug 3 2022, 5:46 AM

Closed by commit rG28f12a09ae63: [RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics (authored by asb). · Explain Why

This revision was automatically updated to reflect the committed changes.

asb added a commit: rG28f12a09ae63: [RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics.

Harbormaster completed remote builds in B179016: Diff 449642.Aug 3 2022, 7:03 AM

In D130191#3694662, @asb wrote:

Yes, my recollection is that the need to handle legalisation of the intrinsics is the main barrier to defining i32 versions on RV64. I'll have another think though.

Yeah, you and Craig are correct here. I was thinking we had a legal i32 type on rv64 which we don't. So yeah, ignore me.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9899	Your tweak was totally fine.

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.cpp

25 lines

test/

CodeGen/

RISCV/

atomic-cmpxchg-branch-on-result.ll

2 lines

atomic-signext.ll

2 lines

Diff 449643

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,876 Lines • ▼ Show 20 Lines	case RISCVISD::VMV_X_S: {
// element type is wider than XLen, the least-significant XLEN bits are		// element type is wider than XLen, the least-significant XLEN bits are
// taken.		// taken.
unsigned XLen = Subtarget.getXLen();		unsigned XLen = Subtarget.getXLen();
unsigned EltBits = Op.getOperand(0).getScalarValueSizeInBits();		unsigned EltBits = Op.getOperand(0).getScalarValueSizeInBits();
if (EltBits <= XLen)		if (EltBits <= XLen)
return XLen - EltBits + 1;		return XLen - EltBits + 1;
break;		break;
}		}
		case ISD::INTRINSIC_W_CHAIN: {
		unsigned IntNo = Op.getConstantOperandVal(1);
		switch (IntNo) {
		default:
		break;
		case Intrinsic::riscv_masked_atomicrmw_xchg_i64:
		jrtc27Unsubmitted Done Reply Inline Actions Should have extra { } so it's easy to add other cases jrtc27: Should have extra { } so it's easy to add other cases
		case Intrinsic::riscv_masked_atomicrmw_add_i64:
		case Intrinsic::riscv_masked_atomicrmw_sub_i64:
		case Intrinsic::riscv_masked_atomicrmw_nand_i64:
		reamesUnsubmitted Not Done Reply Inline Actions I don't follow this comment. Is there some documentation you can point to for what this intrinsic does? Or a good pointer in code to understand it? If I'm gathering this correctly, the and-by-mask is to handle the zext of the type less than XLEN? Not following where the sign bit below that is coming from. reames: I don't follow this comment. Is there some documentation you can point to for what this…
		asbAuthorUnsubmitted Done Reply Inline Actions The intrinsics are underdocumented - I'll loop back round and fix that. For general context on our atomics lowering approach, see here. As for this specific code comment: it may be because it's the end of the day, but I started trying to better explain and confused myself. So it may be I have a mistake, the code comment needs improving, or both. Just leaving this comment now, as it may be it's best for you to hold off on taking a proper look until I've had a chance to re-review. asb: The intrinsics are underdocumented - I'll loop back round and fix that. For general context on…
		craig.topperUnsubmitted Not Done Reply Inline Actions Isn't the output of the intrinsic only the result of the LR_W? There are ANDs in the expansion, but they are to the scratch register aren't they? craig.topper: Isn't the output of the intrinsic only the result of the LR_W? There are ANDs in the expansion…
		asbAuthorUnsubmitted Done Reply Inline Actions I've improved docs for intrinsics in 85c6fab. You're both completely right that I made a logic error here initially - the intrinsic doesn't mask its return value at all, so the number of sign bits is always just 33. asb: I've improved docs for intrinsics in [85c6fab](https://reviews.llvm.org/rG85c6fab8d317).
		case Intrinsic::riscv_masked_atomicrmw_max_i64:
		case Intrinsic::riscv_masked_atomicrmw_min_i64:
		case Intrinsic::riscv_masked_atomicrmw_umax_i64:
		case Intrinsic::riscv_masked_atomicrmw_umin_i64:
		case Intrinsic::riscv_masked_cmpxchg_i64:
		// riscv_masked_{atomicrmw_*,cmpxchg} intrinsics represent an emulated
		reamesUnsubmitted Done Reply Inline Actions Ok, I finally wrapped my head around all of the machinery here, and this is correct. However, I want to suggest some changes. First, the comment: // riscv_masked_atomicrmw_* represents an emulated unaligned atomicrmw operation at the minimum supported atomicrmw width whose result is then sign extended to XLEN. With +A, the minimum width is 32 for both 64 and 32. Second, please add the asserts... assert(XLenVT == 64) assert(getMinCmpXchgSizeInBits() == 32); reames: Ok, I finally wrapped my head around all of the machinery here, and this is correct. However…
		asbAuthorUnsubmitted Done Reply Inline Actions Hi Philip, I've drafted a description that is very similar to yours, but made a few small alterations: Referencing cmpxchg as well at atomicrmw unaligned => narrow. I think this is close enough to your suggestion that I'm going to go ahead and commit, but obviously just shout if you think that didn't match your intent and we can resolve in a follow-up patch. Thanks. asb: Hi Philip, I've drafted a description that is very similar to yours, but made a few small…
		reamesUnsubmitted Not Done Reply Inline Actions Your tweak was totally fine. reames: Your tweak was totally fine.
		// narrow atomic operation. These are implemented using atomic
		// operations at the minimum supported atomicrmw/cmpxchg width whose
		// result is then sign extended to XLEN. With +A, the minimum width is
		// 32 for both 64 and 32.
		assert(Subtarget.getXLen() == 64);
		assert(getMinCmpXchgSizeInBits() == 32);
		assert(Subtarget.hasStdExtA());
		return 33;
		}
		}
}		}

return 1;		return 1;
}		}

const Constant *		const Constant *
RISCVTargetLowering::getTargetConstantFromLoad(LoadSDNode *Ld) const {		RISCVTargetLowering::getTargetConstantFromLoad(LoadSDNode *Ld) const {
assert(Ld && "Unexpected null LoadSDNode");		assert(Ld && "Unexpected null LoadSDNode");
▲ Show 20 Lines • Show All 2,853 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/atomic-cmpxchg-branch-on-result.ll

	Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: # in Loop: Header=BB2_3 Depth=2			; RV64IA-NEXT: # in Loop: Header=BB2_3 Depth=2
	; RV64IA-NEXT: xor a5, a4, a2			; RV64IA-NEXT: xor a5, a4, a2
	; RV64IA-NEXT: and a5, a5, a0			; RV64IA-NEXT: and a5, a5, a0
	; RV64IA-NEXT: xor a5, a4, a5			; RV64IA-NEXT: xor a5, a4, a5
	; RV64IA-NEXT: sc.w.aqrl a5, a5, (a3)			; RV64IA-NEXT: sc.w.aqrl a5, a5, (a3)
	; RV64IA-NEXT: bnez a5, .LBB2_3			; RV64IA-NEXT: bnez a5, .LBB2_3
	; RV64IA-NEXT: .LBB2_5: # %do_cmpxchg			; RV64IA-NEXT: .LBB2_5: # %do_cmpxchg
	; RV64IA-NEXT: # in Loop: Header=BB2_1 Depth=1			; RV64IA-NEXT: # in Loop: Header=BB2_1 Depth=1
	; RV64IA-NEXT: and a4, a4, a0			; RV64IA-NEXT: and a4, a4, a0
	craig.topperUnsubmitted Not Done Reply Inline Actions I don't think this AND is part of the intrinsic. It's a separate operation in the IR and DAG. The output of the intrinsic is a4, the lr.w.aqrl result. craig.topper: I don't think this AND is part of the intrinsic. It's a separate operation in the IR and DAG.
	; RV64IA-NEXT: sext.w a4, a4
	; RV64IA-NEXT: bne a1, a4, .LBB2_1			; RV64IA-NEXT: bne a1, a4, .LBB2_1
	; RV64IA-NEXT: # %bb.2: # %exit			; RV64IA-NEXT: # %bb.2: # %exit
	; RV64IA-NEXT: ret			; RV64IA-NEXT: ret
	entry:			entry:
	br label %do_cmpxchg			br label %do_cmpxchg
	do_cmpxchg:			do_cmpxchg:
	%0 = cmpxchg i8* %ptr, i8 %cmp, i8 %val seq_cst seq_cst			%0 = cmpxchg i8* %ptr, i8 %cmp, i8 %val seq_cst seq_cst
	%1 = extractvalue { i8, i1 } %0, 1			%1 = extractvalue { i8, i1 } %0, 1
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: xor a5, a4, a2			; RV64IA-NEXT: xor a5, a4, a2
	; RV64IA-NEXT: and a5, a5, a0			; RV64IA-NEXT: and a5, a5, a0
	; RV64IA-NEXT: xor a5, a4, a5			; RV64IA-NEXT: xor a5, a4, a5
	; RV64IA-NEXT: sc.w.aqrl a5, a5, (a3)			; RV64IA-NEXT: sc.w.aqrl a5, a5, (a3)
	; RV64IA-NEXT: bnez a5, .LBB3_3			; RV64IA-NEXT: bnez a5, .LBB3_3
	; RV64IA-NEXT: .LBB3_5: # %do_cmpxchg			; RV64IA-NEXT: .LBB3_5: # %do_cmpxchg
	; RV64IA-NEXT: # in Loop: Header=BB3_1 Depth=1			; RV64IA-NEXT: # in Loop: Header=BB3_1 Depth=1
	; RV64IA-NEXT: and a4, a4, a0			; RV64IA-NEXT: and a4, a4, a0
	; RV64IA-NEXT: sext.w a4, a4
	; RV64IA-NEXT: beq a1, a4, .LBB3_1			; RV64IA-NEXT: beq a1, a4, .LBB3_1
	; RV64IA-NEXT: # %bb.2: # %exit			; RV64IA-NEXT: # %bb.2: # %exit
	; RV64IA-NEXT: ret			; RV64IA-NEXT: ret
	entry:			entry:
	br label %do_cmpxchg			br label %do_cmpxchg
	do_cmpxchg:			do_cmpxchg:
	%0 = cmpxchg i8* %ptr, i8 %cmp, i8 %val seq_cst seq_cst			%0 = cmpxchg i8* %ptr, i8 %cmp, i8 %val seq_cst seq_cst
	%1 = extractvalue { i8, i1 } %0, 1			%1 = extractvalue { i8, i1 } %0, 1
	Show All 34 Lines

llvm/test/CodeGen/RISCV/atomic-signext.ll

	Show First 20 Lines • Show All 3,898 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB48_1 Depth=1			; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB48_1 Depth=1
	; RV64IA-NEXT: xor a5, a2, a0			; RV64IA-NEXT: xor a5, a2, a0
	; RV64IA-NEXT: and a5, a5, a4			; RV64IA-NEXT: and a5, a5, a4
	; RV64IA-NEXT: xor a5, a2, a5			; RV64IA-NEXT: xor a5, a2, a5
	; RV64IA-NEXT: sc.w a5, a5, (a3)			; RV64IA-NEXT: sc.w a5, a5, (a3)
	; RV64IA-NEXT: bnez a5, .LBB48_1			; RV64IA-NEXT: bnez a5, .LBB48_1
	; RV64IA-NEXT: .LBB48_3:			; RV64IA-NEXT: .LBB48_3:
	; RV64IA-NEXT: and a0, a2, a4			; RV64IA-NEXT: and a0, a2, a4
	; RV64IA-NEXT: sext.w a0, a0
	; RV64IA-NEXT: xor a0, a1, a0			; RV64IA-NEXT: xor a0, a1, a0
	; RV64IA-NEXT: seqz a0, a0			; RV64IA-NEXT: seqz a0, a0
	; RV64IA-NEXT: ret			; RV64IA-NEXT: ret
	%1 = cmpxchg i8* %ptr, i8 %cmp, i8 %val monotonic monotonic			%1 = cmpxchg i8* %ptr, i8 %cmp, i8 %val monotonic monotonic
	%2 = extractvalue { i8, i1 } %1, 1			%2 = extractvalue { i8, i1 } %1, 1
	ret i1 %2			ret i1 %2
	}			}

	▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB50_1 Depth=1			; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB50_1 Depth=1
	; RV64IA-NEXT: xor a4, a2, a0			; RV64IA-NEXT: xor a4, a2, a0
	; RV64IA-NEXT: and a4, a4, a5			; RV64IA-NEXT: and a4, a4, a5
	; RV64IA-NEXT: xor a4, a2, a4			; RV64IA-NEXT: xor a4, a2, a4
	; RV64IA-NEXT: sc.w a4, a4, (a3)			; RV64IA-NEXT: sc.w a4, a4, (a3)
	; RV64IA-NEXT: bnez a4, .LBB50_1			; RV64IA-NEXT: bnez a4, .LBB50_1
	; RV64IA-NEXT: .LBB50_3:			; RV64IA-NEXT: .LBB50_3:
	; RV64IA-NEXT: and a0, a2, a5			; RV64IA-NEXT: and a0, a2, a5
	; RV64IA-NEXT: sext.w a0, a0
	; RV64IA-NEXT: xor a0, a1, a0			; RV64IA-NEXT: xor a0, a1, a0
	; RV64IA-NEXT: seqz a0, a0			; RV64IA-NEXT: seqz a0, a0
	; RV64IA-NEXT: ret			; RV64IA-NEXT: ret
	%1 = cmpxchg i16* %ptr, i16 %cmp, i16 %val monotonic monotonic			%1 = cmpxchg i16* %ptr, i16 %cmp, i16 %val monotonic monotonic
	%2 = extractvalue { i16, i1 } %1, 1			%2 = extractvalue { i16, i1 } %1, 1
	ret i1 %2			ret i1 %2
	}			}

	▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines