Download Raw Diff

Details

Reviewers

efriedma
dmgreen
vhscampos

Commits

rG249bd9eab0aa: [ARM] Fix codegen of unaligned volatile load/store of i64

Summary

Volatile loads/stores of i64 are lowered to LDRD/STRD on ARMv5TE.
However, these instructions require the addresses to be aligned.
Unaligned loads/stores therefore should be ignored by this handling.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

momo5502 created this revision.Jun 13 2023, 2:08 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2023, 2:08 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

momo5502 requested review of this revision.Jun 13 2023, 2:08 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2023, 2:08 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Hello

Should the condition be bases on Subtarget->allowsUnalignedMem too? I'm not sure exactly in which architectures LDRD required an alignment of 8 and when it required an alignment of 4. I believe in most cases it is 4, with some early architectures requiring 8 if Subtarget->allowsUnalignedMem was false.

Harbormaster completed remote builds in B238431: Diff 530822.Jun 13 2023, 3:59 AM

efriedma added inline comments.Jun 13 2023, 10:41 AM

llvm/lib/Target/ARM/ARMISelLowering.cpp
10085	Using getABITypeAlign() like this is confusing; if you mean "4", just write 4.

I did have Align(4) initially, but saw that it was done differently here: https://github.com/llvm/llvm-project/blob/2a1716dec57e8b3dd668df17ecbedfc77a4112e5/llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp#L2294
I simply copied the expression. If that is not applicable here, I can just use 4 instead or respect allowsUnalignedMem.

I have no idea why the ARMLoadStoreOptimizer code was written that way; maybe fix that too, for the sake of clarity.

In D152790#4419917, @efriedma wrote:

I have no idea why the ARMLoadStoreOptimizer code was written that way; maybe fix that too, for the sake of clarity.

I adjusted it here to Align(4).

I also tried to adjust the ARMLoadStoreOptimizer and it turns out that getABITypeAlign of i64 is 8 and not 4,. That seems wrong in case of aligning loads. Therefore your comment is valid and it should be fixed there too. However, that breaks quite a bunch of tests and I feel this exceeds the scope of this change. I recommend creating an issue for that.

@dmgreen, concerning allowsUnalignedMem, I'm not sure how to proceed with that. Only applying the alignment check form this change when strict-alignment is enabled causes the instructions in this tests to be lowered to 16 one-byte loads and stores. However, 2 four-byte loads seem perfectly fine to me, at least on the system where I encountered this issue. To me, not taking allowsUnalignedMem into consideration seems fine. What do you think?

Harbormaster completed remote builds in B239066: Diff 531670.Jun 15 2023, 4:45 AM

re: allowsUnalignedMem:

Certain memory (e.g. memmap'ed registers, uncached memory) have restrictions that go beyond the normal rules for memory. allowsUnalignedMem() is supposed to model that: all pointer accesses have to be naturally aligned to avoid faults, even if the CPU supports unaligned access to cached memory.

I'm not 100% sure what the interaction is between that and ldrd/strd off the top of my head. But the idea would be to narrow the check to Align(Subtarget->hasV6Ops() && Subtarget->allowsUnalignedMem() ? 4 : 8), so we don't emit a ldrd such that "addr % 8 == 4".

I haven't checked any hardware, but from reading the reference manuals it looked like the alignment requirement would be Subtarget->hasV6Ops() || Subtarget->allowsUnalignedMem() ? 4 : 8. As in for newer archs 4 is always fine. For v5te the ldrd requires an alignment of 4 with U=1 and 8 with U=0. I haven't tested that though.

Spent a bit of time digging in the armv7 reference manual. Apparently on armv7, it should always be fine to use ldrd for word alignment. On armv5, as you've noted, it's "UNPREDICTABLE". For armv6, you can switch between armv5 semantics and armv7 semantics. (sections D12.3.1 and D15.3.1.) On v6 targets, if allowsUnalignedMem() is true, we can safely assume we're using v7 semantics; if we're in strict alignment mode, I don't think we want to make any assumptions. (We could theoretically make it a flag specified by the user, but that seems like overkill.)

Note that we currently assume the frontend will set the "+strict-align" target attribute if we're compiling for v4 or v5, so compiling for -mtriple=armv5e-arm-none-eabi without also adding -mattr=+strict-align will produce weird results: we assume unaligned accesses work, but they clearly won't. This is probably a bug; the backend should do something more reasonable by default on v4/v5 targets.

Given that, I guess what we want is actually something like Align(Subtarget->hasV7Ops() || Subtarget->allowsUnalignedMem() ? 4 : 8). Maybe wrap that up in a helper and stick it in ARMSubtarget.h, so we can use it for ARMLoadStoreOptimizer in a followup.

Thank you for investigating it. I have added a helper called getDualLoadStoreAlignment. Feel free to suggest a better name if you have one.

Harbormaster completed remote builds in B239744: Diff 532567.Jun 19 2023, 2:50 AM

I'd like to see better test coverage for align-4 case: specifically, a test that checks we produce ldrd for targets where it's legal, but a pair of ldr where it isn't.

I added an align 4 test for armv7 and armv6 with and without strict alignment.
I omitted armv5 here, as, due to the missing strict align, the test generates an ldrd which is probably not something one wants to explicitly test for.
Question is, should I add strict align to armv4/v5 test cases here and adjust the test accordingly?

Added an armv7 strict case to make sure it is identical to non-strict for align 4

Harbormaster completed remote builds in B239929: Diff 532805.Jun 20 2023, 1:03 AM

armv4/armv5 should have strict align, yes.

(Also, you might want to autogenerate the CHECK lines using update_llc_test_checks.py; it's a little more verbose, but a lot easier to update.)

Thank you for the hint. Using update_llc_test_checks.py makes writing the tests much easier.

Harbormaster completed remote builds in B240186: Diff 533179.Jun 21 2023, 2:35 AM

LGTM

(I'm guessing you don't have commit access? Please let me know the author name/email you want on the commit.)

This revision is now accepted and ready to land.Jun 21 2023, 10:19 AM

In D152790#4438421, @efriedma wrote:

LGTM

(I'm guessing you don't have commit access? Please let me know the author name/email you want on the commit.)

Thank you. You're right, I don't have commit access. In my previous commits I used this: Maurice Heumann <MauriceHeumann@gmail.com>

as soon as this is commited,I will try to adjust the ARMLoadStoreOptimizer accordingly. @eli.friedman you have commit permissions, right? would you mind committing this change for me with the name and email provided above?

I'll get it merged in the next couple days; sorry about the delay.

This revision was landed with ongoing or failed builds.Jun 26 2023, 10:46 AM

Closed by commit rG249bd9eab0aa: [ARM] Fix codegen of unaligned volatile load/store of i64 (authored by momo5502, committed by efriedma). · Explain Why

This revision was automatically updated to reflect the committed changes.

efriedma added a commit: rG249bd9eab0aa: [ARM] Fix codegen of unaligned volatile load/store of i64.

momo5502 mentioned this in D153800: [ARM] Adjust strd/ldrd codegen alignment requirements.Jun 26 2023, 11:59 AM

efriedma mentioned this in rG92a9c30c61da: [ARM] Adjust strd/ldrd codegen alignment requirements.Jul 2 2023, 2:25 PM

efriedma mentioned this in rGa1cdb323e261: [ARM] Adjust strd/ldrd codegen alignment requirements.Jul 14 2023, 12:54 PM

Diff 531670

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 10,074 Lines • ▼ Show 20 Lines

	void ARMTargetLowering::LowerLOAD(SDNode *N, SmallVectorImpl<SDValue> &Results,			void ARMTargetLowering::LowerLOAD(SDNode *N, SmallVectorImpl<SDValue> &Results,
	SelectionDAG &DAG) const {			SelectionDAG &DAG) const {
	LoadSDNode *LD = cast<LoadSDNode>(N);			LoadSDNode *LD = cast<LoadSDNode>(N);
	EVT MemVT = LD->getMemoryVT();			EVT MemVT = LD->getMemoryVT();
	assert(LD->isUnindexed() && "Loads should be unindexed at this point.");			assert(LD->isUnindexed() && "Loads should be unindexed at this point.");

	if (MemVT == MVT::i64 && Subtarget->hasV5TEOps() &&			if (MemVT == MVT::i64 && Subtarget->hasV5TEOps() &&
	!Subtarget->isThumb1Only() && LD->isVolatile()) {			!Subtarget->isThumb1Only() && LD->isVolatile() &&
				LD->getAlign() >= Align(Subtarget->hasV6Ops() ? 4 : 8)) {
	SDLoc dl(N);			SDLoc dl(N);
				efriedmaUnsubmitted Not Done Reply Inline Actions Using getABITypeAlign() like this is confusing; if you mean "4", just write 4. efriedma: Using getABITypeAlign() like this is confusing; if you mean "4", just write 4.
	SDValue Result = DAG.getMemIntrinsicNode(			SDValue Result = DAG.getMemIntrinsicNode(
	ARMISD::LDRD, dl, DAG.getVTList({MVT::i32, MVT::i32, MVT::Other}),			ARMISD::LDRD, dl, DAG.getVTList({MVT::i32, MVT::i32, MVT::Other}),
	{LD->getChain(), LD->getBasePtr()}, MemVT, LD->getMemOperand());			{LD->getChain(), LD->getBasePtr()}, MemVT, LD->getMemOperand());
	SDValue Lo = Result.getValue(DAG.getDataLayout().isLittleEndian() ? 0 : 1);			SDValue Lo = Result.getValue(DAG.getDataLayout().isLittleEndian() ? 0 : 1);
	SDValue Hi = Result.getValue(DAG.getDataLayout().isLittleEndian() ? 1 : 0);			SDValue Hi = Result.getValue(DAG.getDataLayout().isLittleEndian() ? 1 : 0);
	SDValue Pair = DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i64, Lo, Hi);			SDValue Pair = DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i64, Lo, Hi);
	Results.append({Pair, Result.getValue(2)});			Results.append({Pair, Result.getValue(2)});
	}			}
	Show All 39 Lines

	static SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG,			static SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG,
	const ARMSubtarget *Subtarget) {			const ARMSubtarget *Subtarget) {
	StoreSDNode *ST = cast<StoreSDNode>(Op.getNode());			StoreSDNode *ST = cast<StoreSDNode>(Op.getNode());
	EVT MemVT = ST->getMemoryVT();			EVT MemVT = ST->getMemoryVT();
	assert(ST->isUnindexed() && "Stores should be unindexed at this point.");			assert(ST->isUnindexed() && "Stores should be unindexed at this point.");

	if (MemVT == MVT::i64 && Subtarget->hasV5TEOps() &&			if (MemVT == MVT::i64 && Subtarget->hasV5TEOps() &&
	!Subtarget->isThumb1Only() && ST->isVolatile()) {			!Subtarget->isThumb1Only() && ST->isVolatile() &&
				ST->getAlign() >= Align(Subtarget->hasV6Ops() ? 4 : 8)) {
	SDNode *N = Op.getNode();			SDNode *N = Op.getNode();
	SDLoc dl(N);			SDLoc dl(N);

	SDValue Lo = DAG.getNode(			SDValue Lo = DAG.getNode(
	ISD::EXTRACT_ELEMENT, dl, MVT::i32, ST->getValue(),			ISD::EXTRACT_ELEMENT, dl, MVT::i32, ST->getValue(),
	DAG.getTargetConstant(DAG.getDataLayout().isLittleEndian() ? 0 : 1, dl,			DAG.getTargetConstant(DAG.getDataLayout().isLittleEndian() ? 0 : 1, dl,
	MVT::i32));			MVT::i32));
	SDValue Hi = DAG.getNode(			SDValue Hi = DAG.getNode(
	▲ Show 20 Lines • Show All 11,983 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/i64_volatile_load_store.ll

	; RUN: llc -mtriple=armv5e-arm-none-eabi %s -o - \| FileCheck %s --check-prefixes=CHECK-ARMV5TE,CHECK			; RUN: llc -mtriple=armv5e-arm-none-eabi %s -o - \| FileCheck %s --check-prefixes=CHECK-ARMV5TE,CHECK
	; RUN: llc -mtriple=thumbv6t2-arm-none-eabi %s -o - \| FileCheck %s --check-prefixes=CHECK-T2,CHECK			; RUN: llc -mtriple=thumbv6t2-arm-none-eabi %s -o - \| FileCheck %s --check-prefixes=CHECK-T2,CHECK
	; RUN: llc -mtriple=armv4t-arm-none-eabi %s -o - \| FileCheck %s --check-prefixes=CHECK-ARMV4T,CHECK			; RUN: llc -mtriple=armv4t-arm-none-eabi %s -o - \| FileCheck %s --check-prefixes=CHECK-ARMV4T,CHECK

	@x = common dso_local global i64 0, align 8			@x = common dso_local global i64 0, align 8
	@y = common dso_local global i64 0, align 8			@y = common dso_local global i64 0, align 8

				@x_unaligned = common dso_local global i64 0, align 1
				@y_unaligned = common dso_local global i64 0, align 1

	define void @test() {			define void @test() {
	entry:			entry:
	; CHECK-LABEL: test:			; CHECK-LABEL: test:
	; CHECK-ARMV5TE: ldr [[ADDR0:r[0-9]+]]			; CHECK-ARMV5TE: ldr [[ADDR0:r[0-9]+]]
	; CHECK-ARMV5TE-NEXT: ldr [[ADDR1:r[0-9]+]]			; CHECK-ARMV5TE-NEXT: ldr [[ADDR1:r[0-9]+]]
	; CHECK-ARMV5TE-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], [[[ADDR0]]]			; CHECK-ARMV5TE-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], [[[ADDR0]]]
	; CHECK-ARMV5TE-NEXT: strd [[R0]], [[R1]], [[[ADDR1]]]			; CHECK-ARMV5TE-NEXT: strd [[R0]], [[R1]], [[[ADDR1]]]
	; CHECK-T2: movw [[ADDR0:r[0-9]+]], :lower16:x			; CHECK-T2: movw [[ADDR0:r[0-9]+]], :lower16:x
	; CHECK-T2-NEXT: movw [[ADDR1:r[0-9]+]], :lower16:y			; CHECK-T2-NEXT: movw [[ADDR1:r[0-9]+]], :lower16:y
	; CHECK-T2-NEXT: movt [[ADDR0]], :upper16:x			; CHECK-T2-NEXT: movt [[ADDR0]], :upper16:x
	; CHECK-T2-NEXT: movt [[ADDR1]], :upper16:y			; CHECK-T2-NEXT: movt [[ADDR1]], :upper16:y
	; CHECK-T2-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], [[[ADDR0]]]			; CHECK-T2-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], [[[ADDR0]]]
	; CHECK-T2-NEXT: strd [[R0]], [[R1]], [[[ADDR1]]]			; CHECK-T2-NEXT: strd [[R0]], [[R1]], [[[ADDR1]]]
	; CHECK-ARMV4T: ldr [[ADDR0:r[0-9]+]]			; CHECK-ARMV4T: ldr [[ADDR0:r[0-9]+]]
	; CHECK-ARMV4T-NEXT: ldr [[ADDR1:r[0-9]+]]			; CHECK-ARMV4T-NEXT: ldr [[ADDR1:r[0-9]+]]
	; CHECK-ARMV4T-NEXT: ldr [[R1:r[0-9]+]], [[[ADDR0]]]			; CHECK-ARMV4T-NEXT: ldr [[R1:r[0-9]+]], [[[ADDR0]]]
	; CHECK-ARMV4T-NEXT: ldr [[R0:r[0-9]+]], [[[ADDR0]], #4]			; CHECK-ARMV4T-NEXT: ldr [[R0:r[0-9]+]], [[[ADDR0]], #4]
	; CHECK-ARMV4T-NEXT: str [[R0]], [[[ADDR1]], #4]			; CHECK-ARMV4T-NEXT: str [[R0]], [[[ADDR1]], #4]
	; CHECK-ARMV4T-NEXT: str [[R1]], [[[ADDR1]]]			; CHECK-ARMV4T-NEXT: str [[R1]], [[[ADDR1]]]
	%0 = load volatile i64, ptr @x, align 8			%0 = load volatile i64, ptr @x, align 8
	store volatile i64 %0, ptr @y, align 8			store volatile i64 %0, ptr @y, align 8
	ret void			ret void
	}			}

				define void @test_unaligned() {
				entry:
				; CHECK-LABEL: test_unaligned:
				; CHECK-ARMV5TE: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV5TE-NEXT: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV5TE-NEXT: ldr [[R1:r[0-9]+]], [[[ADDR0]]]
				; CHECK-ARMV5TE-NEXT: ldr [[R0:r[0-9]+]], [[[ADDR0]], #4]
				; CHECK-ARMV5TE-NEXT: str [[R0]], [[[ADDR1]], #4]
				; CHECK-ARMV5TE-NEXT: str [[R1]], [[[ADDR1]]]
				; CHECK-T2: movw [[ADDR0:r[0-9]+]], :lower16:x_unaligned
				; CHECK-T2-NEXT: movw [[ADDR1:r[0-9]+]], :lower16:y_unaligned
				; CHECK-T2-NEXT: movt [[ADDR0]], :upper16:x_unaligned
				; CHECK-T2-NEXT: movt [[ADDR1]], :upper16:y_unaligned
				; CHECK-T2-NEXT: ldr [[R1]], [[[ADDR0]]]
				; CHECK-T2-NEXT: ldr [[R0]], [[[ADDR0]], #4]
				; CHECK-T2-NEXT: str [[R0]], [[[ADDR1]], #4]
				; CHECK-T2-NEXT: str [[R1]], [[[ADDR1]]]
				; CHECK-ARMV4T: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV4T-NEXT: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV4T-NEXT: ldr [[R1:r[0-9]+]], [[[ADDR0]]]
				; CHECK-ARMV4T-NEXT: ldr [[R0:r[0-9]+]], [[[ADDR0]], #4]
				; CHECK-ARMV4T-NEXT: str [[R0]], [[[ADDR1]], #4]
				; CHECK-ARMV4T-NEXT: str [[R1]], [[[ADDR1]]]
				%0 = load volatile i64, ptr @x_unaligned, align 1
				store volatile i64 %0, ptr @y_unaligned, align 1
				ret void
				}

	define void @test_offset() {			define void @test_offset() {
	entry:			entry:
	; CHECK-LABEL: test_offset:			; CHECK-LABEL: test_offset:
	; CHECK-ARMV5TE: ldr [[ADDR0:r[0-9]+]]			; CHECK-ARMV5TE: ldr [[ADDR0:r[0-9]+]]
	; CHECK-ARMV5TE-NEXT: ldr [[ADDR1:r[0-9]+]]			; CHECK-ARMV5TE-NEXT: ldr [[ADDR1:r[0-9]+]]
	; CHECK-ARMV5TE-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], [[[ADDR0]], #-4]			; CHECK-ARMV5TE-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], [[[ADDR0]], #-4]
	; CHECK-ARMV5TE-NEXT: strd [[R0]], [[R1]], [[[ADDR1]], #-4]			; CHECK-ARMV5TE-NEXT: strd [[R0]], [[R1]], [[[ADDR1]], #-4]
	; CHECK-T2: movw [[ADDR0:r[0-9]+]], :lower16:x			; CHECK-T2: movw [[ADDR0:r[0-9]+]], :lower16:x
	▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Fix codegen of unaligned volatile load/store of i64
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 531670

llvm/lib/Target/ARM/ARMISelLowering.cpp

llvm/test/CodeGen/ARM/i64_volatile_load_store.ll

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Fix codegen of unaligned volatile load/store of i64ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 531670

llvm/lib/Target/ARM/ARMISelLowering.cpp

llvm/test/CodeGen/ARM/i64_volatile_load_store.ll

[ARM] Fix codegen of unaligned volatile load/store of i64
ClosedPublic