This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
3
DAGCombiner.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
load-store-forwarding.ll

Differential D53776

[DAGCombiner] Fix for big endian in ForwardStoreValueToDirectLoad
ClosedPublic

Authored by bjope on Oct 26 2018, 1:23 PM.

Download Raw Diff

Details

Reviewers

niravd

Commits

rGfe09a20f09a2: [DAGCombiner] Fix for big endian in ForwardStoreValueToDirectLoad
rL345636: [DAGCombiner] Fix for big endian in ForwardStoreValueToDirectLoad

Summary

Normalize the offset for endianess before checking
if the store cover the load in ForwardStoreValueToDirectLoad.

Without this we missed out on some optimizations for big
endian targets. If for example having a 4 bytes store followed
by a 1 byte load, loading the least significant byte from the
store, the STCoversLD check would fail (see @test4 in
test/CodeGen/AArch64/load-store-forwarding.ll).

This patch also fixes a problem seen in an out-of-tree target.
The target has i40 as a legal type, it is big endian,
and the StoreSize for i40 is 48 bits. So when normalizing
the offset for endianess we need to take the StoreSize into
account (assuming that padding added when storing into
a larger StoreSize always is added at the most significant
end).

Diff Detail

Repository

rL LLVM

Build Status

Buildable 24254
Build 24253: arc lint + arc unit

Event Timeline

bjope created this revision.Oct 26 2018, 1:23 PM

Herald added subscribers: kristof.beyls, javed.absar. · View Herald TranscriptOct 26 2018, 1:23 PM

Harbormaster completed remote builds in B24254: Diff 171342.Oct 26 2018, 1:24 PM

This should fix the problem discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181022/596459.html
But as shown by @test4 in the added test case (for CHECK-BE) we also get forwarding in that test now.

bjope added inline comments.Oct 26 2018, 1:36 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12870	@niravd do you remember why we have the check `(Offset * 8 <= LDMemType.getSizeInBits())` here? From my point of view it looks wrong. Maybe it is supposed to be `(Offset * 8 <= STMemType.getSizeInBits())`, i.e. checking that the load starts before the last bit written by the store. But then I guess it is enough to check `(Offset >= 0) && (Offset * 8 + LDMemType.getSizeInBits() <= STMemType.getSizeInBits())` or are we trying to catch some special case when we get overflow in the int64_t?

Modulo removing the unnecessary condition you commented on, this looks good to me. Thanks.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12870	Huh. I have no idea how that got there. We should only need the other 2 checks. Can you take it out?

This revision is now accepted and ready to land.Oct 30 2018, 12:19 PM

bjope added inline comments.Oct 30 2018, 12:32 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12870	Thanks! I'll take it out before commit.

Closed by commit rL345636: [DAGCombiner] Fix for big endian in ForwardStoreValueToDirectLoad (authored by bjope). · Explain WhyOct 30 2018, 1:19 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

21 lines

test/

CodeGen/

AArch64/

load-store-forwarding.ll

77 lines

Diff 171342

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,847 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::ForwardStoreValueToDirectLoad(LoadSDNode *LD) {
EVT LDType = LD->getValueType(0);		EVT LDType = LD->getValueType(0);
EVT LDMemType = LD->getMemoryVT();		EVT LDMemType = LD->getMemoryVT();
EVT STMemType = ST->getMemoryVT();		EVT STMemType = ST->getMemoryVT();
EVT STType = ST->getValue().getValueType();		EVT STType = ST->getValue().getValueType();

BaseIndexOffset BasePtrLD = BaseIndexOffset::match(LD, DAG);		BaseIndexOffset BasePtrLD = BaseIndexOffset::match(LD, DAG);
BaseIndexOffset BasePtrST = BaseIndexOffset::match(ST, DAG);		BaseIndexOffset BasePtrST = BaseIndexOffset::match(ST, DAG);
int64_t Offset;		int64_t Offset;
		if (!BasePtrST.equalBaseIndex(BasePtrLD, DAG, Offset))
		return SDValue();

		// Normalize for Endianness. After this Offset=0 will denote that the least
		// significant bit in the loaded value maps to the least significant bit in
		// the stored value). With Offset=n (for n > 0) the loaded value starts at the
		// n:th least significant byte of the stored value.
		if (DAG.getDataLayout().isBigEndian())
		Offset = (STMemType.getStoreSizeInBits() -
		LDMemType.getStoreSizeInBits()) / 8 - Offset;

		// Check that the stored value cover all bits that are loaded.
bool STCoversLD =		bool STCoversLD =
BasePtrST.equalBaseIndex(BasePtrLD, DAG, Offset) && (Offset >= 0) &&		(Offset >= 0) &&
(Offset * 8 <= LDMemType.getSizeInBits()) &&		(Offset * 8 <= LDMemType.getSizeInBits()) &&
		bjopeAuthorUnsubmitted Not Done Reply Inline Actions @niravd do you remember why we have the check `(Offset * 8 <= LDMemType.getSizeInBits())` here? From my point of view it looks wrong. Maybe it is supposed to be `(Offset * 8 <= STMemType.getSizeInBits())`, i.e. checking that the load starts before the last bit written by the store. But then I guess it is enough to check `(Offset >= 0) && (Offset * 8 + LDMemType.getSizeInBits() <= STMemType.getSizeInBits())` or are we trying to catch some special case when we get overflow in the int64_t? bjope: @niravd do you remember why we have the check `(Offset * 8 <= LDMemType.getSizeInBits())` here?
		niravdUnsubmitted Not Done Reply Inline Actions Huh. I have no idea how that got there. We should only need the other 2 checks. Can you take it out? niravd: Huh. I have no idea how that got there. We should only need the other 2 checks. Can you take…
		bjopeAuthorUnsubmitted Not Done Reply Inline Actions Thanks! I'll take it out before commit. bjope: Thanks! I'll take it out before commit.
(Offset * 8 + LDMemType.getSizeInBits() <= STMemType.getSizeInBits());		(Offset * 8 + LDMemType.getSizeInBits() <= STMemType.getSizeInBits());

if (!STCoversLD)		if (!STCoversLD)
return SDValue();		return SDValue();

// Normalize for Endianness.
if (DAG.getDataLayout().isBigEndian())
Offset =
(STMemType.getSizeInBits() - LDMemType.getSizeInBits()) / 8 - Offset;

// Memory as copy space (potentially masked).		// Memory as copy space (potentially masked).
if (Offset == 0 && LDType == STType && STMemType == LDMemType) {		if (Offset == 0 && LDType == STType && STMemType == LDMemType) {
// Simple case: Direct non-truncating forwarding		// Simple case: Direct non-truncating forwarding
if (LDType.getSizeInBits() == LDMemType.getSizeInBits())		if (LDType.getSizeInBits() == LDMemType.getSizeInBits())
return CombineTo(LD, ST->getValue(), Chain);		return CombineTo(LD, ST->getValue(), Chain);
// Can we model the truncate and extension with an and mask?		// Can we model the truncate and extension with an and mask?
if (STType.isInteger() && LDMemType.isInteger() && !STType.isVector() &&		if (STType.isInteger() && LDMemType.isInteger() && !STType.isVector() &&
!LDMemType.isVector() && LD->getExtensionType() != ISD::SEXTLOAD) {		!LDMemType.isVector() && LD->getExtensionType() != ISD::SEXTLOAD) {
Show All 15 Lines	SDValue DAGCombiner::ForwardStoreValueToDirectLoad(LoadSDNode *LD) {
// Truncate Value To Stored Memory Size.		// Truncate Value To Stored Memory Size.
do {		do {
if (!getTruncatedStoreValue(ST, Val))		if (!getTruncatedStoreValue(ST, Val))
continue;		continue;
if (!isTypeLegal(LDMemType))		if (!isTypeLegal(LDMemType))
continue;		continue;
if (STMemType != LDMemType) {		if (STMemType != LDMemType) {
// TODO: Support vectors? This requires extract_subvector/bitcast.		// TODO: Support vectors? This requires extract_subvector/bitcast.
if (!STMemType.isVector() && !LDMemType.isVector() &&		if (!STMemType.isVector() && !LDMemType.isVector() &&
STMemType.isInteger() && LDMemType.isInteger())		STMemType.isInteger() && LDMemType.isInteger())
Val = DAG.getNode(ISD::TRUNCATE, SDLoc(LD), LDMemType, Val);		Val = DAG.getNode(ISD::TRUNCATE, SDLoc(LD), LDMemType, Val);
else		else
continue;		continue;
}		}
if (!extendLoadedValueToExtension(LD, Val))		if (!extendLoadedValueToExtension(LD, Val))
continue;		continue;
return CombineTo(LD, Val, Chain);		return CombineTo(LD, Val, Chain);
▲ Show 20 Lines • Show All 6,155 Lines • Show Last 20 Lines

test/CodeGen/AArch64/load-store-forwarding.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64_be -o - %s \| FileCheck %s --check-prefix CHECK-BE
				; RUN: llc -mtriple=aarch64 -o - %s \| FileCheck %s --check-prefix CHECK-LE

				define i8 @test1(i32 %a, i8* %pa) {
				; CHECK-BE-LABEL: test1:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: str w0, [x1]
				; CHECK-BE-NEXT: ldrb w0, [x1]
				; CHECK-BE-NEXT: ret
				;
				; CHECK-LE-LABEL: test1:
				; CHECK-LE: // %bb.0:
				; CHECK-LE-NEXT: str w0, [x1]
				; CHECK-LE-NEXT: ret
				%p32 = bitcast i8* %pa to i32*
				%p8 = getelementptr i8, i8* %pa, i32 0
				store i32 %a, i32* %p32
				%res = load i8, i8* %p8
				ret i8 %res
				}

				define i8 @test2(i32 %a, i8* %pa) {
				; CHECK-BE-LABEL: test2:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: str w0, [x1]
				; CHECK-BE-NEXT: ldrb w0, [x1, #1]
				; CHECK-BE-NEXT: ret
				;
				; CHECK-LE-LABEL: test2:
				; CHECK-LE: // %bb.0:
				; CHECK-LE-NEXT: str w0, [x1]
				; CHECK-LE-NEXT: ubfx w0, w0, #8, #8
				; CHECK-LE-NEXT: ret
				%p32 = bitcast i8* %pa to i32*
				%p8 = getelementptr i8, i8* %pa, i32 1
				store i32 %a, i32* %p32
				%res = load i8, i8* %p8
				ret i8 %res
				}

				define i8 @test3(i32 %a, i8* %pa) {
				; CHECK-BE-LABEL: test3:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: str w0, [x1]
				; CHECK-BE-NEXT: ldrb w0, [x1, #2]
				; CHECK-BE-NEXT: ret
				;
				; CHECK-LE-LABEL: test3:
				; CHECK-LE: // %bb.0:
				; CHECK-LE-NEXT: str w0, [x1]
				; CHECK-LE-NEXT: ubfx w0, w0, #16, #8
				; CHECK-LE-NEXT: ret
				%p32 = bitcast i8* %pa to i32*
				%p8 = getelementptr i8, i8* %pa, i32 2
				store i32 %a, i32* %p32
				%res = load i8, i8* %p8
				ret i8 %res
				}

				define i8 @test4(i32 %a, i8* %pa) {
				; CHECK-BE-LABEL: test4:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: str w0, [x1]
				; CHECK-BE-NEXT: ret
				;
				; CHECK-LE-LABEL: test4:
				; CHECK-LE: // %bb.0:
				; CHECK-LE-NEXT: str w0, [x1]
				; CHECK-LE-NEXT: lsr w0, w0, #24
				; CHECK-LE-NEXT: ret
				%p32 = bitcast i8* %pa to i32*
				%p8 = getelementptr i8, i8* %pa, i32 3
				store i32 %a, i32* %p32
				%res = load i8, i8* %p8
				ret i8 %res
				}