This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
1/3
AggressiveInstCombine.cpp
-
test/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
-
AArch64/
-
or-load.ll
-
X86/
-
or-load.ll

Differential D137201

[AggressiveInstCombine] Handle the insert point of the merged load correctly.
ClosedPublic

Authored by bipmis on Nov 1 2022, 3:12 PM.

Download Raw Diff

Details

Reviewers

dmgreen
spatel
nikic
eaeltsin

Commits

rGe9393789a9fa: [AggressiveInstCombine] Handle the insert point of the merged load correctly.

Summary

This patch updates the load insert point of the merged load in AggressiveInstCombine() as implemeted in
https://reviews.llvm.org/D135137
This is done to handle the reported test breaks.

Diff Detail

Event Timeline

bipmis requested review of this revision.Nov 1 2022, 3:12 PM

bipmis created this revision.

Harbormaster completed remote builds in B195567: Diff 472423.Nov 1 2022, 4:32 PM

asmok-g added a subscriber: asmok-g.Nov 2 2022, 7:14 AM

asbirlea added a subscriber: asbirlea.Nov 2 2022, 11:12 AM

asbirlea added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
783	Is it possible for InsertPoint to be `nullptr` here?

dmgreen mentioned this in D135137: [AggressiveInstCombine] Load merge the reverse load pattern of consecutive loads..Nov 3 2022, 3:02 AM

We found another case, that looks like this where the p3 is read out of order. We are looking into getting that fixed too.

define i32 @loadCombine_4consecutive_badinsert(ptr %p) {
; LE-LABEL: @loadCombine_4consecutive_badinsert(
; LE-NEXT:    [[P3:%.*]] = getelementptr i8, ptr [[P:%.*]], i32 3
; LE-NEXT:    [[L1:%.*]] = load i32, ptr [[P]], align 1
; LE-NEXT:    store i8 0, ptr [[P3]], align 1
; LE-NEXT:    ret i32 [[L1]]
;
  %p1 = getelementptr i8, ptr %p, i32 1
  %p2 = getelementptr i8, ptr %p, i32 2
  %p3 = getelementptr i8, ptr %p, i32 3
  %l2 = load i8, ptr %p1
  store i8 0, ptr %p3, align 1
  %l3 = load i8, ptr %p2
  %l4 = load i8, ptr %p3
  %l1 = load i8, ptr %p

  %e1 = zext i8 %l1 to i32
  %e2 = zext i8 %l2 to i32
  %e3 = zext i8 %l3 to i32
  %e4 = zext i8 %l4 to i32

  %s2 = shl i32 %e2, 8
  %s3 = shl i32 %e3, 16
  %s4 = shl i32 %e4, 24

  %o1 = or i32 %e1, %s2
  %o2 = or i32 %o1, %s3
  %o3 = or i32 %o2, %s4
  ret i32 %o3
}

bipmis mentioned this in D137333: [AggressiveInstCombine] Avoid load merge/widen if stores are present b/w loads.Nov 3 2022, 5:48 AM

Update the patch to handle various corner cases of Alias Analysis by handling the insert point of the load and associated pointer. For 2 loads move the insert point to the one which occurs first. Additionally look for clobber in the merged load when the merged load occurs later.

Harbormaster completed remote builds in B196881: Diff 474234.Nov 9 2022, 7:05 AM

dmgreen added inline comments.Nov 10 2022, 9:03 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
823	If the pointer operand isn't in the same block, is that a problem? If they are in different blocks, and we know all the loads are in the same block, then we know the pointer operand dominates the RootInsert I think. It wont need the moveBefore below.

bipmis added inline comments.Nov 18 2022, 7:52 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
823	Good Point. Will update the patch with this change.

Handle review comments on Load and Load pointer in separate BB's.
@dstuttard Would be good if the patch can be tested with non-Opaque Pointers as well. I have done a sanity test and it should work OK. Thanks.

Harbormaster completed remote builds in B198749: Diff 476842.Nov 21 2022, 2:09 PM

Thanks for the update. This LGTM.

This revision is now accepted and ready to land.Nov 23 2022, 6:45 AM

This revision was landed with ongoing or failed builds.Nov 29 2022, 2:54 AM

Closed by commit rGe9393789a9fa: [AggressiveInstCombine] Handle the insert point of the merged load correctly. (authored by bipmis). · Explain Why

This revision was automatically updated to reflect the committed changes.

bipmis added a commit: rGe9393789a9fa: [AggressiveInstCombine] Handle the insert point of the merged load correctly..

Hi, this patch is creating malformed IR for some cases. Test case here: https://github.com/llvm/llvm-project/issues/62756

Herald added a subscriber: StephenFan. · View Herald TranscriptMay 16 2023, 5:37 PM

In D137201#4348346, @mnadeem wrote:

Hi, this patch is creating malformed IR for some cases. Test case here: https://github.com/llvm/llvm-project/issues/62756

Thanks for reporting. I can see the issue and have fixed it in https://reviews.llvm.org/D150864.
I have not optimised it exclusively in the AggressiveInstCombine Pass because the InstCombine does the same and generates a single GEP for the nested patterns which can be reduced. This is then Load merged by the AggressiveInstCombine.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

AggressiveInstCombine/

AggressiveInstCombine.cpp

7 lines

test/

Transforms/

AggressiveInstCombine/

AArch64/

or-load.ll

2 lines

X86/

or-load.ll

4 lines

Diff 472423

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

Show First 20 Lines • Show All 638 Lines • ▼ Show 20 Lines	static bool tryToRecognizeTableBasedCttz(Instruction &I) {
return true;		return true;
}		}

/// This is used by foldLoadsRecursive() to capture a Root Load node which is		/// This is used by foldLoadsRecursive() to capture a Root Load node which is
/// of type or(load, load) and recursively build the wide load. Also capture the		/// of type or(load, load) and recursively build the wide load. Also capture the
/// shift amount, zero extend type and loadSize.		/// shift amount, zero extend type and loadSize.
struct LoadOps {		struct LoadOps {
LoadInst *Root = nullptr;		LoadInst *Root = nullptr;
		LoadInst *InsertPoint = nullptr;
bool FoundRoot = false;		bool FoundRoot = false;
uint64_t LoadSize = 0;		uint64_t LoadSize = 0;
Value *Shift = nullptr;		Value *Shift = nullptr;
Type *ZextType;		Type *ZextType;
AAMDNodes AATags;		AAMDNodes AATags;
};		};

// Identify and Merge consecutive loads recursively which is of the form		// Identify and Merge consecutive loads recursively which is of the form
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	if ((Shift2 - Shift1) != ShiftDiff \|\| (Offset2 - Offset1) != PrevSize)
return false;		return false;

// Update LOps		// Update LOps
AAMDNodes AATags1 = LOps.AATags;		AAMDNodes AATags1 = LOps.AATags;
AAMDNodes AATags2 = LI2->getAAMetadata();		AAMDNodes AATags2 = LI2->getAAMetadata();
if (LOps.FoundRoot == false) {		if (LOps.FoundRoot == false) {
LOps.FoundRoot = true;		LOps.FoundRoot = true;
AATags1 = LI1->getAAMetadata();		AATags1 = LI1->getAAMetadata();
}		LOps.InsertPoint = Start;
		} else if (LOps.InsertPoint && Start->comesBefore(LOps.InsertPoint))
		asbirleaUnsubmitted Not Done Reply Inline Actions Is it possible for InsertPoint to be `nullptr` here? asbirlea: Is it possible for InsertPoint to be `nullptr` here?
		LOps.InsertPoint = Start;
LOps.LoadSize = LoadSize1 + LoadSize2;		LOps.LoadSize = LoadSize1 + LoadSize2;

// Concatenate the AATags of the Merged Loads.		// Concatenate the AATags of the Merged Loads.
LOps.AATags = AATags1.concat(AATags2);		LOps.AATags = AATags1.concat(AATags2);

LOps.Root = LI1;		LOps.Root = LI1;
LOps.Shift = ShAmt1;		LOps.Shift = ShAmt1;
LOps.ZextType = X->getType();		LOps.ZextType = X->getType();
Show All 22 Lines	static bool foldConsecutiveLoads(Instruction &I, const DataLayout &DL,
bool Fast = false;		bool Fast = false;
Allowed = TTI.allowsMisalignedMemoryAccesses(I.getContext(), LOps.LoadSize,		Allowed = TTI.allowsMisalignedMemoryAccesses(I.getContext(), LOps.LoadSize,
AS, LI1->getAlign(), &Fast);		AS, LI1->getAlign(), &Fast);
if (!Allowed \|\| !Fast)		if (!Allowed \|\| !Fast)
return false;		return false;

// New load can be generated		// New load can be generated
Value *Load1Ptr = LI1->getPointerOperand();		Value *Load1Ptr = LI1->getPointerOperand();
Builder.SetInsertPoint(LI1);		Builder.SetInsertPoint(LOps.InsertPoint);
		dmgreenUnsubmitted Not Done Reply Inline Actions If the pointer operand isn't in the same block, is that a problem? If they are in different blocks, and we know all the loads are in the same block, then we know the pointer operand dominates the RootInsert I think. It wont need the moveBefore below. dmgreen: If the pointer operand isn't in the same block, is that a problem? If they are in different…
		bipmisAuthorUnsubmitted Done Reply Inline Actions Good Point. Will update the patch with this change. bipmis: Good Point. Will update the patch with this change.
Value *NewPtr = Builder.CreateBitCast(Load1Ptr, WiderType->getPointerTo(AS));		Value *NewPtr = Builder.CreateBitCast(Load1Ptr, WiderType->getPointerTo(AS));
NewLoad = Builder.CreateAlignedLoad(WiderType, NewPtr, LI1->getAlign(),		NewLoad = Builder.CreateAlignedLoad(WiderType, NewPtr, LI1->getAlign(),
LI1->isVolatile(), "");		LI1->isVolatile(), "");
NewLoad->takeName(LI1);		NewLoad->takeName(LI1);
// Set the New Load AATags Metadata.		// Set the New Load AATags Metadata.
if (LOps.AATags)		if (LOps.AATags)
NewLoad->setAAMetadata(LOps.AATags);		NewLoad->setAAMetadata(LOps.AATags);

▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

llvm/test/Transforms/AggressiveInstCombine/AArch64/or-load.ll

Show First 20 Lines • Show All 1,766 Lines • ▼ Show 20 Lines	;
%s2 = shl i16 %e2, 8		%s2 = shl i16 %e2, 8
%o1 = or i16 %e1, %s2		%o1 = or i16 %e1, %s2
ret i16 %o1		ret i16 %o1
}		}

define i32 @loadCombine_4consecutive_badinsert(ptr %p) {		define i32 @loadCombine_4consecutive_badinsert(ptr %p) {
; LE-LABEL: @loadCombine_4consecutive_badinsert(		; LE-LABEL: @loadCombine_4consecutive_badinsert(
; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; LE-NEXT: store i8 0, ptr [[P1]], align 1
; LE-NEXT: [[L1:%.*]] = load i32, ptr [[P]], align 1		; LE-NEXT: [[L1:%.*]] = load i32, ptr [[P]], align 1
		; LE-NEXT: store i8 0, ptr [[P1]], align 1
; LE-NEXT: ret i32 [[L1]]		; LE-NEXT: ret i32 [[L1]]
;		;
; BE-LABEL: @loadCombine_4consecutive_badinsert(		; BE-LABEL: @loadCombine_4consecutive_badinsert(
; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
Show All 38 Lines

llvm/test/Transforms/AggressiveInstCombine/X86/or-load.ll

Show First 20 Lines • Show All 1,857 Lines • ▼ Show 20 Lines	;
%o2 = or i32 %o1, %s3		%o2 = or i32 %o1, %s3
%o3 = or i32 %o2, %s4		%o3 = or i32 %o2, %s4
ret i32 %o3		ret i32 %o3
}		}

define i16 @loadCombine_2consecutive_badinsert(ptr %p) {		define i16 @loadCombine_2consecutive_badinsert(ptr %p) {
; LE-LABEL: @loadCombine_2consecutive_badinsert(		; LE-LABEL: @loadCombine_2consecutive_badinsert(
; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; LE-NEXT: store i8 0, ptr [[P1]], align 1
; LE-NEXT: [[L1:%.*]] = load i16, ptr [[P]], align 1		; LE-NEXT: [[L1:%.*]] = load i16, ptr [[P]], align 1
		; LE-NEXT: store i8 0, ptr [[P1]], align 1
; LE-NEXT: ret i16 [[L1]]		; LE-NEXT: ret i16 [[L1]]
;		;
; BE-LABEL: @loadCombine_2consecutive_badinsert(		; BE-LABEL: @loadCombine_2consecutive_badinsert(
; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; BE-NEXT: store i8 0, ptr [[P1]], align 1		; BE-NEXT: store i8 0, ptr [[P1]], align 1
; BE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; BE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; BE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i16		; BE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i16
Show All 11 Lines	;
%s2 = shl i16 %e2, 8		%s2 = shl i16 %e2, 8
%o1 = or i16 %e1, %s2		%o1 = or i16 %e1, %s2
ret i16 %o1		ret i16 %o1
}		}

define i32 @loadCombine_4consecutive_badinsert(ptr %p) {		define i32 @loadCombine_4consecutive_badinsert(ptr %p) {
; LE-LABEL: @loadCombine_4consecutive_badinsert(		; LE-LABEL: @loadCombine_4consecutive_badinsert(
; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; LE-NEXT: store i8 0, ptr [[P1]], align 1
; LE-NEXT: [[L1:%.*]] = load i32, ptr [[P]], align 1		; LE-NEXT: [[L1:%.*]] = load i32, ptr [[P]], align 1
		; LE-NEXT: store i8 0, ptr [[P1]], align 1
; LE-NEXT: ret i32 [[L1]]		; LE-NEXT: ret i32 [[L1]]
;		;
; BE-LABEL: @loadCombine_4consecutive_badinsert(		; BE-LABEL: @loadCombine_4consecutive_badinsert(
; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
Show All 38 Lines