This is an archive of the discontinued LLVM Phabricator instance.

Add ability to detect no-alias between different fields of the same structure
Needs RevisionPublic

Authored by pawosm01 on Sep 28 2021, 10:53 AM.

Download Raw Diff

Details

Reviewers

tstellar
jdoerfert
nikic
lebedev.ri

Summary

BasicAliasAnalysis tends to give up on proving no-aliasing between
two memory accesses when they share the same underlying object (e.g
when they are fields of the same structure). Such simplified approach
results in losing optimization opportunities that competing compilers
are able to exploit. Namely, when a loop invariant is loaded from the
i-th element of some array being a field of some global structure and
on each iteration i-th element of another array being a different
field of the same global structure is written to, the load is not
hoisted resulting in tragic performance at -O3 comparing to what
other compiler can achieve at the same optimization level.

This patch adds some additional logic that checks just for that:
it returns NoAlias when accessed elements of the same underlying
object are two different fields of the same structure.

Some of the test cases had to be modified in global_alias.ll which
represented the situation addressed by this patch.

This patch may seem somewhat constrained as it was focused on
improving performance of the particular loop I was working with. Doing
it any more general results in a certain number of failing test cases.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	180 ms	x64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-cxa-atexit.S
	180 ms	x64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-static-initializer.S
	180 ms	x64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-tls.S

Event Timeline

pawosm01 created this revision.Sep 28 2021, 10:53 AM

Herald added subscribers: jeroen.dobbelaere, hiraditya. · View Herald TranscriptSep 28 2021, 10:53 AM

pawosm01 requested review of this revision.Sep 28 2021, 10:53 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptSep 28 2021, 10:53 AM

pawosm01 edited the summary of this revision. (Show Details)Sep 28 2021, 10:56 AM

pawosm01 added subscribers: tstellar, jdoerfert.

pawosm01 added reviewers: tstellar, jdoerfert.Sep 28 2021, 11:04 AM

pawosm01 removed a subscriber: jeroen.dobbelaere.

BasicAA doesn't do this because it's wrong. It is legal to overindex an array in a structure, as long as the bounds of the underlying object are not exceeded, even for inbounds GEP.

This revision now requires changes to proceed.Sep 28 2021, 11:06 AM

Harbormaster completed remote builds in B126143: Diff 375644.Sep 28 2021, 11:14 AM

This is valid given C++ semantics, but not LLVM IR semantics.
I have it my queue to encode the C++ semantics, hang on.

jeroen.dobbelaere added a subscriber: jeroen.dobbelaere.Sep 28 2021, 11:27 AM

I do realize that this patch isn't the best solution for this problem, but I wonder what it would be. Maybe LICM (where the decision whether to hoist is made) should rely on something more than BasicAA, but which pass should do such analysis and basing on what criteria?

In D110642#3030411, @pawosm01 wrote:

I do realize that this patch isn't the best solution for this problem, but I wonder what it would be. Maybe LICM (where the decision whether to hoist is made) should rely on something more than BasicAA, but which pass should do such analysis and basing on what criteria?

The correct solution for this is implemented in D109746, but still needs to be combined with additional range information encoded by the frontend and/or stronger range analysis.

bsmith added a subscriber: bsmith.Feb 28 2022, 3:00 AM

This review may be stuck/dead, consider abandoning if no longer relevant.
Removing myself as reviewer in attempt to clean dashboard.

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 4:58 PM

Herald added subscribers: • pcwang-thead, StephenFan. · View Herald Transcript

Revision Contents

Path

Size

llvm/

lib/

Analysis/

BasicAliasAnalysis.cpp

39 lines

test/

Transforms/

LoopVectorize/

global_alias.ll

14 lines

Diff 375644

llvm/lib/Analysis/BasicAliasAnalysis.cpp

Show First 20 Lines • Show All 1,554 Lines • ▼ Show 20 Lines	if (O1 != O2) {
// location if that memory location doesn't escape. Or it may pass a		// location if that memory location doesn't escape. Or it may pass a
// nocapture value to other functions as long as they don't capture it.		// nocapture value to other functions as long as they don't capture it.
if (isEscapeSource(O1) &&		if (isEscapeSource(O1) &&
AAQI.CI->isNotCapturedBeforeOrAt(O2, cast<Instruction>(O1)))		AAQI.CI->isNotCapturedBeforeOrAt(O2, cast<Instruction>(O1)))
return AliasResult::NoAlias;		return AliasResult::NoAlias;
if (isEscapeSource(O2) &&		if (isEscapeSource(O2) &&
AAQI.CI->isNotCapturedBeforeOrAt(O1, cast<Instruction>(O2)))		AAQI.CI->isNotCapturedBeforeOrAt(O1, cast<Instruction>(O2)))
return AliasResult::NoAlias;		return AliasResult::NoAlias;
		} else {
		// In case the underlying object is a pointer to a structure and the GEP
		// operators refer two different fields of it.
		// Let's return NoAlias only when there is no doubt about it.
		if (const PointerType *PtrTy = dyn_cast<PointerType>(O1->getType())) {
		if (!(PtrTy->isOpaque())) {
		if (PtrTy->getElementType()->isStructTy()) {
		const GEPOperator *GEP1 = dyn_cast<GEPOperator>(V1);
		const GEPOperator *GEP2 = dyn_cast<GEPOperator>(V2);
		if (GEP1 && GEP2) {
		if ((GEP1->hasIndices() && GEP2->hasIndices()) &&
		((GEP1->getNumIndices() <= 3U) &&
		(GEP2->getNumIndices() <= 3U))) {
		const auto &Ids1 = GEP1->idx_begin();
		const auto &Ids2 = GEP2->idx_begin();
		for (int i = 0; i < 2; i++) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]…
		const ConstantInt *Idx1 = dyn_cast<ConstantInt>(Ids1[i]);
		const ConstantInt *Idx2 = dyn_cast<ConstantInt>(Ids2[i]);
		if (Idx1 && Idx2) {
		int64_t id1 = Idx1->getSExtValue();
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'id1' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'id1' [readability-identifier-naming]…
		int64_t id2 = Idx2->getSExtValue();
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'id2' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'id2' [readability-identifier-naming]…
		switch (i) {
		case 0: // Make sure it is 'the same' struct
		if (id1 == id2)
		continue;
		break;
		case 1: // Make sure it is different field of the same struct
		if (id1 != id2) // It is not: no alias possible!
		return AliasResult::NoAlias;
		break;
		}
		}
		break; // We can't continue analyzing that...
		}
		}
		}
		}
		}
		}
}		}

// If the size of one access is larger than the entire object on the other		// If the size of one access is larger than the entire object on the other
// side, then we know such behavior is undefined and can assume no alias.		// side, then we know such behavior is undefined and can assume no alias.
bool NullIsValidLocation = NullPointerIsDefined(&F);		bool NullIsValidLocation = NullPointerIsDefined(&F);
if ((isObjectSmallerThan(		if ((isObjectSmallerThan(
O2, getMinimalExtentFrom(*V1, V1Size, DL, NullIsValidLocation), DL,		O2, getMinimalExtentFrom(*V1, V1Size, DL, NullIsValidLocation), DL,
TLI, NullIsValidLocation)) \|\|		TLI, NullIsValidLocation)) \|\|
▲ Show 20 Lines • Show All 316 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/global_alias.ll

Show First 20 Lines • Show All 763 Lines • ▼ Show 20 Lines	for.end: ; preds = %for.cond
%7 = load i32, i32* %arrayidx5, align 4		%7 = load i32, i32* %arrayidx5, align 4
ret i32 %7		ret i32 %7
}		}


;; === Now, the tests that we could vectorize with induction changes or run-time checks ===		;; === Now, the tests that we could vectorize with induction changes or run-time checks ===


; /// Different objects, swapped induction, alias at the end		; /// Different objects, swapped induction, assuming no alias
; int mayAlias01 (int a) {		; int mayAlias01 (int a) {
; int i;		; int i;
; for (i=0; i<SIZE; i++)		; for (i=0; i<SIZE; i++)
; Foo.A[i] = Foo.B[SIZE-i-1] + a;		; Foo.A[i] = Foo.B[SIZE-i-1] + a;
; return Foo.A[a];		; return Foo.A[a];
; }		; }
; CHECK-LABEL: define i32 @mayAlias01(		; CHECK-LABEL: define i32 @mayAlias01(
; CHECK-NOT: add nsw <4 x i32>		; CHECK: add nsw <4 x i32>
; CHECK: ret		; CHECK: ret

define i32 @mayAlias01(i32 %a) nounwind {		define i32 @mayAlias01(i32 %a) nounwind {
entry:		entry:
%a.addr = alloca i32, align 4		%a.addr = alloca i32, align 4
%i = alloca i32, align 4		%i = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4		store i32 %a, i32* %a.addr, align 4
store i32 0, i32* %i, align 4		store i32 0, i32* %i, align 4
Show All 25 Lines

for.end: ; preds = %for.cond		for.end: ; preds = %for.cond
%6 = load i32, i32* %a.addr, align 4		%6 = load i32, i32* %a.addr, align 4
%arrayidx3 = getelementptr inbounds [100 x i32], [100 x i32]* getelementptr inbounds (%struct.anon, %struct.anon* @Foo, i32 0, i32 0), i32 0, i32 %6		%arrayidx3 = getelementptr inbounds [100 x i32], [100 x i32]* getelementptr inbounds (%struct.anon, %struct.anon* @Foo, i32 0, i32 0), i32 0, i32 %6
%7 = load i32, i32* %arrayidx3, align 4		%7 = load i32, i32* %arrayidx3, align 4
ret i32 %7		ret i32 %7
}		}

; /// Different objects, swapped induction, alias at the beginning		; /// Different objects, swapped induction, assuming no alias
; int mayAlias02 (int a) {		; int mayAlias02 (int a) {
; int i;		; int i;
; for (i=0; i<SIZE; i++)		; for (i=0; i<SIZE; i++)
; Foo.A[SIZE-i-1] = Foo.B[i] + a;		; Foo.A[SIZE-i-1] = Foo.B[i] + a;
; return Foo.A[a];		; return Foo.A[a];
; }		; }
; CHECK-LABEL: define i32 @mayAlias02(		; CHECK-LABEL: define i32 @mayAlias02(
; CHECK-NOT: add nsw <4 x i32>		; CHECK: add nsw <4 x i32>
; CHECK: ret		; CHECK: ret

define i32 @mayAlias02(i32 %a) nounwind {		define i32 @mayAlias02(i32 %a) nounwind {
entry:		entry:
%a.addr = alloca i32, align 4		%a.addr = alloca i32, align 4
%i = alloca i32, align 4		%i = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4		store i32 %a, i32* %a.addr, align 4
store i32 0, i32* %i, align 4		store i32 0, i32* %i, align 4
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines

; int mustAlias01 (int a) {		; int mustAlias01 (int a) {
; int i;		; int i;
; for (i=0; i<SIZE; i++)		; for (i=0; i<SIZE; i++)
; Foo.A[i+10] = Foo.B[SIZE-i-1] + a;		; Foo.A[i+10] = Foo.B[SIZE-i-1] + a;
; return Foo.A[a];		; return Foo.A[a];
; }		; }
; CHECK-LABEL: define i32 @mustAlias01(		; CHECK-LABEL: define i32 @mustAlias01(
; CHECK-NOT: add nsw <4 x i32>		; CHECK: add nsw <4 x i32>
; CHECK: ret		; CHECK: ret

define i32 @mustAlias01(i32 %a) nounwind {		define i32 @mustAlias01(i32 %a) nounwind {
entry:		entry:
%a.addr = alloca i32, align 4		%a.addr = alloca i32, align 4
%i = alloca i32, align 4		%i = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4		store i32 %a, i32* %a.addr, align 4
store i32 0, i32* %i, align 4		store i32 0, i32* %i, align 4
Show All 33 Lines

; int mustAlias02 (int a) {		; int mustAlias02 (int a) {
; int i;		; int i;
; for (i=0; i<SIZE; i++)		; for (i=0; i<SIZE; i++)
; Foo.A[i] = Foo.B[SIZE-i-10] + a;		; Foo.A[i] = Foo.B[SIZE-i-10] + a;
; return Foo.A[a];		; return Foo.A[a];
; }		; }
; CHECK-LABEL: define i32 @mustAlias02(		; CHECK-LABEL: define i32 @mustAlias02(
; CHECK-NOT: add nsw <4 x i32>		; CHECK: add nsw <4 x i32>
; CHECK: ret		; CHECK: ret

define i32 @mustAlias02(i32 %a) nounwind {		define i32 @mustAlias02(i32 %a) nounwind {
entry:		entry:
%a.addr = alloca i32, align 4		%a.addr = alloca i32, align 4
%i = alloca i32, align 4		%i = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4		store i32 %a, i32* %a.addr, align 4
store i32 0, i32* %i, align 4		store i32 0, i32* %i, align 4
Show All 32 Lines

; int mustAlias03 (int a) {		; int mustAlias03 (int a) {
; int i;		; int i;
; for (i=0; i<SIZE; i++)		; for (i=0; i<SIZE; i++)
; Foo.A[i+10] = Foo.B[SIZE-i-10] + a;		; Foo.A[i+10] = Foo.B[SIZE-i-10] + a;
; return Foo.A[a];		; return Foo.A[a];
; }		; }
; CHECK-LABEL: define i32 @mustAlias03(		; CHECK-LABEL: define i32 @mustAlias03(
; CHECK-NOT: add nsw <4 x i32>		; CHECK: add nsw <4 x i32>
; CHECK: ret		; CHECK: ret

define i32 @mustAlias03(i32 %a) nounwind {		define i32 @mustAlias03(i32 %a) nounwind {
entry:		entry:
%a.addr = alloca i32, align 4		%a.addr = alloca i32, align 4
%i = alloca i32, align 4		%i = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4		store i32 %a, i32* %a.addr, align 4
store i32 0, i32* %i, align 4		store i32 0, i32* %i, align 4
Show All 33 Lines