This is an archive of the discontinued LLVM Phabricator instance.

[AliasAnalysis] Mark fences as not Mod'ing unescaped local values
Needs ReviewPublic

Authored by aeubanks on Jun 27 2023, 5:40 PM.

Download Raw Diff

Details

Reviewers

nikic
reames
efriedma
hboehm

Summary

Addresses a regression reported in D145210.

Unescaped local values cannot be Mod'd by fences since other threads
either cannot know the address (unescaped alloca) or it would be UB
(noalias).

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	3,800 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases/Linux::auto_memory_profile_test.cpp
	40 ms	x64 debian > LLVM.Transforms/EarlyCSE::fence.ll

Event Timeline

aeubanks created this revision.Jun 27 2023, 5:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2023, 5:40 PM

Herald added subscribers: jeroen.dobbelaere, hiraditya. · View Herald Transcript

aeubanks requested review of this revision.Jun 27 2023, 5:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2023, 5:40 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

aeubanks added reviewers: nikic, reames.Jun 27 2023, 5:41 PM

Herald added a subscriber: StephenFan. · View Herald TranscriptJun 27 2023, 5:41 PM

aeubanks mentioned this in D145210: [Pipeline] Adjust PostOrderFunctionAttrs placement in simplification pipeline.Jun 27 2023, 5:41 PM

efriedma added a subscriber: efriedma.Jun 27 2023, 7:15 PM

efriedma added inline comments.

llvm/lib/Analysis/AliasAnalysis.cpp
513–518	The bit about the ModRef mask doesn't make sense in the rewritten comment.
518	"before" in isNotCapturedBeforeOrAt is not happens-before... so if the value is captured after the fence, operations could appear to happen on it before the fence. That might mean reordering read operations across the fence isn't actually legal? Not sure.

Harbormaster completed remote builds in B241661: Diff 535207.Jun 27 2023, 7:48 PM

nikic added a reviewer: efriedma.Jun 28 2023, 12:15 AM

nikic added inline comments.

llvm/lib/Analysis/AliasAnalysis.cpp
517	isNotCapturedBeforeOrAt() implies isIdentifiedFunctionLocal().

address comments

perhaps @hboehm can comment on the legality of this?

Harbormaster completed remote builds in B241825: Diff 535442.Jun 28 2023, 10:59 AM

Unfortunately, I'm not an LLVM expert, so let me try to answer in terms of
C++ semantics.
We can certainly have

atomic<int*> y(nullptr);

Thread 1:
{

x = 17;
atomic_thread_fence(memory_order::release);  // Address of x not yet

taken.

y.store(&x, memory_order::relaxed);

}

Thread 2:
int* tmp = y.load(memory_order::acquire);
if (tmp != nullptr) assert(*tmp == 17);

And that assertion should not fail. Without the fence, it may fail, so the
fence orders the store to x. And I think this is not an especially weird
use case.

Does that answer the question? Or did I misinterpret it?

Hans

The relevant scenario would be a bit more complicated, I think, since we're returning that the fence is "ModRefInfo::Ref". Basically, that means non-atomic loads can move across the fence, but stores and atomic ops can't move across the fence. But consider transforming something like the following:

atomic<int*> y(nullptr);

// Thread 1:
x = 17;
z = x;
atomic_thread_fence(memory_order::release);  // Address of x not yet taken.
y.store(&x, memory_order::relaxed);
assert(z == 17);

// Thread 2:
int* tmp = y.load(memory_order::acquire);
if (tmp != nullptr) *tmp += 100;

To something like:

atomic<int*> y(nullptr);

// Thread 1:
x = 17;
atomic_thread_fence(memory_order::release);  // Address of x not yet taken.
z = x;
y.store(&x, memory_order::relaxed);
assert(z == 17);

// Thread 2:
int* tmp = y.load(memory_order::acquire);
if (tmp != nullptr) *tmp += 100;

The original is legal C++, but the modified version has undefined behavior, I think.

And actually, I guess this reasoning also applies to hoisting/sinking across arbitrary calls, since any function can contain a fence. So BasicAAResult::getModRefInfo for CallInst also needs to be fixed.

Thanks for the correction. Subject to my limited understanding of LLVM
conventions, that sounds correct to me.

The behavior of release/acquire is completely symmetric with respect to a
store before the release and a matching load after the acquire, vs. a load
before the release, and a store (that must not be seen by the load) after
the acquire. So the whole notion of letting a load move below the release
fence, but not a store, doesn't make sense to me. There are clearly machine
architectures that have fences specific to one or the other, but the C/C++
memory model, quite intentionally, does not.

reverse ping

I've been busy, probably won't get to this for a while

Revision Contents

Path

Size

llvm/

lib/

Analysis/

AliasAnalysis.cpp

11 lines

test/

Transforms/

EarlyCSE/

fence.ll

20 lines

GVN/

fence.ll

21 lines

Diff 535442

llvm/lib/Analysis/AliasAnalysis.cpp

Show First 20 Lines • Show All 504 Lines • ▼ Show 20 Lines	ModRefInfo AAResults::getModRefInfo(const StoreInst *S,

// Otherwise, a store just writes.		// Otherwise, a store just writes.
return ModRefInfo::Mod;		return ModRefInfo::Mod;
}		}

ModRefInfo AAResults::getModRefInfo(const FenceInst *S,		ModRefInfo AAResults::getModRefInfo(const FenceInst *S,
const MemoryLocation &Loc,		const MemoryLocation &Loc,
AAQueryInfo &AAQI) {		AAQueryInfo &AAQI) {
// All we know about a fence instruction is what we get from the ModRef		// If Loc is a constant memory location (given to us via the ModRef mask) or
// mask: if Loc is a constant memory location, the fence definitely could		// cannot have a pointer outside this function mutating it, the fence cannot
// not modify it.		// modify it.
if (Loc.Ptr)		if (Loc.Ptr) {
		if (AAQI.CI->isNotCapturedBeforeOrAt(Loc.Ptr, S))
		nikicUnsubmitted Not Done Reply Inline Actions isNotCapturedBeforeOrAt() implies isIdentifiedFunctionLocal(). nikic: isNotCapturedBeforeOrAt() implies isIdentifiedFunctionLocal().
		return ModRefInfo::Ref;
		efriedmaUnsubmitted Not Done Reply Inline Actions "before" in isNotCapturedBeforeOrAt is not happens-before... so if the value is captured after the fence, operations could appear to happen on it before the fence. That might mean reordering read operations across the fence isn't actually legal? Not sure. efriedma: "before" in isNotCapturedBeforeOrAt is not happens-before... so if the value is captured after…
		efriedmaUnsubmitted Not Done Reply Inline Actions The bit about the ModRef mask doesn't make sense in the rewritten comment. efriedma: The bit about the ModRef mask doesn't make sense in the rewritten comment.
return getModRefInfoMask(Loc);		return getModRefInfoMask(Loc);
		}
return ModRefInfo::ModRef;		return ModRefInfo::ModRef;
}		}

ModRefInfo AAResults::getModRefInfo(const VAArgInst *V,		ModRefInfo AAResults::getModRefInfo(const VAArgInst *V,
const MemoryLocation &Loc,		const MemoryLocation &Loc,
AAQueryInfo &AAQI) {		AAQueryInfo &AAQI) {
if (Loc.Ptr) {		if (Loc.Ptr) {
AliasResult AR = alias(MemoryLocation::get(V), Loc, AAQI, V);		AliasResult AR = alias(MemoryLocation::get(V), Loc, AAQI, V);
▲ Show 20 Lines • Show All 385 Lines • Show Last 20 Lines

llvm/test/Transforms/EarlyCSE/fence.ll

	Show All 38 Lines

	; We can not value forward across an acquire barrier since we might			; We can not value forward across an acquire barrier since we might
	; be syncronizing with another thread storing to the same variable			; be syncronizing with another thread storing to the same variable
	; followed by a release fence. If this thread observed the release			; followed by a release fence. If this thread observed the release
	; had happened, we must present a consistent view of memory at the			; had happened, we must present a consistent view of memory at the
	; fence. Note that it would be legal to reorder '%a' after the fence			; fence. Note that it would be legal to reorder '%a' after the fence
	; and then remove '%a2'. The current implementation doesn't know how			; and then remove '%a2'. The current implementation doesn't know how
	; to do this, but if it learned, this test will need revised.			; to do this, but if it learned, this test will need revised.
	define i32 @test3(ptr noalias %addr.i, ptr noalias %otheraddr) {			define i32 @test3(ptr %addr.i) {
	; CHECK-LABEL: define i32 @test3			; CHECK-LABEL: define i32 @test3
	; CHECK-SAME: (ptr noalias [[ADDR_I:%.]], ptr noalias [[OTHERADDR:%.]]) {			; CHECK-SAME: (ptr [[ADDR_I:%.*]]) {
	; CHECK-NEXT: [[A:%.*]] = load i32, ptr [[ADDR_I]], align 4			; CHECK-NEXT: [[A:%.*]] = load i32, ptr [[ADDR_I]], align 4
	; CHECK-NEXT: fence acquire			; CHECK-NEXT: fence acquire
	; CHECK-NEXT: [[A2:%.*]] = load i32, ptr [[ADDR_I]], align 4			; CHECK-NEXT: [[A2:%.*]] = load i32, ptr [[ADDR_I]], align 4
	; CHECK-NEXT: [[RES:%.*]] = sub i32 [[A]], [[A2]]			; CHECK-NEXT: [[RES:%.*]] = sub i32 [[A]], [[A2]]
	; CHECK-NEXT: ret i32 [[RES]]			; CHECK-NEXT: ret i32 [[RES]]
	;			;
	%a = load i32, ptr %addr.i, align 4			%a = load i32, ptr %addr.i, align 4
	fence acquire			fence acquire
	%a2 = load i32, ptr %addr.i, align 4			%a2 = load i32, ptr %addr.i, align 4
	%res = sub i32 %a, %a2			%res = sub i32 %a, %a2
	ret i32 %res			ret i32 %res
	}			}

				; This can be optimized because another thread modifying %addr.i would be UB
				; given that we've marked %addr.i as noalias.
				define i32 @test3_noalias(ptr noalias %addr.i) {
				; CHECK-LABEL: define i32 @test3_noalias
				; CHECK-SAME: (ptr noalias [[ADDR_I:%.*]]) {
				; CHECK-NEXT: [[A:%.*]] = load i32, ptr [[ADDR_I]], align 4
				; CHECK-NEXT: fence acquire
				; CHECK-NEXT: ret i32 0
				;
				%a = load i32, ptr %addr.i, align 4
				fence acquire
				%a2 = load i32, ptr %addr.i, align 4
				%res = sub i32 %a, %a2
				ret i32 %res
				}

	; We can not dead store eliminate accross the fence. We could in			; We can not dead store eliminate accross the fence. We could in
	; principal reorder the second store above the fence and then DSE either			; principal reorder the second store above the fence and then DSE either
	; store, but this is beyond the simple last-store DSE which EarlyCSE			; store, but this is beyond the simple last-store DSE which EarlyCSE
	; implements.			; implements.
	define void @test4(ptr %addr.i) {			define void @test4(ptr %addr.i) {
	; CHECK-LABEL: define void @test4			; CHECK-LABEL: define void @test4
	; CHECK-SAME: (ptr [[ADDR_I:%.*]]) {			; CHECK-SAME: (ptr [[ADDR_I:%.*]]) {
	; CHECK-NEXT: store i32 5, ptr [[ADDR_I]], align 4			; CHECK-NEXT: store i32 5, ptr [[ADDR_I]], align 4
	Show All 25 Lines

llvm/test/Transforms/GVN/fence.ll

	Show All 31 Lines
	}			}

	; We can not value forward across an acquire barrier since we might			; We can not value forward across an acquire barrier since we might
	; be syncronizing with another thread storing to the same variable			; be syncronizing with another thread storing to the same variable
	; followed by a release fence. This is not so much enforcing an			; followed by a release fence. This is not so much enforcing an
	; ordering property (though it is that too), but a liveness			; ordering property (though it is that too), but a liveness
	; property. We expect to eventually see the value of store by			; property. We expect to eventually see the value of store by
	; another thread when spinning on that location.			; another thread when spinning on that location.
	define i32 @test3(ptr noalias %addr.i, ptr noalias %otheraddr) {			define i32 @test3(ptr %addr.i) {
	; CHECK-LABEL: define i32 @test3			; CHECK-LABEL: define i32 @test3
	; CHECK-SAME: (ptr noalias [[ADDR_I:%.]], ptr noalias [[OTHERADDR:%.]]) {			; CHECK-SAME: (ptr [[ADDR_I:%.*]]) {
	; CHECK-NEXT: fence acquire			; CHECK-NEXT: fence acquire
	; CHECK-NEXT: [[A:%.*]] = load i32, ptr [[ADDR_I]], align 4			; CHECK-NEXT: [[A:%.*]] = load i32, ptr [[ADDR_I]], align 4
	; CHECK-NEXT: fence acquire			; CHECK-NEXT: fence acquire
	; CHECK-NEXT: [[A2:%.*]] = load i32, ptr [[ADDR_I]], align 4			; CHECK-NEXT: [[A2:%.*]] = load i32, ptr [[ADDR_I]], align 4
	; CHECK-NEXT: [[RES:%.*]] = sub i32 [[A]], [[A2]]			; CHECK-NEXT: [[RES:%.*]] = sub i32 [[A]], [[A2]]
	; CHECK-NEXT: ret i32 [[RES]]			; CHECK-NEXT: ret i32 [[RES]]
	;			;
	; the following code is intented to model the unrolling of			; the following code is intented to model the unrolling of
	; two iterations in a spin loop of the form:			; two iterations in a spin loop of the form:
	; do { fence acquire: tmp = *%addr.i; ) while (!tmp);			; do { fence acquire: tmp = *%addr.i; ) while (!tmp);
	; It's hopefully clear that allowing PRE to turn this into:			; It's hopefully clear that allowing PRE to turn this into:
	; if (!*%addr.i) while(true) {} would be unfortunate			; if (!*%addr.i) while(true) {} would be unfortunate
	fence acquire			fence acquire
	%a = load i32, ptr %addr.i, align 4			%a = load i32, ptr %addr.i, align 4
	fence acquire			fence acquire
	%a2 = load i32, ptr %addr.i, align 4			%a2 = load i32, ptr %addr.i, align 4
	%res = sub i32 %a, %a2			%res = sub i32 %a, %a2
	ret i32 %res			ret i32 %res
	}			}

				; As opposed to @test3, this can be optimized away because another thread
				; modifying %addr.i would be UB given that we've marked %addr.i as noalias.
				define i32 @test3_noalias(ptr noalias %addr.i) {
				; CHECK-LABEL: define i32 @test3_noalias
				; CHECK-SAME: (ptr noalias [[ADDR_I:%.*]]) {
				; CHECK-NEXT: fence acquire
				; CHECK-NEXT: fence acquire
				; CHECK-NEXT: ret i32 0
				;
				fence acquire
				%a = load i32, ptr %addr.i, align 4
				fence acquire
				%a2 = load i32, ptr %addr.i, align 4
				%res = sub i32 %a, %a2
				ret i32 %res
				}

	; We can forward the value forward the load			; We can forward the value forward the load
	; across both the fences, because the load is from			; across both the fences, because the load is from
	; a constant memory location.			; a constant memory location.
	define i32 @test4(ptr %addr) {			define i32 @test4(ptr %addr) {
	; CHECK-LABEL: define i32 @test4			; CHECK-LABEL: define i32 @test4
	; CHECK-SAME: (ptr [[ADDR:%.*]]) {			; CHECK-SAME: (ptr [[ADDR:%.*]]) {
	; CHECK-NEXT: fence release			; CHECK-NEXT: fence release
	; CHECK-NEXT: store i32 42, ptr [[ADDR]], align 8			; CHECK-NEXT: store i32 42, ptr [[ADDR]], align 8
	Show All 26 Lines