This is an archive of the discontinued LLVM Phabricator instance.

Fix EarlyCSE to intersect aliasing metadata.
Needs ReviewPublic

Authored by sheredom on Jan 24 2020, 3:52 AM.

Download Raw Diff

Details

Reviewers

hfinkel
jeroen.dobbelaere

Summary

Further to my post on the mailing list, EarlyCSE was not merging the aliasing information of two loads/stores that it wanted to fold, meaning that it was random (program order) which one survived. This meant that aliasing information would be sometimes propagated from inlined-restrict functions, when it should not be.

Diff Detail

Event Timeline

sheredom created this revision.Jan 24 2020, 3:52 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald TranscriptJan 24 2020, 3:52 AM

I'm not familiar with the code of this pass, but is there a cheap way of identifying that the two operations are in the same basic block?
If so, you could take the intersection of the aliasing information rather than the union. Because if both ops are guaranteed to execute then the tightest aliasing still has to hold.
(being in the same BB doesn't imply that both instructions are executed, but there's code in ValueTracking perhaps that can check that)

Can you explain in more detail why the current behavior is incorrect? Why is using the aliasing information in program order wrong? If the first load is noalias, then we should be able to deduplicate to that noalias load. If the second load is noalias, then generally we can't assume the first load is also noalias. Unless there is a must-exec relation between them, as @nlopes mentioned.

The patch looks ok to me. Some extra testcases to ensure that common !noalias / !alias.scope metadata is kept could be useful ?

In D73342#1838953, @nikic wrote:

Can you explain in more detail why the current behavior is incorrect? Why is using the aliasing information in program order wrong? If the first load is noalias, then we should be able to deduplicate to that noalias load. If the second load is noalias, then generally we can't assume the first load is also noalias. Unless there is a must-exec relation between them, as @nlopes mentioned.

Hal Finkel provided following example on the mailing list:

int * restrict r = a;
...
int x = noaliasing ? *r : *a;

This shows that the intersection of '*r' and '*a' must be taken.

In D73342#1838953, @nikic wrote:

Can you explain in more detail why the current behavior is incorrect? Why is using the aliasing information in program order wrong? If the first load is noalias, then we should be able to deduplicate to that noalias load. If the second load is noalias, then generally we can't assume the first load is also noalias. Unless there is a must-exec relation between them, as @nlopes mentioned.

We have two issues here;

For any pass, if it changes any instruction with metadata that it does not understand, it should drop that metadata. That's the contract that we need to maintain.
For this AA metadata in particular, we need to intersect the metadata, and this is true to preserve semantics for almost all other known metadata (AA, profiling, fp-precision, debug) although likely for it is not correct to keep only one, or the other, because of potential dynamic dependencies on the results.

For AA, as an example, we can have something like this:

int * restrict r = a;
int x = should_alias ? *a : *r;

(essentially the same is true using casts as TBAA).

In any case, we should be calling the function llvm::combineMetadataForCSE here. Please update the patch to do that.

In D73342#1838974, @jeroen.dobbelaere wrote:
Hal Finkel provided following example on the mailing list:
int * restrict r = a;
...
int x = noaliasing ? *r : *a;
This shows that the intersection of '*r' and '*a' must be taken.

So, I guess that the reasoning here is that noalias loads are speculatable and noalias violation is thus poison (not immediate UB), so we can't make inferences from one load to the other. Does that capture it?

In D73342#1839006, @nikic wrote:
In D73342#1838974, @jeroen.dobbelaere wrote:
Hal Finkel provided following example on the mailing list:
int * restrict r = a;
...
int x = noaliasing ? *r : *a;
This shows that the intersection of '*r' and '*a' must be taken.
So, I guess that the reasoning here is that noalias loads are speculatable and noalias violation is thus poison (not immediate UB), so we can't make inferences from one load to the other. Does that capture it?

Yes. They don't need to be speculatable, but they certainly could be.

In D73342#1839002, @hfinkel wrote:
In D73342#1838953, @nikic wrote:

Can you explain in more detail why the current behavior is incorrect? Why is using the aliasing information in program order wrong? If the first load is noalias, then we should be able to deduplicate to that noalias load. If the second load is noalias, then generally we can't assume the first load is also noalias. Unless there is a must-exec relation between them, as @nlopes mentioned.

We have two issues here;

For any pass, if it changes any instruction with metadata that it does not understand, it should drop that metadata. That's the contract that we need to maintain.

For this AA metadata in particular, we need to intersect the metadata, and this is true to preserve semantics for almost all other known metadata (AA, profiling, fp-precision, debug) although likely for it is not correct to keep only one, or the other, because of potential dynamic dependencies on the results.

For AA, as an example, we can have something like this:
int * restrict r = a;
int x = should_alias ? *a : *r;
(essentially the same is true using casts as TBAA).

In any case, we should be calling the function llvm::combineMetadataForCSE here. Please update the patch to do that.

So when I use combineMetadataForCSE it breaks the invariant loads tests because the code there just says 'if one thing is invariant load and the other is not, throw the invariant load metadata away'.

What would be the correct fix for this? I don't know a ton about the intracacies of invariant loads.

In D73342#1841992, @sheredom wrote:
In D73342#1839002, @hfinkel wrote:
In D73342#1838953, @nikic wrote:

Can you explain in more detail why the current behavior is incorrect? Why is using the aliasing information in program order wrong? If the first load is noalias, then we should be able to deduplicate to that noalias load. If the second load is noalias, then generally we can't assume the first load is also noalias. Unless there is a must-exec relation between them, as @nlopes mentioned.

We have two issues here;

For any pass, if it changes any instruction with metadata that it does not understand, it should drop that metadata. That's the contract that we need to maintain.

For this AA metadata in particular, we need to intersect the metadata, and this is true to preserve semantics for almost all other known metadata (AA, profiling, fp-precision, debug) although likely for it is not correct to keep only one, or the other, because of potential dynamic dependencies on the results.

For AA, as an example, we can have something like this:
int * restrict r = a;
int x = should_alias ? *a : *r;
(essentially the same is true using casts as TBAA).

In any case, we should be calling the function llvm::combineMetadataForCSE here. Please update the patch to do that.
So when I use combineMetadataForCSE it breaks the invariant loads tests because the code there just says 'if one thing is invariant load and the other is not, throw the invariant load metadata away'.

What would be the correct fix for this? I don't know a ton about the intracacies of invariant loads.

I believe that invariant loads are a bit special because the invariantness of a load is not allowed to have any control dependencies. The LangRef says, "If a load instruction tagged with the !invariant.load metadata is executed, the optimizer may assume the memory location referenced by the load contains the same value at all points in the program where the memory location is known to be dereferenceable; otherwise, the behavior is undefined." Thus, I think that it's valid for (invariant + nothing) to be (invariant) so long as both are guaranteed to be executed.

I would, however, do this as two separate patches. One which updates EarlyCSE to use combineMetadataForCSE (and XFAILs the test), and a second which proposes to update combineMetadataForCSE and fixes the test (and then we can figure out in that review what to do, because we may need a dominance check or a guaranteed-to-execute check or similar).

Use combineMetadataForCSE to combine the metadata.

@hfinkel ping?

In D73342#1854172, @sheredom wrote:

@hfinkel ping?

@hfinkel ping 2?

hfinkel added inline comments.Feb 18 2020, 8:50 AM

llvm/lib/Transforms/Scalar/EarlyCSE.cpp
1099	No { } around single statement if.
1100	Why is DoesKMove true here? OpInst dominated Inst in the original IR, no?

sheredom marked an inline comment as done.Feb 19 2020, 2:25 AM

sheredom added inline comments.

llvm/lib/Transforms/Scalar/EarlyCSE.cpp
1100	My understanding (albeit naive) of combineMetadataForCSE comes from the fact that at the definition the parameter is called KDominatesJ - which for the load case K does dominate J (whereas for the store case it does not, so I set it false there). The fact the declaration and definition have different terms for the same boolean seems troublesome now I think about it.

hfinkel added inline comments.Feb 28 2020, 7:34 PM

llvm/lib/Transforms/Scalar/EarlyCSE.cpp
1100	The fact the declaration and definition have different terms for the same boolean seems troublesome now I think about it. Yes, and I'm confused by the naming here. The comments in llvm/include/llvm/Transforms/Utils/Local.h say, "Some metadata can only be kept if K dominates J. For this to be correct, K cannot be hoisted." The comment on combineMetadata likewise says, "Some metadata kinds can only be kept if K does not move, meaning it dominated J in the original IR." The parameter is named DoesKMove, implying that when false, K did not originally dominate J. However, in Local.cpp, the parameter to combineMetadataForCSE is named KDominatesJ. This seems backward (i.e., DoesKMove seems like it should be false if K dominates J). In any case, the implementation of llvm::combineMetadata seems to do the more conservative thing when DoesKMove is true (when DoesKMove is false, it will prefer the metadata from K in some cases). Thus, I think that it should be false here.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

EarlyCSE.cpp

6 lines

test/

Transforms/

EarlyCSE/

aliasing.ll

75 lines

invariant-loads.ll

5 lines

Diff 240896

llvm/lib/Transforms/Scalar/EarlyCSE.cpp

Show First 20 Lines • Show All 1,089 Lines • ▼ Show 20 Lines	if (MemInst.isValid() && MemInst.isLoad()) {
<< " to: " << *InVal.DefInst << '\n');		<< " to: " << *InVal.DefInst << '\n');
if (!DebugCounter::shouldExecute(CSECounter)) {		if (!DebugCounter::shouldExecute(CSECounter)) {
LLVM_DEBUG(dbgs() << "Skipping due to debug counter\n");		LLVM_DEBUG(dbgs() << "Skipping due to debug counter\n");
continue;		continue;
}		}
if (!Inst->use_empty())		if (!Inst->use_empty())
Inst->replaceAllUsesWith(Op);		Inst->replaceAllUsesWith(Op);
removeMSSA(Inst);		removeMSSA(Inst);

		if (auto OpInst = dyn_cast<Instruction>(Op)) {
		hfinkelUnsubmitted Not Done Reply Inline Actions No { } around single statement if. hfinkel: No { } around single statement if.
		combineMetadataForCSE(OpInst, Inst, true);
		hfinkelUnsubmitted Not Done Reply Inline Actions Why is DoesKMove true here? OpInst dominated Inst in the original IR, no? hfinkel: Why is DoesKMove true here? OpInst dominated Inst in the original IR, no?
		sheredomAuthorUnsubmitted Done Reply Inline Actions My understanding (albeit naive) of combineMetadataForCSE comes from the fact that at the definition the parameter is called KDominatesJ - which for the load case K does dominate J (whereas for the store case it does not, so I set it false there). The fact the declaration and definition have different terms for the same boolean seems troublesome now I think about it. sheredom: My understanding (albeit naive) of combineMetadataForCSE comes from the fact that at the…
		hfinkelUnsubmitted Not Done Reply Inline Actions The fact the declaration and definition have different terms for the same boolean seems troublesome now I think about it. Yes, and I'm confused by the naming here. The comments in llvm/include/llvm/Transforms/Utils/Local.h say, "Some metadata can only be kept if K dominates J. For this to be correct, K cannot be hoisted." The comment on combineMetadata likewise says, "Some metadata kinds can only be kept if K does not move, meaning it dominated J in the original IR." The parameter is named DoesKMove, implying that when false, K did not originally dominate J. However, in Local.cpp, the parameter to combineMetadataForCSE is named KDominatesJ. This seems backward (i.e., DoesKMove seems like it should be false if K dominates J). In any case, the implementation of llvm::combineMetadata seems to do the more conservative thing when DoesKMove is true (when DoesKMove is false, it will prefer the metadata from K in some cases). Thus, I think that it should be false here. hfinkel: > The fact the declaration and definition have different terms for the same boolean seems…
		}

Inst->eraseFromParent();		Inst->eraseFromParent();
Changed = true;		Changed = true;
++NumCSELoad;		++NumCSELoad;
continue;		continue;
}		}
}		}

// Otherwise, remember that we have this instruction.		// Otherwise, remember that we have this instruction.
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	if (Inst->mayWriteToMemory()) {
"Violated invariant");		"Violated invariant");
if (LastStoreMemInst.isMatchingMemLoc(MemInst)) {		if (LastStoreMemInst.isMatchingMemLoc(MemInst)) {
LLVM_DEBUG(dbgs() << "EarlyCSE DEAD STORE: " << *LastStore		LLVM_DEBUG(dbgs() << "EarlyCSE DEAD STORE: " << *LastStore
<< " due to: " << *Inst << '\n');		<< " due to: " << *Inst << '\n');
if (!DebugCounter::shouldExecute(CSECounter)) {		if (!DebugCounter::shouldExecute(CSECounter)) {
LLVM_DEBUG(dbgs() << "Skipping due to debug counter\n");		LLVM_DEBUG(dbgs() << "Skipping due to debug counter\n");
} else {		} else {
removeMSSA(LastStore);		removeMSSA(LastStore);
		combineMetadataForCSE(Inst, LastStore, false);
LastStore->eraseFromParent();		LastStore->eraseFromParent();
Changed = true;		Changed = true;
++NumDSE;		++NumDSE;
LastStore = nullptr;		LastStore = nullptr;
}		}
}		}
// fallthrough - we can exploit information about this store		// fallthrough - we can exploit information about this store
}		}
▲ Show 20 Lines • Show All 191 Lines • Show Last 20 Lines

llvm/test/Transforms/EarlyCSE/aliasing.ll

This file was added.

				; RUN: opt < %s -S -early-cse \| FileCheck %s
				; RUN: opt < %s -S -basicaa -early-cse-memssa \| FileCheck %s
				; RUN: opt < %s -S -passes=early-cse \| FileCheck %s

				; CHECK-LABEL: @pre.load.alias.scope(
				define i32 @pre.load.alias.scope(i32* %a) {
				; CHECK-NOT: !alias.scope
				%non_alias_load = load i32, i32* %a, align 4, !alias.scope !0
				%aliased_load = load i32, i32* %a, align 4
				%add = add i32 %non_alias_load, %aliased_load
				ret i32 %add
				}

				; CHECK-LABEL: @post.load.alias.scope(
				define i32 @post.load.alias.scope(i32* %a) {
				; CHECK-NOT: !alias.scope
				%non_alias_load = load i32, i32* %a, align 4
				%aliased_load = load i32, i32* %a, align 4, !alias.scope !0
				%add = add i32 %non_alias_load, %aliased_load
				ret i32 %add
				}

				; CHECK-LABEL: @pre.load.noalias(
				define i32 @pre.load.noalias(i32* %a) {
				; CHECK-NOT: !noalias
				%non_alias_load = load i32, i32* %a, align 4, !noalias !0
				%aliased_load = load i32, i32* %a, align 4
				%add = add i32 %non_alias_load, %aliased_load
				ret i32 %add
				}

				; CHECK-LABEL: @post.load.noalias(
				define i32 @post.load.noalias(i32* %a) {
				; CHECK-NOT: !noalias
				%non_alias_load = load i32, i32* %a, align 4
				%aliased_load = load i32, i32* %a, align 4, !noalias !0
				%add = add i32 %non_alias_load, %aliased_load
				ret i32 %add
				}

				; CHECK-LABEL: @pre.store.alias.scope(
				define void @pre.store.alias.scope(i32* %a) {
				; CHECK-NOT: !alias.scope
				store i32 42, i32* %a, !alias.scope !0
				store i32 13, i32* %a
				ret void
				}

				; CHECK-LABEL: @post.store.alias.scope(
				define void @post.store.alias.scope(i32* %a) {
				; CHECK-NOT: !alias.scope
				store i32 42, i32* %a
				store i32 13, i32* %a, !alias.scope !0
				ret void
				}

				; CHECK-LABEL: @pre.store.noalias(
				define void @pre.store.noalias(i32* %a) {
				; CHECK-NOT: !noalias
				store i32 42, i32* %a, !noalias !0
				store i32 13, i32* %a
				ret void
				}

				; CHECK-LABEL: @post.store.noalias(
				define void @post.store.noalias(i32* %a) {
				; CHECK-NOT: !noalias
				store i32 42, i32* %a
				store i32 13, i32* %a, !noalias !0
				ret void
				}

				!0 = !{!1}
				!1 = distinct !{!1, !2, !"called: argument 0"}
				!2 = distinct !{!2, !"called"}

llvm/test/Transforms/EarlyCSE/invariant-loads.ll

	; RUN: opt -S -early-cse < %s \| FileCheck %s			; RUN: opt -S -early-cse < %s \| FileCheck %s
	; RUN: opt -S -basicaa -early-cse-memssa < %s \| FileCheck %s			; RUN: opt -S -basicaa -early-cse-memssa < %s \| FileCheck %s

				; FIXME: D73342 fixes EarlyCSE to use combineMetadataForCSE,
				; which does not propagate !invariant.load correctly among
				; two loads where only one has !invariant.load.
				; XFAIL: *

	declare void @clobber_and_use(i32)			declare void @clobber_and_use(i32)

	define void @f_0(i32* %ptr) {			define void @f_0(i32* %ptr) {
	; CHECK-LABEL: @f_0(			; CHECK-LABEL: @f_0(
	; CHECK: %val0 = load i32, i32* %ptr, !invariant.load !0			; CHECK: %val0 = load i32, i32* %ptr, !invariant.load !0
	; CHECK: call void @clobber_and_use(i32 %val0)			; CHECK: call void @clobber_and_use(i32 %val0)
	; CHECK: call void @clobber_and_use(i32 %val0)			; CHECK: call void @clobber_and_use(i32 %val0)
	; CHECK: call void @clobber_and_use(i32 %val0)			; CHECK: call void @clobber_and_use(i32 %val0)
	▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines