This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
2/5
LICM.cpp
-
test/Transforms/LICM/
-
Transforms/
-
LICM/
-
scalar-promote.ll

Differential D133486

[LICM] Consider sret as writable object
AbandonedPublic

Authored by nikic on Sep 8 2022, 6:00 AM.

Download Raw Diff

Details

Reviewers

reames
fhahn
asbirlea
efriedma

Summary

LangRef explicitly guarantees that sret memory can be both read and written (which makes sense, given how the whole point of sret is that it will be written to):

This pointer must be guaranteed by the caller to be valid: loads and stores to the structure may be assumed by the callee not to trap and to be properly aligned.

Together with the noalias attribute this makes store promotion on sret memory legal, even if there are no unconditional stores.

Diff Detail

Event Timeline

nikic created this revision.Sep 8 2022, 6:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 8 2022, 6:00 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

nikic requested review of this revision.Sep 8 2022, 6:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 8 2022, 6:00 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B185603: Diff 458712.Sep 8 2022, 6:46 AM

Do we use the sret attribute for optimization anywhere else?

Since sret has ABI implications, I'd prefer to use sret specifically for those ABI implications, and use other attributes for any other semantic meaning, if possible. (Maybe we need some stronger form of "dereferenceable"?)

reames added inline comments.Sep 8 2022, 1:38 PM

llvm/lib/Transforms/Scalar/LICM.cpp
1899	If you want to be a bit more aggressive here, I believe that every dereferenceable argument satisfies this requirement. Note that this is specific to the deref globally semantic, not the defer at point semantic. Since the split on that never got through review, you probably don't want to rely on that. You could directly implement the "at point" semantics here - which would be safe - by using dereferenceability in combination with !Value::canBeFreed. We do have precedent for this in several places already.

In D133486#3777778, @efriedma wrote:

Do we use the sret attribute for optimization anywhere else?

Since sret has ABI implications, I'd prefer to use sret specifically for those ABI implications, and use other attributes for any other semantic meaning, if possible. (Maybe we need some stronger form of "dereferenceable"?)

Hm, yes, a separate attribute would be cleaner. I would probably frame this as a writable attribute with semantics "The underlying object of the parameter must be writable, otherwise the behavior is undefined". I believe that is sufficient, in that it implies that any location that is dereferenceable is also writable. The dereferenceable bytes might be either implied by a dereferenceable attribute, or just from an observed load, which is exactly what we need for scalar promotion.

That said, I'm not sure adding a new attribute is actually worthwhile, as I didn't have a specific use-case in mind for this patch. I guess a nice side effect of having a writable attribute is that it can be applied not only to sret parameters, but also to all &mut parameters in rust (modulo details). As these are also noalias, this would allow scalar store promotion on all &mut parameters without unwinding interference.

llvm/lib/Transforms/Scalar/LICM.cpp
1899	I don't think dereferenceable is sufficient here, because it only guarantees that loads don't trap, it makes no statement about stores. From LangRef: This attribute may only be applied to pointer typed parameters. A pointer that is dereferenceable can be loaded from speculatively without a risk of trapping.

reames added inline comments.Sep 12 2022, 12:05 PM

llvm/lib/Transforms/Scalar/LICM.cpp
1899	I really don't think that's a reasonable reading of the spec. The attribute effects whether memory is dereferenceable (i.e. will not fault). Having dereferenceability depend on the type of access is insane. I read that as being an attempt at explaining the semantics, not a restriction on them. It would disallow existing transformations such as hoisting a store to a probably dereferenceable and thread local location out of an if block. (i.e. flattening as done in simplify-cfg for e.g. results of allocations)

efriedma added inline comments.Sep 12 2022, 12:12 PM

llvm/lib/Transforms/Scalar/LICM.cpp
1899	Read-only memory is commonly available to programs in the form of constant globals in LLVM IR. Stating that constants are dereferenceable doesn't seem that strange?

nikic added inline comments.Sep 12 2022, 1:29 PM

llvm/lib/Transforms/Scalar/LICM.cpp
1899	The dereferenceable attribute as currently used definitely does not imply writability -- e.g. clang itself marks const references (which might point to readonly const globals) as dereferenceable. Other frontends do the same (e.g. rust uses it for non-mut references). The background here is that many optimizations want to speculate loads, and dereferenceable with current semantics is sufficient for that. Very few optimizations want to speculate stores -- I believe this LICM optimization and the SimplifyCFG store merge optimizations may well be the only ones in LLVM. These optimizations go out of their way to prove that the location is both writable and that inserting the write is thread-safe. It would disallow existing transformations such as hoisting a store to a probably dereferenceable and thread local location out of an if block. (i.e. flattening as done in simplify-cfg for e.g. results of allocations) We already don't perform this optimization unless we know the location is also writable. I just checked the code, and it's actually limited to allocas (if there is no guaranteed-to-execute preceding store). We could expand that to the same logic used here in LICM, but it would still need to check writability as a separate predicate.

Moving this off the review queue for now -- maybe I'll get around to adding a writable attribute at some point.

Superseded by D158081, which adds the writable attribute.

Herald added a subscriber: StephenFan. · View Herald TranscriptAug 17 2023, 12:11 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LICM.cpp

3 lines

test/

Transforms/

LICM/

scalar-promote.ll

4 lines

Diff 458712

llvm/lib/Transforms/Scalar/LICM.cpp

	Show First 20 Lines • Show All 1,890 Lines • ▼ Show 20 Lines
	}			}

	bool isWritableObject(const Value *Object) {			bool isWritableObject(const Value *Object) {
	// TODO: Alloca might not be writable after its lifetime ends.			// TODO: Alloca might not be writable after its lifetime ends.
	// See https://github.com/llvm/llvm-project/issues/51838.			// See https://github.com/llvm/llvm-project/issues/51838.
	if (isa<AllocaInst>(Object))			if (isa<AllocaInst>(Object))
	return true;			return true;

	// TODO: Also handle sret.
	if (auto *A = dyn_cast<Argument>(Object))			if (auto *A = dyn_cast<Argument>(Object))
				reamesUnsubmitted Not Done Reply Inline Actions If you want to be a bit more aggressive here, I believe that every dereferenceable argument satisfies this requirement. Note that this is specific to the deref globally semantic, not the defer at point semantic. Since the split on that never got through review, you probably don't want to rely on that. You could directly implement the "at point" semantics here - which would be safe - by using dereferenceability in combination with !Value::canBeFreed. We do have precedent for this in several places already. reames: If you want to be a bit more aggressive here, I believe that every dereferenceable argument…
				nikicAuthorUnsubmitted Done Reply Inline Actions I don't think dereferenceable is sufficient here, because it only guarantees that loads don't trap, it makes no statement about stores. From LangRef: This attribute may only be applied to pointer typed parameters. A pointer that is dereferenceable can be loaded from speculatively without a risk of trapping. nikic: I don't think dereferenceable is sufficient here, because it only guarantees that loads don't…
				reamesUnsubmitted Not Done Reply Inline Actions I really don't think that's a reasonable reading of the spec. The attribute effects whether memory is dereferenceable (i.e. will not fault). Having dereferenceability depend on the type of access is insane. I read that as being an attempt at explaining the semantics, not a restriction on them. It would disallow existing transformations such as hoisting a store to a probably dereferenceable and thread local location out of an if block. (i.e. flattening as done in simplify-cfg for e.g. results of allocations) reames: I really don't think that's a reasonable reading of the spec. The attribute effects whether…
				efriedmaUnsubmitted Not Done Reply Inline Actions Read-only memory is commonly available to programs in the form of constant globals in LLVM IR. Stating that constants are dereferenceable doesn't seem that strange? efriedma: Read-only memory is commonly available to programs in the form of constant globals in LLVM IR.
				nikicAuthorUnsubmitted Done Reply Inline Actions The dereferenceable attribute as currently used definitely does not imply writability -- e.g. clang itself marks const references (which might point to readonly const globals) as dereferenceable. Other frontends do the same (e.g. rust uses it for non-mut references). The background here is that many optimizations want to speculate loads, and dereferenceable with current semantics is sufficient for that. Very few optimizations want to speculate stores -- I believe this LICM optimization and the SimplifyCFG store merge optimizations may well be the only ones in LLVM. These optimizations go out of their way to prove that the location is both writable and that inserting the write is thread-safe. It would disallow existing transformations such as hoisting a store to a probably dereferenceable and thread local location out of an if block. (i.e. flattening as done in simplify-cfg for e.g. results of allocations) We already don't perform this optimization unless we know the location is also writable. I just checked the code, and it's actually limited to allocas (if there is no guaranteed-to-execute preceding store). We could expand that to the same logic used here in LICM, but it would still need to check writability as a separate predicate. nikic: The dereferenceable attribute as currently used definitely does not imply writability -- e.g.
	return A->hasByValAttr();			return A->hasByValAttr() \|\| A->hasStructRetAttr();

	// TODO: Noalias has nothing to do with writability, this should check for			// TODO: Noalias has nothing to do with writability, this should check for
	// an allocator function.			// an allocator function.
	return isNoAliasCall(Object);			return isNoAliasCall(Object);
	}			}

	bool isThreadLocalObject(const Value Object, const Loop L,			bool isThreadLocalObject(const Value Object, const Loop L,
	DominatorTree *DT) {			DominatorTree *DT) {
	▲ Show 20 Lines • Show All 429 Lines • Show Last 20 Lines

llvm/test/Transforms/LICM/scalar-promote.ll

	Show First 20 Lines • Show All 877 Lines • ▼ Show 20 Lines

	loop.1.latch:			loop.1.latch:
	br i1 %c, label %loop.1.header, label %exit			br i1 %c, label %loop.1.header, label %exit

	exit:			exit:
	ret void			ret void
	}			}

	; TODO: The store can be promoted, as sret memory is writable.
	define void @sret_cond_store(i32* sret(i32) noalias %ptr) {			define void @sret_cond_store(i32* sret(i32) noalias %ptr) {
	; CHECK-LABEL: @sret_cond_store(			; CHECK-LABEL: @sret_cond_store(
	; CHECK-NEXT: [[PTR_PROMOTED:%.]] = load i32, i32 [[PTR:%.*]], align 4			; CHECK-NEXT: [[PTR_PROMOTED:%.]] = load i32, i32 [[PTR:%.*]], align 4
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[V_INC1:%.]] = phi i32 [ [[V_INC:%.]], [[LOOP_LATCH:%.]] ], [ [[PTR_PROMOTED]], [[TMP0:%.]] ]			; CHECK-NEXT: [[V_INC1:%.]] = phi i32 [ [[V_INC:%.]], [[LOOP_LATCH:%.]] ], [ [[PTR_PROMOTED]], [[TMP0:%.]] ]
	; CHECK-NEXT: [[C:%.*]] = icmp ult i32 [[V_INC1]], 10			; CHECK-NEXT: [[C:%.*]] = icmp ult i32 [[V_INC1]], 10
	; CHECK-NEXT: br i1 [[C]], label [[LOOP_LATCH]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[C]], label [[LOOP_LATCH]], label [[EXIT:%.*]]
	; CHECK: loop.latch:			; CHECK: loop.latch:
	; CHECK-NEXT: [[V_INC]] = add i32 [[V_INC1]], 1			; CHECK-NEXT: [[V_INC]] = add i32 [[V_INC1]], 1
	; CHECK-NEXT: store i32 [[V_INC]], i32* [[PTR]], align 4
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	; CHECK: exit:			; CHECK: exit:
				; CHECK-NEXT: [[V_INC1_LCSSA:%.*]] = phi i32 [ [[V_INC1]], [[LOOP]] ]
				; CHECK-NEXT: store i32 [[V_INC1_LCSSA]], i32* [[PTR]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	br label %loop			br label %loop

	loop:			loop:
	%v = load i32, i32* %ptr			%v = load i32, i32* %ptr
	%c = icmp ult i32 %v, 10			%c = icmp ult i32 %v, 10
	br i1 %c, label %loop.latch, label %exit			br i1 %c, label %loop.latch, label %exit
	Show All 16 Lines