Download Raw Diff

Details

Reviewers

rsmith
jdoerfert
jeroen.dobbelaere
efriedma
nikic

Summary

This can be used to opt out of TBAA for particular pointers without
breaking out the big hammer of asm volatile("" : : "r"(ptr) : "memory").
The underlying motivation is to allow the implementation of C++23's
start_lifetime_as.

We name this a TBAA-fence instead of something more generic to make it
clear that other alias analyses are allowed to see through the fence.
Additionally, the intrinsic returns a "safe-to-use" pointer rather than
a semantic guarantee of being a completely opaque ArgMemOnly intrinsic.
This enables some optimization opportunities: it lets us track uses more
precisely, so we could (e.g.) strip TBAA metadata from associated loads
and stores to enable store-to-load forwarding where beneficial. More
immediately, we don't need the stronger guarantees for
start_lifetime_as, so there's no reason to provide it out of the gate.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

davidtgoldblatt created this revision.Mar 27 2023, 8:33 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2023, 8:33 PM

Herald added subscribers: jeroen.dobbelaere, kosarev, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B222158: Diff 508879.Mar 27 2023, 9:39 PM

cor3ntin added a subscriber: cor3ntin.Mar 28 2023, 6:30 AM

Reword.

davidtgoldblatt edited the summary of this revision. (Show Details)Mar 28 2023, 10:28 AM

davidtgoldblatt added a child revision: D147021: [Builtins] Add __builtin_implicit_object_fence..

davidtgoldblatt added a reviewer: rsmith.Mar 28 2023, 10:30 AM

davidtgoldblatt added a reviewer: nikic.

Herald added a subscriber: StephenFan. · View Herald TranscriptMar 28 2023, 10:31 AM

davidtgoldblatt published this revision for review.Mar 28 2023, 10:32 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 10:32 AM

Herald added subscribers: llvm-commits, jdoerfert. · View Herald Transcript

Harbormaster completed remote builds in B222293: Diff 509068.Mar 28 2023, 6:05 PM

Could you please add a LangRef entry for the new intrinsic? I don't really get what it does / how it is supposed to interact with TBAA.

What exactly are the wanted semantics for such a fence ? And how would it be used ?
Once a pointer is stored into memory, you loose the fence dependency and tbaa does not look into that.

Assume following code:

int bar (int *p, short *q) {
  *p =42;
  *q =99;
  return *p;
}
int foo(int * p) {
   short * q = (short *)tbaa_fence(p);
   return bar(p, q);
}

This will still return 42

Would we want to use this intrinsic as part of the code generation for C++ new expressions? See https://github.com/llvm/llvm-project/issues/54878 .

Add LangRef entry.

Added a LangRef entry that I hope clarifies things.

In D147020#4232988, @jeroen.dobbelaere wrote:
What exactly are the wanted semantics for such a fence ? And how would it be used ?
Once a pointer is stored into memory, you loose the fence dependency and tbaa does not look into that.

Assume following code:
int bar (int *p, short *q) {
  *p =42;
  *q =99;
  return *p;
}
int foo(int * p) {
   short * q = (short *)tbaa_fence(p);
   return bar(p, q);
}
This will still return 42

The fence only applies between loads/stores before the fence and those based on its result (that is; in the newly written-down semantics that only existed in my head before now. Sorry, should have had documentation to start with). So, the optimization you're describing still being allowed matches those semantics. This is fine for the goal of letting us implement start_lifetime_as -- there's still C++-level UB when bar tries to read a short from an address that contains an int.

Would we want to use this intrinsic as part of the code generation for C++ new expressions? See https://github.com/llvm/llvm-project/issues/54878 .

Hmm, I think this is a plausible fix but I'm not sufficiently confident in my understanding of all the subtleties of LICM to say for sure.

Harbormaster completed remote builds in B222849: Diff 509816.Mar 30 2023, 5:38 PM

(I don't have the necessary C++ background to review this.)

TBAA typically only looks at the metadata to make an alias decision. When adding the need to follow the pointer path, it is easy to proof that there is a llvm.,tbaa.fence in the pointer path. It is hard to proof that there is not. A fence dependency could be hidden by a store/load of the pointer into memory. Make this requirement too hard might make TBAA analysis useless. Making it not hard enough, might make the fence useless.

This doesn't necessitate any changes to TBAA (i.e. it doesn't have to try to identify the "source" of a pointer to know if it comes from a fence or not). Instead, what prevents incorrect behavior is that, since llvm.tbaa.fence's memory effects are opaque (well, aside from being ArgMemOnly), the subsequent operation can't be reordered or eliminated. In this case, TBAA will still report NoAlias if asked about the two operations; but that result doesn't matter because you can't optimize based on that fact with a fence in between them (since the fence could, as far as the optimizer is concerned, do arbitrary loads and stores to the region of memory).

This intrinsic would work just as well for these purposes if it returned void and we got rid of the constraints about the subsequent accesses happening via the result of the call (and was truly just an opaque memory fence). I'd be happy to make that change if it'd be less confusing.

The only thing we get from having this return a value is the option to implement some optimizations later on; for example, in certain cases we can turn "store, fence, load" into a bitcast, letting us get rid of the fence's effects entirely.

Rewrite the LangRef entry. This makes the operation of the fence clearer (it doesn't really change the behavior of TBAA at all; instead it just prevents any other optimization pass from being able to make use of TBAA for the given pointer).

I'm also happy to just strengthen this to a pointer-specific fence (that returns void), that acts otherwise just like asm volatile("" : : "r"(ptr) : "memory") if that would make this more palatable. The only optimization I have in mind for the pointer-returning version is something to allow folding "store/fence/load" to a bitcast, which I don't think is important enough to be worth spending a lot of time on.

Harbormaster completed remote builds in B223713: Diff 510981.Apr 4 2023, 7:27 PM

I don't think returning the pointer is actually helpful. Passes dealing with memory aliasing will be walking backwards using something like MemorySSA, not pointer def-use lists, so they'll find it anyway if it's relevant.

Maybe the fence should have a size argument, to indicate the size of the object being allocated? (It's currently not clear exactly what bytes are affected.)

I think we do need to settle the question of whether "new" also needs to call this fence... if it does, we need to measure the compile-time/performance impact on existing code, and if it doesn't, we'll likely end up with two overlapping solutions for very similar issues.

llvm/lib/CodeGen/CodeGenPrepare.cpp
2357	This is killing off the fence too early; we have to keep the fence around as long as we have TBAA data, and that still exists at least though ISel. (Not sure off the top of my head if we encode TBAA into MachineInstrs.)

Just return void instead of an opaque pointer, take a size argument, drop early CodeGenPrepare lowering.

I think we do need to settle the question of whether "new" also needs to call this fence... if it does, we need to measure the compile-time/performance impact on existing code, and if it doesn't, we'll likely end up with two overlapping solutions for very similar issues.

I'm happy to run some tests; any particular benchmark you'd like to see? llvm-test-suite doesn't really use placement-new (which I think is the only place where this would be needed?).

I think that two similar-ish solutions may actually be correct; there's a real semantic difference: after a placement new, any prior stores to the memory is dead, while after a tbaa_fence, they're not. So this will necessarily be an over-conservative solution to fix the placement-new/tbaa issues.

Harbormaster completed remote builds in B227846: Diff 516571.Apr 24 2023, 5:06 PM

For perf, maybe bootstrap LLVM? I think LLVM itself uses placement new in a number of places...

I hadn't really considered the memory being dead after placement new, but the only optimization that would really open up is dead store elimination, I think? Not sure how important that is; we currently don't do that optimization anyway.

I think @hubert.reinterpretcast may have opinions on is PR.

Herald added a subscriber: wenlei. · View Herald TranscriptMay 8 2023, 10:07 AM

Diff 508879

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 690 Lines • ▼ Show 20 Lines	InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
case Intrinsic::coro_end:		case Intrinsic::coro_end:
case Intrinsic::coro_frame:		case Intrinsic::coro_frame:
case Intrinsic::coro_size:		case Intrinsic::coro_size:
case Intrinsic::coro_align:		case Intrinsic::coro_align:
case Intrinsic::coro_suspend:		case Intrinsic::coro_suspend:
case Intrinsic::coro_subfn_addr:		case Intrinsic::coro_subfn_addr:
case Intrinsic::threadlocal_address:		case Intrinsic::threadlocal_address:
case Intrinsic::experimental_widenable_condition:		case Intrinsic::experimental_widenable_condition:
		case Intrinsic::tbaa_fence:
// These intrinsics don't actually represent code after lowering.		// These intrinsics don't actually represent code after lowering.
return 0;		return 0;
}		}
return 1;		return 1;
}		}

InstructionCost getCallInstrCost(Function F, Type RetTy,		InstructionCost getCallInstrCost(Function F, Type RetTy,
ArrayRef<Type *> Tys,		ArrayRef<Type *> Tys,
▲ Show 20 Lines • Show All 649 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 1,227 Lines • ▼ Show 20 Lines	def int_launder_invariant_group : DefaultAttrsIntrinsic<[llvm_anyptr_ty],
[LLVMMatchType<0>],		[LLVMMatchType<0>],
[IntrInaccessibleMemOnly, IntrSpeculatable, IntrWillReturn]>;		[IntrInaccessibleMemOnly, IntrSpeculatable, IntrWillReturn]>;


def int_strip_invariant_group : DefaultAttrsIntrinsic<[llvm_anyptr_ty],		def int_strip_invariant_group : DefaultAttrsIntrinsic<[llvm_anyptr_ty],
[LLVMMatchType<0>],		[LLVMMatchType<0>],
[IntrSpeculatable, IntrNoMem, IntrWillReturn]>;		[IntrSpeculatable, IntrNoMem, IntrWillReturn]>;

		def int_tbaa_fence : DefaultAttrsIntrinsic<[llvm_ptr_ty],
		[llvm_ptr_ty],
		[IntrArgMemOnly, IntrWillReturn,
		NoCapture<ArgIndex<0>>]>;

//===------------------------ Stackmap Intrinsics -------------------------===//		//===------------------------ Stackmap Intrinsics -------------------------===//
//		//
def int_experimental_stackmap : DefaultAttrsIntrinsic<[],		def int_experimental_stackmap : DefaultAttrsIntrinsic<[],
[llvm_i64_ty, llvm_i32_ty, llvm_vararg_ty],		[llvm_i64_ty, llvm_i32_ty, llvm_vararg_ty],
[Throws]>;		[Throws]>;
def int_experimental_patchpoint_void : Intrinsic<[],		def int_experimental_patchpoint_void : Intrinsic<[],
[llvm_i64_ty, llvm_i32_ty,		[llvm_i64_ty, llvm_i32_ty,
llvm_ptr_ty, llvm_i32_ty,		llvm_ptr_ty, llvm_i32_ty,
▲ Show 20 Lines • Show All 965 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,348 Lines • ▼ Show 20 Lines	case Intrinsic::fshr:
return optimizeFunnelShift(II);		return optimizeFunnelShift(II);
case Intrinsic::dbg_assign:		case Intrinsic::dbg_assign:
case Intrinsic::dbg_value:		case Intrinsic::dbg_value:
return fixupDbgValue(II);		return fixupDbgValue(II);
case Intrinsic::masked_gather:		case Intrinsic::masked_gather:
return optimizeGatherScatterInst(II, II->getArgOperand(0));		return optimizeGatherScatterInst(II, II->getArgOperand(0));
case Intrinsic::masked_scatter:		case Intrinsic::masked_scatter:
return optimizeGatherScatterInst(II, II->getArgOperand(1));		return optimizeGatherScatterInst(II, II->getArgOperand(1));
		case Intrinsic::tbaa_fence: {
		efriedmaUnsubmitted Not Done Reply Inline Actions This is killing off the fence too early; we have to keep the fence around as long as we have TBAA data, and that still exists at least though ISel. (Not sure off the top of my head if we encode TBAA into MachineInstrs.) efriedma: This is killing off the fence too early; we have to keep the fence around as long as we have…
		Value *ArgVal = II->getArgOperand(0);
		replaceAllUsesWith(II, ArgVal, FreshBBs, IsHugeFunc);
		II->eraseFromParent();
		return true;
		}
}		}

SmallVector<Value *, 2> PtrOps;		SmallVector<Value *, 2> PtrOps;
Type *AccessTy;		Type *AccessTy;
if (TLI->getAddrModeArguments(II, PtrOps, AccessTy))		if (TLI->getAddrModeArguments(II, PtrOps, AccessTy))
while (!PtrOps.empty()) {		while (!PtrOps.empty()) {
Value *PtrVal = PtrOps.pop_back_val();		Value *PtrVal = PtrOps.pop_back_val();
unsigned AS = PtrVal->getType()->getPointerAddressSpace();		unsigned AS = PtrVal->getType()->getPointerAddressSpace();
▲ Show 20 Lines • Show All 6,195 Lines • Show Last 20 Lines

llvm/lib/CodeGen/IntrinsicLowering.cpp

Show First 20 Lines • Show All 438 Lines • ▼ Show 20 Lines	void IntrinsicLowering::LowerIntrinsicCall(CallInst *CI) {
case Intrinsic::lifetime_start:		case Intrinsic::lifetime_start:
// Discard region information.		// Discard region information.
CI->replaceAllUsesWith(UndefValue::get(CI->getType()));		CI->replaceAllUsesWith(UndefValue::get(CI->getType()));
break;		break;
case Intrinsic::invariant_end:		case Intrinsic::invariant_end:
case Intrinsic::lifetime_end:		case Intrinsic::lifetime_end:
// Discard region information.		// Discard region information.
break;		break;
		case Intrinsic::tbaa_fence:
		CI->replaceAllUsesWith(CI->getArgOperand(0));
}		}

assert(CI->use_empty() &&		assert(CI->use_empty() &&
"Lowering should have eliminated any uses of the intrinsic call!");		"Lowering should have eliminated any uses of the intrinsic call!");
CI->eraseFromParent();		CI->eraseFromParent();
}		}

bool IntrinsicLowering::LowerToByteSwap(CallInst *CI) {		bool IntrinsicLowering::LowerToByteSwap(CallInst *CI) {
Show All 20 Lines

llvm/lib/CodeGen/SelectionDAG/FastISel.cpp

Show First 20 Lines • Show All 1,346 Lines • ▼ Show 20 Lines	bool FastISel::selectIntrinsicCall(const IntrinsicInst *II) {
case Intrinsic::objectsize:		case Intrinsic::objectsize:
llvm_unreachable("llvm.objectsize.* should have been lowered already");		llvm_unreachable("llvm.objectsize.* should have been lowered already");

case Intrinsic::is_constant:		case Intrinsic::is_constant:
llvm_unreachable("llvm.is.constant.* should have been lowered already");		llvm_unreachable("llvm.is.constant.* should have been lowered already");

case Intrinsic::launder_invariant_group:		case Intrinsic::launder_invariant_group:
case Intrinsic::strip_invariant_group:		case Intrinsic::strip_invariant_group:
case Intrinsic::expect: {		case Intrinsic::expect:
		case Intrinsic::tbaa_fence: {
Register ResultReg = getRegForValue(II->getArgOperand(0));		Register ResultReg = getRegForValue(II->getArgOperand(0));
if (!ResultReg)		if (!ResultReg)
return false;		return false;
updateValueMap(II, ResultReg);		updateValueMap(II, ResultReg);
return true;		return true;
}		}
case Intrinsic::experimental_stackmap:		case Intrinsic::experimental_stackmap:
return selectStackmap(II);		return selectStackmap(II);
▲ Show 20 Lines • Show All 998 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,771 Lines • ▼ Show 20 Lines	#include "llvm/IR/VPIntrinsics.def"

case Intrinsic::is_constant:		case Intrinsic::is_constant:
llvm_unreachable("llvm.is.constant.* should have been lowered already");		llvm_unreachable("llvm.is.constant.* should have been lowered already");

case Intrinsic::annotation:		case Intrinsic::annotation:
case Intrinsic::ptr_annotation:		case Intrinsic::ptr_annotation:
case Intrinsic::launder_invariant_group:		case Intrinsic::launder_invariant_group:
case Intrinsic::strip_invariant_group:		case Intrinsic::strip_invariant_group:
		case Intrinsic::tbaa_fence:
// Drop the intrinsic, but forward the value		// Drop the intrinsic, but forward the value
setValue(&I, getValue(I.getOperand(0)));		setValue(&I, getValue(I.getOperand(0)));
return;		return;

case Intrinsic::assume:		case Intrinsic::assume:
case Intrinsic::experimental_noalias_scope_decl:		case Intrinsic::experimental_noalias_scope_decl:
case Intrinsic::var_annotation:		case Intrinsic::var_annotation:
case Intrinsic::sideeffect:		case Intrinsic::sideeffect:
▲ Show 20 Lines • Show All 4,995 Lines • Show Last 20 Lines

llvm/test/Analysis/TypeBasedAliasAnalysis/tbaa-fence.ll

This file was added.

				; RUN: opt < %s -aa-pipeline=tbaa -passes=aa-eval -evaluate-aa-metadata \
				; RUN: -print-no-aliases -print-modref -disable-output 2>&1 \| FileCheck %s

				declare ptr @llvm.tbaa.fence(ptr)

				define void @simple(ptr %p) {
				entry:
				; CHECK-LABEL: simple
				; CHECK: NoAlias: %x = load i8, ptr %fenced, align 1, !tbaa !3 <-> store i8 1, ptr %p, align 1, !tbaa !0
				; CHECK: Both ModRef: Ptr: i8* %p <-> %fenced = call ptr @llvm.tbaa.fence(ptr %p)
				; CHECK: Both ModRef: Ptr: i8* %fenced <-> %fenced = call ptr @llvm.tbaa.fence(ptr %p)
				store i8 1, ptr %p, !tbaa !3
				%fenced = call ptr @llvm.tbaa.fence(ptr %p)
				%x = load i8, ptr %fenced, !tbaa !4
				ret void
				}

				!0 = !{!"root"}
				!1 = !{!"type1", !0, i64 0}
				!2 = !{!"type2", !0, i64 0}
				!3 = !{!1, !1, i64 0, i64 1}
				!4 = !{!2, !2, i64 0, i64 1}

llvm/test/CodeGen/Generic/tbaa-fence.ll

This file was added.

				; RUN: llc < %s

				declare ptr @llvm.tbaa.fence(ptr)

				define ptr @call_fence(ptr %p) {
				%ret = call ptr @llvm.tbaa.fence(ptr %p)
				ret ptr %ret
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AA] Add a tbaa-fence intrinsic.
Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 508879

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/lib/CodeGen/IntrinsicLowering.cpp

llvm/lib/CodeGen/SelectionDAG/FastISel.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/test/Analysis/TypeBasedAliasAnalysis/tbaa-fence.ll

llvm/test/CodeGen/Generic/tbaa-fence.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AA] Add a tbaa-fence intrinsic.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 508879

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/lib/CodeGen/IntrinsicLowering.cpp

llvm/lib/CodeGen/SelectionDAG/FastISel.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/test/Analysis/TypeBasedAliasAnalysis/tbaa-fence.ll

llvm/test/CodeGen/Generic/tbaa-fence.ll

[AA] Add a tbaa-fence intrinsic.
Needs ReviewPublic