This is an archive of the discontinued LLVM Phabricator instance.

[AA] Add a tbaa-fence intrinsic.
Needs ReviewPublic

Authored by davidtgoldblatt on Mar 27 2023, 8:33 PM.

Details

Summary

This can be used to opt out of TBAA for particular pointers without
breaking out the big hammer of asm volatile("" : : "r"(ptr) : "memory").
The underlying motivation is to allow the implementation of C++23's
start_lifetime_as.

We name this a TBAA-fence instead of something more generic to make it
clear that other alias analyses are allowed to see through the fence.
Additionally, the intrinsic returns a "safe-to-use" pointer rather than
a semantic guarantee of being a completely opaque ArgMemOnly intrinsic.
This enables some optimization opportunities: it lets us track uses more
precisely, so we could (e.g.) strip TBAA metadata from associated loads
and stores to enable store-to-load forwarding where beneficial. More
immediately, we don't need the stronger guarantees for
start_lifetime_as, so there's no reason to provide it out of the gate.

Diff Detail

Event Timeline

davidtgoldblatt edited the summary of this revision. (Show Details)

Reword.

davidtgoldblatt published this revision for review.Mar 28 2023, 10:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 10:32 AM

Could you please add a LangRef entry for the new intrinsic? I don't really get what it does / how it is supposed to interact with TBAA.

What exactly are the wanted semantics for such a fence ? And how would it be used ?
Once a pointer is stored into memory, you loose the fence dependency and tbaa does not look into that.

Assume following code:

int bar (int *p, short *q) {
  *p =42;
  *q =99;
  return *p;
}
int foo(int * p) {
   short * q = (short *)tbaa_fence(p);
   return bar(p, q);
}

This will still return 42

Would we want to use this intrinsic as part of the code generation for C++ new expressions? See https://github.com/llvm/llvm-project/issues/54878 .

Add LangRef entry.

Added a LangRef entry that I hope clarifies things.

What exactly are the wanted semantics for such a fence ? And how would it be used ?
Once a pointer is stored into memory, you loose the fence dependency and tbaa does not look into that.

Assume following code:

int bar (int *p, short *q) {
  *p =42;
  *q =99;
  return *p;
}
int foo(int * p) {
   short * q = (short *)tbaa_fence(p);
   return bar(p, q);
}

This will still return 42

The fence only applies between loads/stores before the fence and those based on its result (that is; in the newly written-down semantics that only existed in my head before now. Sorry, should have had documentation to start with). So, the optimization you're describing still being allowed matches those semantics. This is fine for the goal of letting us implement start_lifetime_as -- there's still C++-level UB when bar tries to read a short from an address that contains an int.

Would we want to use this intrinsic as part of the code generation for C++ new expressions? See https://github.com/llvm/llvm-project/issues/54878 .

Hmm, I think this is a plausible fix but I'm not sufficiently confident in my understanding of all the subtleties of LICM to say for sure.

nikic resigned from this revision.Mar 31 2023, 1:40 AM

(I don't have the necessary C++ background to review this.)

TBAA typically only looks at the metadata to make an alias decision. When adding the need to follow the pointer path, it is easy to proof that there is a llvm.,tbaa.fence in the pointer path. It is hard to proof that there is not. A fence dependency could be hidden by a store/load of the pointer into memory. Make this requirement too hard might make TBAA analysis useless. Making it not hard enough, might make the fence useless.

This doesn't necessitate any changes to TBAA (i.e. it doesn't have to try to identify the "source" of a pointer to know if it comes from a fence or not). Instead, what prevents incorrect behavior is that, since llvm.tbaa.fence's memory effects are opaque (well, aside from being ArgMemOnly), the subsequent operation can't be reordered or eliminated. In this case, TBAA will still report NoAlias if asked about the two operations; but that result doesn't matter because you can't optimize based on that fact with a fence in between them (since the fence could, as far as the optimizer is concerned, do arbitrary loads and stores to the region of memory).

This intrinsic would work just as well for these purposes if it returned void and we got rid of the constraints about the subsequent accesses happening via the result of the call (and was truly just an opaque memory fence). I'd be happy to make that change if it'd be less confusing.

The only thing we get from having this return a value is the option to implement some optimizations later on; for example, in certain cases we can turn "store, fence, load" into a bitcast, letting us get rid of the fence's effects entirely.

Rewrite the LangRef entry. This makes the operation of the fence clearer (it doesn't really change the behavior of TBAA at all; instead it just prevents any other optimization pass from being able to make use of TBAA for the given pointer).

I'm also happy to just strengthen this to a pointer-specific fence (that returns void), that acts otherwise just like asm volatile("" : : "r"(ptr) : "memory") if that would make this more palatable. The only optimization I have in mind for the pointer-returning version is something to allow folding "store/fence/load" to a bitcast, which I don't think is important enough to be worth spending a lot of time on.

I don't think returning the pointer is actually helpful. Passes dealing with memory aliasing will be walking backwards using something like MemorySSA, not pointer def-use lists, so they'll find it anyway if it's relevant.

Maybe the fence should have a size argument, to indicate the size of the object being allocated? (It's currently not clear exactly what bytes are affected.)

I think we do need to settle the question of whether "new" also needs to call this fence... if it does, we need to measure the compile-time/performance impact on existing code, and if it doesn't, we'll likely end up with two overlapping solutions for very similar issues.

llvm/lib/CodeGen/CodeGenPrepare.cpp
2357 ↗(On Diff #510981)

This is killing off the fence too early; we have to keep the fence around as long as we have TBAA data, and that still exists at least though ISel. (Not sure off the top of my head if we encode TBAA into MachineInstrs.)

Just return void instead of an opaque pointer, take a size argument, drop early CodeGenPrepare lowering.

I think we do need to settle the question of whether "new" also needs to call this fence... if it does, we need to measure the compile-time/performance impact on existing code, and if it doesn't, we'll likely end up with two overlapping solutions for very similar issues.

I'm happy to run some tests; any particular benchmark you'd like to see? llvm-test-suite doesn't really use placement-new (which I think is the only place where this would be needed?).

I think that two similar-ish solutions may actually be correct; there's a real semantic difference: after a placement new, any prior stores to the memory is dead, while after a tbaa_fence, they're not. So this will necessarily be an over-conservative solution to fix the placement-new/tbaa issues.

For perf, maybe bootstrap LLVM? I think LLVM itself uses placement new in a number of places...

I hadn't really considered the memory being dead after placement new, but the only optimization that would really open up is dead store elimination, I think? Not sure how important that is; we currently don't do that optimization anyway.