All current analyses ignore the context. We make the argument mandatory
for analyses, but optional for the query interface.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This (and subsequent diffs in the stack) is something I've been experimenting with at Meta (hence the "facebook" comment markers, which are just an accident of our internal patch tracking and will get removed if this direction gets support).
The end goal is to allow library code to express aliasing invariants that it knows but that cannot be easily deduced by alias analyses. The simplest example of these is std::vector; modifications to its contents can't change the vector itself, but there's no way to deduce this in general (see e.g. this blog post: https://travisdowns.github.io/blog/2019/08/26/vector-inc.html). Instead, within the vector implementation, we can add __builtin_experimental_separate_storage(this, this->_M_data) in accessors; this is a new builtin that results in UB if the two arguments did not come from the same allocation.
This isn't ready to merge, because of the statelessness issue called out in invalidate(). If there is buy-in for the overall approach, I'm happy to generalize the assumption cache to support more than just assumptions, allowing an efficient implementation.
With a couple judicious insertions, it's shown promising results in some extracted performance-sensitive code kernels (around a 5% speed improvement in a thrift serialization function when we assert that the output buffer does not alias the thrift parse state, and around a 30% speed improvement to a numerical loop that became vectorizable when the array contents don't alias). Just inserting the hints into std::vector results in around an 0.05% improvement in generated code size in a variety of large server workloads. We can also add additional builtins to support things like std::string or SmallVector; using the current builtin would not be correct for them as written (because they embed their storage alongside their control data down certain paths).
This is vaguely similar to restrict, but restrict can't address some of the issues that these hints can. In particular, restrict is local to a function, and so can't propagate its semantic requirements to callers and callees after inlining.
Removing from draft to get some feedback, but still more or less an RFC at the moment.
Added as reviewer anyone subscribed to AA changes + @wenlei, who's seen some of the evolution here.
Some initial notes:
- Rather than adding a separate aliasAt() API, we should add an extra optional argument to alias(). Especially when it comes to the Concept/Model/Base I don't see a value in having both alias() and aliasAt().
- An important limitation (which your patches do get right) is that BasicAA must not make use of the context instruction, otherwise it would invalidate our caching strategy. This does mean that recursive queries will not be able to benefit from the context instruction (or they would need to use a different context instruction that only depends on the memory locations, which is tricky in itself due to symmetry requirements).
- I don't really understand your invalidation troubles in the new AA pass. Your current implementation has a high compile-time impact (1% geomean on ctmark), but I think this mainly comes down to the fact that you are unnecessarily recalculating the dominator tree many times -- why doesn't this fetch the DT from the pass manager? BasicAA already does this, so it should be available and work fine. (Note that this is not just a performance concern, your current implementation is incorrect if the DT is updated within a pass -- this will update the DT from the PM, but not your internal instance.)
- It might make sense to use an assume operand bundle rather than a separate intrinsic, something like call void @llvm.assume(i1 true) ["noalias"(ptr %o1, ptr %o2)]. This could give you various things like ephemeral value skipping for "free". And probably also integrates with AssumptionCache already. Not sure if there's any downsides to this representation.
- The second patch does not apply cleanly, because your base sources seem to register an additional objc AA pass.
Generally, the approach looks viable to me though. To suggest a possible alternative, it should be possible to make some other pass add scoped noalias metadata to accesses based on this intrinsic. Not sure this would be better though.
Rather than adding a separate aliasAt() API, we should add an extra optional argument to alias(). Especially when it comes to the Concept/Model/Base I don't see a value in having both alias() and aliasAt().
Agreed, that's better.
An important limitation (which your patches do get right) is that BasicAA must not make use of the context instruction, otherwise it would invalidate our caching strategy. This does mean that recursive queries will not be able to benefit from the context instruction (or they would need to use a different context instruction that only depends on the memory locations, which is tricky in itself due to symmetry requirements).
Yeah. I think it's possible to extend it in certain cases, but didn't feel sure enough to try it.
I don't really understand your invalidation troubles in the new AA pass. Your current implementation has a high compile-time impact (1% geomean on ctmark), but I think this mainly comes down to the fact that you are unnecessarily recalculating the dominator tree many times -- why doesn't this fetch the DT from the pass manager? BasicAA already does this, so it should be available and work fine. (Note that this is not just a performance concern, your current implementation is incorrect if the DT is updated within a pass -- this will update the DT from the PM, but not your internal instance.)
I think practically the reason is that BasicAA is still fairly useful without the dominator tree, while SeparateStorageAA is much less so. But a hard DT dependency is somewhat burdensome because any AA result being invalidated acts to invalidate all of them (just because of the invalidate implementation in AAResults). But I think it's just a bullet this has to bite given the correctness/compile-time problems.
It might make sense to use an assume operand bundle rather than a separate intrinsic, something like call void @llvm.assume(i1 true) ["noalias"(ptr %o1, ptr %o2)]. This could give you various things like ephemeral value skipping for "free". And probably also integrates with AssumptionCache already. Not sure if there's any downsides to this representation.
This is a good idea, I'll do that. This also nicely solves various other annoyances: duplicated special-casing logic, the possibility of failing to notice intrinsic creations due to duplication when cloning, etc.
The second patch does not apply cleanly, because your base sources seem to register an additional objc AA pass.
Will fix.
To suggest a possible alternative, it should be possible to make some other pass add scoped noalias metadata to accesses based on this intrinsic. Not sure this would be better though.
I did some experiments in this direction; it turned out trickier than I expected because of the impedance mismatch with noalias scopes. Example: changes in control flow (inlining, DCE, etc.) can cause a llvm.experimental.separate.storage call to start dominating an instruction it previously didn't, but we'd need to re-run a metadata propagation pass for subsequent AA queries to notice. Entirely possible it's just due to me being dumb, but the similarity between the semantics of this and assumes make the operand-bundle approach feel right.
I don't really understand your invalidation troubles in the new AA pass. Your current implementation has a high compile-time impact (1% geomean on ctmark), but I think this mainly comes down to the fact that you are unnecessarily recalculating the dominator tree many times -- why doesn't this fetch the DT from the pass manager? BasicAA already does this, so it should be available and work fine. (Note that this is not just a performance concern, your current implementation is incorrect if the DT is updated within a pass -- this will update the DT from the PM, but not your internal instance.)
I think practically the reason is that BasicAA is still fairly useful without the dominator tree, while SeparateStorageAA is much less so. But a hard DT dependency is somewhat burdensome because any AA result being invalidated acts to invalidate all of them (just because of the invalidate implementation in AAResults). But I think it's just a bullet this has to bite given the correctness/compile-time problems.
BasicAA currently has a hard dependency on DominatorTree, it's not an optional analysis. So it should really be in the same boat as SeparateStorageAA... (We could make DT optional in BasicAA, but that's not how it is right now.)
llvm/include/llvm/Analysis/AliasAnalysis.h | ||
---|---|---|
318 |
Which I is this referring to? |
llvm/include/llvm/Analysis/AliasAnalysis.h | ||
---|---|---|
318 | Woops, that should have been down below, on the internal query function. |
ping @nikic, would you mind taking another look? I think with the changes you suggested this ends up as a much cleaner stack (being able to avoid duplicating so much of the assume handling really cuts down). My followup plan is to expand this to handle the possibility of inline storage (e.g. the small string optimization), which is where we incur more of the costs, but I'd rather build on something stable before going much further.
This looks fine, just some style nits.
llvm/include/llvm/Analysis/AliasAnalysis.h | ||
---|---|---|
624–629 | We'd usually call this CxtI (the "context instruction"). | |
760 | You probably don't need = nullptr here and basically anywhere that is not public API. | |
llvm/include/llvm/Analysis/CFLSteensAliasAnalysis.h | ||
70 | I don't think such comments are typical in LLVM. |
llvm/include/llvm/Analysis/CFLSteensAliasAnalysis.h | ||
---|---|---|
80 | Just a note: You pass through CtxI to AAResultBase::alias() in some places, but not in others. Of course, this doesn't make any practical difference, because AAResultBase::alias() is a no-op that ignores all arguments. We should probably still either do one or the other for consistency. (I'm kind of wondering if we shouldn't drop all these AAResultBase calls and just replace them with the appropriate values. I get the impression that these are causing a lot more confusion than they clarify, because people think they perform recursive AA, which they don't.) |
llvm/include/llvm/Analysis/CFLSteensAliasAnalysis.h | ||
---|---|---|
80 | Will fix; and agreed that it's better to just have all these return MayAlias instead of returning the base implementation and doing it indirectly (I think a vestige of when this stuff was CRTP-ified?). Happy to put up a cleanup diff once it won't trigger any merge conflicts. |
I pushed the first 2 (accepted) patches on behalf of David, since David doesn't have commit access yet.
Which I is this referring to?