Further to my post on the mailing list, EarlyCSE was not merging the aliasing information of two loads/stores that it wanted to fold, meaning that it was random (program order) which one survived. This meant that aliasing information would be sometimes propagated from inlined-restrict functions, when it should not be.
I'm not familiar with the code of this pass, but is there a cheap way of identifying that the two operations are in the same basic block?
If so, you could take the intersection of the aliasing information rather than the union. Because if both ops are guaranteed to execute then the tightest aliasing still has to hold.
(being in the same BB doesn't imply that both instructions are executed, but there's code in ValueTracking perhaps that can check that)
Can you explain in more detail why the current behavior is incorrect? Why is using the aliasing information in program order wrong? If the first load is noalias, then we should be able to deduplicate to that noalias load. If the second load is noalias, then generally we can't assume the first load is also noalias. Unless there is a must-exec relation between them, as @nlopes mentioned.
We have two issues here;
- For any pass, if it changes any instruction with metadata that it does not understand, it should drop that metadata. That's the contract that we need to maintain.
- For this AA metadata in particular, we need to intersect the metadata, and this is true to preserve semantics for almost all other known metadata (AA, profiling, fp-precision, debug) although likely for it is not correct to keep only one, or the other, because of potential dynamic dependencies on the results.
For AA, as an example, we can have something like this:
int * restrict r = a; int x = should_alias ? *a : *r;
(essentially the same is true using casts as TBAA).
In any case, we should be calling the function llvm::combineMetadataForCSE here. Please update the patch to do that.
So, I guess that the reasoning here is that noalias loads are speculatable and noalias violation is thus poison (not immediate UB), so we can't make inferences from one load to the other. Does that capture it?
So when I use combineMetadataForCSE it breaks the invariant loads tests because the code there just says 'if one thing is invariant load and the other is not, throw the invariant load metadata away'.
What would be the correct fix for this? I don't know a ton about the intracacies of invariant loads.
I believe that invariant loads are a bit special because the invariantness of a load is not allowed to have any control dependencies. The LangRef says, "If a load instruction tagged with the !invariant.load metadata is executed, the optimizer may assume the memory location referenced by the load contains the same value at all points in the program where the memory location is known to be dereferenceable; otherwise, the behavior is undefined." Thus, I think that it's valid for (invariant + nothing) to be (invariant) so long as both are guaranteed to be executed.
I would, however, do this as two separate patches. One which updates EarlyCSE to use combineMetadataForCSE (and XFAILs the test), and a second which proposes to update combineMetadataForCSE and fixes the test (and then we can figure out in that review what to do, because we may need a dominance check or a guaranteed-to-execute check or similar).
My understanding (albeit naive) of combineMetadataForCSE comes from the fact that at the definition the parameter is called KDominatesJ - which for the load case K does dominate J (whereas for the store case it does not, so I set it false there).
The fact the declaration and definition have different terms for the same boolean seems troublesome now I think about it.
Yes, and I'm confused by the naming here. The comments in llvm/include/llvm/Transforms/Utils/Local.h say, "Some metadata can only be kept if K dominates J. For this to be correct, K cannot be hoisted." The comment on combineMetadata likewise says, "Some metadata kinds can only be kept if K does not move, meaning it dominated J in the original IR." The parameter is named DoesKMove, implying that when false, K did not originally dominate J. However, in Local.cpp, the parameter to combineMetadataForCSE is named KDominatesJ. This seems backward (i.e., DoesKMove seems like it should be false if K dominates J). In any case, the implementation of llvm::combineMetadata seems to do the more conservative thing when DoesKMove is true (when DoesKMove is false, it will prefer the metadata from K in some cases).
Thus, I think that it should be false here.