Yet another case that shows up in eembc, see PR24288.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Analysis/IPA/GlobalsModRef.cpp | ||
---|---|---|
654 ↗ | (On Diff #32240) | It seems like we could make this more general is two ways:
If I'm reading this code correctly, one possible way of handling this, is just to do this: push.Inputs(LI->getPointerOperand()); continue; which seems to accomplish both things, and keeps an integrated Depth constraint. |
lib/Analysis/IPA/GlobalsModRef.cpp | ||
---|---|---|
654 ↗ | (On Diff #32240) | You're saying a pointer loaded from an escaping pointer is itself escaping? I think I agree, but it makes this surprisingly more powerful. This will chase a chain of loads until it hits control flow or a known underlying object. Nifty, but maybe surprising. However, unless I'm mistaken, we still need to call GetUnderlyingObject? The worklist doesn't do that. I'm a bit scared about using GetUnderlyingObjects making this significantly more slow... |
lib/Analysis/IPA/GlobalsModRef.cpp | ||
---|---|---|
654 ↗ | (On Diff #32240) | Responses below, but actually, it seems that we can do even better than this: A non-addr-taken global does not have its address stored anywhere, ever. Thus, if we get a pointer from a load, it cannot be a non-addr-taken global (except for those in AllocsForIndirectGlobals). No?
Yes, I think that's correct.
With a depth limit of 4, it is not going to do very *much* chasing ;) - but, yes, that's the idea.
Ah, yes, correct. The existing worklist does not look though GEPs. Should it?
Indeed; this was my motivation for suggesting using the existing worklist (to get the unified depth limit). |
Thanks for the idea, Hal.
But, unless I'm missing something, this doesn't quite work, because the base condition is different.
Let's say you have:
%v = load i32, i32* @g1
isNonEscapingGlobalNoAlias(@g1, @g1) should return false, but isNonEscapingGlobalNoAlias(@g1, %v) should return true.
So we can't just push @g1 onto the worklist.
Let's start with the patch as-is. I'm sure that this at least is correct and unlikely to be problematic for compile time, while it is sufficient to solve real world performance problems.
We can revisit other designs as follow-up patches.
Michael, feel free to submit, and we can keep poking at what the exact right model here is.