This is an archive of the discontinued LLVM Phabricator instance.

[LAA] Don't require stride one for inbounds GEP
Needs ReviewPublic

Authored by nikic on May 5 2023, 1:28 AM.

Details

Reviewers
reames
fhahn
Summary

I'm pretty sure I'm missing something here, but I'm not sure what it is... As far as I can tell, if we have an inbounds GEP, then we don't need the stride to be 1 or -1 specifically to assume it doesn't wrap in a nusw sense. If it wraps, then we will go from one half of the address space to the other, and no allocated object can be that large (or at least, using inbounds on such an object is not legal).

This is still WIP as some additional test updates are needed.

Diff Detail

Event Timeline

nikic created this revision.May 5 2023, 1:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2023, 1:28 AM
nikic requested review of this revision.May 5 2023, 1:28 AM
reames added a comment.May 8 2023, 8:56 AM

I think this is correct, but for different reasoning.

"The successive addition of the current address, interpreted as an unsigned number, and an offset, interpreted as a signed number, does not wrap the unsigned address space and remains in bounds of the allocated object. As a corollary, if the added offset is non-negative, the addition does not wrap in an unsigned sense (nuw)."

A non-negative stride is clearly and explicitly okay from the corollary. The wording on the first part is slightly hard to follow, but I think this also disallows underflow *as integers* for negative offsets. However, the condition is the corollary maybe hints I'm missing something here? I *think* this is the distinction between arithmetic overflow (where a negative offset does NUW wrap), and address space overflow (where it doesn't) I *think* the later is the semantic we're actually proving here, but the definition of IncrementWrapFlags is a bit hard to follow for me.

nikic added a comment.May 8 2023, 10:06 AM

@reames What LangRef is describing there is basically "nusw" in SCEV terms, which is I believe is also the property LAA is interested in (or at least, that's what it adds to PSE). What LangRef is trying to emphasize here is that inbounds is not nsw, it is nusw, which reduces to nuw for non-negative offsets.

@reames What LangRef is describing there is basically "nusw" in SCEV terms, which is I believe is also the property LAA is interested in (or at least, that's what it adds to PSE). What LangRef is trying to emphasize here is that inbounds is not nsw, it is nusw, which reduces to nuw for non-negative offsets.

I think I agree. If so, your patch here is definitely sound.

Might be good to update the SCEV docs or the inbounds docs to explicitly link the two definitions. Would make this type of discussion much easier going forward.

We should also maybe consider singing "nusw" into base SCEV wrap enum so we can be explicit about the difference. SCEV should conceptually be able to prove it on any inbounds gep which is known to reach a UB-if-poison use. (The use case here doesn't have to prove the existence of the use.) That may (or may not) actually be worth doing.

nikic added a comment.Jul 13 2023, 7:45 AM

I've been thinking about this again and had a crazy thought: Do we need inbounds at all?

As far as I understand, everything here is working under the assumption that the pointer actually gets accessed on each iteration (otherwise a lot of the reasoning would not be valid anyway). The underlying object of an addrec only depends on the addrec start, so the underlying object must stay the same across all iterations. This means that, on each iteration, the pointer address must be part of that single underlying object -- otherwise we would not have provenance to perform the access, and behavior would be undefined. As no allocated object can cross the address space boundary, this implies that the pointer cannot wrap.