This is an archive of the discontinued LLVM Phabricator instance.

[LangRef] State that the based-on relation is for pointer typed values only
Needs ReviewPublic

Authored by aqjune on Jun 10 2021, 2:03 AM.

Details

Summary

This patch (1) clarifies that the definition of based-on relation is for pointer typed values only (2) removes the statement about inttoptr saying that its result is based on pointers that contribute to the resulting pointer.
Also, this patch adds a statement stating that inttoptr(ptrtoint p) isn't based on p.

While trying to convey meaningful updates, I also tried to make changes small as well.
For example, the precise meaning of inttoptr isn't suggested in this patch.
Also, the term 'provenance' is intentionally not introduced here because it isn't necessary for this patch.

Diff Detail

Event Timeline

aqjune created this revision.Jun 10 2021, 2:03 AM
aqjune requested review of this revision.Jun 10 2021, 2:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2021, 2:03 AM
aqjune added inline comments.Jun 10 2021, 2:04 AM
llvm/docs/LangRef.rst
2659

This item is moved to the bottom.

Matt added a subscriber: Matt.Jun 10 2021, 2:49 AM
hvdijk added a subscriber: hvdijk.Jun 10 2021, 12:29 PM
hvdijk added inline comments.
llvm/docs/LangRef.rst
2666

Nit: cast is an irregular verb, "casted" should be "cast". (The same mistake is already present in other places.)

2689

This looks wrong, q is based on "all pointer values that contribute (directly or indirectly) to the computation of the pointer's value", and p's value clearly contributes to that of q, so q is based on p. What I think you're trying to say here is that it is incorrect to assume that q is only based on p.

2700

This implies that storing a pointer value in memory and loading it as a pointer type does preserve its address range. Whether it does is unclear; this idea is assumed by some existing optimisations and incompatible with other existing optimisations. In my opinion, it is not a good idea to update the documentation to clarify this either way until it is decided how LLVM should behave.

If you have raised this not to update the documentation directly, but only to as a starting point for discussion, could you also include what happens when an integer is stored to memory and is then loaded as a pointer value?

nhaehnle added inline comments.Jun 12 2021, 12:28 AM
llvm/docs/LangRef.rst
2666

Some targets have address spaces in which 0 is a valid pointer, so this should probably be adjusted while we're at it? It depends on the address space.

2689

I agree. I wonder if this bullet point couldn't be folded into the bullet "A pointer cast from an integer constant ..." above.

Is there a good reason for distinguishing between an inttoptr from constant vs. inttoptr from anything else? I'd naively assume they behave the same.

2700

This implies that storing a pointer value in memory and loading it as a pointer type does preserve its address range. Whether it does is unclear; this idea is assumed by some existing optimisations and incompatible with other existing optimisations

If store of pointer followed by load of pointer *doesn't* preserve the address range, then mem2reg is likely incorrect as-is, so... :)

Which existing optimizations are incompatible with this?

hvdijk added inline comments.Jun 12 2021, 3:47 AM
llvm/docs/LangRef.rst
2700

Consider

define i8* @f(i8* %p) {
  %buf = alloca i8*
  %buf.i32 = bitcast i8** %buf to i32*
  store i8* %p, i8** %buf
  %i = load i32, i32* %buf.i32
  store i32 %i, i32* %buf.i32
  %q = load i8*, i8** %buf
  ret i8* %q
}

I think it is clear that this needs to return inttoptr(ptrtoint(%p)), SROA optimizes it to exactly that, but Early CSE then changes it to return %p (tested in LLVM/clang 12.0.0). That looks wrong. Let's suppose it didn't do that, let's suppose we run existing optimisations in a different order. Given in the above

%i = load i32, i32* %buf.i32
store i32 %i, i32* %buf.i32

Early CSE is able to remove this store by looking at only these two instructions without any other context: the store is a store of exactly the value that was just loaded. If storing a pointer value in memory and loading it as a pointer type preserves the address range, though, this is invalid: the previous store to that address was a pointer with provenance, and this store is an integer without provenance, so the store still had an effect. This optimisation is only correct if pointers in memory lose their address ranges.

nhaehnle added inline comments.Jun 12 2021, 10:34 PM
llvm/docs/LangRef.rst
2700

Okay, thanks. To repeat the last part, an integer load followed by storing back of the same value may not actually be a no-op. That's pretty horrible :(

aqjune added inline comments.Jun 13 2021, 7:18 PM
llvm/docs/LangRef.rst
2700

Hi @hvdijk ,

This implies that storing a pointer value in memory and loading it as a pointer type does preserve its address range. Whether it does is unclear; this idea is assumed by some existing optimisations and incompatible with other existing optimisations. In my opinion, it is not a good idea to update the documentation to clarify this either way until it is decided how LLVM should behave.

If you have raised this not to update the documentation directly, but only to as a starting point for discussion, could you also include what happens when an integer is stored to memory and is then loaded as a pointer value?

A straightforward solution would be to define it as yielding poison. Since poison is more undefined than any other value, the EarlyCSE example can be explained.
Similarly, defining the reversed case (pointer stored to memory and is then loaded as an integer value) as poison explains an analogous transformation too.
With this poison semantics, the concerned optimizations can be supported.

But, I think the validity of the statement is orthogonal from its concrete semantics.
It is because based-on is a pointer-to-pointer relation only; it is natural to conclude that the loaded integer will lose its associated address range.
The sentence was here for clarification - readers may naturally ask what happens when a pointer was read as integer via memory punning as well.

aqjune updated this revision to Diff 351764.Jun 13 2021, 7:20 PM

Address comments

aqjune marked 4 inline comments as done.Jun 13 2021, 7:21 PM

Thank you very much for the comments!

llvm/docs/LangRef.rst
2689

Thanks, I simply removed the inttoptr part, and moved the inttoptr(ptrtoint) warning below.

aqjune marked an inline comment as done.Jun 13 2021, 7:21 PM

Thank you for your changes.

llvm/docs/LangRef.rst
2666

Should the "constant" be removed?

jdoerfert added inline comments.Jun 14 2021, 9:14 PM
llvm/docs/LangRef.rst
2666

^ yes. And I would not say null pointer but use the wording we have elsewhere: "if a null pointer is defined". My worry is that (void*)0 is always a null pointer, no matter the AS but the AS determines if this has the special or non special meaning.

aqjune updated this revision to Diff 352042.Jun 14 2021, 9:54 PM

Address comments

aqjune updated this revision to Diff 352043.Jun 14 2021, 10:00 PM
aqjune marked 2 inline comments as done.

Use a slightly different wording for the null pointer case

aqjune added inline comments.Jun 14 2021, 10:01 PM
llvm/docs/LangRef.rst
2666

I updated the sentence by using the wording at the description of 'null_pointer_is_valid' attribute.

This makes sense to me, wording seems to be alright, though it's late. Thx for working on this!

hvdijk added inline comments.Jun 15 2021, 2:50 PM
llvm/docs/LangRef.rst
2689

If you take out the inttoptr part without any replacement, this isn't a clarification of the rules, it's a change to the rules. In that case, please update the description accordingly so that this is also obvious to other reviewers.

aqjune edited the summary of this revision. (Show Details)Jun 16 2021, 3:47 AM

ping

If the changes about loading pointer bytes as an integer type (reinterpreting the pointer value stored in the memory by loading it as a non-pointer type loses its associated ...) is a concern, maybe I can remove it and readd it when its semantics becomes concrete enough.

Since you pinged, I will repeat my previous objection that you have not yet addressed: you removed the rule stating "A pointer value formed by an `inttoptr` is *based* on all pointer values that contribute (directly or indirectly) to the computation of the pointer's value." without a replacement as far as I can see, yet say in your description that this is just a clarification. If you want to modify the rules, please make your description say so. If you just want to clarify the rules, please make sure your changes only do that.

aqjune edited the summary of this revision. (Show Details)Jun 21 2021, 11:28 AM
reames resigned from this revision.Nov 30 2021, 10:00 AM