Page MenuHomePhabricator

[deref] Implement initial set of inference rules for deref-at-point

Authored by reames on Mar 22 2021, 7:42 PM.



Building block towards D99100.

This implements the initial set of inference rules proposed in the llvm-dev thread "RFC: Decomposing deref(N) into deref(N) + nofree". I've already stumbled across a few more when examining test cases, but it's good to get this checked in and tested.

Diff Detail

Event Timeline

reames created this revision.Mar 22 2021, 7:42 PM
reames requested review of this revision.Mar 22 2021, 7:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2021, 7:42 PM
nlopes requested changes to this revision.Mar 23 2021, 4:19 AM
nlopes added inline comments.

an Alloca can be killed with the lifetime intrinsics. Storing to an alloca after lifetime_end is UB.


I don't understand this argument of why hasNoSync isn't needed.
All argument attributes are with respect to what the function does, not the environment. It would be impossible to infer attributes if the environment was considered.

My understanding is that readonly implies nofree. Can't see why nofree would have those special semantics


FWIW missing byval case (can't be freed).

This revision now requires changes to proceed.Mar 23 2021, 4:19 AM
reames added inline comments.Mar 23 2021, 9:14 AM

Right, but how does that affect dereferenceability? The memory is still dereferenceable. The contents may be undefined, but the access won't fault.

p.s. Lifetime intrinsics are very badly specified, and are inconsistent across the optimizer - including in the existing deref code. I really don't want them to be a blocker here.


The focus here isn't on nofree implying nosync. It's noalias which implies nosync. In order to for there to be a written (freed) copy of the pointer accessible in the other thread, the noalias fact must be false. If so, that's already UB.


Handled in follow up patch.

reames added inline comments.Mar 23 2021, 9:21 AM

JFYI, the wording in the LangRef on semantics of a dead object (after lifetime end) was recently changed in c821ef4. I objected to the wording at the time, though the patch appears not to have been adjusted. This is why.

lifetime intrinsics do not change dereferenceability. The relevant wording from c821ef4 needs removed from the LangRef. It is wrong. Both with current and proposed semantics.

nlopes added inline comments.Mar 23 2021, 9:58 AM

I agree that doing a store after lifetime_end intrinsic doesn't cause a crash (ATM), but it may write to another object.
For example:

%p = alloca i8
%q = alloca i8

lifetime_start %p
lifetime_end %p
lifetime_start %q
store i8 1, %q
store i8 0, %p ; UB: %p is dead

%v = load i8, %q

What's the value of %v? In practice, it's going to be 0 because %p & %q will end up allocated at the same address. And this is why storing after lifetime_end must be UB.

So now it depends on how this analysis is used. My concern is that it may be used to move a store past lifetime_end and then we have a miscompilation. Because if a pointer is dereferenceable for the size of the store and it's guaranteed to be alive throughout the function, then we can conclude that loads & stores to that pointer can be moved freely.

A simple solution is to check if the particular alloca is an argument to a lifetime_end intrinsic. More complex ones require an additional argument like a BB or specific instruction, such that the query becomes: is the object still alive at this BB/instruction?

reames added inline comments.Mar 23 2021, 10:05 AM

This has turned into a discussion of the lifetime intrinsics. I assert that is unrelated to this change (as the implementation here exactly matches the existing deref implementation for this case). I ask that this discussion be removed from this review thread and taken elsewhere.

I'll also note that you are commingling concerns in your discussion. Dereferenceability does NOT imply it's safe to insert a store. That's a higher level property with additional concerns (e.g. concurrency) which need to be satisfied.

nlopes added inline comments.Mar 23 2021, 11:26 AM

Ok, I confess I'm less familiar with the noalias attribute. Though memcpy's signature is something like:

declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #1

The caller of memcpy may have had the src/dst pointers escaped to other threads. Even then, both arguments are marked as noalias because the C spec says the arguments cannot overlap.
This contradicts the global definition of noalias you are suggesting.

reames added inline comments.Mar 23 2021, 11:44 AM

I'm also fairly unfamiliar with noalias. I'm going off existing wording, and what the optimizer appears to actually do.

I don't think it actually is contradictory. Your concern involves a *writeable* copy of the pointer. The wording of noalias talks about *access*. It doesn't appear to be UB to have a writeable copy of the pointer on another thread (or even within the callee). It does appear to be UB to *actually write to it* during the execution of the callee.

Note that this definition does make noalias argument attributes very hard to infer. I'll note that we only have noalias return inference which might be exactly because of that. :) I'm not sure.

nlopes added inline comments.Mar 23 2021, 12:40 PM

Unfortunately we need to settle on the lifetime semantics if we want to have code inferring properties about allocas here.
BTW, my concern is about movement of load/stores, not of introduction.

This canFree is called from getPointerDereferenceableBytes(), which will infer for how many bytes a pointer is dereferenceable in the whole function. Not just in a particular BB. Hence once the dereferenceable attribute is inferred, "free" movement of memory accesses becomes legal. But it's not, as we have to respect the boundaries of lifetime_start/end.
(I'm even ignoring segmented stacks to avoid extra complication).

Fundamentally, I think that if we go with a definition of dereferenceability per function, we are always going to have our hands tied. We may need to refine the abstraction and do it per BB, for example. Without knowing what every single user of getPointerDereferenceableBytes() does with the result of the analysis, I'm not comfortable declaring an alloca'ed object as always dereferenceable.


I see; thanks for the explanation. LLVM doesn't seem to assume that noalias pointers are distinct indeed:
So I'm good on this now :)

nikic added a subscriber: nikic.Mar 23 2021, 1:38 PM
nikic added inline comments.

Movement of accesses across lifetime intrinsics is already prevented because lifetime intrinsics are modeled as clobbering the location.

I really don't think we should get lifetime intrinsics caught up in this if we can avoid it. Requiring context to handle allocas complicates things a lot.

Do you have some specific example of a miscompile that would occur if lifetime intrinsics are not modeled here?

nlopes added inline comments.Mar 24 2021, 4:38 AM

That's a good point, thanks! Since the intrinsics clobber the objects then movement is already _likely_ prevented. I tried and couldn't trigger a miscompilation by hand in ~15 min of fiddling with IR.

I'm still uncomfortable declaring allocas dereferenceable in the whole function when they are not. Who knows what the users of getPointerDereferenceableBytes will do with this information (now or in the future).. We don't have documentation about these analyses.

Anyway, to unblock this, I'll just suggest a comment is added to getPointerDereferenceableBytes regarding allocas. Meaning, just because an alloca'ed pointer isn't escaped and is "dereferenceable" for x bytes, it doesn't mean we can introduce a store to it (as one would suppose!). One needs to consider lifetime as well.


Just re-read LangRef a few times again regarding noalias (
I can't interpret what's written there as anything related with pointers not being writable in another thread. It just says that if I access object o1, and that object is pointed to by a noalias argument %p, then object o1 can't be accessed through any other pointer that is not based on %p. There's no word about concurrent accesses.
So either this code is incorrect or LangRef needs updating.

reames added inline comments.Mar 24 2021, 9:12 AM


I will not put an incorrect comment into the source code to reflect a mistaken understanding of the code. As explained by both myself and Nikita previously, you are commingling concerns of dereferenability, concurrency, and memory aliasing.

I have already asked you to move the discussion of lifetime intrinsics elsewhere. As I asserted before, the code as written here matches the existing implementation it's replacing. If you want to change that implementation or the semantics thereof, feel free to put forth proposals. I will not.

I am frustrated by this sub-thread. TBH IMO, it seems like you are not discussing the issue *in this patch* in good faith. It seems like you're trying to insist I broaden the scope of work substantially. That bothers me.


I'm going to move this inference rule to it's own review. We probably need input from Johannes or someone more knowledgeable on noalias than myself. I want to unblock the rest of the review.

nlopes resigned from this revision.Mar 24 2021, 9:24 AM
nlopes added inline comments.

The whole point of your RFC & patches was to fix bugs in LLVM, not to match the implementation it is replacing AFAIU.

Anyway, since apparently I'm acting in bad faith regarding this patch, I'll step down immediately.

This revision now requires review to proceed.Mar 24 2021, 9:24 AM
reames updated this revision to Diff 333144.Mar 24 2021, 3:05 PM

Strip out the argument attribute cases for now. I'll move them to their own review since they seem to need more discussion.

Can I get an LGTM on this? This is blocking a bunch of follow up patches.

nikic added inline comments.Mar 24 2021, 3:22 PM

GlobalValue is a subtype of Constant, no need to check it separately.


nit: it's -> its


I don't really get why this code is specific to arguments. Shouldn't this hold for any SSA value inside the function? Sure, it can get freed after the function returns/unwinds, but not while the SSA value is active. Am I missing something here?

nikic added a comment.Mar 24 2021, 3:23 PM

You might want to split out the GC part as well, or at least have someone else look at that -- I don't know anything about it.

I looked at the GC part and it looks good to me.

reames updated this revision to Diff 333163.Mar 24 2021, 4:04 PM

Address review comments, and make a stylistic adjustment to make it really clear that this patch doesn't effect alloca handling at all.

nikic accepted this revision.Mar 24 2021, 4:10 PM



Probably doesn't make sense to keep this as a bullet point list if there's only the one element :)

This revision is now accepted and ready to land.Mar 24 2021, 4:10 PM
This revision was landed with ongoing or failed builds.Mar 24 2021, 4:21 PM
This revision was automatically updated to reflect the committed changes.
reames added inline comments.Mar 25 2021, 11:56 AM

To close the loop here for third parties reading along, Nuno and I had an offline conversation on this sub-point today.

This discussion did trigger a clarification response to the original RFC to be much more specific about the scope of the proposal. That seems to have been the original confusion which let us down this path of miscommunication.

The conversation also triggered me to change my handling of a related discussion thread about changes which recently changed the semantics of lifetime intrinsics.