Page MenuHomePhabricator

[IR] `byval` arguments cause reads at call sites
AcceptedPublic

Authored by jdoerfert on May 5 2020, 4:19 PM.

Details

Summary

Even if a call site or function is marked readnone (or writeonly), a
byval argument is still "read"/copied at the call site. We now
indicate so in Instruction::mayReadOrWriteMemory and remove the
temporary workaround in the Attributor. Test coverage is provided by the
Attributor tests.

Diff Detail

Event Timeline

jdoerfert created this revision.May 5 2020, 4:19 PM
Herald added a reviewer: uenoku. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
arsenm added inline comments.May 5 2020, 4:46 PM
llvm/lib/IR/Instruction.cpp
542

It technically reads memory, but does it really read memory from an IR perspective? This isn't supposed to be a program visible read?

rnk accepted this revision.May 5 2020, 4:54 PM
rnk added a subscriber: aeubanks.

lgtm

IIUC, Attributor may infer that a function with byval args is readnone. This makes sense, the body may contain no loads. But a call instruction with a byval argument reads memory, even if the callee doesn't.

Does this actually matter, though? I would expect you could delete all the argument stores to the byval memory being passed in, and so long as the callee doesn't load from it, that's fine.

@aeubanks
Do we need equivalent handling for the preallocated and (soon to be removed) inalloca attributes? Maybe we don't, since I think we want to model preallocated functions as having side effects (adjusting SP).

This revision is now accepted and ready to land.May 5 2020, 4:54 PM

I'm a little skeptical this is a good idea. Yes, if you examine the assembly generated by most backends, you'll find that the code which performs the copy is written in the caller. But we don't promise that in general, and at the IR level we've been modeling the "byval" as part of the callee for a long time. And the incentive to change this seems weak.

If you are going to change this, please audit all the other places that query the readnone/readonly attributes directly; I'm not convinced you've found them all. And please document this in LangRef.

Do we need equivalent handling for the preallocated and (soon to be removed) inalloca attributes?

With inalloca/preallocated, the callee explicitly reads/modifies memory allocated in the caller. It isn't any different from a normal non-byval pointer parameter in this respect.

llvm/lib/IR/Instruction.cpp
542

In the callee, the parameter points to a distinct memory allocation that can be read/modified independently of the memory pointed by the caller. This is definitely visible to the program.

Yes, this is a messy edge case of LLVM IR; arguably, it isn't a good idea, and we should kill off byval in favor of explicitly writing out the allocation/copy in the caller (https://reviews.llvm.org/D74651).

Let me explain why I came to the conclusion this is the right choice, we can go from there:

In my mind, a function without memory accesses should be considered readnone. Even if memory
accesses are present, readnone is fine *iff* the accessed memory is local (alloca) (or constant,
e.g., via tbaa [this is what I've seen other parts do]).

I always assumed byval is a copy on the call edge. It does not happen "in the caller before the call"
or "in the callee at the beginning of the function" but "on the call edge" (similar to phi assignment on a CFG edge).
With that mental model (there is a copy made on the call edge) I went to design the access behavior of byval in the Attributor.
It will mark byval call site arguments *always* as readonly as they are copied. In the callee, the byval argument is refering
to the copy, thus considered "local memory", basically like any alloca. Accesses to the byval argument are not treated as argument
accesses but as local memory accesses.

Now I realized today that some tests we had for the two ideas above match the wrong thing nowadays.
We currently do derive readnone for byval call site arguments. That is in my mind a problem:

define void @empty(i8* byval %rn) readnone {
  ret void
}

define void @caller(i8* %arg) {      ; neither @caller nor %arg should be considered readnone
  call void @empty(i8* byval %arg) readnone
  ret void
}

My local fix for the Attributor was ugly as we filter "interesting" instructions early with mayReadOrWrite(). The two changes you
see in the patch are those ugly hacks. I thought if the logic I had in mind is sound mayReadOrWrite is the right place to check this.


Responses:

Kill of byval/make the copy explicit:

Fine with me, doesn't solve the current problem though.

Is byval part of the calller/do I change anything:

I don't think so. As far as I know, all attributes on an argument are also at the call site, and vice versa. So the attribute is
on both sides of the call edge and the question is what it means. As mentioned above, I think the copy is neither in the callee nor
in the caller. I would assume my interpretation is not really breaking with your that it happens in the callee, assuming it happens "before anything else".

jdoerfert added a comment.EditedMay 6 2020, 9:07 AM

EDIT: It seems Clang actually already performs the thing I describe below o.O.

One more thought:

I don't think it is preferable to say the copy happens in the callee because the user could reasonably write something like this: https://godbolt.org/z/8vxz7P and we would need to resolve the discrepancy between the pure attribute and the byval passing in Clang. Though, having it "conceptually" happen on the call edge doesn't mean we cannot implement it differently.

EDIT: It seems Clang actually already performs the thing I describe below o.O.

Yes, like I mentioned, we have a long history of making the current semantics work. :)

I think I can see that changing the semantics like this makes the overall model a little cleaner. But do we actually get any concrete benefit from it? I don't think this actually makes Attributor more powerful; it just changes the way we represent the end result a bit.

jdoerfert added a comment.EditedMay 6 2020, 10:36 AM

EDIT: It seems Clang actually already performs the thing I describe below o.O.

Yes, like I mentioned, we have a long history of making the current semantics work. :)

I think I can see that changing the semantics like this makes the overall model a little cleaner. But do we actually get any concrete benefit from it? I don't think this actually makes Attributor more powerful; it just changes the way we represent the end result a bit.

Not more powerful no (see EDIT below). It makes it more intuitive I'd say. For one, not only the Attributor but also other passes would right now assume readnone for an empty function. Clang might not but run my example with O1 (https://godbolt.org/z/CjxNhN) to see how we infer readnone. That said, it is clear there is inconsistency. I would propose to adapt the language reference first.
I still believe this patch (=this model) is best suited to get consistent semantics that work. We can continue the intuitive readnone deduction for empty functions but if a call with a byval argument is present it is not considered readnone for everyone using the Instruction::mayReadMemory interface. Basically, if you look at attributes yourself you have to look at all of them. The only alternative I can see is to avoid readnone for functions with byval arguments everywhere. I would assume that is more complicated and a burden on downstream users as well. Also, we don't do that right now anyway ;)

EDIT: If we would go with the byval prevents readnone route, we really need the verifier to check that.
EDIT: We might actually loose out if we prevent readnone in the presence of byval arguments if, the user provided the readnone/ony annotation, or it was generated by something like HTO (or maybe part of a LTO summary). Basically whenever we cannot rediscover the fact via IPO and not inline. So having readnone, meaning no accesses to non-local memory that could be observed by the caller, seems reasonable to me.

I don't think it is preferable to say the copy happens in the callee because the user could reasonably write something like this: https://godbolt.org/z/8vxz7P and we would need to resolve the discrepancy between the pure attribute and the byval passing in Clang.

I think that's a good point. It does introduce a discrepancy where the callsite attributes/properties and those of the callee. This isn't novel, I suppose, because we have that situation with operand bundles too. That all having been said, if we have a situation where a byval argument is never accessed, perhaps we really want the optimizer to just remove the byval attribute?

That all having been said, if we have a situation where a byval argument is never accessed, perhaps we really want the optimizer to just remove the byval attribute?

While we can and should do that, I would really like it to be representable without requiring transformtions.

I also fear there might be non-trivial corner cases just waiting to be explored.


I doubt the following is possible but maybe the idea becomes clear:

// optnone to simulate complicated arithmetic we do not fold (int2ptr, etc)
__attribute__((noinline,optnone))
static int make_the_compiler_complicated(struct S s, int o, int p) {
  return &s.arr[o] == &s.arr[p];
}
int caller(int o, int p) {
   int r = 0;
   struct S *s = new S;
   r = make_the_compiler_complicated(*s, o, p);
   delete s;
   return r;
}

We do not access s but it is not trival to remove the uses of s. Eventually we want code motion to move the geps and comparison out of the function but for now we cannot.
If we remove the byval we have a readnone function we can move after the delete, I think, which would cause poison in the result (due to the inbounds gep). Now that I wrote this I am concerned we already have this problem without byval :(

Now that I wrote this I am concerned we already have this problem without byval

If you do an inbounds GEP on a deallocated pointer, I think the "inbounds"-ness should apply to what the allocation looked like during its lifetime. (The only alternative is that gep effectively reads memory, and that's obviously a lot more problematic.)

perhaps we really want the optimizer to just remove the byval attribute?

In general, we want the optimizer to remove the byval attribute; it's terrible for optimization and code generation. But removing it changes the calling convention, so we can only do it in limited circumstances.

Now that I wrote this I am concerned we already have this problem without byval

If you do an inbounds GEP on a deallocated pointer, I think the "inbounds"-ness should apply to what the allocation looked like during its lifetime. (The only alternative is that gep effectively reads memory, and that's obviously a lot more problematic.)

We drift off here: The trick here is that there is no memory dependence but one on the implicit dereferenceable state which we need to obey as it could cause poison. It seems we are doing the right thing here even if readnone is present.


To the issue at hand, @efriedma what are your thoughts on https://reviews.llvm.org/D79454#2021902 and https://reviews.llvm.org/D79454#2023133 ? What we have now is arguably broken.

I think it's a fair argument that readnone/etc. should only apply to memory operations actually written in the callee, not the implicit byval memory operation (which is "on the call edge"). But I don't think the existing semantics fail in any important respect. The rules are self-consistent, straightforward to explain, and are reasonably convenient for optimizations. And really, any result we arrive at is going to feel sort of weird: outside of byval, there usually aren't any reads to user-visible memory between the call instruction and the first instruction of the callee, and byval is the one exception. I agree your proposed model is slightly better, but I'm not enthusiastic about changing this because the incentive to change it is weak.

(Also, I guess this makes mayReadFromMemory() slightly slower, but that's probably not going to lead to any visible difference.)

The other concern is that it's backward-compatible. I think we don't need to do anything special here; we were previously being conservative, and not adding readnone to calls with byval arguments, so old IR should just work, I think.

Please stage this in the correct order: first fix LangRef, then change existing optimizations to respect byval properly, then add your new optimization to Attributor.

EDIT: We might actually loose out if we prevent readnone in the presence of byval arguments if, the user provided the readnone/ony annotation

I think you can get similar results with argmemonly.