Page MenuHomePhabricator

An llvm.noalias intrinsic
AcceptedPublic

Authored by hfinkel on Apr 30 2015, 8:02 AM.

Details

Summary

This is the first in a series of patches to extend the scoped-noalias metadata implementation by providing an intrinsic for implicit alias scope assignment (in contrast to the alias.scope metadata which provides explicit scope assignment).

The general design for this was discussed at the end of last year (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-November/078786.html), and while there will need to be further extensions (to support dynamic predicates, at least), this should address two core motivations:

  1. Provide a way to retain dominance information which is necessary to answer loop-sensitive queries
  2. Make it practical to use scoped-noalias metadata to model block-level restrict-qualified pointers from Clang.

To summarize from the llvmdev e-mail, the problem underlying (1) is that, when using the existing metadata scheme, we cannot differentiate between these two after inlining:

void foo(float * restrict a, float * restrict b) {
  for (int i = 0; i < 1600; ++i)
    a[i] = b[i] + 1;
}

and:

void inner(float * restrict a, float * restrict b) {
  *a = *b + 1;
}
void foo(float * a, float * b) {
  for (int i = 0; i < 1600; ++i)
    inner(a+i, b+i);
}

and we currently cheat and assume the former to enable vectorization. This is not correct, however, and the underlying issue is that the current scheme cannot differentiate directly between:

  1. Does access A alias with access B, both in loop L, within a loop iteration
  2. Does access A alias with the memory accessed by B is any loop iteration

the existing metadata really only can answer (a) but the vectorizer needs (b). The existing noalias attribute (like the restrict-qualifier on which it is based) does provide the necessary information (because it applies to all derived pointers, and thus those used in every loop iteration), but we loose it when converting to the existing metadata scheme.

Regarding problem (2), while in theory block-level restrict-qualified pointers can be modeled using the existing metadata scheme, it suffers from a lack of dominance information (from problem (1)), and is also difficult to apply in practice. The core issue is that to determine whether or not a variable is derived from some particular restrict-qualified pointer, you need to determine which accesses might be derived from that pointer via some capture. This is much easier to do after optimization (mem2reg, funcattrs to add nocapture attributes, etc.) than in the frontend directly. In practice, we really want to delay this determination until later in the pipeline (and not try to do it in the frontend).

To address these concerns, we'll add a new intrinsic for implicit scope assignment. The scope, in the sense of modeling restrict-qualified pointers, represents the set of restrict-qualified pointers from which some access might derive. This can be explicitly represented with alias.scope metadata. Then the noalias metadata specified with which scopes the tagged access does not alias. To use the new intrinsic, llvm.noalias, you tag all potentially relevant accesses with noalias metadata and derive all pointers in a particular scope from the return value of the llvm.noalias intrinsic with that scope as an argument. The reason that all relevant accesses need to have noalias metadata, including those using pointers derived from the noalias intrinsic, is a compile-time optimization: we don't need to search for noalias intrinsics, a potentially expensive operation, if there is no noalias metadata tag, and thus, there should be no significant cost to the feature if you're not using it.

One issue (directly relevant in a later patch providing the implementation), is that the scope list provided as the metadata argument can have only one item. This is because we need the ability to search for other intrinsics with the same scope argument, and because metadata no longer have use lists, we can do this only by searching the use list of the MetadataAsValue wrapper (and, as a result, can only find complete matches, so we need to have trivial lists). This may be something we wish to address is some different way, because as it stands, we cannot combine adjacent noalias intrinsics.

To bring up a naming issue: I named the intrinsic llvm.noalias because it is like the noalias function argument. However, conceptually, is serves the same purpose as the alias.scope metadata (and works along with the noalias metadata, which is still needed), so maybe it should be named llvm.alias.scope?

Also, on the data dependence, while poses obvious difficulties on the optimizer (and may of the patches are to address this), a strict control dependence is not sufficient (so we can't make it more like the lifetime intrinsics which take a pointer argument). Specifically, we need to be able to represent this kind of construct:

void foo(T * restrict x, T * restrict y) {
  for (int i = 0; i < 1600; ++i)
    x[2*i+1] = y[2*i] + 1;
}

when inlined from the call foo(q, q).

Diff Detail

Event Timeline

hfinkel updated this revision to Diff 24708.Apr 30 2015, 8:02 AM
hfinkel retitled this revision from to An llvm.noalias intrinsic.
hfinkel updated this object.
hfinkel edited the test plan for this revision. (Show Details)
hfinkel added a subscriber: Unknown Object (MLST).
dberlin edited edge metadata.May 12 2015, 1:00 PM

This looks 100% reasonable to me otherwise ;)

docs/LangRef.rst
13834

This wording is a little odd.
I realize you are trying to say something about other pointers, but this intrinsic is really saying something about the accesses with the right metadata, not about their pointers.
(unless there's something i missed, and the fact that a pointer has the right noalias data in one access also implies something about that pointer when used in an access *not* in the right scope)

So i would say "llvm.noalias allows the assume that accesses using pointers derived from the return value don't alias with those accesses tagged with associated noalias scopes"
or something

You also need to very carefully define "derived".
Derived could mean a lot of things (IE if i ptrtoint it, add 6, and intoptr it, is it still derived?)

13854

Similar wording issue
You probably mean:
`llvm.noalias` allows the optimizer to assume that memory accesses using pointers derived from the
return value don't alias with memory accesses tagged with
`noalias` metadata with compatible scopes.

reames edited edge metadata.May 12 2015, 2:45 PM

After thinking about this a bit, I think this is a reasonable approach. I don't claim to be an expert on the semantics of restrict, but am willing to trust that Hal is. :)

I'd mildly prefer we generalize the intrinsic so that's not specific to it's current use case, but I'm completely willing to see that happen at a later date.

I've started going through the "simple" reviews in the following sequence. I'm not going to have lots of review bandwidth to spare, but I'll help where I can.

reames resigned from this revision.Oct 8 2015, 10:24 AM
reames removed a reviewer: reames.

Resigning as a reviewer to get a very stale review off my list of blocking tasks in phabricator. Please reopen when desired.

hfinkel updated this revision to Diff 63035.Jul 7 2016, 1:21 AM
hfinkel edited edge metadata.

Wording changes (and rebased)

hfinkel updated this revision to Diff 63036.Jul 7 2016, 1:29 AM

Rebase to the current ArgMemOnly spelling.

nhaehnle added inline comments.
docs/LangRef.rst
13842

Typo: The first argument *is* the pointer (I haven't really looked at anything, my email client just happened to open here and I started reading out of curiosity...)

hfinkel updated this revision to Diff 63064.Jul 7 2016, 6:38 AM

Fixed typo (pointed out by @nhaehnle)

hfinkel updated this revision to Diff 63432.Jul 10 2016, 11:50 AM

Added the 'Returned' property (from D22201).

Seems fine to me but an alias analysis expert needs to take a closer look.

So, the summary says:

Also, on the data dependence, while poses obvious difficulties on the optimizer (and may of the patches are to address this), a strict control dependence is not sufficient (so we can't make it more like the lifetime intrinsics which take a pointer argument). Specifically, we need to be able to represent this kind of construct:
"void foo(T * restrict x, T * restrict y) {

for (int i = 0; i < 1600; ++i)
  x[2*i+1] = y[2*i] + 1;

}"

Is this summary still correct?
If so, i want to understand this. I don't want to make you do the work, i just want to understand what the issue is.

The inlined version will still have noalias calls that will not be able to be hoisted past the pointer arguments they take, no?

Why won't that just work?

Is there any way you could just provide me a bunch of examples that show "what broken thing will happen"?

(I'd say we should whiteboard it, but that seems hard :P)

(I'm also wondering, offhand, if memoryssa's memory versioning does not already automatically provide the thing we want - something that says "this thing goes with this version of memory state, and you can't hoist it above that version of memory state". You'd take the memory version as the argument, and they would stay where they should - below that memory version)

In particular, note that recent issues with llvm.assume are precisely because it is defined as a construct with control flow scope, but is not itself part of control flow.

If instead, llvm.assume's semantics were like noalias, where it was "this intrinsic takes a value as a first argument, a condition as a second argument, and returns a value. Any use of the value it returns may assume the condition is true", it would not matter whether llvm.assume calls got hoisted or not, even all the way to right after the argument, the values being used would still have correct semantics.

(and the only way something could validly replace a non-assume value with an assume value is if it can prove the condition holds).

Of course, if the *goal* is to be able to have some scope in llvm that something holds, you need real control flow, and you just can't avoid this, and we shouldn't avoid it :P

The only solution i can see for that, and to actually have them try to define "scopes", is to either take the control flow as inputs to the lifetime markers, or attach it to existing control flow (IE lifetime start markers, assume, etc, can be properly thought of as attributes of either the top/bottom of a basic block, or an edge, depending how granular you want to be).

(These types of control flow intrinsics are much like having an analysis pass where you cheat because you don't want everything to have to do the work to preserve what you meant. lifetime start/end are precisely like having dominator tree markers in the IR)

So, the summary says:

Also, on the data dependence, while poses obvious difficulties on the optimizer (and may of the patches are to address this), a strict control dependence is not sufficient (so we can't make it more like the lifetime intrinsics which take a pointer argument). Specifically, we need to be able to represent this kind of construct:
"void foo(T * restrict x, T * restrict y) {

for (int i = 0; i < 1600; ++i)
  x[2*i+1] = y[2*i] + 1;

}"

Is this summary still correct?
If so, i want to understand this. I don't want to make you do the work, i just want to understand what the issue is.

Certainly. The summary says:

Also, on the data dependence, while poses obvious difficulties on the optimizer (and may of the patches are to address this), a strict control dependence is not sufficient (so we can't make it more like the lifetime intrinsics which take a pointer argument). > Specifically, we need to be able to represent this kind of construct:

void foo(T * restrict x, T * restrict y) {
  for (int i = 0; i < 1600; ++i)
    x[2*i+1] = y[2*i] + 1;
}

when inlined from the call foo(q, q).

and you didn't quote that last line, so I wonder if you missed it. So the problem here is that, if we have foo(q,q), and I had some intrinsic that only had a control dependence, like our current @llvm.noalias but without the return value, then we'd end up with:

call @llvm.noalias(%q, !scope)
call @llvm.noalias(%q, !scope)
%p1 = gep %q, ... (2*i)
%p2 = gep %q, ... (2*i+1)
%v = load %p1, !scope
store %v, %p2, !scope

In this example the indexing is simple and we'd get the (lack of) aliasing anyway, but in the general case the problem is that we lose which pointer values are derived from one restrict pointer and which are derived from the other once we inline. That's why I have @llvm.noalias return a value.

In this example the indexing is simple and we'd get the (lack of) aliasing anyway, but in the general case the problem is that we lose which pointer values are derived from one restrict pointer and which are derived from the other once we inline. That's why I have @llvm.noalias return a value.

Sorry, i misunderstood.
I thought you were saying that llvm.noalias *still* has this problem, in the current implementation, and, as i think i pointed out, it should not, *because* it returns a value. It seems like we are in violent agreement.

But that leads me to the other question:
What control dependence does it have that requires it be marked as argmemonly and not "readnone"?
I don't see any tests attached where it would change the semantics if they were hoisted to wherever, and i don't see how they could be DSE'd/DCE'd if the pointers themselves are not dead :)

So what am i missing?

In this example the indexing is simple and we'd get the (lack of) aliasing anyway, but in the general case the problem is that we lose which pointer values are derived from one restrict pointer and which are derived from the other once we inline. That's why I have @llvm.noalias return a value.

Sorry, i misunderstood.
I thought you were saying that llvm.noalias *still* has this problem, in the current implementation, and, as i think i pointed out, it should not, *because* it returns a value. It seems like we are in violent agreement.

Looks that way.

But that leads me to the other question:
What control dependence does it have that requires it be marked as argmemonly and not "readnone"?
I don't see any tests attached where it would change the semantics if they were hoisted to wherever, and i don't see how they could be DSE'd/DCE'd if the pointers themselves are not dead :)

So what am i missing?

There are no tests because I can't write them yet ;) -- But this certainly a good point, and definitely a good thing to discuss. This goes back to an earlier part of the description (which I've edited a bit to make the point more explicit):

To summarize from the llvmdev e-mail, the problem underlying (1) is that, when using the existing metadata scheme, we cannot differentiate between these two after inlining:

void foo(float * restrict a, float * restrict b) {
  for (int i = 0; i < 1600; ++i)
    a[i] = b[i] + 1;
}

and:

void inner(float * restrict a, float * restrict b, int i) {
  a[i] = b[i] + 1;
}
void foo(float * a, float * b) {
  for (int i = 0; i < 1600; ++i)
    inner(a, b);
}

and we currently cheat and assume the former to enable vectorization. This is not correct, however, and the underlying issue is that the current scheme cannot differentiate directly between:

  1. Does access A alias with access B, both in loop L, within a loop iteration
  2. Does access A alias with the memory accessed by B is any loop iteration

After inlining, the only difference here is that, the two noalias intrinsics:

%... = @llvm.noalias(%a, !scope)
%... = @llvm.noalias(%b, !scope)

will, in the first case, be outside the loop and, in the second case, be inside the loop. There's no other difference, but I need to be able to distinguish them in order to correctly answer the queries the vectorizer wants to make. The vectorizer needs to be able to do alias(a[i], b[i], L), where L is the loop, and the associated MemLocs have sizes that are infinite (or corresponding somehow to the entire range of indices), instead of just referring to the individual accesses. If the intrinsics can be hosted out of loops, then this won't work.

I can't write a test for this yet because we have no such interface yet. Proposing that is what I plan for the next step. Maybe we'll think of a better way, and we can make the intrinsics readnone.

hfinkel updated this revision to Diff 73346.Oct 3 2016, 2:51 PM

Rebased

hfinkel accepted this revision.Oct 12 2016, 2:45 PM
hfinkel added a reviewer: hfinkel.

Marking this as accepted, based on okay by @dberlin (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160815/382593.html), under the condition that the intrinsic be named llvm.noalias.scope (I'll rebase everything for the rename).

This revision is now accepted and ready to land.Oct 12 2016, 2:45 PM
nhaehnle added inline comments.Sep 29 2017, 8:01 AM
docs/LangRef.rst
13883

-from

13894

-from

marksl added a subscriber: marksl.Jul 31 2018, 11:11 AM
troyj added a subscriber: troyj.Aug 27 2018, 8:52 AM

Hi Hal,

I have a question about this: as an 'llvm.noalias' intrinsic is introduced for every 'assignment to a restrict pointer', it is not possible to see a difference between 'foo' and 'bar':

void foo(int* pA, int* pC, int n) {
  int* __restrict rp;

  for (int i=0; i<n; ++i) {
    rp=pA+i;
    *rp=42;
    pC[i]=99;
  }
}

void bar(int* pA, int* pC, int n) {
  for (int i=0; i<n; ++i) {
    int* __restrict rp;
 
    rp=pA+i;
    *rp=42;
    pC[i]=99;
  }
}

The problem occurs when unrolling the loop: for 'bar', the metadata must be cloned, as the restrict scope starts and ends with the loop body. For 'foo', the restrict scope resides outside the loop body and the metadata must not be cloned when unrolling.
In the more complex case, the metadata must be partially cloned (= the part that covers restrict scopes introduced inside the loop body). Any idea on an approach for differentiating between those two cases ?

lzutao added a subscriber: lzutao.Mar 26 2019, 10:04 AM
TijmenW added a subscriber: TijmenW.