Page MenuHomePhabricator

Introduce llvm.noalias.decl intrinsic
ClosedPublic

Authored by jeroen.dobbelaere on Dec 10 2020, 8:25 AM.

Details

Summary

The `llvm.experimental.noalias.scope.decl` intrinsic identifies where a noalias
scope is declared. When the intrinsic is duplicated, a decision must
also be made about the scope: depending on the reason of the duplication,
the scope might need to be duplicated as well.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
llvm/docs/LangRef.rst
19619

The initial recommendation during the AA tech call was to already add all the arguments even if they are not needed yet. I guess that omitting them for now could work. Do you know the exact policy for backwards compatibility ? (released vs unreleased)

19628

At this level, we can indeed omit references to restrict. For the larger explanation (full restrict), imho, it makes sense to use restrict as an example for explaining all the logic.

For the intrinsic: it is not a 'start of a scope'. The intrinsic indicates the location (in the code) where the scope was declared. Instructions metadata are allowed to refer to that scope, even if the intrinsic is not dominating that instruction. Optimizations are allowed to move instructions that refer to this scope over the intrinsic.

19651

Yes (to the second): one of the ideas is that a future optimization might combine llvm.noalias.decl into a single instruction, referring to the union of the scopes.

llvm/include/llvm/IR/Intrinsics.td
558

That might also work.

While the concept of the intrinsic is intuitive, I think we're missing something here in terms of how it is formalized. I have a bit of a hard time formulating the rules.

The rules that I have in mind have more freedom. They are more like:

  • @llvm.noalias.decl is something like a barrier that indicates where a noalias scope originates
  • It should normally not be moved across block boundaries. (This is to avoid a declaration crossing a loop boundary)
  • Moving it inside a block should pose no problems.
  • When it is duplicated, care must be taken to see what to do with the noalias scope.
    • cases like loop unrolling and loop rotation: scope must be duplicated
    • cases like loop unswitching: no need to duplicate the scope
    • it should also not block optimizing away an 'almost empty loop' (in that case, the empty loop should be replaced by the @llvm.noalias.decl)
  • instructions that have !alias.scope or !noalias metadata referring to the declared scope can be moved freely across the intrinsic.
  • A @llvm.noalias.decl can be omitted if there is no !alias.scope metadata referring to the scope.
    • An !alias.scope usage, corresponds to a @llvm.noalias/@llvm.ptr_provenance.noalias usage in the full restrict version. When there are no more dependencies, the @llvm.noalias.decl intrinsic can be omitted.
  • A missing @llvm.noalias.decl inside a function can lead to undefined behavior after inlining (and loop unrolling).
  • Multiple @llvm.noalias.decl intrinsics in the same block, referring to the same scope can be merged.
  • Multiple @llvm.noalias.decl intrinsics in the same block, referring to a different scopes can also be merged (but I would not yet do this).

if there is no @llvm.noalias.decl` call for a certain scope contained in the function, the behavior is as if such a call were located at the start of the function.`

Not sure if we need to enforce this. For backwards compatibility it might be convenient (and expensive to check) to do this during inlining. But I would rather not do it there.

nikic added a comment.Tue, Jan 5, 9:34 AM

While the concept of the intrinsic is intuitive, I think we're missing something here in terms of how it is formalized. I have a bit of a hard time formulating the rules.

The rules that I have in mind have more freedom. They are more like:

  • @llvm.noalias.decl is something like a barrier that indicates where a noalias scope originates
  • It should normally not be moved across block boundaries. (This is to avoid a declaration crossing a loop boundary)
  • Moving it inside a block should pose no problems.
  • When it is duplicated, care must be taken to see what to do with the noalias scope.
    • cases like loop unrolling and loop rotation: scope must be duplicated
    • cases like loop unswitching: no need to duplicate the scope
    • it should also not block optimizing away an 'almost empty loop' (in that case, the empty loop should be replaced by the @llvm.noalias.decl)
  • instructions that have !alias.scope or !noalias metadata referring to the declared scope can be moved freely across the intrinsic.
  • A @llvm.noalias.decl can be omitted if there is no !alias.scope metadata referring to the scope.
    • An !alias.scope usage, corresponds to a @llvm.noalias/@llvm.ptr_provenance.noalias usage in the full restrict version. When there are no more dependencies, the @llvm.noalias.decl intrinsic can be omitted.
  • A missing @llvm.noalias.decl inside a function can lead to undefined behavior after inlining (and loop unrolling).
  • Multiple @llvm.noalias.decl intrinsics in the same block, referring to the same scope can be merged.
  • Multiple @llvm.noalias.decl intrinsics in the same block, referring to a different scopes can also be merged (but I would not yet do this).

I think it is generally beneficial to make the rules stricter rather than laxer. The reason is that we can implement checks in the IR Verifier and then automatically find optimizations that break the invariants. For example, if we have a rule that says:

If a noalias.decl for !scope is dominated by another noalias.decl for !scope, then the IR is malformed.

Then, performing loop unrolling without proper handling of alias scopes, will automatically result in a verifier error. Without the rule the IR remains valid, just incorrect. Similarly, a rule that says:

If a noalias.decl for !scope exists in the function, then all uses of !scope must be dominated by the noalias.decl.

Then this will also automatically detect certain incorrect code motion optimizations. (I'm less sure about this one, but I think we generally already drop noalias metadata when moving loads upwards.)

Does this make some sense?

llvm/docs/LangRef.rst
19619

Not sure on policy for unreleased versions, but it should be simple enough to autoupgrade it in either case.

jdoerfert added inline comments.Tue, Jan 5, 10:05 AM
llvm/docs/LangRef.rst
19659

Maybe:
_ exact location
+ scope


I don't like the loop sentence. Isn't the real problem that we need to duplicate + uniquify whenever we "duplicate" a noalias decl? So for inlining it is the same as for loop unrolling, isn't it? Loop unrolling can be mentioned afterwards as an example.

llvm/include/llvm/IR/Intrinsics.td
558

I don't think it should be DCEd if used. NoMem seems logical to me.

penzn added a subscriber: penzn.Tue, Jan 5, 10:09 AM
uenoku added a subscriber: uenoku.Tue, Jan 5, 4:19 PM
jeroen.dobbelaere edited the summary of this revision. (Show Details)

As discussed during the LLVM AA Tech Call of 2021/1/5:

  • switched to the minimal version with '@llvm.experimental.noalias.scope.decl', only using a single argument.

Note: verifier changes are not yet included. I first wanted to provide an update with the other changes before trying to formalize the rule checking in the verifier.

jeroen.dobbelaere marked 5 inline comments as done.Thu, Jan 7, 7:31 AM
jeroen.dobbelaere added inline comments.
llvm/include/llvm/IR/Intrinsics.td
558

NoMem would allow to optimize it away. IntrInaccessibleMemOnly seems the closest fit today.

nikic added a comment.Thu, Jan 7, 1:22 PM

Just some minor notes.

llvm/docs/LangRef.rst
19617

Maybe name it !id.scope.list to make it clear that it's not a single scope?

19640

", also a decision must be made about the scope" -> ", a decision must also be made about the scope"

19644

associate -> associated

19665

The %a here should be %b.

llvm/include/llvm/IR/Intrinsics.td
550

At the technical call @jdoerfert also mentioned the possibility of marking it noduplicate, but after looking at some of the usages of cannotDuplicate() I'm pretty sure this will cause more issues than it will solve.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6429

Spurious change.

llvm/lib/IR/IRBuilder.cpp
461

I believe you can just pass {Scope} here and drop the SmallVector.

jeroen.dobbelaere updated this revision to Diff 315335.EditedFri, Jan 8, 3:35 AM
jeroen.dobbelaere marked 3 inline comments as done.
  • Minor changes as suggested by Nikita
  • Added verifier part.
NOTE: this will now show that there is an issue with looprotate (which was in fact expected, as it has a similar issue as loopunroll). I'll see if I can backport that fix from the Full Restrict version as wel.
jeroen.dobbelaere marked 3 inline comments as done.Fri, Jan 8, 3:39 AM
jeroen.dobbelaere added a comment.EditedFri, Jan 8, 7:06 AM

I added D94306 for supporting loop rotation. The test-suite still triggers some verifier crashes due to one declaration dominating another one. It seems that the SLP Vectorizer is the culprit.

Another observation/question I have is about possible optimizations for omitting the @llvm.experimental.noalias.scope.decl: In the full restrict patches, the declarations are thrown away when there are no users any more. The equivalent with the implementation here, is that we can throw away the declaration if the scope it declares is not used in any !alias.scope metadata. Would this be something that is easy (and fast) to check ?

nikic added a comment.Fri, Jan 8, 9:23 AM

Another observation/question I have is about possible optimizations for omitting the @llvm.experimental.noalias.scope.decl: In the full restrict patches, the declarations are thrown away when there are no users any more. The equivalent with the implementation here, is that we can throw away the declaration if the scope it declares is not used in any !alias.scope metadata. Would this be something that is easy (and fast) to check ?

As far as I can tell metadata does not have use lists, so I don't think this can be done without a full scan of the function to collect used scopes.

llvm/docs/LangRef.rst
19624

Same as below "also a decision must be made" -> "a decision must also be made"

19636

The domination rule enforced by the verifier should be mentioned here as well.

llvm/lib/IR/Verifier.cpp
5540

const auto *?

5553

It might make sense to temporarily put the domination check behind a default disabled cl::opt. Unless we want to land everything at the same time (which I wouldn't recommend) there will be an intermediate time where the verifier check will not succeed.

5557

auto *

5566

llvm::sort(NoAliasScopeDecls, Compare)

5593

Can you please also add a test for this in llvm/test/Verifier?

Another observation/question I have is about possible optimizations for omitting the @llvm.experimental.noalias.scope.decl: In the full restrict patches, the declarations are thrown away when there are no users any more. The equivalent with the implementation here, is that we can throw away the declaration if the scope it declares is not used in any !alias.scope metadata. Would this be something that is easy (and fast) to check ?

As far as I can tell metadata does not have use lists, so I don't think this can be done without a full scan of the function to collect used scopes.

Any optimization pass where we could piggy back on to do this ? Or should something like this be a separate pass ? We would need to trace all !noalias/!alias.scope usages in a function and combine this data with the llvm.experimental.noalias.scope.decl calls.

  • Added verifier testcase
  • put domination check behind an option
jeroen.dobbelaere marked 8 inline comments as done.Sat, Jan 9, 9:52 AM

Thanks for the reviews @nikic. I hope I didn't miss anything.

nikic accepted this revision.Sat, Jan 9, 1:22 PM

LGTM, but please wait for @jdoerfert to check this as well.

llvm/docs/LangRef.rst
19624

nit: reason of -> reason for

This revision is now accepted and ready to land.Sat, Jan 9, 1:22 PM
nikic added a comment.Mon, Jan 11, 2:40 PM

@jdoerfert It would be great if you could look over this as well.

jdoerfert accepted this revision.Mon, Jan 11, 3:19 PM

A few minor wording suggestions and comments, nothing that would require another round of review. Adopt my suggestions as you see fit, add TODOs where it makes sense and something should be investigated in the future. LGTM

llvm/docs/LangRef.rst
19624–19625
19641

Same wording as above.

19643–19646
19650
19678–19680
19681

Nit: Version or version, not both.

llvm/include/llvm/IR/Intrinsics.td
550

Interesting, at some point I'd like to hear more. Let's go with IntrInaccessibleMemOnly for now but we should be on the lookout for better solutions here. This is not the only intrinsic and the less restrictive our annotations are the better.

llvm/lib/IR/Verifier.cpp
5571

If this sorts based on pointer values it is non-deterministic. While potentially OK in the verifier, I'd recommend a TODO. We should sort by "metadata ids" (!0, !1, ...) or something that is deterministic.

5597

FWIW, I would set this up the other way around, maybe worth considering but not strictly necessary:

  1. Collect all scopes from the intrinsics in a map<Scope, SmallVector<Instruction>>
  2. For each scope, take the last and all other instructions in the vector and check dominance.
  3. If there is more than one, drop the last instruction and go to 2.

A few minor wording suggestions and comments, nothing that would require another round of review. Adopt my suggestions as you see fit, add TODOs where it makes sense and something should be investigated in the future. LGTM

Thanks for the feedback. I plan to do the small updates today and commit it so that we have a view on possible (unexpected) fallout.
@jdoerfert, @nikic, can you also spend some time on the other open reviews on the stack ?

llvm/lib/IR/Verifier.cpp
5571

Yes that's true. In the 'good' case, this will not make a difference of course. I'll add the TODO.

5597

My initial idea was to use a map of vectors as you suggest, but I settled on doing a sort and avoiding a bunch of allocations. I am going to keep this for now. Having a nested for-loop does not make a big difference wrt complexity as we need to check the dominance for A vs B and for B vs A.

jeroen.dobbelaere edited the summary of this revision. (Show Details)

Adapted to comments.

This revision was automatically updated to reflect the committed changes.

Forgot to send this nits, feel free to add in a follow up.

llvm/lib/IR/Verifier.cpp
5571

+1, please let's avoid non-deterministic behavior, follow-up works.

Potentially also store and reuse the list of scopes from the above for loop.

5586

s/[/]

llvm/lib/IR/Verifier.cpp
5586

ItCurrent inclusive, ItNext exclusive.
Or [ ItCurrent, ItNext-1 ].

asbirlea added inline comments.Fri, Jan 22, 10:22 AM
llvm/lib/IR/Verifier.cpp
5586

Thank you for the clarification, I realized the interval from the code, but then didn't connect it to the comment notation.
This format is new to me. I was taught the mathematical notation is [ItCurrent, ItNext) in this case, or, as you said, [ItCurrent, ItNext-1].

nikic added inline comments.Sun, Jan 24, 7:29 AM
llvm/lib/IR/Verifier.cpp
121

Can we flip this flag now, or are you aware of any further issues?

Will you do the flip or shall I do it ?

llvm/lib/IR/Verifier.cpp
121

I just did a test-suite run and a stage2 release build using an assertion build with the flag flipped, and both went fine. IMHO it makes sense to try it out...

fhahn added a subscriber: fhahn.Sun, Jan 24, 9:06 AM
fhahn added inline comments.
llvm/lib/IR/Verifier.cpp
121

What's the impact on compile-time? If it is potentially expensive, it should probably be true for EXPENSIVE_CHECKS builds.

nikic added inline comments.Sun, Jan 24, 9:10 AM
llvm/lib/IR/Verifier.cpp
121

I see no reason for it to be expensive at least.

llvm/lib/IR/Verifier.cpp
5586

@asbirlea Both are valid it seems https://en.wikipedia.org/wiki/Interval_(mathematics)#Including_or_excluding_endpoints
and https://en.wikipedia.org/wiki/ISO_31-11
I (and my children) learned the notation using only '[' and ']' ;)

xbolva00 added inline comments.
llvm/lib/IR/Verifier.cpp
5586

I would prefer more common notation too. It may look like a typo, and somebody in the future may “fix” it to ].

llvm/lib/IR/Verifier.cpp
121

@fhahn, @nikic, It's O(N log(N)) (sorting of N = number of scope declarations)
Next to that, it is roughly k * O((M-1)^2) * O(dominance checking) with M the the number of identical scope declarations, and k the number if different scopes.
In practice M should be pretty low (mostly 1) and O(N log(N)) should be the main part.

Is that considered to be too expensive ?

5586

@xbolva00 That probably depends on what you learned and might go in both directions. Do we have an llvm guidance on what notation to use ? I have no problem with using the '[' ')' style, but it might make sense that it is documented as the preferred style for llvm ?

fhahn added inline comments.Sun, Jan 24, 9:35 AM
llvm/lib/IR/Verifier.cpp
121

IIUC we need to potentially check whether any llvm.experimental.noalias.scope.decl dominates any other llvm.experimental.noalias.scope.decl?

Is it possible to have a function with N basic blocks which do not dominate each other, each with N llvm.experimental.noalias.scope.decl for the same scope?

llvm/lib/IR/Verifier.cpp
121

Is it possible to have a function with N basic blocks which do not dominate each other, each with N llvm.experimental.noalias.scope.decl for the same scope?

It is allowed to construct such llvm-ir manually.
I am not aware on how to trigger this by an optimization pass, just based on declarations introduced by inlining.
My understanding is that most optimizations will not duplicate the intrinsic. The extra check is there to catch those optimizations that do and do it wrongly.

fhahn added inline comments.Sun, Jan 24, 10:01 AM
llvm/lib/IR/Verifier.cpp
121

In practice M should be pretty low (mostly 1) and O(N log(N)) should be the main part.

I am not worried really worried about about the 'common' case here, it is unlikely to be noticeable in common C/C++ programs. But if there isn't anything preventing the worst case, (~M in the above being roughly the number of instructions in a function, excluding branches), then I think it would be better to have this as expensive check.

This might be overly cautious for 'sane' code, but unfortunately there are plenty of code-generators that like to generate highly unusual IR. For example, we surprisingly often get reports of builds taking 24+ hours, because someone tries to compile a file with 500k stores in a single basic block without running any IR store optimizations, which one could argue is not 'common'.

It is allowed to construct such llvm-ir manually.
I am not aware on how to trigger this by an optimization pass, just based on declarations introduced by inlining.

Unfortunately LLVM IR & the verifier is used in many contexts, frontends are free to generate whatever IR they want and also are free to use whatever combination of optimizations (or no optimizations at all). Personally I am weary when it comes to assuming IR passed to LLVM is 'reasonable'.

llvm/lib/IR/Verifier.cpp
121

@fhahn Would D95335 be acceptable ?