The `llvm.experimental.noalias.scope.decl` intrinsic identifies where a noalias
scope is declared. When the intrinsic is duplicated, a decision must
also be made about the scope: depending on the reason of the duplication,
the scope might need to be duplicated as well.
Details
- Reviewers
nikic jdoerfert asbirlea - Commits
- rG668827b64856: Introduce llvm.noalias.decl intrinsic
Diff Detail
Event Timeline
llvm/docs/LangRef.rst | ||
---|---|---|
19591 | The p.alloca and p.objId arguments don't make a lot of sense without the remainder of the full restrict patch. The objId refers to a notion that doesn't exist in current LLVM. I think it would be better to make the intrinsics just declare void @llvm.noalias.decl(metadata !p.scope) to start with. Adding additional arguments via AutoUpgrade should be straightfoward down the line. | |
19600 | We should probably avoid using "restrict" terminology on the LLVM IR level, as restrict is a C/C++ specific concept. Rather than "restrict scope" I would call this a "noalias scope". On that note, I wonder whether calling this intrinsic @llvm.noalias.scope.start rather than @llvm.noalias.decl might make sense. (The name mirrors llvm.lifetime.start and llvm.invariant.start -- but then again, those also have an "end", so maybe the parallel does not make sense.) | |
19623 | If only a single scope can be listed, might it make sense to directly point to the scope, rather than a list with a single scope? Or is the idea here that this might be extended to declare multiple scopes at the same time in the future? | |
19631 | I think adding an example for this intrinsic would be helpful. I'm thinking something along these lines: ; This examples shows two possible positions for noalias.decl: ; If it is outside the loop (version 1), then %a and %b are noalias across *all* iterations. ; If it is inside the loop (version 2), then %a and %b are noalias only within *one* iteration. declare void @decl_in_loop(i8* %a.base, i8* %b.base) { entry: ; call void @llvm.noalias.decl(metadata !2) ; Version 1: noalias.decl outside loop br label %loop loop: %a = phi i8* [ %a.base, %entry ], [ %a.inc, %loop ] %b = phi i8* [ %b.base, %entry ], [ %b.inc, %loop ] ; call void @llvm.noalias.decl(metadata !2) ; Version 2: noalias.decl inside loop %val = load i8, i8* %a, !alias.scope !2 store i8 %val, i8* %b, !noalias !2 %a.inc = getelementptr inbounds i8, i8* %a, i64 1 %b.inc = getelementptr inbounds i8, i8* %a, i64 1 %cond = call i1 @cond() br i1 %cond, label %loop, label %exit exit: ret void } !0 = !{!0} ; domain !1 = !{!1, !0} ; scope !2 = !{!1} ; scope list | |
llvm/include/llvm/IR/Intrinsics.td | ||
558 | The ArgMemOnly here seems a bit dubious. This means it can read/write the p.alloca argument, which I assume is not intended (even if it's unused now). I guess we can't make this NoMem, because then it could simply be DCEd right? Maybe it should be InaccessibleMemOnly? |
While the concept of the intrinsic is intuitive, I think we're missing something here in terms of how it is formalized. I have a bit of a hard time formulating the rules.
Let's consider this code:
call @llvm.noalias.decl(!metadata !2) load i8, i8* %a, !alias.scope !2 call @llvm.noalias.decl(!metadata !2) store i8 0, i8* %a, !noalias !2
Something along the lines you could get from a naive unrolling. Now, clean semantics would be to say that noalias.decl acts as a barrier, and !2 before noalias.decl and !2 after it refer to separate scopes. But we can't do that from a technical perspective.
So I guess instead, we say that the above code invoked undefined behavior.
However, is this undefined behavior as well?
br i1 %c, label %if, label %else if: call @llvm.noalias.decl(!metadata !2) load i8, i8* %a, !alias.scope !2 br label %end else: call @llvm.noalias.decl(!metadata !2) store i8 0, i8* %a, !noalias !2 br label %end end: ret void }
I would say "no", this code is fine. So the rule would be:
If an @llvm.noalias.decl intrinsic for a certain scope is dominated by another @llvm.noalias.decl intrinsic for the same scope, the behavior is undefined.
Loop unrolling etc then need to perform the renaming to avoid that UB.
Next, what about this code?
load i8, i8* %a, !alias.scope !2 ; before noalias.decl call @llvm.noalias.decl(!metadata !2) store i8 0, i8* %a, !noalias !2 // and this if: call @llvm.noalias.decl(!metadata !2) br label %end else: br label %end end: store i8 0, i8* %a, !noalias !2 ; after noalias.decl, but not dominated ret void
I would assume this is also UB:
Any usage of a certain scope must be dominated by a @llvm.noalias.decl intrinsic for that scope, otherwise the behavior is undefined. As an exception, if there is no @llvm.noalias.decl call for a certain scope contained in the function, the behavior is as if such a call were located at the start of the function.
The exception here is a bit ugly. Without it existing IR would have to be auto-upgraded by inserting @llvm.noalias.decl for any scopes that are used.
llvm/docs/LangRef.rst | ||
---|---|---|
19591 | The initial recommendation during the AA tech call was to already add all the arguments even if they are not needed yet. I guess that omitting them for now could work. Do you know the exact policy for backwards compatibility ? (released vs unreleased) | |
19600 | At this level, we can indeed omit references to restrict. For the larger explanation (full restrict), imho, it makes sense to use restrict as an example for explaining all the logic. For the intrinsic: it is not a 'start of a scope'. The intrinsic indicates the location (in the code) where the scope was declared. Instructions metadata are allowed to refer to that scope, even if the intrinsic is not dominating that instruction. Optimizations are allowed to move instructions that refer to this scope over the intrinsic. |
The rules that I have in mind have more freedom. They are more like:
- @llvm.noalias.decl is something like a barrier that indicates where a noalias scope originates
- It should normally not be moved across block boundaries. (This is to avoid a declaration crossing a loop boundary)
- Moving it inside a block should pose no problems.
- When it is duplicated, care must be taken to see what to do with the noalias scope.
- cases like loop unrolling and loop rotation: scope must be duplicated
- cases like loop unswitching: no need to duplicate the scope
- it should also not block optimizing away an 'almost empty loop' (in that case, the empty loop should be replaced by the @llvm.noalias.decl)
- instructions that have !alias.scope or !noalias metadata referring to the declared scope can be moved freely across the intrinsic.
- A @llvm.noalias.decl can be omitted if there is no !alias.scope metadata referring to the scope.
- An !alias.scope usage, corresponds to a @llvm.noalias/@llvm.ptr_provenance.noalias usage in the full restrict version. When there are no more dependencies, the @llvm.noalias.decl intrinsic can be omitted.
- A missing @llvm.noalias.decl inside a function can lead to undefined behavior after inlining (and loop unrolling).
- Multiple @llvm.noalias.decl intrinsics in the same block, referring to the same scope can be merged.
- Multiple @llvm.noalias.decl intrinsics in the same block, referring to a different scopes can also be merged (but I would not yet do this).
if there is no @llvm.noalias.decl` call for a certain scope contained in the function, the behavior is as if such a call were located at the start of the function.`
Not sure if we need to enforce this. For backwards compatibility it might be convenient (and expensive to check) to do this during inlining. But I would rather not do it there.
I think it is generally beneficial to make the rules stricter rather than laxer. The reason is that we can implement checks in the IR Verifier and then automatically find optimizations that break the invariants. For example, if we have a rule that says:
If a noalias.decl for !scope is dominated by another noalias.decl for !scope, then the IR is malformed.
Then, performing loop unrolling without proper handling of alias scopes, will automatically result in a verifier error. Without the rule the IR remains valid, just incorrect. Similarly, a rule that says:
If a noalias.decl for !scope exists in the function, then all uses of !scope must be dominated by the noalias.decl.
Then this will also automatically detect certain incorrect code motion optimizations. (I'm less sure about this one, but I think we generally already drop noalias metadata when moving loads upwards.)
Does this make some sense?
llvm/docs/LangRef.rst | ||
---|---|---|
19591 | Not sure on policy for unreleased versions, but it should be simple enough to autoupgrade it in either case. |
llvm/docs/LangRef.rst | ||
---|---|---|
19631 | Maybe: I don't like the loop sentence. Isn't the real problem that we need to duplicate + uniquify whenever we "duplicate" a noalias decl? So for inlining it is the same as for loop unrolling, isn't it? Loop unrolling can be mentioned afterwards as an example. | |
llvm/include/llvm/IR/Intrinsics.td | ||
558 | I don't think it should be DCEd if used. NoMem seems logical to me. |
As discussed during the LLVM AA Tech Call of 2021/1/5:
- switched to the minimal version with '@llvm.experimental.noalias.scope.decl', only using a single argument.
Note: verifier changes are not yet included. I first wanted to provide an update with the other changes before trying to formalize the rule checking in the verifier.
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
558 | NoMem would allow to optimize it away. IntrInaccessibleMemOnly seems the closest fit today. |
Just some minor notes.
llvm/docs/LangRef.rst | ||
---|---|---|
19589 | Maybe name it !id.scope.list to make it clear that it's not a single scope? | |
19612 | ", also a decision must be made about the scope" -> ", a decision must also be made about the scope" | |
19616 | associate -> associated | |
19637 | The %a here should be %b. | |
llvm/include/llvm/IR/Intrinsics.td | ||
550 | At the technical call @jdoerfert also mentioned the possibility of marking it noduplicate, but after looking at some of the usages of cannotDuplicate() I'm pretty sure this will cause more issues than it will solve. | |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
6407 | Spurious change. | |
llvm/lib/IR/IRBuilder.cpp | ||
461 | I believe you can just pass {Scope} here and drop the SmallVector. |
- Minor changes as suggested by Nikita
- Added verifier part.
I added D94306 for supporting loop rotation. The test-suite still triggers some verifier crashes due to one declaration dominating another one. It seems that the SLP Vectorizer is the culprit.
Another observation/question I have is about possible optimizations for omitting the @llvm.experimental.noalias.scope.decl: In the full restrict patches, the declarations are thrown away when there are no users any more. The equivalent with the implementation here, is that we can throw away the declaration if the scope it declares is not used in any !alias.scope metadata. Would this be something that is easy (and fast) to check ?
As far as I can tell metadata does not have use lists, so I don't think this can be done without a full scan of the function to collect used scopes.
llvm/docs/LangRef.rst | ||
---|---|---|
19596 | Same as below "also a decision must be made" -> "a decision must also be made" | |
19608 | The domination rule enforced by the verifier should be mentioned here as well. | |
llvm/lib/IR/Verifier.cpp | ||
5540 | const auto *? | |
5553 | It might make sense to temporarily put the domination check behind a default disabled cl::opt. Unless we want to land everything at the same time (which I wouldn't recommend) there will be an intermediate time where the verifier check will not succeed. | |
5557 | auto * | |
5566 | llvm::sort(NoAliasScopeDecls, Compare) | |
5593 | Can you please also add a test for this in llvm/test/Verifier? |
Any optimization pass where we could piggy back on to do this ? Or should something like this be a separate pass ? We would need to trace all !noalias/!alias.scope usages in a function and combine this data with the llvm.experimental.noalias.scope.decl calls.
LGTM, but please wait for @jdoerfert to check this as well.
llvm/docs/LangRef.rst | ||
---|---|---|
19596 | nit: reason of -> reason for |
A few minor wording suggestions and comments, nothing that would require another round of review. Adopt my suggestions as you see fit, add TODOs where it makes sense and something should be investigated in the future. LGTM
llvm/docs/LangRef.rst | ||
---|---|---|
19596–19597 | ||
19613 | Same wording as above. | |
19615–19618 | ||
19622 | ||
19650–19652 | ||
19653 | Nit: Version or version, not both. | |
llvm/include/llvm/IR/Intrinsics.td | ||
550 | Interesting, at some point I'd like to hear more. Let's go with IntrInaccessibleMemOnly for now but we should be on the lookout for better solutions here. This is not the only intrinsic and the less restrictive our annotations are the better. | |
llvm/lib/IR/Verifier.cpp | ||
5571 | If this sorts based on pointer values it is non-deterministic. While potentially OK in the verifier, I'd recommend a TODO. We should sort by "metadata ids" (!0, !1, ...) or something that is deterministic. | |
5597 | FWIW, I would set this up the other way around, maybe worth considering but not strictly necessary:
|
Thanks for the feedback. I plan to do the small updates today and commit it so that we have a view on possible (unexpected) fallout.
@jdoerfert, @nikic, can you also spend some time on the other open reviews on the stack ?
llvm/lib/IR/Verifier.cpp | ||
---|---|---|
5571 | Yes that's true. In the 'good' case, this will not make a difference of course. I'll add the TODO. | |
5597 | My initial idea was to use a map of vectors as you suggest, but I settled on doing a sort and avoiding a bunch of allocations. I am going to keep this for now. Having a nested for-loop does not make a big difference wrt complexity as we need to check the dominance for A vs B and for B vs A. |
llvm/lib/IR/Verifier.cpp | ||
---|---|---|
5586 | ItCurrent inclusive, ItNext exclusive. |
llvm/lib/IR/Verifier.cpp | ||
---|---|---|
5586 | Thank you for the clarification, I realized the interval from the code, but then didn't connect it to the comment notation. |
llvm/lib/IR/Verifier.cpp | ||
---|---|---|
121 | Can we flip this flag now, or are you aware of any further issues? |
Will you do the flip or shall I do it ?
llvm/lib/IR/Verifier.cpp | ||
---|---|---|
121 | I just did a test-suite run and a stage2 release build using an assertion build with the flag flipped, and both went fine. IMHO it makes sense to try it out... |
llvm/lib/IR/Verifier.cpp | ||
---|---|---|
121 | What's the impact on compile-time? If it is potentially expensive, it should probably be true for EXPENSIVE_CHECKS builds. |
llvm/lib/IR/Verifier.cpp | ||
---|---|---|
121 | I see no reason for it to be expensive at least. |
llvm/lib/IR/Verifier.cpp | ||
---|---|---|
5586 | @asbirlea Both are valid it seems https://en.wikipedia.org/wiki/Interval_(mathematics)#Including_or_excluding_endpoints |
llvm/lib/IR/Verifier.cpp | ||
---|---|---|
5586 | I would prefer more common notation too. It may look like a typo, and somebody in the future may “fix” it to ]. |
llvm/lib/IR/Verifier.cpp | ||
---|---|---|
121 | @fhahn, @nikic, It's O(N log(N)) (sorting of N = number of scope declarations) Is that considered to be too expensive ? | |
5586 | @xbolva00 That probably depends on what you learned and might go in both directions. Do we have an llvm guidance on what notation to use ? I have no problem with using the '[' ')' style, but it might make sense that it is documented as the preferred style for llvm ? |
llvm/lib/IR/Verifier.cpp | ||
---|---|---|
121 | IIUC we need to potentially check whether any llvm.experimental.noalias.scope.decl dominates any other llvm.experimental.noalias.scope.decl? Is it possible to have a function with N basic blocks which do not dominate each other, each with N llvm.experimental.noalias.scope.decl for the same scope? |
llvm/lib/IR/Verifier.cpp | ||
---|---|---|
121 |
It is allowed to construct such llvm-ir manually. |
llvm/lib/IR/Verifier.cpp | ||
---|---|---|
121 |
I am not worried really worried about about the 'common' case here, it is unlikely to be noticeable in common C/C++ programs. But if there isn't anything preventing the worst case, (~M in the above being roughly the number of instructions in a function, excluding branches), then I think it would be better to have this as expensive check. This might be overly cautious for 'sane' code, but unfortunately there are plenty of code-generators that like to generate highly unusual IR. For example, we surprisingly often get reports of builds taking 24+ hours, because someone tries to compile a file with 500k stores in a single basic block without running any IR store optimizations, which one could argue is not 'common'.
Unfortunately LLVM IR & the verifier is used in many contexts, frontends are free to generate whatever IR they want and also are free to use whatever combination of optimizations (or no optimizations at all). Personally I am weary when it comes to assuming IR passed to LLVM is 'reasonable'. |
Maybe name it !id.scope.list to make it clear that it's not a single scope?