Page MenuHomePhabricator

[PATCH 01/38] [noalias] LangRef: noalias intrinsics and noalias_sidechannel documentation.
Needs ReviewPublic

Authored by jeroen.dobbelaere on Oct 4 2019, 1:59 PM.

Details

Reviewers
hfinkel
jdoerfert
Summary

This patch is the first of a series that introduces full restrict support
in LLVM and clang. The full support is based on the original local restrict
patches from Hal Finkel and is an implementation of the
'RFC: Full 'restrict' support in LLVM' [1].

In order to show the dependencies, in what follows, most of the time
a non-functional rebased patch from Hal Finkel is provided, followed
by a patch that enhances the full restrict support and makes everything
compile and run again.

[1] https://lists.llvm.org/pipermail/llvm-dev/2019-October/135672.html

Notes:

  • The mechanism with the noalias_sidechannel is such that passes that don't know about it will either work (and maybe miss certain optimizations) or crash. This is considered to be better than producing wrong code.
  • This set of patches is at the moment not complete. It is tested and works for the use cases of my company. But it is to be expected that some optimization passes will not interact well with it. In our experience, a number of optimization passes do have problems with the optional extra argument for load and store instructions, and they are normally easy to fix. It is possible that we did not yet catch all of those in passes that we don't use.
  • One item that is known to be missing, is LLVM IR bitcode support for the noalias_sidechannel of the load/store instruction (ascii LLVM IR is supported).
  • The new pass manager support has been fixed (D68507).
  • SLPVectorizer issues also have been fixed. (D68517)
  • The options enabling/disabling full restrict have been improved. (D68484)

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jeroen.dobbelaere edited the summary of this revision. (Show Details)Oct 4 2019, 3:41 PM
jeroen.dobbelaere edited the summary of this revision. (Show Details)Oct 5 2019, 2:34 AM

The patches in this patch series are not generated with full context (-U99999) (and not fully clang-formatted) :/

This mentions something it calls <channel> but as far as i can tell it is not actually explained *what* that is, what type it is.

This mentions something it calls <channel> but as far as i can tell it is not actually explained *what* that is, what type it is.

As the syntax description shows, it is the same as the type of the pointer operand <ty>*.
I should maybe have mentioned it explicitly.

Document the type of the noalias_sidechannel for load and store.

craig.topper added inline comments.
llvm/docs/LangRef.rst
16476

Extra space after "path"

16479

in -> into?

16496

"allows to track" reads funny. Maybe "allows tracking of"?

16502

intrinsics*

lzutao added a subscriber: lzutao.Oct 7 2019, 1:14 PM

Thank you very much for working on this and putting all this into motion!

I started to look at this patch in isolation but with the rough idea of the approach in mind.
I did add various comments, from small wording changes to proposals on how we conceptually describe things.
Given that many comments would be repeated for each intrinsic, I stopped after the first and want to see what people think.

llvm/docs/LangRef.rst
16439

Nit:

of load and store instructions.

The "of" sounds weird to me.


The documentation below explains how it works, with `restrict` in mind.

I personally dislike sentences like this and would just remove it.

The intrinsics can also be used to specify alias assumptions that are not restrict based.

Arguably that is always true. The section describes the semantics of the alias stuff and how that can be used to model restrict. It is implied that other things can be modeled as well.

16456

I think you mix the "templated" definition (XXX) with instantiations (i8**, %struct.FOO*, ...). I would prefer we pick either. Precedence says you replace XXX with the types of that instantiation.

16482

introduces alias assumptions

plural vs singular

in the normal computation path

this isn't a "known" term for me (see below)

of a pointer and it will be opaque for most optimizations

this is the hope but it is questionable if it is true and why it is here

I would replace the first sentence with:
"The `llvm.noalias` intrinsic attaches alias assumptions to its first argument."


The whole pass thing and splitting comes to early (IMHO). I don't know yet what these intrinsics mean but I learn that they are transformed. That said, llvm.side.noalias is not described here.

16488

Nit: remove "is used to", just "identifies"
Nit: remove "exact" (what does it mean given that we actually move stuff around under the normal "as if" rules)

It's not "done inside" and loops are only one example of this. What you want to say is more general:
"Whenever a llvm.noalias.decl intrinsic is duplicated through code transformations, care must be taken to duplicate and uniquify the scopes and intrinsics. These steps are described in the following."
To be honest, I'm not sure if it makes sense to say something like that here already.

16493

Not: stray "this" in 16284
The inlining sentence does not really clear up anything here, partially because we don't know what is happening.

16497

remove "a blob of"

16508

Either a real object, a constant where the value is relative to 0 or `null`.

There is a word missing and 0 is null.

16510

This seems odd, why introduce two things that do the same thing.

16513

"entries with a single element each."

It represents the variable declaration that contains one or more restrict pointers.

I do not understand this sentence.

16517

For both items above:
No need for "points to".

a restrict variable.

Maybe more specific: "the restrict pointer %p."

16521

Maybe:
"the address of an object with at least one restrict pointer constituent.

16524

I did not understand the above wording.

16541

Nit: "by the following"

16545

"an extra number" is not helpful
" related to " -> "describing"/"identifying"

16560

"related to" -> "referencing" or "scoped in"

16690

I mentioned that before, the lady of the lake says XXX is specialized here:
https://llvm.org/docs/LangRef.html#llvm-memcpy-intrinsic

(do we add attributes here? if so, we need attributes here, e.g., nocapture).

16699

I dislike the loop body and alloca sentence because they do not convey information. To be honest, I don't know what the second sentence is saying. What I would prefer is to say something like:
"The intrinsic identifies the scope of the restrict restrict pointer through a virtual side-effect that ensures the control dependences are preserved. This virtual side-effect will also keep allocations alive and explicit."
(I guessed what you want to say wrt. alloca)

(allocas and mallocs should not be treated differently anyway, we have a heap-2-stack transformation now and for the purpose of this discussion there should not be a difference anyway)

16703

The above reads funny, maybe:
"The returned value is a handle to track dependences on the declaration. There is no explicit relationship to the value of the arguments."
Also, why do we want an i8* then? We have tokens and we have i32, I'd prefer either over an i8* which is more confusing in this context full of i8* that are actually pointers (IMHO).

16712
  1. I would love to remove this duplication, is that possible?
  2. Why do we need to talk about alloca and "optimized away"? Can't we say:

"The first argument %p.alloca points to an object in memory with one or more restrict pointers constituents or null."

16716

Is it a list with one element or a list with entries that have one element each? What I read earlier sounded different from what I read here.

16725

Copy and paste from somewhere above. I'd avoid duplication if possible in favor of references.

16731

The above sentence is broken somewhere (I think). Maybe make it two.

jeroen.dobbelaere marked 4 inline comments as done.Oct 7 2019, 3:38 PM

Thanks for all the feedback ! I added some explanations.

llvm/docs/LangRef.rst
16456

That's true, but imho the full intrinsic name become very long, cluttering the display., For clarity I replaced the type encodings with XXX. This makes it easier to focus on the intrinsics and the actual arguments. I agree that this is not perfect.

16510

The original idea was to treat '%p.addr' sometimes as a pointer to an object and sometimes as an offset. Later it needed to be separated: SROA first splits alloca's into multiple smaller alloca's. Each separate restrict pointer now points to its own alloca (%p.addr), and there is no place to put the offset. You can differentiate by splitting the p.scope, but that would imply duplicating scopes all over the place. The p.objId serves as a convenient and less costly solution to differentiate the pointers in this case.

16513

hmm. Not sure how to explain it further. What I want to say is (shown with an example:)

int *restrict A;  // one !p.scope, one restrict pointer
int *restrict B[10]; // another (single) !p.scope, ten restrict pointers
struct FOO { int* restrict mA; int * mB; int* restrict mC; } C; // yet another !p.scope, 2 restrict pointers
16703

I think a token has to many restrictions (no PHI, no select). i32 might do. I didn't think too much about it and just settled on i8*.

jdoerfert added inline comments.Oct 7 2019, 4:11 PM
llvm/docs/LangRef.rst
16510

So objId is an offset into p.addr? If so, let's document it that way.

How does this work if there are multiple restrict pointers in the object, e.g. struct { restrict *a; restrict *b }? Maybe it would help if you point me towards the place where I can see this intrinsic in action. At least then I might be able to provide better feedback on the wording.

16513

In that example, how doe the p.scopes look like? Or, asked differently, is the p.scope a consequence of the declaration, hence does it uniquely identifies a declaration?

16703

If the token is too restrictive I'd still prefer an i32 (or similar) to avoid confusion with all the i8 pointers that fly around. The wording will then make it clear that these are tokens.

jeroen.dobbelaere marked 3 inline comments as done.Oct 8 2019, 8:24 AM

Here is an example test.c:

struct FOO {
  int* restrict pA;
  int* pB;
  int* restrict pC;
};

void bar(int* a, int* b, int* c) {
  struct FOO tmp = { a, b, c };
  *tmp.pA=42;
  *tmp.pB=43;
  *tmp.pC=44;
}

Compiled as:

clang -mllvm --print-before-all -mllvm -debug -emit-llvm -O2 test.c -S -o -

Before SROA:

%tmp = alloca %struct.FOO, align 8
...
%1 = call i8* @llvm.noalias.decl.p0i8.p0s_struct.FOOs.i64(%struct.FOO* %tmp, i64 0, metadata !6)
...
%pA1 = getelementptr inbounds %struct.FOO, %struct.FOO* %tmp, i32 0, i32 0
%5 = load i32*, i32** %pA1, align 8, !tbaa !9, !noalias !6
%6 = call i32* @llvm.noalias.p0i32.p0i8.p0p0i32.i64(i32* %5, i8* %1, i32** %pA1, i64 0, metadata !6), !tbaa !9, !noalias !6
...
%pC3 = getelementptr inbounds %struct.FOO, %struct.FOO* %tmp, i32 0, i32 2
%8 = load i32*, i32** %pC3, align 8, !tbaa !12, !noalias !6
%9 = call i32* @llvm.noalias.p0i32.p0i8.p0p0i32.i64(i32* %8, i8* %1, i32** %pC3, i64 0, metadata !6), !tbaa !12, !noalias !6

During SROA: Notice how llvm.noalias.decl and llvm.noalias is split, using 0, 8 and 16 for the p.objId :

...
Rewriting alloca partition [0,8) to:   %tmp.sroa.0 = alloca i32*
Found llvm.noalias.decl:   %1 = call i8* @llvm.noalias.decl.p0i8.p0s_struct.FOOs.i64(%struct.FOO* %tmp, i64 0, metadata !6)
New   llvm.noalias.decl:   %1 = call i8* @llvm.noalias.decl.p0i8.p0p0i32.i64(i32** %tmp.sroa.0, i64 0, metadata !6)
 [...]
  rewriting [0,8) slice #2
    original:   %7 = call i32* @llvm.noalias.p0i32.p0i8.p0p0i32.i64(i32* %tmp.sroa.0.0., i8* %2, i32** %pA1, i64 0, metadata !6), !tbaa !9, !noalias !6
          to:   %7 = call i32* @llvm.noalias.p0i32.p0i8.p0p0i32.i64(i32* %tmp.sroa.0.0., i8* %1, i32** %tmp.sroa.0, i64 0, metadata !6), !tbaa !9, !noalias !6
 [...]
  rewriting split [0,24) slice #4 (splittable)
    original:   %2 = call i8* @llvm.noalias.decl.p0i8.p0s_struct.FOOs.i64(%struct.FOO* %tmp, i64 0, metadata !6)
          to:   %1 = call i8* @llvm.noalias.decl.p0i8.p0p0i32.i64(i32** %tmp.sroa.0, i64 0, metadata !6)
 [...]
Rewriting alloca partition [8,16) to:   %tmp.sroa.6 = alloca i32*
Found llvm.noalias.decl:   %2 = call i8* @llvm.noalias.decl.p0i8.p0s_struct.FOOs.i64(%struct.FOO* %tmp, i64 0, metadata !6)
New   llvm.noalias.decl:   %2 = call i8* @llvm.noalias.decl.p0i8.p0p0i32.i64(i32** %tmp.sroa.6, i64 8, metadata !6)
  [...]
  rewriting split [0,24) slice #4 (splittable)
    original:   %3 = call i8* @llvm.noalias.decl.p0i8.p0s_struct.FOOs.i64(%struct.FOO* %tmp, i64 0, metadata !6)
          to:   %2 = call i8* @llvm.noalias.decl.p0i8.p0p0i32.i64(i32** %tmp.sroa.6, i64 8, metadata !6)
  [...]
  rewriting [8,16) slice #7
    original:   %9 = load i32*, i32** %pB2, align 8, !tbaa !11, !noalias !6
          to:   %tmp.sroa.6.8. = load i32*, i32** %tmp.sroa.6, !tbaa !11, !noalias !6
Rewriting alloca partition [16,24) to:   %tmp.sroa.8 = alloca i32*
Found llvm.noalias.decl:   %3 = call i8* @llvm.noalias.decl.p0i8.p0s_struct.FOOs.i64(%struct.FOO* %tmp, i64 0, metadata !6)
New   llvm.noalias.decl:   %3 = call i8* @llvm.noalias.decl.p0i8.p0p0i32.i64(i32** %tmp.sroa.8, i64 16, metadata !6)
  [...]
  rewriting split [0,24) slice #4 (splittable)
    original:   %4 = call i8* @llvm.noalias.decl.p0i8.p0s_struct.FOOs.i64(%struct.FOO* %tmp, i64 0, metadata !6)
          to:   %3 = call i8* @llvm.noalias.decl.p0i8.p0p0i32.i64(i32** %tmp.sroa.8, i64 16, metadata !6)
  [...]
  rewriting [16,24) slice #10
    original:   %12 = call i32* @llvm.noalias.p0i32.p0i8.p0p0i32.i64(i32* %tmp.sroa.8.16., i8* %4, i32** %pC3, i64 0, metadata !6), !tbaa !12, !noalias !6
          to:   %12 = call i32* @llvm.noalias.p0i32.p0i8.p0p0i32.i64(i32* %tmp.sroa.8.16., i8* %3, i32** %tmp.sroa.8, i64 16, metadata !6), !tbaa !12, !noalias !6
  Speculating PHIs
  Speculating Selects

Then later on:

Promoting allocas with mem2reg...
Zeoring noalias.decl dep:   %0 = call i8* @llvm.noalias.decl.p0i8.p0p0i32.i64(i32** %tmp.sroa.0, i64 0, metadata !6)
Zeroing operand 2 of   %3 = call i32* @llvm.noalias.p0i32.p0i8.p0p0i32.i64(i32* %tmp.sroa.0.0., i8* %0, i32** %tmp.sroa.0, i64 0, metadata !6), !tbaa !9, !noalias !6
[...]
Zeoring noalias.decl dep:   %2 = call i8* @llvm.noalias.decl.p0i8.p0p0i32.i64(i32** %tmp.sroa.8, i64 16, metadata !2)
Zeroing operand 2 of   %4 = call i32* @llvm.noalias.p0i32.p0i8.p0p0i32.i64(i32* %tmp.sroa.8.16., i8* %2, i32** %tmp.sroa.8, i64 16, metadata !2), !tbaa !10, !noalias !2
[...]
Zeoring noalias.decl dep:   %1 = call i8* @llvm.noalias.decl.p0i8.p0p0i32.i64(i32** %tmp.sroa.6, i64 8, metadata !2)
[...]

(aargh, 'Zeoring' should of course be 'Zeroing' ;) )

After this pass, we get:

define dso_local void @bar(i32* %a, i32* %b, i32* %c) #0 {
entry:
  %0 = call i8* @llvm.noalias.decl.p0i8.p0p0i32.i64(i32** null, i64 0, metadata !2)
  %1 = call i8* @llvm.noalias.decl.p0i8.p0p0i32.i64(i32** null, i64 8, metadata !2)  ; This one will be removed later on, as it is not used anywhere.
  %2 = call i8* @llvm.noalias.decl.p0i8.p0p0i32.i64(i32** null, i64 16, metadata !2)
  %3 = call i32* @llvm.noalias.p0i32.p0i8.p0p0i32.i64(i32* %a, i8* %0, i32** null, i64 0, metadata !2), !tbaa !5, !noalias !2
  store i32 42, i32* %3, align 4, !tbaa !10, !noalias !2
  store i32 43, i32* %b, align 4, !tbaa !10, !noalias !2
  %4 = call i32* @llvm.noalias.p0i32.p0i8.p0p0i32.i64(i32* %c, i8* %2, i32** null, i64 16, metadata !2), !tbaa !12, !noalias !2
  store i32 44, i32* %4, align 4, !tbaa !10, !noalias !2
  ret void
}
[...]
!2 = !{!3} ; p.scope: 'tmp' in function 'bar', also recycled by !noalias, as it is the only restrict declaration
!3 = distinct !{!3, !4, !"bar: tmp"}
!4 = distinct !{!4, !"bar"}
llvm/docs/LangRef.rst
16510

This is the confusing part for me for the LangRef vs the usage: should the LangRef describe only the high level effect, or can it also describe how llvm treats/optimizes stuff internally ? I have somehow the feeling the we might want to have a separate restrict handling document, describing how the intrincs and metadata work together. Or do you think such a thing also belongs to the LangRef ?

16513

Yes, the p.scope is a result of the declaration and uniquely identifies one.

16703

ok. We can consider that.

a.elovikov added inline comments.
llvm/docs/LangRef.rst
16448

I find it strange to see %side.p on both left and right sides. Is it a typo or does it have some special meaning?

After reading till the intrinsics' description I believe it should be just "%p" on the right side.

16782

the `noalias_sidechannel` path

Not sure about terminology, but are @llvm.noalias.arg.guard/@llvm.noalias.copy.guard considered as noalias_sidechannel? I'd suggest not to use the spelling from the load/store instructions and have a more general moved onto the "side" path (if my understanding is correct here).

16822

No explicit "or null" here. Is that intentional?

jeroen.dobbelaere marked 3 inline comments as done.Oct 10 2019, 1:34 PM
jeroen.dobbelaere added inline comments.
llvm/docs/LangRef.rst
16448

yes, that's a typo. the second %side.p should be %p:

%side.p = i8* @llvm.side.noalias.XXX(i8* %p, ...)
16782

The @llvm.noalias.arg.guard combines the normal path with the noalias_sidechannel path. The @llvm.noalias.copy.guard resides on the normal path and adds extra information to a copy operation (memcpy, load/store).
I tried to be consistent in terminology when referring to the 'noalias_sidechannel' path. (but I could also use the 'noalias side channel' or something similar).

16822

It can be 'null'

a.elovikov added inline comments.Oct 10 2019, 1:46 PM
llvm/docs/LangRef.rst
16782

How about this:

It will be transformed into a `llvm.side.noalias` intrinsic and moved onto
the `noalias_sidechannel` path for loads/stores and fed into the @llvm.noalias.arg.guard/@llvm.noalias.copy.guard intrinsics for function boundaries/copies respectively.

jeroen.dobbelaere marked an inline comment as done.Oct 10 2019, 1:53 PM
jeroen.dobbelaere added inline comments.
llvm/docs/LangRef.rst
16782

... and fed into the @llvm.noalias.arg.guard intrinsics for function boundaries.

(The @llvm.noalias.copy.guard is generated by the clang frontend)

izik1 added a subscriber: izik1.Oct 18 2019, 11:39 AM
jeroen.dobbelaere edited the summary of this revision. (Show Details)
aqjune added a subscriber: aqjune.Oct 30 2019, 7:52 PM
uenoku added a subscriber: uenoku.Nov 2 2019, 6:48 AM
simoll added a subscriber: simoll.Nov 5 2019, 6:29 AM