This is an archive of the discontinued LLVM Phabricator instance.

Handle invariant.group.barrier in BasicAA
ClosedPublic

Authored by Prazek on Apr 2 2017, 12:57 PM.

Details

Summary

llvm.invariant.group.barrier returns pointer that mustalias
pointer it takes. It can't be marked with returned attribute,
because it would be remove easily. The other reason is that
only Alias Analysis can know about this, because if any other
pass would know it, then the result would be replaced with it's
argument, which would be invalid.

We can think about returned pointer as something that mustalias, but
it doesn't have to be bitwise the same as the argument.

Diff Detail

Repository
rL LLVM

Event Timeline

Prazek created this revision.Apr 2 2017, 12:57 PM
Prazek updated this revision to Diff 93809.Apr 2 2017, 12:59 PM

Removed empty lines

Prazek updated this revision to Diff 93810.Apr 2 2017, 1:00 PM

Add Krzysztof to subscribers

Prazek added a subscriber: amharc.Apr 2 2017, 1:00 PM
dberlin edited edge metadata.EditedApr 2 2017, 1:08 PM

Note: Nothing currently removes things with the returned attribute, so it wouldn't be removed because of that.

" because if any other pass would know it, then the result would be replaced with it's
argument, which would be invalid."

Invalid or just lose optimization/have to drop the attribute?
If invalid, this sounds, honestly, like a serious design flaw.
SSA values are meant to be propagated, and don't die once created.
(Memory does, but ssa doesn't).

I should always be able to propagate an ssa value past anything i like.
If you've created an intrinsic where ssa values die when passed to it, that seems very broken.

sanjoy edited edge metadata.Apr 2 2017, 1:21 PM

Hi,

I'm not sure if this is correct. What if we transform:

i8* ptr_0 = ...
i8 old_val = load ptr_0, !invariant.group
store vptr to ptr_0
i8* ptr_1 = barrier(ptr_0)
i8 val = load ptr_1, !invariant.group

to (since ptr_0 must-alias ptr_1, and the rules of invariant.group, ptr_0 must-alias ptr_1 and they're not clobbered by the store to ptr_0)

i8* ptr_0 = ...
i8 old_val = load ptr_0, !invariant.group
store vptr to ptr_0
i8* ptr_1 = barrier(ptr_0)
i8 val = old_val

In other words, given what you're teaching BasicAA, even though barrier(x) cannot be replaced with x in general, since x must-alias barrier(x) we can do the replacement for load and store operations, which can break the data dependency you're trying to preserve.

This seems pretty broken.

It seems if i add an assume that the beforeptr == afterptr
it will propagate it
if i put it in a loop with a if (beforeptr == afterptr) condition, it'll propagate it, etc.

You can't prevent the compiler from propagating your beforeptr to afterptr, and it's only ever going to become smarter about proving they are the same.
Thus, things need to lose optimization, not break, when it does that.

Prazek added a comment.Apr 2 2017, 1:50 PM

Hi,

I'm not sure if this is correct. What if we transform:

i8* ptr_0 = ...
i8 old_val = load ptr_0, !invariant.group
store vptr to ptr_0
i8* ptr_1 = barrier(ptr_0)
i8 val = load ptr_1, !invariant.group

to (since ptr_0 must-alias ptr_1, and the rules of invariant.group, ptr_0 must-alias ptr_1 and they're not clobbered by the store to ptr_0)

i8* ptr_0 = ...
i8 old_val = load ptr_0, !invariant.group
store vptr to ptr_0
i8* ptr_1 = barrier(ptr_0)
i8 val = old_val

In other words, given what you're teaching BasicAA, even though barrier(x) cannot be replaced with x in general, since x must-alias barrier(x) we can do the replacement for load and store operations, which can break the data dependency you're trying to preserve.

It can't, because the optimizations based on invariant.group will not be stripping barriers. It can only work on SSA values.

The outcome of all the optimizations will be:

i8* ptr_0 = ...
i8 old_val = load ptr_0, !invariant.group
store vptr to ptr_0
i8* ptr_1 = barrier(ptr_0)
i8 val = vptr

Which is totally valid, because barrier doesn't change any memory.

Note: Nothing currently removes things with the returned attribute, so it wouldn't be removed because of that.

" because if any other pass would know it, then the result would be replaced with it's
argument, which would be invalid."

Invalid or just lose optimization/have to drop the attribute?
If invalid, this sounds, honestly, like a serious design flaw.
SSA values are meant to be propagated, and don't die once created.
(Memory does, but ssa doesn't).

I should always be able to propagate an ssa value past anything i like.
If you've created an intrinsic where ssa values die when passed to it, that seems very broken.

This patch is meant to solve the "barriers stopping optimizations" problem.
llvm.invariant.group.barrier is readonly argmemonly intrinsic, which means that if I would mark it with returns attribute, then it would be
removed as fast as llvm.ssa.copy.

Check out this: https://reviews.llvm.org/D31531
There is also another argument on how you can think about barriers (Chandler's idea):

Let say that lower 32 bits of pointer are the phisical address, and high 32 bits are encoding dynamic type.
Barrier (or std::launder if it helps) can read the pointer, and based on dynamic type of the object (read from vptr)
it returns pointer with the same lower 32 bits, but high 32 bits can be different from the pointer it took.

Another problem, that need to be mention is skipping the barrier based on comparasions:

Base *a = new Base();
a->foo();
Base *b = new(a) Derived();  
assert(a == b);
b->foo();

So after assert, LLVM could replace ssa value of b with a,
which would skip the barrier and cause wrong devirtualization.
That is another reason why we need to add barriers before comparasion like:
barrier(a) == barrier(b)

but it would be not enough, if we would know that barrier(a) produces the same value
as barrier(b).

I should always be able to propagate an ssa value past anything i like.
If you've created an intrinsic where ssa values die when passed to it, that seems very broken.

This patch is meant to solve the "barriers stopping optimizations" problem.

I think this needs to be solved another way.

llvm.invariant.group.barrier is readonly argmemonly intrinsic, which means that if I would mark it with returns attribute, then it would be
removed as fast as llvm.ssa.copy.

Only one pass removes llvm.ssa.copy :)

Check out this: https://reviews.llvm.org/D31531
There is also another argument on how you can think about barriers (Chandler's idea):

Let say that lower 32 bits of pointer are the phisical address, and high 32 bits are encoding dynamic type.
Barrier (or std::launder if it helps) can read the pointer, and based on dynamic type of the object (read from vptr)
it returns pointer with the same lower 32 bits, but high 32 bits can be different from the pointer it took.

but again, if i add comparisons, assumes, or asserts that prove otherwise, it'll propagate.
You simply aren't going to be able to stop this propagation, nor should you be able to.
It violates a fundamental property of ssa - values never die :)

My problem with your barrier is simply that *when propagation happens*, it generates wrong code, instead of "losing optimization"

Compare to noalias intrinsic
If i propagate the argument of llvm.noalias to it's result, i lose optimization, but i don't break the program.

Another problem, that need to be mention is skipping the barrier based on comparasions:

Base *a = new Base();
a->foo();
Base *b = new(a) Derived();  
assert(a == b);
b->foo();

So after assert, LLVM could replace ssa value of b with a,

Yes, and it will.

which would skip the barrier and cause wrong devirtualization.
That is another reason why we need to add barriers before comparasion like:
barrier(a) == barrier(b)

but it would be not enough, if we would know that barrier(a) produces the same value
as barrier(b).

I don't think this is going to work out.
Honestly.
It is always possible to find ways to break it based on value equivalence.

You can't stop all forms of proving value equivalence.

Prazek added a comment.Apr 2 2017, 2:15 PM

I should always be able to propagate an ssa value past anything i like.
If you've created an intrinsic where ssa values die when passed to it, that seems very broken.

This patch is meant to solve the "barriers stopping optimizations" problem.

I think this needs to be solved another way.

llvm.invariant.group.barrier is readonly argmemonly intrinsic, which means that if I would mark it with returns attribute, then it would be
removed as fast as llvm.ssa.copy.

Only one pass removes llvm.ssa.copy :)

Check out this: https://reviews.llvm.org/D31531
There is also another argument on how you can think about barriers (Chandler's idea):

Let say that lower 32 bits of pointer are the phisical address, and high 32 bits are encoding dynamic type.
Barrier (or std::launder if it helps) can read the pointer, and based on dynamic type of the object (read from vptr)
it returns pointer with the same lower 32 bits, but high 32 bits can be different from the pointer it took.

but again, if i add comparisons, assumes, or asserts that prove otherwise, it'll propagate.
You simply aren't going to be able to stop this propagation, nor should you be able to.
It violates a fundamental property of ssa - values never die :)

My problem with your barrier is simply that *when propagation happens*, it generates wrong code, instead of "losing optimization"

Compare to noalias intrinsic
If i propagate the argument of llvm.noalias to it's result, i lose optimization, but i don't break the program.

Another problem, that need to be mention is skipping the barrier based on comparasions:

Base *a = new Base();
a->foo();
Base *b = new(a) Derived();  
assert(a == b);
b->foo();

So after assert, LLVM could replace ssa value of b with a,

Yes, and it will.

which would skip the barrier and cause wrong devirtualization.
That is another reason why we need to add barriers before comparasion like:
barrier(a) == barrier(b)

but it would be not enough, if we would know that barrier(a) produces the same value
as barrier(b).

I don't think this is going to work out.
Honestly.
It is always possible to find ways to break it based on value equivalence.

You can't stop all forms of proving value equivalence.

I should always be able to propagate an ssa value past anything i like.
If you've created an intrinsic where ssa values die when passed to it, that seems very broken.

This patch is meant to solve the "barriers stopping optimizations" problem.

I think this needs to be solved another way.

llvm.invariant.group.barrier is readonly argmemonly intrinsic, which means that if I would mark it with returns attribute, then it would be
removed as fast as llvm.ssa.copy.

Only one pass removes llvm.ssa.copy :)

Check out this: https://reviews.llvm.org/D31531
There is also another argument on how you can think about barriers (Chandler's idea):

Let say that lower 32 bits of pointer are the phisical address, and high 32 bits are encoding dynamic type.
Barrier (or std::launder if it helps) can read the pointer, and based on dynamic type of the object (read from vptr)
it returns pointer with the same lower 32 bits, but high 32 bits can be different from the pointer it took.

but again, if i add comparisons, assumes, or asserts that prove otherwise, it'll propagate.
You simply aren't going to be able to stop this propagation, nor should you be able to.
It violates a fundamental property of ssa - values never die :)

My problem with your barrier is simply that *when propagation happens*, it generates wrong code, instead of "losing optimization"

Compare to noalias intrinsic
If i propagate the argument of llvm.noalias to it's result, i lose optimization, but i don't break the program.

Another problem, that need to be mention is skipping the barrier based on comparasions:

Base *a = new Base();
a->foo();
Base *b = new(a) Derived();  
assert(a == b);
b->foo();

So after assert, LLVM could replace ssa value of b with a,

Yes, and it will.

which would skip the barrier and cause wrong devirtualization.
That is another reason why we need to add barriers before comparasion like:
barrier(a) == barrier(b)

but it would be not enough, if we would know that barrier(a) produces the same value
as barrier(b).

I don't think this is going to work out.
Honestly.
It is always possible to find ways to break it based on value equivalence.

You can't stop all forms of proving value equivalence.

I've talked with Chandler and we think that if the barrier will be added when:

  • any cast from dynamic type to not dynamic (void*, int64 etc)
  • when pointer to dynamic object is compared in any way
  • standard places (ctor, dtor, placement new, union)

will be enough to not propaget the value. So as long as you cange A to B, and both A and B are after after the barrier, then we will never skip the barrier (replace ssa value "after barrier" to "before barrier")

I'll make you a deal:
If you are willing to fix this when it inevitably breaks, without hacking up the optimizations to do so, i'm happy to remove my objection :)

Prazek added a comment.Apr 2 2017, 2:32 PM

I'll make you a deal:
If you are willing to fix this when it inevitably breaks, without hacking up the optimizations to do so, i'm happy to remove my objection :)

invariant.group is my baby, ofc I would fix it :) I am also very happy to learn a better way to model this. So far after couple of years I don't see any better solution - we can't
mark this intrinsic with returned attribute, because we can't expose it to SSA. The only place that should know about it is AA. This knowledge shouldn't break anything, because
we won't replace the returned value based on MustAlias, and we want to perform the same optimizations as if we would strip all the invariant.group metadata and barriers.

The only optimizations that invariant.group.barrier stops are the ones optimizing memory.
I was also planning to add "mustalias" attribute to barrier after figuring it out with Hal and Chandler, so I would look for functions with this attribute instead of the barriers, but
it doesn't really matter that much.
You can check out the proposal, which specify the problem and bunch of optimizations that I care about here:
https://docs.google.com/document/d/1bY9edrxgEMnqo0VxlF8cBmzu9Avrue7RqC5S3HISBiQ/edit?usp=sharing

"The only optimizations that invariant.group.barrier stops are the ones optimizing memory.
"
These are literally our most important optimizations.
Really.
Scalar optimizations just about don't matter anymore.
We can perform billions of wasted scalar calculations per second, but not so much on the memory front.

Prazek updated this revision to Diff 93816.Apr 2 2017, 2:39 PM

reorder PSK enum and perform clang-format

Prazek added a comment.EditedApr 2 2017, 2:46 PM

"The only optimizations that invariant.group.barrier stops are the ones optimizing memory.
"
These are literally our most important optimizations.
Really.
Scalar optimizations just about don't matter anymore.
We can perform billions of wasted scalar calculations per second, but not so much on the memory front.

I agree, but I hope that this patch will fix most of that :) There might be some missing places like the DSE that you saw that doesn't use AA and rely on SSA values only.

edit: I know that there is also GlobalOpt that I have to take a look, but it is not as important as GVN or DSE

"The only optimizations that invariant.group.barrier stops are the ones optimizing memory.
"
These are literally our most important optimizations.
Really.
Scalar optimizations just about don't matter anymore.
We can perform billions of wasted scalar calculations per second, but not so much on the memory front.

I agree, but I hope that this patch will fix most of that :) There might be some missing places like the DSE that you saw that doesn't use AA and rely on SSA values only.

again, they shouldn't have to rely on AA to get *conservatively correct answers about the values*, only to optimize *the memory it touches* better.

edit: I know that there is also GlobalOpt that I have to take a look, but it is not as important as GVN or DSE

I'm fine with this for now, i just suspect it's going to break in highly creative ways as we get better about optimization.

Prazek added a comment.Apr 2 2017, 3:00 PM

"The only optimizations that invariant.group.barrier stops are the ones optimizing memory.
"
These are literally our most important optimizations.
Really.
Scalar optimizations just about don't matter anymore.
We can perform billions of wasted scalar calculations per second, but not so much on the memory front.

I agree, but I hope that this patch will fix most of that :) There might be some missing places like the DSE that you saw that doesn't use AA and rely on SSA values only.

again, they shouldn't have to rely on AA to get *conservatively correct answers about the values*, only to optimize *the memory it touches* better.

edit: I know that there is also GlobalOpt that I have to take a look, but it is not as important as GVN or DSE

I'm fine with this for now, i just suspect it's going to break in highly creative ways as we get better about optimization.

"The only optimizations that invariant.group.barrier stops are the ones optimizing memory.
"
These are literally our most important optimizations.
Really.
Scalar optimizations just about don't matter anymore.
We can perform billions of wasted scalar calculations per second, but not so much on the memory front.

I agree, but I hope that this patch will fix most of that :) There might be some missing places like the DSE that you saw that doesn't use AA and rely on SSA values only.

again, they shouldn't have to rely on AA to get *conservatively correct answers about the values*, only to optimize *the memory it touches* better.

It is not about correctness, it is only about optimizations. If I remove this patch, then it will be as correct as it was before.

edit: I know that there is also GlobalOpt that I have to take a look, but it is not as important as GVN or DSE

I'm fine with this for now, i just suspect it's going to break in highly creative ways as we get better about optimization.

The only bad thing that could happen is if someone would use stripPointerCastsWithBarriers() outside of AA, which would be invalid. Beside this, I don't see
how the information that 2 pointers mustalias could break anything. Do you have some thoughts on what could break?
Anyway, as I said, I feel responsible for the invariant.group topics, so I will be always happy to fix it.

"The only optimizations that invariant.group.barrier stops are the ones optimizing memory.
"
These are literally our most important optimizations.
Really.
Scalar optimizations just about don't matter anymore.
We can perform billions of wasted scalar calculations per second, but not so much on the memory front.

I agree, but I hope that this patch will fix most of that :) There might be some missing places like the DSE that you saw that doesn't use AA and rely on SSA values only.

again, they shouldn't have to rely on AA to get *conservatively correct answers about the values*, only to optimize *the memory it touches* better.

edit: I know that there is also GlobalOpt that I have to take a look, but it is not as important as GVN or DSE

I'm fine with this for now, i just suspect it's going to break in highly creative ways as we get better about optimization.

"The only optimizations that invariant.group.barrier stops are the ones optimizing memory.
"
These are literally our most important optimizations.
Really.
Scalar optimizations just about don't matter anymore.
We can perform billions of wasted scalar calculations per second, but not so much on the memory front.

I agree, but I hope that this patch will fix most of that :) There might be some missing places like the DSE that you saw that doesn't use AA and rely on SSA values only.

again, they shouldn't have to rely on AA to get *conservatively correct answers about the values*, only to optimize *the memory it touches* better.

It is not about correctness, it is only about optimizations. If I remove this patch, then it will be as correct as it was before.

edit: I know that there is also GlobalOpt that I have to take a look, but it is not as important as GVN or DSE

I'm fine with this for now, i just suspect it's going to break in highly creative ways as we get better about optimization.

The only bad thing that could happen is if someone would use stripPointerCastsWithBarriers() outside of AA, which would be invalid. Beside this, I don't see
how the information that 2 pointers mustalias could break anything. Do you have some thoughts on what could break?

I suspect nicer gvn's will break.
There are way too many ways to use mustalias.
While we can say mustalias does not not prove value equivalence directly, it can be used indirectly.
For example (and remember, i can ignore metadata i like :P)
store q
a = load b
d = b
c = load d
if b ==d , and b mustalias q, d mustalias q.

IE in:
store q
a = load b
c = load d
if b mustalias q, at least in our system, we expect b == d
(TBAA is the canonical example of what breaks this, but we still rely on it in some cases)

This travels backwards in time, too
store 5, b
c = load b
store 5, a
d = load a
if a mustalias b, the first store and load are dead, etc.

So chalk my feeling up to "general unease having gone down this path before".

Prazek added a comment.Apr 2 2017, 3:39 PM

"The only optimizations that invariant.group.barrier stops are the ones optimizing memory.
"
These are literally our most important optimizations.
Really.
Scalar optimizations just about don't matter anymore.
We can perform billions of wasted scalar calculations per second, but not so much on the memory front.

I agree, but I hope that this patch will fix most of that :) There might be some missing places like the DSE that you saw that doesn't use AA and rely on SSA values only.

again, they shouldn't have to rely on AA to get *conservatively correct answers about the values*, only to optimize *the memory it touches* better.

edit: I know that there is also GlobalOpt that I have to take a look, but it is not as important as GVN or DSE

I'm fine with this for now, i just suspect it's going to break in highly creative ways as we get better about optimization.

"The only optimizations that invariant.group.barrier stops are the ones optimizing memory.
"
These are literally our most important optimizations.
Really.
Scalar optimizations just about don't matter anymore.
We can perform billions of wasted scalar calculations per second, but not so much on the memory front.

I agree, but I hope that this patch will fix most of that :) There might be some missing places like the DSE that you saw that doesn't use AA and rely on SSA values only.

again, they shouldn't have to rely on AA to get *conservatively correct answers about the values*, only to optimize *the memory it touches* better.

It is not about correctness, it is only about optimizations. If I remove this patch, then it will be as correct as it was before.

edit: I know that there is also GlobalOpt that I have to take a look, but it is not as important as GVN or DSE

I'm fine with this for now, i just suspect it's going to break in highly creative ways as we get better about optimization.

The only bad thing that could happen is if someone would use stripPointerCastsWithBarriers() outside of AA, which would be invalid. Beside this, I don't see
how the information that 2 pointers mustalias could break anything. Do you have some thoughts on what could break?

I suspect nicer gvn's will break.
There are way too many ways to use mustalias.
While we can say mustalias does not not prove value equivalence directly, it can be used indirectly.
For example (and remember, i can ignore metadata i like :P)
store q
a = load b
d = b
c = load d
if b ==d , and b mustalias q, d mustalias q.

Yep, this is totally valid reasoning, but as long as we go from pointer equality to mustalias, the barriers should not break :)

IE in:
store q
a = load b
c = load d
if b mustalias q, at least in our system, we expect b == d
(TBAA is the canonical example of what breaks this, but we still rely on it in some cases)

I don't think we should go from mustalias to equality. It is valid in some sense, but imho we should treat it as a property, that we can't derive from mustalias.

This travels backwards in time, too
store 5, b
c = load b
store 5, a
d = load a
if a mustalias b, the first store and load are dead, etc.

yep, this is exactly what I am lookig for, and it should not break the barriers :)

So chalk my feeling up to "general unease having gone down this path before".

"I don't think we should go from mustalias to equality. It is valid in some sense, but imho we should treat it as a property, that we can't derive from mustalias.

"

I don't actually disagree, of course.
aliasing is a property of abstract memory locations
value equivalence is a property of bits.
B implies A but A does not imply B :)

However, outside of memoryssa, which makes this very explicit (IE SSA form form for abstract memory locations), the distinction has a strong tendency to bleed through optimizations .

Prazek added a comment.Apr 2 2017, 3:55 PM

"I don't think we should go from mustalias to equality. It is valid in some sense, but imho we should treat it as a property, that we can't derive from mustalias.

"

I don't actually disagree, of course.
aliasing is a property of abstract memory locations
value equivalence is a property of bits.
B implies A but A does not imply B :)

However, outside of memoryssa, which makes this very explicit (IE SSA form form for abstract memory locations), the distinction has a strong tendency to bleed through optimizations .

"I don't think we should go from mustalias to equality. It is valid in some sense, but imho we should treat it as a property, that we can't derive from mustalias.

"

I don't actually disagree, of course.
aliasing is a property of abstract memory locations
value equivalence is a property of bits.
B implies A but A does not imply B :)

Cool, I am happy we agree on this :)

However, outside of memoryssa, which makes this very explicit (IE SSA form form for abstract memory locations), the distinction has a strong tendency to bleed through optimizations .

I guess we just have to be careful not to get from mustalias to equality in this case.
What about adding test like:

;Run with -O3 and another passes that are not by default that could break it
%b = call i8* @barrier(%a)
; Check here that it wasn't constant folded
%r = i1 cmp eq %b, %a
call @use(i1 %r)

This will not prove someone broke it, but it should help keep the semantics right.

sanjoy added a comment.Apr 2 2017, 4:38 PM

It can't, because the optimizations based on invariant.group will not be stripping barriers. It can only work on SSA values.

The outcome of all the optimizations will be:

i8* ptr_0 = ...
i8 old_val = load ptr_0, !invariant.group
store vptr to ptr_0
i8* ptr_1 = barrier(ptr_0)
i8 val = vptr

Which is totally valid, because barrier doesn't change any memory.

Are you saying that if I have (forget about barrier for a second):

%v0 = load %X
...
%v1 = load %Y

then even if %X must-alias %Y I cannot transform the above to:

%v0 = load %X
...
%v1 = load %X

which seems pretty weird. What does "must alias" even mean then?

The other thing here is that http://llvm.org/docs/AliasAnalysis.html#the-alias-method says: "A MustAlias response implies that the pointers compare equal." However, this I think is a bug in our AA specification, and the documentation should be fixed to say that the pointers may not be equal.

sanjoy added a comment.Apr 2 2017, 6:01 PM

(This is devolving into a general conversation about invariant.group; so please let me know if I should move this elsewhere.)

One more thing to document is that with invariant.group in the picture, certain kinds of "specializations" are no longer correct. For instance, I cannot transform:

call @f(i8* ptr)

to

if (ptr == X)
  call @f(i8* ptr)
else
  call @f(i8* ptr)

since it is possible that ptr == barrier(X) (in a way that isn't visible), and therefore the above can be GVN'ed into

if (ptr == X)
  call @f(i8* X)
else
  call @f(i8* ptr)

which is a problem if @f loads the vtable of X.

This needs to be carefully documented somewhere, though I'm not sure where.

It can't, because the optimizations based on invariant.group will not be stripping barriers. It can only work on SSA values.

The outcome of all the optimizations will be:

i8* ptr_0 = ...
i8 old_val = load ptr_0, !invariant.group
store vptr to ptr_0
i8* ptr_1 = barrier(ptr_0)
i8 val = vptr

Which is totally valid, because barrier doesn't change any memory.

Are you saying that if I have (forget about barrier for a second):

%v0 = load %X
...
%v1 = load %Y

then even if %X must-alias %Y I cannot transform the above to:

%v0 = load %X
...
%v1 = load %X

Yes.
If you can prove them value equivalent, you can transform them, but not just because they refer to the same memory location.

which seems pretty weird. What does "must alias" even mean then?

It means the language says the memory pointed to by these two locations is the same memory.
That does not imply the pointers are equal.

IE it's about the pointed-to memory, *not* about the pointer to that memory.

IE i could have language where different areas of memory are considered the exact same, and carefully manage them so it's not user-visible.
They are must-alias. You can't necessarily replace one with the other.

For example,

load %x
I move area pointed to by %x to where %y is
load %y

These are must-alias, but the pointers are not value equivalent.

It is certainly possible to have an IR where must-alias implies value equivalence.
It's not even uncommon.
So if we want to go that route, we certainly can.
But it's definitely also valid to not do so.

Note you can go further if you want.

If you have must alias != Pointer value equivalence, then In world where they are different, realistically the only thing i have to prove is: At any point in the program where something accesses memory. they access the same memory.

Thus, the following could be legal (i'm going to do it in non-ssa, just imagine it's two-level pointers)

%x = a place
load %x
%x = nonsense
%x = %z
load %z

where %z and %x could still be must-alias.

(Note this is pretty aggressive, i don't think we would go this far)

The converse is true, FWIW:
If they must be pointer-equal, situations like the above can't be must-alias, if they are, you may transform the program incorrectly, particularly if you allow that equivalence to go back in time and affect earlier statements.

sanjoy added a comment.Apr 2 2017, 6:21 PM

One more thing to document is that with invariant.group in the picture, certain kinds of "specializations" are no longer correct. For instance, I cannot transform:

Okay, false alarm (for some defintion of "alarm" :) ), this is not new to the @llvm.invariant.group.barrier intrinsic -- we already can't do this kind of transform because of realloc.

Say we have a program that prints 20:

new = realloc(old);
*new = 20;
print *new;

to

new = realloc(old);
if (new == old) {
  *new = 20;
  print *new;
} else {
  *new = 20;
  print *new;
}

to

new = realloc(old);
if (new == old) {
  *old = 20;
  print *new;
} else {
  *new = 20;
  print *new;
}

to (since realloc returns noalias)

new = realloc(old);
if (new == old) {
  print *new;
  *old = 20;
} else {
  *new = 20;
  print *new;
}

and now if realloc manages to reallocate in place, we'll print garbage instead of 20.

sanjoy added a comment.Apr 2 2017, 7:08 PM

It can't, because the optimizations based on invariant.group will not be stripping barriers. It can only work on SSA values.

The outcome of all the optimizations will be:

i8* ptr_0 = ...
i8 old_val = load ptr_0, !invariant.group
store vptr to ptr_0
i8* ptr_1 = barrier(ptr_0)
i8 val = vptr

Which is totally valid, because barrier doesn't change any memory.

Are you saying that if I have (forget about barrier for a second):

%v0 = load %X
...
%v1 = load %Y

then even if %X must-alias %Y I cannot transform the above to:

%v0 = load %X
...
%v1 = load %X

Yes.
If you can prove them value equivalent, you can transform them, but not just because they refer to the same memory location.

which seems pretty weird. What does "must alias" even mean then?

It means the language says the memory pointed to by these two locations is the same memory.
That does not imply the pointers are equal.

IE it's about the pointed-to memory, *not* about the pointer to that memory.

I was not using pointer equality in the above example. I meant to say that if the two SSA values %X and %Y (of type i32*, say) are MustAlias, then they must refer to the same memory location by definition (whether they're pointer equal or not is irrelevant). Which means loading from one should be the same as loading from the other (otherwise what does "referring to the same memory location" even mean?).

IE i could have language where different areas of memory are considered the exact same, and carefully manage them so it's not user-visible.
They are must-alias. You can't necessarily replace one with the other.

I understand that I can't replace one with the other in arbitrary computation (like pointer comparisons). But it seems to me that we should be able to replace them in contexts that only load / store through those pointers.

For example,

load %x
I move area pointed to by %x to where %y is
load %y

These are must-alias, but the pointers are not value equivalent.

True, but this isn't contradictory with my previous example.

It is certainly possible to have an IR where must-alias implies value equivalence.
It's not even uncommon.
So if we want to go that route, we certainly can.

As I mentioned on IRC, in http://llvm.org/docs/AliasAnalysis.html#must-may-and-no-alias-responses we specifically say "A MustAlias response implies that the pointers compare equal.". However, I agree that that's just a specification bug.

But it's definitely also valid to not do so.

Note you can go further if you want.

If you have must alias != Pointer value equivalence, then In world where they are different, realistically the only thing i have to prove is: At any point in the program where something accesses memory. they access the same memory.

Sure - that makes sense. I imagine multi-mapping scenarios where bitwise inequal pointers may be accessing the same memory.

Thus, the following could be legal (i'm going to do it in non-ssa, just imagine it's two-level pointers)

%x = a place
load %x
%x = nonsense
%x = %z
load %z

where %z and %x could still be must-alias.

The example wasn't clear to me -- when you say %x and %z are must-alias, do you mean the value stored in %x, or the value %x itself (of type int** presumably)?

(Note this is pretty aggressive, i don't think we would go this far)

The converse is true, FWIW:
If they must be pointer-equal, situations like the above can't be must-alias, if they are, you may transform the program incorrectly, particularly if you allow that equivalence to go back in time and affect earlier statements.

Prazek added a comment.Apr 3 2017, 2:48 AM

It can't, because the optimizations based on invariant.group will not be stripping barriers. It can only work on SSA values.

The outcome of all the optimizations will be:

i8* ptr_0 = ...
i8 old_val = load ptr_0, !invariant.group
store vptr to ptr_0
i8* ptr_1 = barrier(ptr_0)
i8 val = vptr

Which is totally valid, because barrier doesn't change any memory.

Are you saying that if I have (forget about barrier for a second):

%v0 = load %X
...
%v1 = load %Y

then even if %X must-alias %Y I cannot transform the above to:

%v0 = load %X
...
%v1 = load %X

Yes.
If you can prove them value equivalent, you can transform them, but not just because they refer to the same memory location.

which seems pretty weird. What does "must alias" even mean then?

It means the language says the memory pointed to by these two locations is the same memory.
That does not imply the pointers are equal.

IE it's about the pointed-to memory, *not* about the pointer to that memory.

I was not using pointer equality in the above example. I meant to say that if the two SSA values %X and %Y (of type i32*, say) are MustAlias, then they must refer to the same memory location by definition (whether they're pointer equal or not is irrelevant). Which means loading from one should be the same as loading from the other (otherwise what does "referring to the same memory location" even mean?).

I would say that if you have these two loads of different ssa pointers, but you know they mustalias, then you know that the load will produce the same value as with second pointer.
With that information you could (sometimes) remove the load, but you can't replace %X with %Y.
Actually it would be probably safe to replace only this one use, but if you are not planning to remove the load, then it should not make any difference for optimizations.

sanjoy added a comment.Apr 3 2017, 9:07 AM

I was not using pointer equality in the above example. I meant to say that if the two SSA values %X and %Y (of type i32*, say) are MustAlias, then they must refer to the same memory location by definition (whether they're pointer equal or not is irrelevant). Which means loading from one should be the same as loading from the other (otherwise what does "referring to the same memory location" even mean?).

I would say that if you have these two loads of different ssa pointers, but you know they mustalias, then you know that the load will produce the same value as with second pointer.
With that information you could (sometimes) remove the load, but you can't replace %X with %Y.

That seems a bit counter-intuitive to me, but I'd be happy with an edit to docs/AliasAnalysis.rst clearly explaining this.

Btw, I realized my example was misleading -- there is no need for the first load. I could have just said:

%t = load %Y

to (with %Y must-alias `%Y)

%t = load %X

Actually it would be probably safe to replace only this one use, but if you are not planning to remove the load, then it should not make any difference for optimizations.

I presume you'll at least need to drop the invariant.group metadata from the load whose pointer you're transforming. But, as I said, I won't be unhappy with disallowing the above transform altogether.

One more thing -- I suppose you can't use this MustAlias together with the properties conferred by the invariant.group metadata, right? E.g. in

vtbl_0 = load ptr0, !invariant.group
// placement new
*ptr0 = new_vtable
ptr1 = barrier(ptr0)
vtbl_1 = load ptr1, !invariant.group

you can't apply both of these rules

a. ptr0 must-alias ptr1
b. vtbl_1 is an invariant.group, so the store of the new vtable can be ignored

since that will let you forward vtbl_0 to vtbl_1, which isn't what you want.

If that's true, I think it too is subtle enough to put in writing (probably as a comment).

I was not using pointer equality in the above example. I meant to say that if the two SSA values %X and %Y (of type i32*, say) are MustAlias, then they must refer to the same memory location by definition (whether they're pointer equal or not is irrelevant). Which means loading from one should be the same as loading from the other (otherwise what does "referring to the same memory location" even mean?).

I would say that if you have these two loads of different ssa pointers, but you know they mustalias, then you know that the load will produce the same value as with second pointer.
With that information you could (sometimes) remove the load, but you can't replace %X with %Y.

That seems a bit counter-intuitive to me, but I'd be happy with an edit to docs/AliasAnalysis.rst clearly explaining this.

Btw, I realized my example was misleading -- there is no need for the first load. I could have just said:

%t = load %Y

to (with %Y must-alias `%Y)

%t = load %X

Actually it would be probably safe to replace only this one use, but if you are not planning to remove the load, then it should not make any difference for optimizations.

I presume you'll at least need to drop the invariant.group metadata from the load whose pointer you're transforming. But, as I said, I won't be unhappy with disallowing the above transform altogether.

One more thing -- I suppose you can't use this MustAlias together with the properties conferred by the invariant.group metadata, right? E.g. in

vtbl_0 = load ptr0, !invariant.group
// placement new
*ptr0 = new_vtable
ptr1 = barrier(ptr0)
vtbl_1 = load ptr1, !invariant.group

you can't apply both of these rules

a. ptr0 must-alias ptr1
b. vtbl_1 is an invariant.group, so the store of the new vtable can be ignored

since that will let you forward vtbl_0 to vtbl_1, which isn't what you want.

If that's true, I think it too is subtle enough to put in writing (probably as a comment).

I was not using pointer equality in the above example. I meant to say that if the two SSA values %X and %Y (of type i32*, say) are MustAlias, then they must refer to the same memory location by definition (whether they're pointer equal or not is irrelevant). Which means loading from one should be the same as loading from the other (otherwise what does "referring to the same memory location" even mean?).

I would say that if you have these two loads of different ssa pointers, but you know they mustalias, then you know that the load will produce the same value as with second pointer.
With that information you could (sometimes) remove the load, but you can't replace %X with %Y.

That seems a bit counter-intuitive to me, but I'd be happy with an edit to docs/AliasAnalysis.rst clearly explaining this.

Btw, I realized my example was misleading -- there is no need for the first load. I could have just said:

%t = load %Y

to (with %Y must-alias `%Y)

%t = load %X

Actually it would be probably safe to replace only this one use, but if you are not planning to remove the load, then it should not make any difference for optimizations.

I presume you'll at least need to drop the invariant.group metadata from the load whose pointer you're transforming. But, as I said, I won't be unhappy with disallowing the above transform altogether.

One more thing -- I suppose you can't use this MustAlias together with the properties conferred by the invariant.group metadata, right? E.g. in

vtbl_0 = load ptr0, !invariant.group
// placement new
*ptr0 = new_vtable
ptr1 = barrier(ptr0)
vtbl_1 = load ptr1, !invariant.group

you can't apply both of these rules

a. ptr0 must-alias ptr1
b. vtbl_1 is an invariant.group, so the store of the new vtable can be ignored

since that will let you forward vtbl_0 to vtbl_1, which isn't what you want.

If that's true, I think it too is subtle enough to put in writing (probably as a comment).

But what advantage does it give us, that we can replace one value with another in loads/stores? I don't think it should give any, because if one pass would take advantage of it, then it can do it the same way by just using AA. It add unnecessary complexity and we would loose soem of optimizations (like the dropping of invariant.group).
I think we should keep with the model we have right now, in which optimizations can know that loads of different pointers that mustalias produces the same value, but it can't replace the ssa values.

But what advantage does it give us, that we can replace one value with another in loads/stores?

The same redundancy elimination advantage you get anywhere else

I don't think it should give any, because if one pass would take advantage of it, then it can do it the same way by just using AA.

Err, if the address is expensive to compute, it definitely gives an advantage. It also helps prove conditionals, and a whole host of things.
We canonicalize and redundancy eliminate not just for speed, but to make the IR look simpler for other things.

It add unnecessary complexity

To what?
Invariant.group?
It's metadata, vs regualr IR semantics :)
I'm okay with adding complexity to metadata if it makes regular IR simpler to reason about.
It's complexity to say "it's okay to infer this but not that".
Complexity cuts both ways here

and we would loose soem of optimizations (like the dropping of invariant.group).

Honestly, i think you are a bit blinded here.
We lose optimization either way. You just happen to like the optimizations it helps with. But you also assume we'll lose them if we don't do it that way.
That's probably wrong. We can almost always invent better ways to represent things.
:)

I think we should keep with the model we have right now, in which optimizations can know that loads of different pointers that mustalias produces the same value, but it can't replace the ssa values.

Maybe. But i'm going to be honest. I don't think your argument for why we should do that seems very strong so far, and sanjoy is making a pretty cogent argument.

Does that mean we should do it? No, but we need to have a real discussion of the benefits/tradeoffs.
The benefits are not "we can devirtualize", because there are lots of ways to make that happen. The main benefit is it's already done. But that doesn't mean we shouldn't redesign it if we need to.
In the meantime, if it means invariant.group can't be as useful in as many places, we'll get over it.
We always find better ways to express things.

We are rarely getting metadata exactly right on the first try, and that's okay.
We shouldn't stick with things just because we did it.
(we shouldn't change things just beause we can, either)

I don't think there is any advantage of canonicalizing like this based on mustalias - it is possible, but if there would be real advantage then someone would already figure it out and we would do it. My suspect is that every intrinsic that returns, and doesn't care about keeping the value, is marked with returned attribute and we do already replace it.

There is another argument, why we should not replace the values.

%x = call i32 @foo(i8* %ptr)

If we know that foo will produce the same value as if we would give it another pointer - %ptr2 (where we don't know if %ptr == %ptr2), is it a valid reason to replace it with %ptr2?

%x = call i32 @foo(i8* %ptr2)

In general I don't think it is. Even if @foo would be readonly or readnone, it should't be good reason to do so. But the knowledge is still very useful if we can eliminate calls to foo based
on it, like:

%x = call i32 @foo(i8* %ptr)
%t = call i32 @foo(i8* %ptr2)

=>
%x = call i32 @foo(i8* %ptr)
%y = %x

I am really open to another other ideas and I hope Chandler will join the discussion.

Hi Piotr,

My point was less that "doing the replacement is super important for performance", but that "I'm worried that if we cannot do that transform we must have changed something fundamental".

It is the same as, hypothetically, not being able to transform %t = add i32 5, 10 to %t = add i32 10, 5. Obviously, the transformation is not very useful (since you'll normally just fold %t to 15) but the fact that we (hypothetically) cannot do that is surprising.

In any case, I think the replacement is fine as long as you also drop the !invariant.group associated with the load, if any. That is, the following is fine I think:

load i32 %X, !invariant.group

(%X must-alias %Y) =>

load i32 %Y

However, that needs to be documented, and part of the spec for !invariant.group.

Prazek added a comment.Apr 3 2017, 1:53 PM

Hi Piotr,

My point was less that "doing the replacement is super important for performance", but that "I'm worried that if we cannot do that transform we must have changed something fundamental".

It is the same as, hypothetically, not being able to transform %t = add i32 5, 10 to %t = add i32 10, 5. Obviously, the transformation is not very useful (since you'll normally just fold %t to 15) but the fact that we (hypothetically) cannot do that is surprising.

In any case, I think the replacement is fine as long as you also drop the !invariant.group associated with the load, if any. That is, the following is fine I think:

load i32 %X, !invariant.group

(%X must-alias %Y) =>

load i32 %Y

However, that needs to be documented, and part of the spec for !invariant.group.

Ok, if this is the case then I totally agree - the invariant.group is associated with the SSA value, so we need to drop that if we woudl perform such transformation. I will try to send patch for a review about the documentation tomorrow.

Ok, the documentation has been fixed in https://reviews.llvm.org/D31758. Is there anything I should fix before submiting this patch?

ping, it is blocking 2 other patches

sanjoy accepted this revision.Apr 24 2017, 10:58 AM

While I'm still not 100% sure about the approach, at this point I'm okay with this going in.

One thing you can do to assuage my concerns quite a bit is specify the invariant.group feature as "experimental" (after discussing on llvm-dev). That way if we find issues with the design later on we can change it in backward incompatible ways without breaking users already using it for things other than devirtualization.

include/llvm/IR/Value.h
490 ↗(On Diff #93816)

I'd s/stripPointerCastsWithBarriers/stripPointerCastsAndBarriers/

This revision is now accepted and ready to land.Apr 24 2017, 10:58 AM

While I'm still not 100% sure about the approach, at this point I'm okay with this going in.

One thing you can do to assuage my concerns quite a bit is specify the invariant.group feature as "experimental" (after discussing on llvm-dev). That way if we find issues with the design later on we can change it in backward incompatible ways without breaking users already using it for things other than devirtualization.

I think it is a good idea. I will do so

This revision was automatically updated to reflect the committed changes.
Prazek marked an inline comment as done.