Page MenuHomePhabricator

Make llvm.launder.invariant.group writeonly
Needs RevisionPublic

Authored by aeubanks on Sep 9 2021, 3:20 PM.

Details

Summary

-fstrict-vtable-pointers causes many functions to no longer be
considered writeonly. This is because basic-aa thinks launder may read
from the pointer operand.

From the LangRef on writeonly:
If a writeonly function reads memory visible to the program, or has
other side-effects, the behavior is undefined. If a function reads from
a writeonly pointer argument, the behavior is undefined.

launder fits this criteria.

Diff Detail

Unit TestsFailed

TimeTest
700 msx64 debian > libomp.lock::omp_init_lock.c
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -I /var/lib/buildkite-agent/builds/llvm-project/openmp/runtime/test -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -I /var/lib/buildkite-agent/builds/llvm-project/openmp/runtime/test/ompt /var/lib/buildkite-agent/builds/llvm-project/openmp/runtime/test/lock/omp_init_lock.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/test/lock/Output/omp_init_lock.c.tmp -lm -latomic && /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/test/lock/Output/omp_init_lock.c.tmp

Event Timeline

aeubanks created this revision.Sep 9 2021, 3:20 PM
aeubanks requested review of this revision.Sep 9 2021, 3:20 PM
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a reviewer: baziotis. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
nikic added a subscriber: nikic.Sep 10 2021, 2:01 AM

Do you have any examples of where this is relevant in practice?

The change looks sound to me, but should probably apply to all intrinsics that exist to model control dependence, i.e. assume, noalias.scope.decl, annotation etc.

Prazek accepted this revision.Sep 10 2021, 2:19 AM

This seems to be valid!
Can any func attr expert can comment on what sort of optimizations/reordering this enables? I am a bit worried we might be missing something.

This revision is now accepted and ready to land.Sep 10 2021, 2:19 AM

Do you have any examples of where this is relevant in practice?

The change looks sound to me, but should probably apply to all intrinsics that exist to model control dependence, i.e. assume, noalias.scope.decl, annotation etc.

+1, it is nice to propagate more function attributes, but I am not sure if it is going go give us much.

Prazek added inline comments.Sep 10 2021, 2:44 AM
llvm/include/llvm/IR/Intrinsics.td
1156

Could you expand the comment here with writeonly?
Optional: s/barrier/launder

Prazek requested changes to this revision.Sep 10 2021, 5:10 AM

Actually, as Krzysztof Pszeniczny pointed out offline, I think we can't mark it as writeonly, because it would let optimizer to optimize:

%a = launder(%0)
%b = launder(%0)

As it could assume that both launders store the same value. This could bite us in cases like (https://godbolt.org/z/YaGrrWosj):

void foo(A *a) {
    auto *b = new(a)B;
    use(b);
    auto *c = new(a)A;
}

Here we will have both launders use the same pointer argument (a), so call like c->foo() could be optimzied to b->foo() if two launders would be merged.

However, for somre reason LLVM does not do that if I mark the launder as writeonly - probably a missed optimization? (but perhaps not that important).
Does is make sense, or am I misuderstanding the writeonly?

This revision now requires changes to proceed.Sep 10 2021, 5:10 AM
nikic added a comment.Sep 10 2021, 6:06 AM

@Prazek writeonly only says that no memory is read, it does not determine which location is actually written. launder.invariant.groupdoes not read or write the pointer operands (it is inaccessiblememonly), it "writes" some inaccessible location to avoid code motion optimizations. To perform DSE, we need to know the location being written.

nikic added a comment.Sep 10 2021, 8:43 AM

Generally though, I think we should consider adding a new attribute to mark intrinsics like this, rather than using inaccessiblememonly. Ideally we don't want to mix these up with memory effects, and just want to say "don't remove or CSE this intrinsic" ("do not hoist" is already implied by lack of speculatable).

Generally though, I think we should consider adding a new attribute to mark intrinsics like this, rather than using inaccessiblememonly. Ideally we don't want to mix these up with memory effects, and just want to say "don't remove or CSE this intrinsic" ("do not hoist" is already implied by lack of speculatable).

I think we should revisit our attribute system for effects and make it flexible enough that "don't remove or cse this intrinsic" falls out as a special case which is not interacting with anything else.
We'll send an RFC today for global effects, after that I want to start a discussion on a flexible system.

@Prazek writeonly only says that no memory is read, it does not determine which location is actually written. launder.invariant.groupdoes not read or write the pointer operands (it is inaccessiblememonly), it "writes" some inaccessible location to avoid code motion optimizations. To perform DSE, we need to know the location being written.

If a writeonly function is called multiple times, can it can only write a constant to a constant location or the argument? To be able to write to different locations across calls or write a different value, it'd have to do read memory to be able to determine when to do something different right? e.g. a global counter to write a different on multiple invocations.
So could we prove that it does the exact same thing across multiple back-to-back invocations given the same argument? Or could we consider this intrinsic to not really read to memory but still not produce the same results upon back-to-back invocations with the same argument?

jdoerfert added a comment.EditedSep 10 2021, 2:21 PM

@Prazek writeonly only says that no memory is read, it does not determine which location is actually written. launder.invariant.groupdoes not read or write the pointer operands (it is inaccessiblememonly), it "writes" some inaccessible location to avoid code motion optimizations. To perform DSE, we need to know the location being written.

If a writeonly function is called multiple times, can it can only write a constant to a constant location or the argument? To be able to write to different locations across calls or write a different value, it'd have to do read memory to be able to determine when to do something different right? e.g. a global counter to write a different on multiple invocations.
So could we prove that it does the exact same thing across multiple back-to-back invocations given the same argument? Or could we consider this intrinsic to not really read to memory but still not produce the same results upon back-to-back invocations with the same argument?

"write a constant to a constant location" is only true if you have no arguments. Basically, constants + arguments is what you have to decide what and where to write. Same arguments + write only => same effect.

[EDIT] On second though. We might have intrinsics that make this more complicated as they allow you to get input without reading things. I'm not 100% sure but the "read thread ID" intrinsics on GPUs might fall into that category.

@Prazek writeonly only says that no memory is read, it does not determine which location is actually written. launder.invariant.groupdoes not read or write the pointer operands (it is inaccessiblememonly), it "writes" some inaccessible location to avoid code motion optimizations. To perform DSE, we need to know the location being written.

If a writeonly function is called multiple times, can it can only write a constant to a constant location or the argument? To be able to write to different locations across calls or write a different value, it'd have to do read memory to be able to determine when to do something different right? e.g. a global counter to write a different on multiple invocations.
So could we prove that it does the exact same thing across multiple back-to-back invocations given the same argument? Or could we consider this intrinsic to not really read to memory but still not produce the same results upon back-to-back invocations with the same argument?

"write a constant to a constant location" is only true if you have no arguments. Basically, constants + arguments is what you have to decide what and where to write. Same arguments + write only => same effect.

[EDIT] On second though. We might have intrinsics that make this more complicated as they allow you to get input without reading things. I'm not 100% sure but the "read thread ID" intrinsics on GPUs might fall into that category.

Basically can we optimize

%b = @llvm.writeonly(%a)
%c = @llvm.writeonly(%a)

to just

%b = @llvm.writeonly(%a)

and RAUW %c with %b

@Prazek writeonly only says that no memory is read, it does not determine which location is actually written. launder.invariant.groupdoes not read or write the pointer operands (it is inaccessiblememonly), it "writes" some inaccessible location to avoid code motion optimizations. To perform DSE, we need to know the location being written.

If a writeonly function is called multiple times, can it can only write a constant to a constant location or the argument? To be able to write to different locations across calls or write a different value, it'd have to do read memory to be able to determine when to do something different right? e.g. a global counter to write a different on multiple invocations.
So could we prove that it does the exact same thing across multiple back-to-back invocations given the same argument? Or could we consider this intrinsic to not really read to memory but still not produce the same results upon back-to-back invocations with the same argument?

"write a constant to a constant location" is only true if you have no arguments. Basically, constants + arguments is what you have to decide what and where to write. Same arguments + write only => same effect.

[EDIT] On second though. We might have intrinsics that make this more complicated as they allow you to get input without reading things. I'm not 100% sure but the "read thread ID" intrinsics on GPUs might fall into that category.

Basically can we optimize

%b = @llvm.writeonly(%a)
%c = @llvm.writeonly(%a)

to just

%b = @llvm.writeonly(%a)

and RAUW %c with %b

I'd argue that we can't without further information -- the write may be volatile.

@Prazek writeonly only says that no memory is read, it does not determine which location is actually written. launder.invariant.groupdoes not read or write the pointer operands (it is inaccessiblememonly), it "writes" some inaccessible location to avoid code motion optimizations. To perform DSE, we need to know the location being written.

If a writeonly function is called multiple times, can it can only write a constant to a constant location or the argument? To be able to write to different locations across calls or write a different value, it'd have to do read memory to be able to determine when to do something different right? e.g. a global counter to write a different on multiple invocations.
So could we prove that it does the exact same thing across multiple back-to-back invocations given the same argument? Or could we consider this intrinsic to not really read to memory but still not produce the same results upon back-to-back invocations with the same argument?

"write a constant to a constant location" is only true if you have no arguments. Basically, constants + arguments is what you have to decide what and where to write. Same arguments + write only => same effect.

[EDIT] On second though. We might have intrinsics that make this more complicated as they allow you to get input without reading things. I'm not 100% sure but the "read thread ID" intrinsics on GPUs might fall into that category.

Basically can we optimize

%b = @llvm.writeonly(%a)
%c = @llvm.writeonly(%a)

to just

%b = @llvm.writeonly(%a)

and RAUW %c with %b

I'd argue that we can't without further information -- the write may be volatile.

If that is the case, then it seems that writeonly is useless (and probably should be specified that it can't do volatile writes).
For example, readonly says:
"A readonly function always returns the same value (or unwinds an exception identically) when called with the same set of arguments and global state. "
One could argue that readonly function could do volatile load, and thus can't be CSEd.

volatile reads are considered write effects, IIRC: https://godbolt.org/z/73Tqjj4rc

volatile reads are considered write effects, IIRC: https://godbolt.org/z/73Tqjj4rc

But a volatile store is considered to read and write?
https://godbolt.org/z/hjr7hrscP

This one shows that function doing volatile store is not marked with writeonly, which adds to my point that in my understanding, writeonly function with identical arguments will store the same values (which means it can be CSEd)