Currently, isSafeToPromoteArgument only scans the function's entry block to find indices for loads which are safe to move from callee to the caller. In this patch, we extend this search to all the blocks which must be executed upon entry into the function.
Diff Detail
Event Timeline
Hi all, most likely this is not the final version of the patch, but I am posting this initial version to get some suggestions/feedback. Soon I will try to collect some statistics to see how much impact on performance this change has.
Using post-dominator information is not safe. Take this code:
static int foo(int c, int *a) { if (c) { while (1) {} } else { my_personal_exit(); // < no return } return *a; // < post dominator is dead }
What you can use is the must-be-executed context as it was introduced for this reason, e.g.,:
MustBeExecutedContextExplorer Explorer(...); auto EIt = Explorer.begin(&EntryBB.front()), EEnd = Explorer.end(&EntryBB.front()); do { Instruction *ExecutedI = EIt.getCurrentInst(); // extract properties from ExecutedI. } while (++EIt != EEnd);
FWIW, I'm strongly in favor of removing ArgumentPromotion as a pass, the first step towards this goal is D68852, I just haven't had the time to rebase and merge it.
I want this for various reasons, including the bugs in ArgumentPromotion, which are fixable of course. (See the bugzilla PR42852, PR887, PR42683). Generally speaking,
a lot of things happening in this pass are, or should be, part of other analyses/transformations.
If you would be interested to make the pointer privatization in the Attributor stronger, I'd be happy to help with that.
@jdoerfert I see, thanks for the example - I did not think about that. I will try the "MustBeExecutedContextExplorer".
@jdoerfert I updated the patch as you suggested. Also, I added your example as a test.
This should fix PR887: https://bugs.llvm.org/show_bug.cgi?id=887
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.
This should also fix PR42039: https://bugs.llvm.org/show_bug.cgi?id=42039
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.
I don't see the example.
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp | ||
---|---|---|
637 | I think this is correct but it can be expensive. Since this runs by default in O3 you might want to verify the compile time impact and potentially put the construction under a flag. |
My patch does not make any difference. It looks like 887 is related to pass ordering: the load can be hoisted up into the entry block, but hoisting only runs after argpromotion
This should also fix PR42039: https://bugs.llvm.org/show_bug.cgi?id=42039
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.
With my patch argpromotion does not happen, so I'll add this to the test cases and mention it in commit message.
I don't see the example.
It's at the very end of control-flow3.ll:
; CHECK-LABEL: define internal i32 @callee2(i32* %P) {
define internal i32 @callee2(i32* %P) {
entry:
br label %bb1
bb1:
%gep0 = getelementptr i32, i32* %P, i64 0 ; CHECK: %X = load i32, i32* %gep0 %X = load i32, i32* %gep0 br label %bb1
bb2:
%gep1 = getelementptr i32, i32* %P, i64 1 ; CHECK: %Y = load i32, i32* %gep1 %Y = load i32, i32* %gep1 ret i32 %X
}
define i32 @caller2() {
%A = alloca i32 store i32 17, i32* %A ; CHECK: %X = call i32 @callee2(i32* %A) %X = call i32 @callee2(i32* %A) ret i32 %X
}
Right. Sorry. We are still missing DD65593. Anyway, all this logic should not be here in the first place so never mind ;)
This should also fix PR42039: https://bugs.llvm.org/show_bug.cgi?id=42039
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.With my patch argpromotion does not happen, so I'll add this to the test cases and mention it in commit message.
Yes, thanks.
I don't see the example.
It's at the very end of control-flow3.ll:
; CHECK-LABEL: define internal i32 @callee2(i32* %P) {
define internal i32 @callee2(i32* %P) {
entry:br label %bb1bb1:
%gep0 = getelementptr i32, i32* %P, i64 0 ; CHECK: %X = load i32, i32* %gep0 %X = load i32, i32* %gep0 br label %bb1bb2:
%gep1 = getelementptr i32, i32* %P, i64 1 ; CHECK: %Y = load i32, i32* %gep1 %Y = load i32, i32* %gep1 ret i32 %X}
define i32 @caller2() {
%A = alloca i32 store i32 17, i32* %A ; CHECK: %X = call i32 @callee2(i32* %A) %X = call i32 @callee2(i32* %A) ret i32 %X}
Isn't that an endless loop without side-effects? We should avoid test like these. If you run it through the attributor (https://godbolt.org/z/xTPcmT) you see why this is not too meaningful.
I think this is correct but it can be expensive. Since this runs by default in O3 you might want to verify the compile time impact and potentially put the construction under a flag.