This is an archive of the discontinued LLVM Phabricator instance.

[ArgPromotion] Extend search for SafeToUnconditionallyLoad indices to the blocks that must be executed upon entry into the function.
Needs ReviewPublic

Authored by mgudim on Jan 7 2020, 6:29 PM.

Details

Summary

Currently, isSafeToPromoteArgument only scans the function's entry block to find indices for loads which are safe to move from callee to the caller. In this patch, we extend this search to all the blocks which must be executed upon entry into the function.

Diff Detail

Event Timeline

mgudim created this revision.Jan 7 2020, 6:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2020, 6:29 PM
mgudim added a comment.Jan 7 2020, 6:37 PM

Hi all, most likely this is not the final version of the patch, but I am posting this initial version to get some suggestions/feedback. Soon I will try to collect some statistics to see how much impact on performance this change has.

jdoerfert requested changes to this revision.Jan 9 2020, 8:24 AM
jdoerfert added a reviewer: jdoerfert.
jdoerfert added a subscriber: jdoerfert.

Using post-dominator information is not safe. Take this code:

static int foo(int c, int *a) {
  if (c) {
    while (1) {}
  } else {
    my_personal_exit();    // < no return
  }
  return *a; // < post dominator is dead
}

What you can use is the must-be-executed context as it was introduced for this reason, e.g.,:

MustBeExecutedContextExplorer Explorer(...);
auto EIt = Explorer.begin(&EntryBB.front()), EEnd = Explorer.end(&EntryBB.front());
do {
  Instruction *ExecutedI = EIt.getCurrentInst();
  // extract properties from ExecutedI.
} while (++EIt != EEnd);

FWIW, I'm strongly in favor of removing ArgumentPromotion as a pass, the first step towards this goal is D68852, I just haven't had the time to rebase and merge it.
I want this for various reasons, including the bugs in ArgumentPromotion, which are fixable of course. (See the bugzilla PR42852, PR887, PR42683). Generally speaking,
a lot of things happening in this pass are, or should be, part of other analyses/transformations.

If you would be interested to make the pointer privatization in the Attributor stronger, I'd be happy to help with that.

@jdoerfert I see, thanks for the example - I did not think about that. I will try the "MustBeExecutedContextExplorer".

mgudim updated this revision to Diff 238560.Jan 16 2020, 11:55 AM
mgudim retitled this revision from [ArgPromotion] Extend search for SafeToUnconditionallyLoad indices to all post-dominators of the entry block. to [ArgPromotion] Extend search for SafeToUnconditionallyLoad indices to the blocks that must be executed upon entry into the function..
mgudim edited the summary of this revision. (Show Details)

@jdoerfert I updated the patch as you suggested. Also, I added your example as a test.

This should fix PR887: https://bugs.llvm.org/show_bug.cgi?id=887
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.


This should also fix PR42039: https://bugs.llvm.org/show_bug.cgi?id=42039
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.

@jdoerfert I updated the patch as you suggested. Also, I added your example as a test.

I don't see the example.

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
637

I think this is correct but it can be expensive. Since this runs by default in O3 you might want to verify the compile time impact and potentially put the construction under a flag.

mgudim updated this revision to Diff 238581.Jan 16 2020, 12:45 PM

updated the comment.

This should fix PR887: https://bugs.llvm.org/show_bug.cgi?id=887
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.

My patch does not make any difference. It looks like 887 is related to pass ordering: the load can be hoisted up into the entry block, but hoisting only runs after argpromotion


This should also fix PR42039: https://bugs.llvm.org/show_bug.cgi?id=42039
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.

With my patch argpromotion does not happen, so I'll add this to the test cases and mention it in commit message.

@jdoerfert I updated the patch as you suggested. Also, I added your example as a test.

I don't see the example.

It's at the very end of control-flow3.ll:
; CHECK-LABEL: define internal i32 @callee2(i32* %P) {
define internal i32 @callee2(i32* %P) {
entry:

br label %bb1

bb1:

%gep0 = getelementptr i32, i32* %P, i64 0
; CHECK: %X = load i32, i32* %gep0
%X = load i32, i32* %gep0
br label %bb1

bb2:

%gep1 = getelementptr i32, i32* %P, i64 1
; CHECK: %Y = load i32, i32* %gep1
%Y = load i32, i32* %gep1
ret i32 %X

}

define i32 @caller2() {

%A = alloca i32
store i32 17, i32* %A
; CHECK: %X = call i32 @callee2(i32* %A)
%X = call i32 @callee2(i32* %A)
ret i32 %X

}

This should fix PR887: https://bugs.llvm.org/show_bug.cgi?id=887
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.

My patch does not make any difference. It looks like 887 is related to pass ordering: the load can be hoisted up into the entry block, but hoisting only runs after argpromotion

Right. Sorry. We are still missing DD65593. Anyway, all this logic should not be here in the first place so never mind ;)


This should also fix PR42039: https://bugs.llvm.org/show_bug.cgi?id=42039
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.

With my patch argpromotion does not happen, so I'll add this to the test cases and mention it in commit message.

Yes, thanks.

@jdoerfert I updated the patch as you suggested. Also, I added your example as a test.

I don't see the example.

It's at the very end of control-flow3.ll:
; CHECK-LABEL: define internal i32 @callee2(i32* %P) {
define internal i32 @callee2(i32* %P) {
entry:

br label %bb1

bb1:

%gep0 = getelementptr i32, i32* %P, i64 0
; CHECK: %X = load i32, i32* %gep0
%X = load i32, i32* %gep0
br label %bb1

bb2:

%gep1 = getelementptr i32, i32* %P, i64 1
; CHECK: %Y = load i32, i32* %gep1
%Y = load i32, i32* %gep1
ret i32 %X

}

define i32 @caller2() {

%A = alloca i32
store i32 17, i32* %A
; CHECK: %X = call i32 @callee2(i32* %A)
%X = call i32 @callee2(i32* %A)
ret i32 %X

}

Isn't that an endless loop without side-effects? We should avoid test like these. If you run it through the attributor (https://godbolt.org/z/xTPcmT) you see why this is not too meaningful.