This is an archive of the discontinued LLVM Phabricator instance.

[GlobalOpt] Perform store->dominated load forwarding for stored once globals
ClosedPublic

Authored by aeubanks on Jun 18 2022, 4:17 PM.

Details

Summary

The initial land incorrectly optimized forwarding non-Constants in non-nosync/norecurse functions. Bail on non-Constants since norecurse should cause global -> alloca promotion anyway.

The initial land also incorrectly assumed that StoredOnceStore was the only store to the global, but it actually means that only one value other than the global initializer is stored. Add a check that there's only one store.

Compile time tracker:
https://llvm-compile-time-tracker.com/compare.php?from=c80b88ee29f34078d2149de94e27600093e6c7c0&to=ef2c2b7772424b6861a75e794f3c31b45167304a&stat=instructions

Diff Detail

Event Timeline

aeubanks created this revision.Jun 18 2022, 4:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2022, 4:17 PM
aeubanks requested review of this revision.Jun 18 2022, 4:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2022, 4:17 PM
aeubanks planned changes to this revision.Jun 18 2022, 7:49 PM
aeubanks edited the summary of this revision. (Show Details)Jun 18 2022, 10:52 PM
aeubanks added a reviewer: asbirlea.
nikic accepted this revision.Jun 19 2022, 1:49 AM

LGTM

I noticed that currently GlobalOpt invalidates all analyses if it makes any change. Can't we preserve at least all the CFG analyses, such as the domtree computed here?

llvm/lib/Transforms/IPO/GlobalOpt.cpp
1447

getValueOperand

This revision is now accepted and ready to land.Jun 19 2022, 1:49 AM
This revision was landed with ongoing or failed builds.Jun 19 2022, 10:27 AM
This revision was automatically updated to reflect the committed changes.
jdoerfert added a subscriber: jdoerfert.EditedJun 21 2022, 6:28 AM

This breaks in the presence of synchronization.

static int X = 42;

int foo(int Arg) {
  sync();
  X = Arg;
  sync();
  return X;
}

Two threads call foo with two values of Arg and the sync will ensure they proceed "one after another":

    X = 42;
sync
T1: X = Arg1;
sync
T2: X = Arg2;
sync
T1, T2: return Arg2

Whereas replacing the load will result

T1, T2, return Arg{1,2}

Right? If foo is nosync this should work though.

jdoerfert reopened this revision.Jun 21 2022, 6:29 AM
This revision is now accepted and ready to land.Jun 21 2022, 6:29 AM
aeubanks updated this revision to Diff 438749.Jun 21 2022, 10:25 AM

bail on forwarding non-Constants in non-nosync functions

aeubanks edited the summary of this revision. (Show Details)Jun 21 2022, 10:25 AM
jdoerfert accepted this revision.Jun 21 2022, 11:58 AM

LG, I think :)

nikic added inline comments.Jun 21 2022, 12:05 PM
llvm/lib/Transforms/IPO/GlobalOpt.cpp
1452

Possible complication here might be storing the address of a thread-local, which is Constant but not necessarily the same between two invocations. (Though that would be solved by https://reviews.llvm.org/D125291.)

jdoerfert added inline comments.Jun 21 2022, 1:26 PM
llvm/lib/Transforms/IPO/GlobalOpt.cpp
1452

Good point. Constant::isThreadDependent. Though. I don't know why we require constant in the first place. What is the motivation here?

aeubanks updated this revision to Diff 438827.Jun 21 2022, 2:03 PM
aeubanks edited the summary of this revision. (Show Details)

treat thread local constants as non-constants

I think I misread it earlier. Constant & !thread_dependent *or* nosync. This makes more sense.

aeubanks updated this revision to Diff 438853.Jun 21 2022, 3:33 PM

remove check for thread_local global. GlobalStatus::analyzeGlobal() actually already bails if it sees a store of a thread_local variable
keep thread_local tests just to make sure

asbirlea accepted this revision.Jun 21 2022, 6:13 PM

Latest diff looks good to me.

It seems to me following test showed unexpected result

target datalayout = "e-m:e-i64:64-p:64:64-n32:64-S128-v256:256:256-v512:512:512"
target triple = "powerpc64le-unknown-linux-gnu"

@lwvar1 = internal global i32 0, align 4

define dso_local signext i32 @main() {
plmix5_entry:
  %T_4 = alloca i32, align 4
  %"#87" = alloca i32, align 4
  store i32 1, i32* @lwvar1, align 4
  store i32 0, i32* @lwvar1, align 4
  %_val_lwvar1_212 = load i32, i32* @lwvar1, align 4
  store i32 %_val_lwvar1_212, i32* %"#87", align 4
  unreachable
}

Tried with D128128.id438853.diff: opt -globalopt -S < %s

; ModuleID = '<stdin>'
source_filename = "<stdin>"
target datalayout = "e-m:e-i64:64-p:64:64-n32:64-S128-v256:256:256-v512:512:512"
target triple = "powerpc64le-unknown-linux-gnu"

define dso_local signext i32 @main() local_unnamed_addr {
plmix5_entry:
  %T_4 = alloca i32, align 4
  %"#87" = alloca i32, align 4
  store i32 1, i32* %"#87", align 4
  unreachable
}

^^^ so much for stored once

Reduced TC that fail with -O3 -flto:

int counter = 0;

struct S {
        int m;
        S(int i=0) : m(i) { }
        S(S&) { counter++; }
        S& operator=(S&) { counter++; return *this; }
} s;

int main(void)
{
        try {   
                try {   
                        throw s;
                }
                catch (...) {
                        counter = 0;
                        throw;
                }
        }
        catch (...) {
        }

        if (counter != 0) {
                return 1;
        }

        return  0;
}
aeubanks planned changes to this revision.Jun 22 2022, 9:49 AM

turns out GlobalStatus::StoredOnce means "one store that's not the initializer but potentially other stores of the initializer"
will fix

aeubanks edited the summary of this revision. (Show Details)Jun 22 2022, 1:54 PM
aeubanks edited the summary of this revision. (Show Details)
This revision is now accepted and ready to land.Jun 22 2022, 1:57 PM

A quick try on B171416: Diff 439149 shows one case still fails. Please allow me some time to look into the details and reduce the error if there is any. I will update later.

A quick try on B171416: Diff 439149 shows one case still fails. Please allow me some time to look into the details and reduce the error if there is any. I will update later.

This time, I saw error on recursive function.

target datalayout = "e-m:e-i64:64-p:64:64-n32:64-S128-v256:256:256-v512:512:512"
target triple = "powerpc64le-unknown-linux-gnu"

@j = internal unnamed_addr global i32 0, align 4

; Function Attrs: nobuiltin nofree noinline nosync
define dso_local signext i32 @recursive_sub(i32* noalias nocapture dereferenceable(4) %.i) local_unnamed_addr #4 {
recursive_sub_entry:
  %_val_i_ = load i32, i32* %.i, align 4
  %_add_tmp = sub i32 6, %_val_i_
  store i32 %_add_tmp, i32* @j, align 4
  %_add_tmp8 = add nsw i32 %_val_i_, 1
  store i32 %_add_tmp8, i32* %.i, align 4
  %0 = tail call i32 @recursive_sub(i32* nonnull %.i)
  %_val_ptr_ = load i32, i32* @j, align 4
  %_add_tmp11 = add nsw i32 %_val_ptr_, %0
  ret i32 1
}

attributes #4 = { nobuiltin nofree noinline nosync }

opt -globalopt -S < %s

; ModuleID = '<stdin>'
source_filename = "<stdin>"
target datalayout = "e-m:e-i64:64-p:64:64-n32:64-S128-v256:256:256-v512:512:512"
target triple = "powerpc64le-unknown-linux-gnu"

; Function Attrs: nobuiltin nofree noinline nosync
define dso_local signext i32 @recursive_sub(i32* noalias nocapture dereferenceable(4) %.i) local_unnamed_addr #0 {
recursive_sub_entry:
  %_val_i_ = load i32, i32* %.i, align 4
  %_add_tmp = sub i32 6, %_val_i_
  %_add_tmp8 = add nsw i32 %_val_i_, 1
  store i32 %_add_tmp8, i32* %.i, align 4
  %0 = tail call i32 @recursive_sub(i32* nonnull %.i)
  %_add_tmp11 = add nsw i32 %_add_tmp, %0
  ret i32 1
}

attributes #0 = { nobuiltin nofree noinline nosync }
nikic added a comment.Jun 23 2022, 5:26 AM

Yeah, the non-constant case needs both nosync and norecurse.

Yeah, the non-constant case needs both nosync and norecurse.

Attributor calls this "dynamically unique" and I could have thought of this before.

aeubanks edited the summary of this revision. (Show Details)Jun 23 2022, 9:51 AM
aeubanks updated this revision to Diff 439442.Jun 23 2022, 9:52 AM

bail on non Constants, if we need norecurse then at that point global -> alloca promotion should kick in anyway

nikic accepted this revision.Jun 24 2022, 6:13 AM

LG, fingers crossed.

This revision was landed with ongoing or failed builds.Jun 24 2022, 9:09 AM
This revision was automatically updated to reflect the committed changes.