This is an archive of the discontinued LLVM Phabricator instance.

[SimpifyCFG] Speculate a store preceded by a local non-escaping load
ClosedPublic

Authored by chill on Aug 2 2021, 7:45 AM.

Details

Summary

In SimplifyCFG we may simplify the CFG by speculatively executing
certain stores, when they are preceded by a store to the same
location. This patch allows such speculation also when the stores are
similarly preceded by a load.

In order for this transformation to be correct we need to ensure that
the memory location is writable and the store in the new location does
not introduce a data race.

Local objects (created by an alloca instruction) are always
writable, so once we are past a read from a location it is valid to
also write to that same location.

Seeing just a load does not guarantee absence of a data race (unlike
if we see a store) - the load may still be part of a race, just not
causing undefined behaviour
(cf. https://llvm.org/docs/Atomics.html#optimization-outside-atomic).

In the original program, a data race might have been prevented by the
condition, but once we move the store outside the condition, we must
be sure a data race wasn't possible anyway, no matter what the
condition evaluates to.

One way to be sure that a local object is never concurrently
read/written is check that its address never escapes the function.

Hence this transformation is restricted to local, non-escaping
objects.

Diff Detail

Event Timeline

chill created this revision.Aug 2 2021, 7:45 AM
chill requested review of this revision.Aug 2 2021, 7:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2021, 7:45 AM
nikic added a subscriber: nikic.Aug 2 2021, 7:54 AM
nikic added inline comments.
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
2196

Why does this not use PointerMayBeCaptured or similar?

chill added inline comments.Aug 2 2021, 8:08 AM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
2196

Thanks, I'll look at it.

chill updated this revision to Diff 363648.Aug 3 2021, 1:40 AM
chill marked an inline comment as done.

Please add a negative test that you believe showcases the problem the PointerMayBeCaptured() is solving.

On the profitability of this transform, it looks like this is now always profitable given the restrictions and limiations of isSafeToSpeculateStore? Its description of says:

/// Determine if we can hoist sink a sole store instruction out of a
/// conditional block.

and looking where this was checked I found:

++SpeculatedInstructions;
  if (SpeculatedInstructions > 1)

But perhaps it's good to check this and have another (negative) case for that.

Just checking if we haven't missed anything, did you ran benchmarks and didn't find any regressions related to this?

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
2232

Unrelated, but nice magic number here.

2261

The description of PointerMayBeCaptured says its an expensive function, but with magic number 9 that MaxNumInstToLookAt is set to that doesn't seem to be an issue.

chill added a comment.Aug 3 2021, 6:59 AM

Please add a negative test that you believe showcases the problem the PointerMayBeCaptured() is solving.

Will do.

On the profitability of this transform, it looks like this is now always profitable given the restrictions and limiations of isSafeToSpeculateStore? Its description of says:

/// Determine if we can hoist sink a sole store instruction out of a
/// conditional block.

and looking where this was checked I found:

++SpeculatedInstructions;
  if (SpeculatedInstructions > 1)

But perhaps it's good to check this and have another (negative) case for that.

OK.

Just checking if we haven't missed anything, did you ran benchmarks and didn't find any regressions related to this?

I've got these results (1 - Time_new/Time_old, higher is better):

500.perlbench_r-0.19%
502.gcc_r0.26%
505.mcf_r0.21%
520.omnetpp_r0.73%
523.xalancbmk_r0.29%
525.x264_r0.00%
531.deepsjeng_r0.00%
541.leela_r4.50%
557.xz_r-0.19%
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
2261

That and also the third parameter to PointerMayBeCaptured which limits the number of uses of the alloca to explore to no more than 20.

chill updated this revision to Diff 363757.Aug 3 2021, 8:44 AM

Added a couple of negative tests: for a store to an address that escapes and for more than one instruction in the block.

chill marked 2 inline comments as done.Aug 3 2021, 8:45 AM

Nice speedup for leela! And probably motivation case, right? I remember gcc’s:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430

I'm afraid i'm not seeing why @load_before_store_escape must not be transformed.
Alive2 says it's fine: https://alive2.llvm.org/ce/z/F7ctum
I'm not seeing any syncronization primitives, so i'm not really seeing yet why we'd be introducing a race?

chill added a comment.Aug 3 2021, 10:15 AM

I'm afraid i'm not seeing why @load_before_store_escape must not be transformed.
Alive2 says it's fine: https://alive2.llvm.org/ce/z/F7ctum
I'm not seeing any syncronization primitives, so i'm not really seeing yet why we'd be introducing a race?

Once we've made the address of the local variable visible to other functions, we cannot discard the possibility that those other functions may spawn threads
that write to the variable. In the original program, a data race might have been prevented by the condition, but once we move the store outside the
condition, we must be sure a data race wasn't possible anyway, no matter what the condition evaluates to.

Seeing just a load does not guarantee absence of a data race (unlike if we see a store) - the load may still be a part of a race, just not being UB (cf. https://llvm.org/docs/Atomics.html#optimization-outside-atomic)

lebedev.ri accepted this revision.Aug 3 2021, 10:26 AM

Posted https://github.com/AliveToolkit/alive2/issues/735,
but i guess this LG to me. Unless there are other comments.

This revision is now accepted and ready to land.Aug 3 2021, 10:26 AM
nikic accepted this revision.Aug 3 2021, 12:29 PM

LG to me as well.

I'm afraid i'm not seeing why @load_before_store_escape must not be transformed.
Alive2 says it's fine: https://alive2.llvm.org/ce/z/F7ctum
I'm not seeing any syncronization primitives, so i'm not really seeing yet why we'd be introducing a race?

Once we've made the address of the local variable visible to other functions, we cannot discard the possibility that those other functions may spawn threads
that write to the variable. In the original program, a data race might have been prevented by the condition, but once we move the store outside the
condition, we must be sure a data race wasn't possible anyway, no matter what the condition evaluates to.

Seeing just a load does not guarantee absence of a data race (unlike if we see a store) - the load may still be a part of a race, just not being UB (cf. https://llvm.org/docs/Atomics.html#optimization-outside-atomic)

Forgot to say, please add something along those lines at least into the patch description.

chill updated this revision to Diff 364029.Aug 4 2021, 3:42 AM
chill edited the summary of this revision. (Show Details)

Updated description and comments.

chill added a comment.Aug 4 2021, 3:42 AM

Thanks for the comments and the reviews, much appreciated!

Looks like some boot failures on Android are bisecting to this change. See https://bugs.llvm.org/show_bug.cgi?id=52390 for more info.