This is an archive of the discontinued LLVM Phabricator instance.

Allow value forwarding past release fences in EarlyCSE
ClosedPublic

Authored by reames on Jul 22 2015, 3:22 PM.

Details

Summary

A release fence acts as a publication barrier for stores within the current thread to become visible to other threads which might observe the release fence. It does not require the current thread to observe stores performed on other threads. As a result, we can allow store-load and load-store forwarding across a release fence.

We do need to make sure that stores before the fence can be eliminated even if there's an otherwise store to the same location after the fence. In theory, we could reorder the second store above the fence and *then* eliminate the former, but we can't do this if the stores are on opposite sides of the fence.

I have a similar change, D11436, for MemoryDependenceAnalysis, but I consider this change much lower risk than that one.

p.s. The LangRef indicates only atomic loads and stores are effected by fences. This patch chooses to be far more conservative then that. I'm not even really sure the LangRef's definition is helpful.

Diff Detail

Repository
rL LLVM

Event Timeline

reames updated this revision to Diff 30410.Jul 22 2015, 3:22 PM
reames retitled this revision from to Allow value forwarding past release fences in EarlyCSE.
reames updated this object.
reames added a subscriber: llvm-commits.
jfb added a reviewer: dvyukov.Jul 22 2015, 4:17 PM
jfb updated this object.
jfb updated this object.
jfb added inline comments.Jul 22 2015, 4:47 PM
lib/Transforms/Scalar/EarlyCSE.cpp
635 ↗(On Diff #30410)

I'm not sure I understand why this isn't firing when there's an intervening release fence (test4 below). I think it could: there was a race anyways, so screw the code and keep one of the store (storing the last value, of course)? Another thread is guaranteed to observe a store to that memory location, but they're racing with each other so the first one isn't guaranteed to ever be observed.

The second store is the more hilarious one to keep, but you could just store its value before the fence?

e.g.:

*loc = 42;
std::atomic_thread_fence(std::memory_order_release);
*loc = 0;

Becomes:

std::atomic_thread_fence(std::memory_order_release);
*loc = 0;

Or:

*loc = 0;
std::atomic_thread_fence(std::memory_order_release);

?

test/Transforms/EarlyCSE/fence.ll
43 ↗(On Diff #30410)

You can't forward %a and eliminate %a2, but you could eliminate %a, and replace it by %a2 if there are no intervening uses, no?

dvyukov added inline comments.Jul 23 2015, 6:30 AM
test/Transforms/EarlyCSE/fence.ll
43 ↗(On Diff #30410)

In this particular case we can do whatever we want with accesses to %addr.i. No other thread can access it without a race.

57 ↗(On Diff #30410)

Is it a full test? Or is it something that we don't handle yet?
I am asking because in this particular case we can do whatever we want with accesses to %addr. Since we store to %addr both before and after the fence, we can conclude that this fence is not used to synchronize accesses to %addr, so we can eliminate/combine/etc accesses to %addr.

reames added inline comments.Jul 23 2015, 3:47 PM
lib/Transforms/Scalar/EarlyCSE.cpp
635 ↗(On Diff #30410)

See my inline response for test4 below.

test/Transforms/EarlyCSE/fence.ll
43 ↗(On Diff #30410)

@jfb - Yes, but that's not what the current optimization tries to do. Nor does EarlyCSE have a mechanism to do that in general even for unordered loads.

@dvyukov - Yes, but we need to actually prove that and my change to EarlyCSE is no where near that aggressive. And frankly, I'm really not interested in exploiting the undefined behaviour part of this.

Do you have suggestions for how to restructure the test in a way which still tests EarlyCSE but avoids disallowing the optimization you pointed out? Would simply removing the first load from the check sequence be sufficient?

57 ↗(On Diff #30410)

In general, I don't believe it is legal to eliminate the first store. With all memory initialized to zero, consider these two threads:

T1:
b = 2;
a = 1;
release fence
b = 4;

T2:
while (a == 0) {}
if (b == 0) abort();

In this *particular case* I think you're reasoning is sound because we can tell the store to a doesn't exist.

(Just to be clear, your concern here is an unnecessarily restricted optimization, not a correctness concern right?)

Would splitting this test into two - one with a store to 'a' which is a negative test and one as is which is marked as a missed optimization - satisfy you?

dvyukov added inline comments.Jul 24 2015, 12:04 AM
test/Transforms/EarlyCSE/fence.ll
43 ↗(On Diff #30410)

I can't answer your question. I don't understand what EarlyCSE does, change description talks only about release fences but this test contains an acquire fence. I am trying to figure out what this change does by reading the tests. And the tests says that we cannot do something that we actually can do...

57 ↗(On Diff #30410)

I don't understand. Your example is a one large data race. Behavior is undefined.

jfb added inline comments.Jul 24 2015, 8:58 AM
test/Transforms/EarlyCSE/fence.ll
43 ↗(On Diff #30410)

I think the main problem Dmitry and I have is the comment: it says the transformation can't happen but it in fact is allowed to happen and simply ins't implemented here. You're testing the implementation knowing what it does, but your comment hints to future contributors that doing this optimization isn't valid.

I'm assuming that the test can't be restructured because it would require a loop, and that's something that the current implementation won't see through. You're testing a very minimal thing instead (an overly conservative on, since it's a race!), and ensuring that the current code can't optimize it.

reames added inline comments.Jul 24 2015, 10:43 AM
test/Transforms/EarlyCSE/fence.ll
43 ↗(On Diff #30410)

Guys, would simply adding a comment to the test explaining the alternate path by which this could be optimized satisfy you? The code in earlycse needs to be specific to release fences. I don't know of an easy way to exercise the code to form a negative test without something of similar form.

p.s. I get that there is some optimization which can recognize this as a data race. However, simple load-load forwarding as implemented in EarlyCSE is not it. Nor is likely to ever be so.

jfb added inline comments.Jul 24 2015, 10:48 AM
test/Transforms/EarlyCSE/fence.ll
43 ↗(On Diff #30410)

Yes.

reames updated this revision to Diff 33260.Aug 26 2015, 3:36 PM

Adjust test comments to indicate where the negative tests are testing implementation rather than the memory model per review comments.

jfb added inline comments.Aug 26 2015, 4:01 PM
lib/Transforms/Scalar/EarlyCSE.cpp
617 ↗(On Diff #33260)

Getting re-acquainted with the code: it doesn't seem like this change exercises a test?

test/Transforms/EarlyCSE/fence.ll
59 ↗(On Diff #33260)

principle

reames added inline comments.Aug 26 2015, 4:31 PM
lib/Transforms/Scalar/EarlyCSE.cpp
617 ↗(On Diff #33260)

Huh? Not sure what you're asking. The first two tests in the included file cover cases this triggers.

test/Transforms/EarlyCSE/fence.ll
59 ↗(On Diff #33260)

Will fix in next round.

jfb added inline comments.Aug 26 2015, 5:04 PM
lib/Transforms/Scalar/EarlyCSE.cpp
617 ↗(On Diff #33260)

True, they check that store/fence/load and store/fence/store are handled properly with release.

What I mean is: the code changes, but all of the tests are *exactly the same*. If code changes but tests remains the same, then what changed? :-)

reames added inline comments.Aug 26 2015, 5:28 PM
lib/Transforms/Scalar/EarlyCSE.cpp
617 ↗(On Diff #33260)

I'm still not clear what you're trying to ask. We remove a load from both test1 and test2 which is not removed without the patch. (Well, at the time I wrote it. Haven't checked today, but I don't see any obvious new additions.) Can you clarify?

jfb added inline comments.Aug 26 2015, 5:38 PM
lib/Transforms/Scalar/EarlyCSE.cpp
617 ↗(On Diff #33260)

Ah... The diff is wrong :-)
fence.ll is new, but the current phabricator diff shows it as modified. It doesn't show modifications to test1 and test2, so it looks like they the same as current tip-of-tree.

Could you fix the diff?

reames updated this revision to Diff 33290.Aug 26 2015, 6:06 PM

Diff should be fixed. Sorry for the confusion.

jfb accepted this revision.Aug 26 2015, 6:16 PM
jfb edited edge metadata.

OK now this lgtm! I'd forgotten that fence.ll was new, so the bad diff was *very* confusing!

This revision is now accepted and ready to land.Aug 26 2015, 6:16 PM
This revision was automatically updated to reflect the committed changes.