This is an archive of the discontinued LLVM Phabricator instance.

Prevent X86IselLowering from merging volatile loads
ClosedPublic

Authored by niravd on Mar 29 2016, 6:07 AM.

Details

Summary

In EltsFromConsecutiveLoads we check to see if a vector of values coming
from a set of loads can be merged into a single large load, but did not
check that the loads are non-volatile.

This issue is exposed by the Merge Consecutive Store cleanups in D14834

Diff Detail

Repository
rL LLVM

Event Timeline

niravd updated this revision to Diff 51894.Mar 29 2016, 6:07 AM
niravd retitled this revision from to Prevent X86IselLowering from merging volatile loads.
niravd added a reviewer: RKSimon.
niravd updated this object.
niravd added a subscriber: llvm-commits.
RKSimon added a subscriber: spatel.Mar 29 2016, 2:05 PM
RKSimon edited edge metadata.Mar 29 2016, 2:12 PM

I'm not sure if this is necessary - we already check that we're not merging consecutive loads where any/all the loads are volatiles. DAG.isConsecutiveLoad handles these cases for us (and its why we already have tests for them) - we shouldn't need to check again in EltsFromConsecutiveLoads.

If you wish to add additional volatile load tests that'd be great; but please note, we auto-generate the merge-consecutive-loads-*.ll test files (something we're trying to encourage) - manually editting these files with entwined DAG checks is not easy to maintain.

I'm not sure if this is necessary - we already check that we're not merging consecutive loads where any/all the loads are volatiles. DAG.isConsecutiveLoad handles these cases for us (and its why we already have tests for them) - we shouldn't need to check again in EltsFromConsecutiveLoads.

This is currently not caught by the current tests, but if you apply D18062 and D14834 which clean up Merging of Consecutive Stores (and willl hopefully be commited soon). merge-consecutive-loads-128.ll will produce wrong code. Specifically in "merge_4f32_f32_2345_volatile", there are 4 consecutive loads with the first being volatile preventing merging them but are merged into a single volatile load (this happens in the legalize-types of SelectionDAGISel after dag-combine1).

It doesn't look like isConsecutiveLoad does any checking for volatiles though perhaps it should (though we should rename it to reflect that).

We should also probably dictate that the newLd is always non-volatile.

If you wish to add additional volatile load tests that'd be great; but please note, we auto-generate the merge-consecutive-loads-*.ll test files (something we're trying to encourage) - manually editting these files with entwined DAG checks is not easy to maintain.

Good point. I'll undo these test changes and rerun the generation for D14834 when behavior changes.

I'm not sure if this is necessary - we already check that we're not merging consecutive loads where any/all the loads are volatiles. DAG.isConsecutiveLoad handles these cases for us (and its why we already have tests for them) - we shouldn't need to check again in EltsFromConsecutiveLoads.

Except, it doesn't actually check that.

The only reason why "load volatile; load; load; load" is not being merged by that code already is the incoming Chain must match on the nodes for isConsecutiveLoad to return true. With the current code, the chains do not get rearranged to the proper shape.

With the new code, they do: all 4 loads can be determined to not alias, and thus they are all given the same incoming Chain (EntryToken). This is correct. Then, isConsecutiveLoad says yes.

spatel added a subscriber: hfinkel.Mar 30 2016, 8:42 AM

With the new code, they do: all 4 loads can be determined to not alias, and thus they are all given the same incoming Chain (EntryToken). This is correct. Then, isConsecutiveLoad says yes.

This is currently not caught by the current tests, but if you apply D18062 and D14834 which clean up Merging of Consecutive Stores (and willl hopefully be commited soon). merge-consecutive-loads-128.ll will produce wrong code. Specifically in "merge_4f32_f32_2345_volatile", there are 4 consecutive loads with the first being volatile preventing merging them but are merged into a single volatile load (this happens in the legalize-types of SelectionDAGISel after dag-combine1).

First, thanks to Nirav and James for working on this.

When I looked at the behavior of volatiles in relation to load merging, it looked to me like a volatile access was always guaranteed to have a different chain than a non-volatile access. Therefore, because SelectionDAG::isConsecutiveLoad() bails out when the chains are not equal, I thought we were safe. James's comment clears up that misconception.

But the test that you referenced raises a question about volatile semantics (at least to me since I don't have much experience here). Here's the test case for reference:

define <4 x float> @merge_4f32_f32_2345_volatile(float* %ptr) nounwind uwtable noinline ssp {
  %ptr0 = getelementptr inbounds float, float* %ptr, i64 2
  %ptr1 = getelementptr inbounds float, float* %ptr, i64 3
  %ptr2 = getelementptr inbounds float, float* %ptr, i64 4
  %ptr3 = getelementptr inbounds float, float* %ptr, i64 5
  %val0 = load volatile float, float* %ptr0                    <--- DANGER!
  %val1 = load float, float* %ptr1
  %val2 = load float, float* %ptr2
  %val3 = load float, float* %ptr3
  %res0 = insertelement <4 x float> undef, float %val0, i32 0
  %res1 = insertelement <4 x float> %res0, float %val1, i32 1
  %res2 = insertelement <4 x float> %res1, float %val2, i32 2
  %res3 = insertelement <4 x float> %res2, float %val3, i32 3
  ret <4 x float> %res3
}

And here's the resulting codegen after applying D18062 and D14834:

vmovss	8(%rdi), %xmm0       <--- this is the load volatile float specified by the IR
vmovups	8(%rdi), %xmm0       <--- this is a 16-byte merged load that includes the volatile region

We still have the volatile access specified by the IR. I'm guessing this is wrong though because we've accessed the volatile region a 2nd time. The LangRef seems vague to me about this:
http://llvm.org/docs/LangRef.html#volatile-memory-accesses

Assuming the extra load of the volatile region is wrong, I would prefer that the check for volatility goes in isConsecutiveLoad(). It could be triggered by an optional bool param. If we only check volatility in EltsFromConsecutiveLoads(), I'm scared that other targets may have already mutated that code and wouldn't get the fix.

And here's the resulting codegen after applying D18062 and D14834:

vmovss	8(%rdi), %xmm0       <--- this is the load volatile float specified by the IR
vmovups	8(%rdi), %xmm0       <--- this is a 16-byte merged load that includes the volatile region

We still have the volatile access specified by the IR. I'm guessing this is wrong though because we've accessed the volatile region a 2nd time. The LangRef seems vague to me about this:
http://llvm.org/docs/LangRef.html#volatile-memory-accesses

I think that code is wrong because the *value* is coming from the second load, not the first. The code says to do a 4-byte volatile read of the pointer, and so it should do exactly that, and use that value, not something else. The 4-byte load is only being preserved through the chain.

Of course, at least in some circumstances, it's okay to widen a normal load to cover memory that wasn't explicitly loaded (e.g. with undef in the middle of the vector). And obviously, if it was okay to widen a normal load *without* the volatile load being present, adding a volatile load in the middle wouldn't make that transform suddenly invalid. So, with a slightly different test case (e.g. if the second pointer was volatile), you could potentially validly generate a merged 16-byte load that overlapped the volatile region, a separate 4-byte volatile load, and then combine those to the result vector. But even so

Assuming the extra load of the volatile region is wrong, I would prefer that the check for volatility goes in isConsecutiveLoad(). It could be triggered by an optional bool param. If we only check volatility in EltsFromConsecutiveLoads(), I'm scared that other targets may have already mutated that code and wouldn't get the fix.

That makes sense to me. The only other caller in-tree of this function already checks isVolatile, anyways.

That makes sense to me. The only other caller in-tree of this function already checks isVolatile, anyways.

Thanks for checking that. Note that the PPC backend has this:

// Like SelectionDAG::isConsecutiveLoad, but also works for stores, and does
// not enforce equality of the chain operands.
static bool isConsecutiveLS(SDNode *N, LSBaseSDNode *Base,

...I didn't look closely, but it might have related problems.

Also, independent of volatile but maybe affected by the other patches-in-progress:
https://llvm.org/bugs/show_bug.cgi?id=10114

niravd updated this revision to Diff 52104.Mar 30 2016, 12:20 PM
niravd edited edge metadata.

Revert tests, move check to ConsecutiveLoad. Simplify callers

It looks like the bug (https://llvm.org/bugs/show_bug.cgi?id=10114) is easy to fix and I have a patch that appears to fix an equivalent problem in merge-consecutive-loads-128.ll with D14834 and D18062, but not this patch. However, the example in the bug doesn't exercise this anymore so I'll do it separately after I cons up.

niravd updated this revision to Diff 52105.Mar 30 2016, 12:26 PM

Apply clang-format

jyknight added inline comments.Mar 30 2016, 1:44 PM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6962 ↗(On Diff #52105)

The coding style changed somewhat recently to not have the name of the function repeated at the beginning of the doc. So, if you're changing a function with that, you can/should remove it.

6970 ↗(On Diff #52105)

Um, you mean to check if both are !isVolatile, not if their volatility is equal, right?

Is there any way to expose this bug in a test case before applying D14384?

include/llvm/CodeGen/SelectionDAG.h
1287 ↗(On Diff #52105)

Don't repeat the function name in the documentation comment.
"true loads" -> "true if loads"

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6962–6965 ↗(On Diff #52105)

You can delete the whole comment block from the implementation file. The doxygen should be generated from the header file, and users should be using that as the interface.
http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments

6970–6971 ↗(On Diff #52105)

Shouldn't this check if *either* load is volatile? Ie, we'll get past this check if both are volatile.

niravd updated this revision to Diff 52168.Mar 30 2016, 6:57 PM
niravd marked 5 inline comments as done.

remove embarassing bug and fix comments

RKSimon accepted this revision.Mar 31 2016, 3:06 AM
RKSimon edited edge metadata.

LGTM - couple of minors.

PPCISelLowering refers to SelectionDAG::isConsecutiveLoad in a comment, please can you update it so that it still makes sense?

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7257 ↗(On Diff #52168)

This is really horrid to grok - please can you pull out 'LD1VT.getSizeInBits() / 8' as a 'LD1Bytes' or something so we can have the call on one line?

This revision is now accepted and ready to land.Mar 31 2016, 3:06 AM
This revision was automatically updated to reflect the committed changes.