Page MenuHomePhabricator

[LV] Avoid adding into interleaved group in presence of WAW dependency
Needs ReviewPublic

Authored by anna on Jan 24 2019, 12:29 PM.

Details

Summary

Fix for miscompile when we ignore the store ordering when adding stores
into the interleaved store group.
Fix for PR40291.
Test case shows we now add stores into the interleaved store group only
if we prove there is no WAW dependency.
This is a cleaned up fix suggested by @hsaito.

Diff Detail

Event Timeline

anna created this revision.Jan 24 2019, 12:29 PM
fhahn added a comment.Jan 29 2019, 1:18 PM

I plan on having a look later this week. I am a little worried that the checks in-line here are already quite complex and I would like to have a think if that could be improved in some way.

dorit added a subscriber: dorit.Jan 31 2019, 11:19 PM

I plan on having a look later this week. I am a little worried that the checks in-line here are already quite complex and I would like to have a think if that could be improved in some way.

I agree; The algorithm makes sure that we visit everything between B and A, including C, before we visit A; so we have a chance to identify the (potentially) interfering store C before we reach A; This is what allows the algorithm to only compare the pairs (A,B) without having each time to also scan everything in between.

So I think the bug is that when we visited C, and found that it could be inserted into B's group dependence-wise, but wasn't inserted due to other reasons, we should have either:

  1. Invalidated the group (which is over aggressive but better than wrong code)
  2. Recorded in B's Group the index where C could be inserted, to "burn" that index from allowing some other instruction A to become a group member at that index; so when we reach A we see its spot is taken. (I think this will have the same effect as the proposed patch but without the extra scan.)
  3. Same as above but instead of bailing out on grouping A with B, make sure that C is also sunk down with that group (as I think Hideki mentioned in the PR) (maybe a future improvement).
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2019, 11:19 PM

I plan on having a look later this week. I am a little worried that the checks in-line here are already quite complex and I would like to have a think if that could be improved in some way.

I agree; The algorithm makes sure that we visit everything between B and A, including C, before we visit A; so we have a chance to identify the (potentially) interfering store C before we reach A; This is what allows the algorithm to only compare the pairs (A,B) without having each time to also scan everything in between.

So I think the bug is that when we visited C, and found that it could be inserted into B's group dependence-wise, but wasn't inserted due to other reasons, we should have either:

  1. Invalidated the group (which is over aggressive but better than wrong code)
  2. Recorded in B's Group the index where C could be inserted, to "burn" that index from allowing some other instruction A to become a group member at that index; so when we reach A we see its spot is taken. (I think this will have the same effect as the proposed patch but without the extra scan.)
  3. Same as above but instead of bailing out on grouping A with B, make sure that C is also sunk down with that group (as I think Hideki mentioned in the PR) (maybe a future improvement).

If you don't like the current approach, I agree 2) achieves the same thing, with extra bookkeeping (or extra state in the existing bookkeeping). I think 1) is too conservative. Even if C is next to B, B's group can still be extended to the other direction. 3) should be done separately from the bug fix. Anyway, do we ever deal with so many loads/stores for this efficiency to avoid extra scanning to actually matter? I'm just curious.

fhahn added a comment.Feb 1 2019, 5:55 AM

! In D57180#1380108, @hsaito wrote:
Anyway, do we ever deal with so many loads/stores for this efficiency to avoid extra scanning to actually matter? I'm just curious.

I'd say usually the number would be quite small and there would be no significant compile time problem. But it is not impossible to hit a worst case scenario, given the wide range of frontends and source code that gets thrown at LLVM. Also, compile time is a key metric for some users and unnecessarily wasting a bit here and there leads to a death by a thousand paper cuts :)

fhahn added a comment.Feb 1 2019, 5:57 AM

(I am saying that after recently investigating a few cases of hour-long compile times on user code that were caused by unnecessary scanning)

Ayal added a comment.Feb 1 2019, 8:04 AM

I plan on having a look later this week. I am a little worried that the checks in-line here are already quite complex and I would like to have a think if that could be improved in some way.

I agree; The algorithm makes sure that we visit everything between B and A, including C, before we visit A; so we have a chance to identify the (potentially) interfering store C before we reach A; This is what allows the algorithm to only compare the pairs (A,B) without having each time to also scan everything in between.

So I think the bug is that when we visited C, and found that it could be inserted into B's group dependence-wise, but wasn't inserted due to other reasons, we should have either:

  1. Invalidated the group (which is over aggressive but better than wrong code)
  2. Recorded in B's Group the index where C could be inserted, to "burn" that index from allowing some other instruction A to become a group member at that index; so when we reach A we see its spot is taken. (I think this will have the same effect as the proposed patch but without the extra scan.)
  3. Same as above but instead of bailing out on grouping A with B, make sure that C is also sunk down with that group (as I think Hideki mentioned in the PR) (maybe a future improvement).

If you don't like the current approach, I agree 2) achieves the same thing, with extra bookkeeping (or extra state in the existing bookkeeping). I think 1) is too conservative. Even if C is next to B, B's group can still be extended to the other direction. 3) should be done separately from the bug fix. Anyway, do we ever deal with so many loads/stores for this efficiency to avoid extra scanning to actually matter? I'm just curious.

Rather than over aggressive or too conservative, 1) seems to match the current behavior which forbids store groups with gaps; extending in the other direction will also break the vector WAW dependence, right? 2) could potentially "burn" the index with minimal extra bookkeeping or state by inserting a nullptr in its place; in any case, it's worth doing only when/after introducing support for store groups with gaps.

hsaito added a comment.Feb 1 2019, 9:45 AM

(I am saying that after recently investigating a few cases of hour-long compile times on user code that were caused by unnecessary scanning)

Fair enough. If the size based threshold is not in place in the interleaved access optimization, that would be a good defensive improvement. Can be separate from this bug fix, though.

hsaito added a comment.Feb 1 2019, 9:48 AM

Rather than over aggressive or too conservative, 1) seems to match the current behavior which forbids store groups with gaps; extending in the other direction will also break the vector WAW dependence, right?

OK. Hitting the gap case, yes. Else, the other direction should also hit WAW dep.

in any case, it's worth doing only when/after introducing support for store groups with gaps.

Agree.

Ayal added a comment.Feb 1 2019, 1:14 PM

Also, regarding

  1. Same as above but instead of bailing out on grouping A with B, make sure that C is also sunk down with that group (as I think Hideki mentioned in the PR) (maybe a future improvement).

consider eliminating such WAW dependencies by sinking the stores and folding them into one, producing a single interleave group (by a future, separate patch).

(The issue here is somewhat reminiscent of the WAW dependence caused by multiple invariant stores, as in https://reviews.llvm.org/D54538, but there the dependence is both inside and across loop iterations.)