This is an archive of the discontinued LLVM Phabricator instance.

[LV] Do not add load to group if it moves across conflicting store.
ClosedPublic

Authored by fhahn on Jul 2 2023, 2:45 PM.

Details

Summary

This patch prevents invalid load groups from being formed, where a load
needs to be moved across a conflicting store.

Once we hit a store that conflicts with a load with an existing
interleave group, we need to stop adding earlier loads to the group, as
this would force hoisting the previous stores in the group across the
conflicting load.

To detect such cases, add a new CompletedLoadGroups set, which is used
to keep track of load groups to which no earlier loads can be added.

Fixes https://github.com/llvm/llvm-project/issues/63602

Diff Detail

Event Timeline

fhahn created this revision.Jul 2 2023, 2:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2023, 2:45 PM
fhahn requested review of this revision.Jul 2 2023, 2:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2023, 2:45 PM
fhahn updated this revision to Diff 536895.Jul 3 2023, 2:13 PM

Include test changes

anna accepted this revision.Jul 5 2023, 11:46 AM

LGTM with comment. Thanks!

llvm/lib/Analysis/VectorUtils.cpp
1150

Can we move this above the insertion to LoadGroups since this Group should already be present in LoadGroups?

// Skip B if no new instructions can be added to its load group.
 if (CompletedLoadGroups.contains(Group))
      continue;
LoadGroups.insert(Group)
This revision is now accepted and ready to land.Jul 5 2023, 11:46 AM
fhahn marked an inline comment as done.Jul 7 2023, 3:06 AM
fhahn added inline comments.
llvm/lib/Analysis/VectorUtils.cpp
1150

Thanks, adjusted in the committed version.

This revision was automatically updated to reflect the committed changes.
fhahn marked an inline comment as done.
Ayal added a comment.Jul 9 2023, 8:31 AM

This raises some thoughts (post-commit), discussed with @gilr, including the need for a more thorough fix.

llvm/lib/Analysis/VectorUtils.cpp
1090

Augment the above explanation to address CompletedLoadGroups?

1141

Can hoist it further - a newly created group is surely not completed.

1175

Sketch of one option to fix insertion of Group into CompletedLoadGroups whenever/as-soon-as needed.

1188

Note that it may suffice to take A out of its StoreGroup, along with all other members that precede B, but not necessarily dismantle StoreGroup completely. Worth adding some tests.

The case of a store obstructing a store-group seems symmetric to the case of a store obstructing a load-group, if store-groups are collected top-down (separately from continuing to collect load-groups bottom-up), and may then be better formed along with an analogous CompletedStoreGroups. WDYT?

1191

"(along with all other loads in B's interleave group)"

Unfortunately, earlier loads may already have been added to B's interleave group at this point. Those could either be filtered out now or prevented from insertion earlier. One way to perform the latter is sketched above.

1194–1195

We already got the interleave group of B, aka Group, and better rename it GroupB (along with renaming above StoreGroup to GroupA)?

1207

"we can't add additional instructions to B's group" - i.e., should mark B's group as completed.

llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-hoist-load-across-store.ll
105–106

Swapping these two loads circumvents the current CompletedLoadGroups, and deserves a separate test case.

This is because only the load which creates an interleaved group (the one appearing last in program order) is compared with obstructing stores.

anna added inline comments.Jul 13 2023, 5:40 PM
llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-hoist-load-across-store.ll
105–106

Ayal, I had added this test locally along with the more complete fix you suggested for checking all loads in the interleave load group. I didn't see any change in the output with the complete fix versus what we have currently in the patch (checking only the single load which would be %l2 in this case that doesn't obstruct the store).
I'll place what I have for review.

(motivation is another miscompile that looks related to interleaving and I was hoping this more complete fix handles it. Doesn't though. Will add a reproducer upstream).

Ayal added inline comments.Jul 14 2023, 6:39 AM
llvm/lib/Analysis/VectorUtils.cpp
1175

Anna, here's a more concrete sketch, which also breaks out of the enclosing "for AI" loop as needed:

if (Group && isa<LoadInst>(B)) {
  uint32_t Index = 0, Factor = Group->getFactor();
  for (; Index < Factor; ++Index) {
    Instruction *MemberOfGroupB = Group->getMember(Index);
    if (MemberOfGroupB &&
        !canReorderMemAccessesForInterleavedGroups(
            &*AI, &*AccessStrideInfo.find(MemberOfGroupB)))
      break;
  }
  if (Index < Factor) {
    CompletedLoadGroups.insert(Group);
    break;
  }
}

Curious to learn if it helps.

anna added inline comments.Jul 14 2023, 8:11 AM
llvm/lib/Analysis/VectorUtils.cpp
1175

thanks Ayal! My fix had missed that we need to break out of the "for AI" loop (the original fix was automatically doing that when we check for canReorderMemAccessesForInterleavedGroups for single BI at line 1208).
With this the modified testcase you suggested optimizes correctly.