This is an archive of the discontinued LLVM Phabricator instance.

[LV] Do not add load to group if it moves across conflicting store.

Authored by fhahn on Jul 2 2023, 2:45 PM.



This patch prevents invalid load groups from being formed, where a load
needs to be moved across a conflicting store.

Once we hit a store that conflicts with a load with an existing
interleave group, we need to stop adding earlier loads to the group, as
this would force hoisting the previous stores in the group across the
conflicting load.

To detect such cases, add a new CompletedLoadGroups set, which is used
to keep track of load groups to which no earlier loads can be added.


Diff Detail

Event Timeline

fhahn created this revision.Jul 2 2023, 2:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2023, 2:45 PM
fhahn requested review of this revision.Jul 2 2023, 2:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2023, 2:45 PM
fhahn updated this revision to Diff 536895.Jul 3 2023, 2:13 PM

Include test changes

anna accepted this revision.Jul 5 2023, 11:46 AM

LGTM with comment. Thanks!


Can we move this above the insertion to LoadGroups since this Group should already be present in LoadGroups?

// Skip B if no new instructions can be added to its load group.
 if (CompletedLoadGroups.contains(Group))
This revision is now accepted and ready to land.Jul 5 2023, 11:46 AM
fhahn marked an inline comment as done.Jul 7 2023, 3:06 AM
fhahn added inline comments.

Thanks, adjusted in the committed version.

This revision was automatically updated to reflect the committed changes.
fhahn marked an inline comment as done.
Ayal added a comment.Jul 9 2023, 8:31 AM

This raises some thoughts (post-commit), discussed with @gilr, including the need for a more thorough fix.


Augment the above explanation to address CompletedLoadGroups?


Can hoist it further - a newly created group is surely not completed.


Sketch of one option to fix insertion of Group into CompletedLoadGroups whenever/as-soon-as needed.


Note that it may suffice to take A out of its StoreGroup, along with all other members that precede B, but not necessarily dismantle StoreGroup completely. Worth adding some tests.

The case of a store obstructing a store-group seems symmetric to the case of a store obstructing a load-group, if store-groups are collected top-down (separately from continuing to collect load-groups bottom-up), and may then be better formed along with an analogous CompletedStoreGroups. WDYT?


"(along with all other loads in B's interleave group)"

Unfortunately, earlier loads may already have been added to B's interleave group at this point. Those could either be filtered out now or prevented from insertion earlier. One way to perform the latter is sketched above.


We already got the interleave group of B, aka Group, and better rename it GroupB (along with renaming above StoreGroup to GroupA)?


"we can't add additional instructions to B's group" - i.e., should mark B's group as completed.


Swapping these two loads circumvents the current CompletedLoadGroups, and deserves a separate test case.

This is because only the load which creates an interleaved group (the one appearing last in program order) is compared with obstructing stores.

anna added inline comments.Jul 13 2023, 5:40 PM

Ayal, I had added this test locally along with the more complete fix you suggested for checking all loads in the interleave load group. I didn't see any change in the output with the complete fix versus what we have currently in the patch (checking only the single load which would be %l2 in this case that doesn't obstruct the store).
I'll place what I have for review.

(motivation is another miscompile that looks related to interleaving and I was hoping this more complete fix handles it. Doesn't though. Will add a reproducer upstream).

Ayal added inline comments.Jul 14 2023, 6:39 AM

Anna, here's a more concrete sketch, which also breaks out of the enclosing "for AI" loop as needed:

if (Group && isa<LoadInst>(B)) {
  uint32_t Index = 0, Factor = Group->getFactor();
  for (; Index < Factor; ++Index) {
    Instruction *MemberOfGroupB = Group->getMember(Index);
    if (MemberOfGroupB &&
            &*AI, &*AccessStrideInfo.find(MemberOfGroupB)))
  if (Index < Factor) {

Curious to learn if it helps.

anna added inline comments.Jul 14 2023, 8:11 AM

thanks Ayal! My fix had missed that we need to break out of the "for AI" loop (the original fix was automatically doing that when we check for canReorderMemAccessesForInterleavedGroups for single BI at line 1208).
With this the modified testcase you suggested optimizes correctly.