This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
2/11
VectorUtils.cpp
-
test/Transforms/LoopVectorize/X86/
-
Transforms/
-
LoopVectorize/
-
X86/
2
interleaved-accesses-hoist-load-across-store.ll

Differential D154309

[LV] Do not add load to group if it moves across conflicting store.
ClosedPublic

Authored by fhahn on Jul 2 2023, 2:45 PM.

Download Raw Diff

Details

Reviewers

Ayal
gilr
anna

Commits

rG4d847bf4d065: [LV] Do not add load to group if it moves across conflicting store.

Summary

This patch prevents invalid load groups from being formed, where a load
needs to be moved across a conflicting store.

Once we hit a store that conflicts with a load with an existing
interleave group, we need to stop adding earlier loads to the group, as
this would force hoisting the previous stores in the group across the
conflicting load.

To detect such cases, add a new CompletedLoadGroups set, which is used
to keep track of load groups to which no earlier loads can be added.

Fixes https://github.com/llvm/llvm-project/issues/63602

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Jul 2 2023, 2:45 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2023, 2:45 PM

Herald added subscribers: artagnon, StephenFan, hiraditya. · View Herald Transcript

fhahn requested review of this revision.Jul 2 2023, 2:45 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2023, 2:45 PM

Harbormaster completed remote builds in B242711: Diff 536631.Jul 2 2023, 3:28 PM

Include test changes

Harbormaster completed remote builds in B242886: Diff 536895.Jul 3 2023, 4:45 PM

LGTM with comment. Thanks!

llvm/lib/Analysis/VectorUtils.cpp
1150	Can we move this above the insertion to LoadGroups since this `Group` should already be present in LoadGroups? // Skip B if no new instructions can be added to its load group. if (CompletedLoadGroups.contains(Group)) continue; LoadGroups.insert(Group)

This revision is now accepted and ready to land.Jul 5 2023, 11:46 AM

Herald added a subscriber: wangpc. · View Herald TranscriptJul 5 2023, 11:46 AM

fhahn marked an inline comment as done.Jul 7 2023, 3:06 AM

fhahn added inline comments.

llvm/lib/Analysis/VectorUtils.cpp
1150	Thanks, adjusted in the committed version.

Closed by commit rG4d847bf4d065: [LV] Do not add load to group if it moves across conflicting store. (authored by fhahn). · Explain WhyJul 7 2023, 3:07 AM

This revision was automatically updated to reflect the committed changes.

fhahn marked an inline comment as done.

fhahn added a commit: rG4d847bf4d065: [LV] Do not add load to group if it moves across conflicting store..

This raises some thoughts (post-commit), discussed with @gilr, including the need for a more thorough fix.

llvm/lib/Analysis/VectorUtils.cpp
1090	Augment the above explanation to address CompletedLoadGroups?
1141	Can hoist it further - a newly created group is surely not completed.
1175	Sketch of one option to fix insertion of Group into CompletedLoadGroups whenever/as-soon-as needed.
1188	Note that it may suffice to take `A` out of its StoreGroup, along with all other members that precede `B`, but not necessarily dismantle `StoreGroup` completely. Worth adding some tests. The case of a store obstructing a store-group seems symmetric to the case of a store obstructing a load-group, if store-groups are collected top-down (separately from continuing to collect load-groups bottom-up), and may then be better formed along with an analogous `CompletedStoreGroups`. WDYT?
1191	"(along with all other loads in B's interleave group)" Unfortunately, earlier loads may already have been added to B's interleave group at this point. Those could either be filtered out now or prevented from insertion earlier. One way to perform the latter is sketched above.
1194–1195	We already got the interleave group of B, aka `Group`, and better rename it `GroupB` (along with renaming above `StoreGroup` to `GroupA`)?
1207	"we can't add additional instructions to B's group" - i.e., should mark B's group as completed.
llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-hoist-load-across-store.ll
105–106	Swapping these two loads circumvents the current `CompletedLoadGroups`, and deserves a separate test case. This is because only the load which creates an interleaved group (the one appearing last in program order) is compared with obstructing stores.

anna added inline comments.Jul 13 2023, 5:40 PM

llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-hoist-load-across-store.ll
105–106	Ayal, I had added this test locally along with the more complete fix you suggested for checking all loads in the interleave load group. I didn't see any change in the output with the complete fix versus what we have currently in the patch (checking only the single load which would be `%l2` in this case that doesn't obstruct the store). I'll place what I have for review. (motivation is another miscompile that looks related to interleaving and I was hoping this more complete fix handles it. Doesn't though. Will add a reproducer upstream).

Ayal added inline comments.Jul 14 2023, 6:39 AM

llvm/lib/Analysis/VectorUtils.cpp

1175

Anna, here's a more concrete sketch, which also breaks out of the enclosing "for AI" loop as needed:

if (Group && isa<LoadInst>(B)) {
  uint32_t Index = 0, Factor = Group->getFactor();
  for (; Index < Factor; ++Index) {
    Instruction *MemberOfGroupB = Group->getMember(Index);
    if (MemberOfGroupB &&
        !canReorderMemAccessesForInterleavedGroups(
            &*AI, &*AccessStrideInfo.find(MemberOfGroupB)))
      break;
  }
  if (Index < Factor) {
    CompletedLoadGroups.insert(Group);
    break;
  }
}

Curious to learn if it helps.

anna added inline comments.Jul 14 2023, 8:11 AM

llvm/lib/Analysis/VectorUtils.cpp
1175	thanks Ayal! My fix had missed that we need to break out of the "for AI" loop (the original fix was automatically doing that when we check for `canReorderMemAccessesForInterleavedGroups` for single `BI` at line 1208). With this the modified testcase you suggested optimizes correctly.

anna mentioned this in rGdfaf4587e4ce: Precommit follow-up testcase for interleaved miscompile.Jul 14 2023, 1:05 PM

anna mentioned this in rG9675e3fa81e5: [LV] Address post-commit NFC comments in interleave.Jul 14 2023, 1:24 PM

anna mentioned this in rGa5573bf030e8: [LV] Precommit test for interleaving miscompile.Jul 17 2023, 2:25 PM

anna mentioned this in D155520: [LV] Complete load groups and release store groups in presence of dependency.Jul 17 2023, 3:10 PM

anna mentioned this in rGeaf6117f3388: [LV] Complete load groups and release store groups in presence of dependency.Jul 25 2023, 2:32 PM

anna mentioned this in rG3cf24dbbdde0: [LV] Complete load groups and release store groups. Try 2..Aug 8 2023, 3:10 PM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

VectorUtils.cpp

20 lines

test/

Transforms/

LoopVectorize/

X86/

interleaved-accesses-hoist-load-across-store.ll

46 lines

Diff 538056

llvm/lib/Analysis/VectorUtils.cpp

Show First 20 Lines • Show All 1,081 Lines • ▼ Show 20 Lines

// //

// E.g., for the WAW dependence: A[i] = a; // (1) // E.g., for the WAW dependence: A[i] = a; // (1)

// A[i] = b; // (2) // A[i] = b; // (2)

// A[i + 1] = c; // (3) // A[i + 1] = c; // (3)

// //

// We will first create a store group with (3) and (2). (1) can't be added to // We will first create a store group with (3) and (2). (1) can't be added to

// this group because it and (2) are dependent. However, (1) can be grouped // this group because it and (2) are dependent. However, (1) can be grouped

// with other accesses that may precede it in program order. Note that a // with other accesses that may precede it in program order. Note that a

// bottom-up order does not imply that WAW dependences should not be checked. // bottom-up order does not imply that WAW dependences should not be checked.

AyalUnsubmitted

Not Done

Augment the above explanation to address CompletedLoadGroups?

Ayal: Augment the above explanation to address CompletedLoadGroups?

void InterleavedAccessInfo::analyzeInterleaving( void InterleavedAccessInfo::analyzeInterleaving(

bool EnablePredicatedInterleavedMemAccesses) { bool EnablePredicatedInterleavedMemAccesses) {

LLVM_DEBUG(dbgs() << "LV: Analyzing interleaved accesses...\n"); LLVM_DEBUG(dbgs() << "LV: Analyzing interleaved accesses...\n");

const auto &Strides = LAI->getSymbolicStrides(); const auto &Strides = LAI->getSymbolicStrides();

// Holds all accesses with a constant stride. // Holds all accesses with a constant stride.

MapVector<Instruction *, StrideDescriptor> AccessStrideInfo; MapVector<Instruction *, StrideDescriptor> AccessStrideInfo;

collectConstStrideAccesses(AccessStrideInfo, Strides); collectConstStrideAccesses(AccessStrideInfo, Strides);

if (AccessStrideInfo.empty()) if (AccessStrideInfo.empty())

return; return;

// Collect the dependences in the loop. // Collect the dependences in the loop.

collectDependences(); collectDependences();

// Holds all interleaved store groups temporarily. // Holds all interleaved store groups temporarily.

SmallSetVector<InterleaveGroup<Instruction> *, 4> StoreGroups; SmallSetVector<InterleaveGroup<Instruction> *, 4> StoreGroups;

// Holds all interleaved load groups temporarily. // Holds all interleaved load groups temporarily.

SmallSetVector<InterleaveGroup<Instruction> *, 4> LoadGroups; SmallSetVector<InterleaveGroup<Instruction> *, 4> LoadGroups;

// Groups added to this set cannot have new members added.

SmallPtrSet<InterleaveGroup<Instruction> *, 4> CompletedLoadGroups;

// Search in bottom-up program order for pairs of accesses (A and B) that can // Search in bottom-up program order for pairs of accesses (A and B) that can

// form interleaved load or store groups. In the algorithm below, access A // form interleaved load or store groups. In the algorithm below, access A

// precedes access B in program order. We initialize a group for B in the // precedes access B in program order. We initialize a group for B in the

// outer loop of the algorithm, and then in the inner loop, we attempt to // outer loop of the algorithm, and then in the inner loop, we attempt to

// insert each A into B's group if: // insert each A into B's group if:

// //

// 1. A and B have the same stride, // 1. A and B have the same stride,

Show All 13 Lines for (auto BI = AccessStrideInfo.rbegin(), E = AccessStrideInfo.rend();

InterleaveGroup<Instruction> *Group = nullptr; InterleaveGroup<Instruction> *Group = nullptr;

if (isStrided(DesB.Stride) && if (isStrided(DesB.Stride) &&

(!isPredicated(B->getParent()) || EnablePredicatedInterleavedMemAccesses)) { (!isPredicated(B->getParent()) || EnablePredicatedInterleavedMemAccesses)) {

Group = getInterleaveGroup(B); Group = getInterleaveGroup(B);

if (!Group) { if (!Group) {

LLVM_DEBUG(dbgs() << "LV: Creating an interleave group with:" << *B LLVM_DEBUG(dbgs() << "LV: Creating an interleave group with:" << *B

<< '\n'); << '\n');

Group = createInterleaveGroup(B, DesB.Stride, DesB.Alignment); Group = createInterleaveGroup(B, DesB.Stride, DesB.Alignment);

} }

AyalUnsubmitted

Not Done

Group = createInterleaveGroup(B, DesB.Stride, DesB.Alignment);

- }

- if (B->mayWriteToMemory())

+ } else if (CompletedLoadGroups.contains(Group)) {

+ // Skip B if no new instructions can be added to its load group.

+ continue;

+ } if (B->mayWriteToMemory())

Can hoist it further - a newly created group is surely not completed.

Ayal: Can hoist it further - a newly created group is surely not completed.

if (B->mayWriteToMemory()) if (B->mayWriteToMemory())

StoreGroups.insert(Group); StoreGroups.insert(Group);

else else {

// Skip B if no new instructions can be added to its load group.

if (CompletedLoadGroups.contains(Group))

continue;

LoadGroups.insert(Group); LoadGroups.insert(Group);

} }

}

annaUnsubmitted

Done

Can we move this above the insertion to LoadGroups since this Group should already be present in LoadGroups?

// Skip B if no new instructions can be added to its load group.
 if (CompletedLoadGroups.contains(Group))
      continue;
LoadGroups.insert(Group)

anna: Can we move this above the insertion to LoadGroups since this `Group` should already be present…

fhahnAuthorUnsubmitted

Done

Thanks, adjusted in the committed version.

fhahn: Thanks, adjusted in the committed version.

for (auto AI = std::next(BI); AI != E; ++AI) { for (auto AI = std::next(BI); AI != E; ++AI) {

Instruction *A = AI->first; Instruction *A = AI->first;

StrideDescriptor DesA = AI->second; StrideDescriptor DesA = AI->second;

// Our code motion strategy implies that we can't have dependences // Our code motion strategy implies that we can't have dependences

// between accesses in an interleaved group and other accesses located // between accesses in an interleaved group and other accesses located

// between the first and last member of the group. Note that this also // between the first and last member of the group. Note that this also

// means that a group can't have more than one member at a given offset. // means that a group can't have more than one member at a given offset.

// The accesses in a group can have dependences with other accesses, but // The accesses in a group can have dependences with other accesses, but

// we must ensure we don't extend the boundaries of the group such that // we must ensure we don't extend the boundaries of the group such that

// we encompass those dependent accesses. // we encompass those dependent accesses.

// //

// For example, assume we have the sequence of accesses shown below in a // For example, assume we have the sequence of accesses shown below in a

// stride-2 loop: // stride-2 loop:

// //

// (1, 2) is a group | A[i] = a; // (1) // (1, 2) is a group | A[i] = a; // (1)

// | A[i-1] = b; // (2) | // | A[i-1] = b; // (2) |

// A[i-3] = c; // (3) // A[i-3] = c; // (3)

// A[i] = d; // (4) | (2, 4) is not a group // A[i] = d; // (4) | (2, 4) is not a group

// //

// Because accesses (2) and (3) are dependent, we can group (2) with (1) // Because accesses (2) and (3) are dependent, we can group (2) with (1)

// but not with (4). If we did, the dependent access (3) would be within // but not with (4). If we did, the dependent access (3) would be within

// the boundaries of the (2, 4) group. // the boundaries of the (2, 4) group.

if (!canReorderMemAccessesForInterleavedGroups(&*AI, &*BI)) { if (!canReorderMemAccessesForInterleavedGroups(&*AI, &*BI)) {

AyalUnsubmitted

Not Done

// the boundaries of the (2, 4) group.

+ if (GroupB && isa<LoadInst>(B) &&

+ [any member MBI of GroupB !canReorderMemAccessesForInterleavedGroups(&*AI, MBI))

+ CompletedLoadGroups.insert(GroupB);

if (!canReorderMemAccessesForInterleavedGroups(&*AI, &*BI)) {

// If a dependence exists and A is already in a group, we know that A

Sketch of one option to fix insertion of Group into CompletedLoadGroups whenever/as-soon-as needed.

Ayal: Sketch of one option to fix insertion of Group into CompletedLoadGroups whenever/as-soon-as…

AyalUnsubmitted

Not Done

Anna, here's a more concrete sketch, which also breaks out of the enclosing "for AI" loop as needed:

if (Group && isa<LoadInst>(B)) {
  uint32_t Index = 0, Factor = Group->getFactor();
  for (; Index < Factor; ++Index) {
    Instruction *MemberOfGroupB = Group->getMember(Index);
    if (MemberOfGroupB &&
        !canReorderMemAccessesForInterleavedGroups(
            &*AI, &*AccessStrideInfo.find(MemberOfGroupB)))
      break;
  }
  if (Index < Factor) {
    CompletedLoadGroups.insert(Group);
    break;
  }
}

Curious to learn if it helps.

Ayal: Anna, here's a more concrete sketch, which also breaks out of the enclosing "for AI" loop as…

annaUnsubmitted

Not Done

thanks Ayal! My fix had missed that we need to break out of the "for AI" loop (the original fix was automatically doing that when we check for canReorderMemAccessesForInterleavedGroups for single BI at line 1208).
With this the modified testcase you suggested optimizes correctly.

anna: thanks Ayal! My fix had missed that we need to break out of the "for AI" loop (the original fix…

// If a dependence exists and A is already in a group, we know that A // If a dependence exists and A is already in a group, we know that A

// must be a store since A precedes B and WAR dependences are allowed. // must be a store since A precedes B and WAR dependences are allowed.

// Thus, A would be sunk below B. We release A's group to prevent this // Thus, A would be sunk below B. We release A's group to prevent this

// illegal code motion. A will then be free to form another group with // illegal code motion. A will then be free to form another group with

// instructions that precede it. // instructions that precede it.

if (isInterleaved(A)) { if (isInterleaved(A)) {

InterleaveGroup<Instruction> *StoreGroup = getInterleaveGroup(A); InterleaveGroup<Instruction> *StoreGroup = getInterleaveGroup(A);

LLVM_DEBUG(dbgs() << "LV: Invalidated store group due to " LLVM_DEBUG(dbgs() << "LV: Invalidated store group due to "

"dependence between " << *A << " and "<< *B << '\n'); "dependence between " << *A << " and "<< *B << '\n');

StoreGroups.remove(StoreGroup); StoreGroups.remove(StoreGroup);

releaseGroup(StoreGroup); releaseGroup(StoreGroup);

AyalUnsubmitted

Not Done

Note that it may suffice to take A out of its StoreGroup, along with all other members that precede B, but not necessarily dismantle StoreGroup completely. Worth adding some tests.

The case of a store obstructing a store-group seems symmetric to the case of a store obstructing a load-group, if store-groups are collected top-down (separately from continuing to collect load-groups bottom-up), and may then be better formed along with an analogous CompletedStoreGroups. WDYT?

Ayal: Note that it may suffice to take `A` out of its StoreGroup, along with all other members that…

} }

// If B is a load and part of an interleave group, no earlier loads can

// be added to B's interleave group, because this would mean the load B

AyalUnsubmitted

Not Done

"(along with all other loads in B's interleave group)"

Unfortunately, earlier loads may already have been added to B's interleave group at this point. Those could either be filtered out now or prevented from insertion earlier. One way to perform the latter is sketched above.

Ayal: "(along with all other loads in B's interleave group)" Unfortunately, earlier loads may…

// would need to be moved across store A. Mark the interleave group as

// complete.

if (isInterleaved(B) && isa<LoadInst>(B)) {

InterleaveGroup<Instruction> *LoadGroup = getInterleaveGroup(B);

AyalUnsubmitted

Not Done

// complete.

- if (isInterleaved(B) && isa<LoadInst>(B)) {

- InterleaveGroup<Instruction> *LoadGroup = getInterleaveGroup(B);

+ if (GroupB && isa<LoadInst>(B)) {

LLVM_DEBUG(dbgs() << "LV: Marking interleave group for " << *B

We already got the interleave group of B, aka Group, and better rename it GroupB (along with renaming above StoreGroup to GroupA)?

Ayal: We already got the interleave group of B, aka `Group`, and better rename it `GroupB` (along…

LLVM_DEBUG(dbgs() << "LV: Marking interleave group for " << *B

<< " as complete.\n");

CompletedLoadGroups.insert(LoadGroup);

}

// If a dependence exists and A is not already in a group (or it was // If a dependence exists and A is not already in a group (or it was

// and we just released it), B might be hoisted above A (if B is a // and we just released it), B might be hoisted above A (if B is a

// load) or another store might be sunk below A (if B is a store). In // load) or another store might be sunk below A (if B is a store). In

// either case, we can't add additional instructions to B's group. B // either case, we can't add additional instructions to B's group. B

// will only form a group with instructions that it precedes. // will only form a group with instructions that it precedes.

AyalUnsubmitted

Not Done

// either case, we can't add additional instructions to B's group. B

- // will only form a group with instructions that it precedes.

+ // will only form a group with instructions that A precedes.

break;

"we can't add additional instructions to B's group" - i.e., should mark B's group as completed.

Ayal: "we can't add additional instructions to B's group" - i.e., should mark B's group as completed.

break; break;

} }

// At this point, we've checked for illegal code motion. If either A or B // At this point, we've checked for illegal code motion. If either A or B

// isn't strided, there's nothing left to do. // isn't strided, there's nothing left to do.

if (!isStrided(DesA.Stride) || !isStrided(DesB.Stride)) if (!isStrided(DesA.Stride) || !isStrided(DesB.Stride))

continue; continue;

▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-hoist-load-across-store.ll

Show All 22 Lines

; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[OFFSET_IDX]], 3 ; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[OFFSET_IDX]], 3

; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[OFFSET_IDX]], 6 ; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[OFFSET_IDX]], 6

; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[OFFSET_IDX]], 9 ; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[OFFSET_IDX]], 9

; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[INDEX]], 3 ; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[INDEX]], 3

; CHECK-NEXT: [[OFFSET_IDX2:%.*]] = add i64 1, [[TMP5]] ; CHECK-NEXT: [[OFFSET_IDX2:%.*]] = add i64 1, [[TMP5]]

; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[OFFSET_IDX2]], 0 ; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[OFFSET_IDX2]], 0

; CHECK-NEXT: [[TMP7:%.*]] = add nuw nsw i64 [[TMP6]], 4 ; CHECK-NEXT: [[TMP7:%.*]] = add nuw nsw i64 [[TMP6]], 4

; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[TMP7]] ; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[TMP7]]

; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[TMP8]], i32 -2 ; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[TMP8]], i32 0

; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <12 x i32>, ptr [[TMP9]], align 4 ; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <12 x i32>, ptr [[TMP9]], align 4

; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 0, i32 3, i32 6, i32 9> ; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 0, i32 3, i32 6, i32 9>

; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 1, i32 4, i32 7, i32 10>

; CHECK-NEXT: [[STRIDED_VEC4:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 2, i32 5, i32 8, i32 11>

; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[TMP1]] ; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[TMP1]]

; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[TMP2]] ; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[TMP2]]

; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[TMP3]] ; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[TMP3]]

; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[TMP4]] ; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[TMP4]]

; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i32> [[STRIDED_VEC4]], i32 0 ; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i32> [[STRIDED_VEC]], i32 0

; CHECK-NEXT: store i32 [[TMP14]], ptr [[TMP10]], align 4 ; CHECK-NEXT: store i32 [[TMP14]], ptr [[TMP10]], align 4

; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i32> [[STRIDED_VEC4]], i32 1 ; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i32> [[STRIDED_VEC]], i32 1

; CHECK-NEXT: store i32 [[TMP15]], ptr [[TMP11]], align 4 ; CHECK-NEXT: store i32 [[TMP15]], ptr [[TMP11]], align 4

; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x i32> [[STRIDED_VEC4]], i32 2 ; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x i32> [[STRIDED_VEC]], i32 2

; CHECK-NEXT: store i32 [[TMP16]], ptr [[TMP12]], align 4 ; CHECK-NEXT: store i32 [[TMP16]], ptr [[TMP12]], align 4

; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x i32> [[STRIDED_VEC4]], i32 3 ; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x i32> [[STRIDED_VEC]], i32 3

; CHECK-NEXT: store i32 [[TMP17]], ptr [[TMP13]], align 4 ; CHECK-NEXT: store i32 [[TMP17]], ptr [[TMP13]], align 4

; CHECK-NEXT: [[TMP18:%.*]] = add <4 x i32> [[STRIDED_VEC3]], [[STRIDED_VEC]] ; CHECK-NEXT: [[TMP18:%.*]] = add nuw nsw i64 [[TMP6]], 2

; CHECK-NEXT: [[TMP19:%.*]] = extractelement <4 x i32> [[TMP18]], i32 0 ; CHECK-NEXT: [[TMP19:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[TMP18]]

; CHECK-NEXT: store i32 [[TMP19]], ptr [[TMP10]], align 4 ; CHECK-NEXT: [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[TMP19]], i32 0

; CHECK-NEXT: [[TMP20:%.*]] = extractelement <4 x i32> [[TMP18]], i32 1 ; CHECK-NEXT: [[WIDE_VEC3:%.*]] = load <12 x i32>, ptr [[TMP20]], align 4

; CHECK-NEXT: store i32 [[TMP20]], ptr [[TMP11]], align 4 ; CHECK-NEXT: [[STRIDED_VEC4:%.*]] = shufflevector <12 x i32> [[WIDE_VEC3]], <12 x i32> poison, <4 x i32> <i32 0, i32 3, i32 6, i32 9>

; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x i32> [[TMP18]], i32 2 ; CHECK-NEXT: [[STRIDED_VEC5:%.*]] = shufflevector <12 x i32> [[WIDE_VEC3]], <12 x i32> poison, <4 x i32> <i32 1, i32 4, i32 7, i32 10>

; CHECK-NEXT: store i32 [[TMP21]], ptr [[TMP12]], align 4 ; CHECK-NEXT: [[TMP21:%.*]] = add <4 x i32> [[STRIDED_VEC5]], [[STRIDED_VEC4]]

; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x i32> [[TMP18]], i32 3 ; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x i32> [[TMP21]], i32 0

; CHECK-NEXT: store i32 [[TMP22]], ptr [[TMP13]], align 4 ; CHECK-NEXT: store i32 [[TMP22]], ptr [[TMP10]], align 4

; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x i32> [[TMP21]], i32 1

; CHECK-NEXT: store i32 [[TMP23]], ptr [[TMP11]], align 4

; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x i32> [[TMP21]], i32 2

; CHECK-NEXT: store i32 [[TMP24]], ptr [[TMP12]], align 4

; CHECK-NEXT: [[TMP25:%.*]] = extractelement <4 x i32> [[TMP21]], i32 3

; CHECK-NEXT: store i32 [[TMP25]], ptr [[TMP13]], align 4

; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4

; CHECK-NEXT: [[TMP23:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16 ; CHECK-NEXT: [[TMP26:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16

; CHECK-NEXT: br i1 [[TMP23]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] ; CHECK-NEXT: br i1 [[TMP26]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]

; CHECK: middle.block: ; CHECK: middle.block:

; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 17, 16 ; CHECK-NEXT: br label [[SCALAR_PH]]

; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]

; CHECK: scalar.ph: ; CHECK: scalar.ph:

; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 49, [[MIDDLE_BLOCK]] ], [ 1, [[ENTRY:%.*]] ] ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 49, [[MIDDLE_BLOCK]] ], [ 1, [[ENTRY:%.*]] ]

; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i64 [ 52, [[MIDDLE_BLOCK]] ], [ 4, [[ENTRY]] ] ; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i64 [ 52, [[MIDDLE_BLOCK]] ], [ 4, [[ENTRY]] ]

; CHECK-NEXT: br label [[LOOP:%.*]] ; CHECK-NEXT: br label [[LOOP:%.*]]

; CHECK: loop: ; CHECK: loop:

; CHECK-NEXT: [[IV_1:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_1_NEXT:%.*]], [[LOOP]] ] ; CHECK-NEXT: [[IV_1:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_1_NEXT:%.*]], [[LOOP]] ]

; CHECK-NEXT: [[IV_2:%.*]] = phi i64 [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ], [ [[IV_2_NEXT:%.*]], [[LOOP]] ] ; CHECK-NEXT: [[IV_2:%.*]] = phi i64 [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ], [ [[IV_2_NEXT:%.*]], [[LOOP]] ]

; CHECK-NEXT: [[IV_1_NEXT]] = add nuw nsw i64 [[IV_1]], 3 ; CHECK-NEXT: [[IV_1_NEXT]] = add nuw nsw i64 [[IV_1]], 3

; CHECK-NEXT: [[IV_1_PLUS_4:%.*]] = add nuw nsw i64 [[IV_1]], 4 ; CHECK-NEXT: [[IV_1_PLUS_4:%.*]] = add nuw nsw i64 [[IV_1]], 4

; CHECK-NEXT: [[GEP_IV_1_PLUS_4:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[IV_1_PLUS_4]] ; CHECK-NEXT: [[GEP_IV_1_PLUS_4:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[IV_1_PLUS_4]]

; CHECK-NEXT: [[L1:%.*]] = load i32, ptr [[GEP_IV_1_PLUS_4]], align 4 ; CHECK-NEXT: [[L1:%.*]] = load i32, ptr [[GEP_IV_1_PLUS_4]], align 4

; CHECK-NEXT: [[GEP_IV_2:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[IV_2]] ; CHECK-NEXT: [[GEP_IV_2:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[IV_2]]

; CHECK-NEXT: store i32 [[L1]], ptr [[GEP_IV_2]], align 4 ; CHECK-NEXT: store i32 [[L1]], ptr [[GEP_IV_2]], align 4

; CHECK-NEXT: [[IV_1_PLUS_2:%.*]] = add nuw nsw i64 [[IV_1]], 2 ; CHECK-NEXT: [[IV_1_PLUS_2:%.*]] = add nuw nsw i64 [[IV_1]], 2

; CHECK-NEXT: [[GEP_IV_1_PLUS_2:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[IV_1_PLUS_2]] ; CHECK-NEXT: [[GEP_IV_1_PLUS_2:%.*]] = getelementptr inbounds i32, ptr [[ARR]], i64 [[IV_1_PLUS_2]]

; CHECK-NEXT: [[L2:%.*]] = load i32, ptr [[GEP_IV_1_PLUS_2]], align 4 ; CHECK-NEXT: [[L2:%.*]] = load i32, ptr [[GEP_IV_1_PLUS_2]], align 4

; CHECK-NEXT: [[L3:%.*]] = load i32, ptr [[GEP_IV_2]], align 4 ; CHECK-NEXT: [[L3:%.*]] = load i32, ptr [[GEP_IV_2]], align 4

; CHECK-NEXT: [[ADD:%.*]] = add i32 [[L3]], [[L2]] ; CHECK-NEXT: [[ADD:%.*]] = add i32 [[L3]], [[L2]]

; CHECK-NEXT: store i32 [[ADD]], ptr [[GEP_IV_2]], align 4 ; CHECK-NEXT: store i32 [[ADD]], ptr [[GEP_IV_2]], align 4

; CHECK-NEXT: [[IV_2_NEXT]] = add nuw nsw i64 [[IV_2]], 3 ; CHECK-NEXT: [[IV_2_NEXT]] = add nuw nsw i64 [[IV_2]], 3

; CHECK-NEXT: [[ICMP:%.*]] = icmp ugt i64 [[IV_2]], 50 ; CHECK-NEXT: [[ICMP:%.*]] = icmp ugt i64 [[IV_2]], 50

; CHECK-NEXT: br i1 [[ICMP]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]] ; CHECK-NEXT: br i1 [[ICMP]], label [[EXIT:%.*]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]

; CHECK: exit: ; CHECK: exit:

; CHECK-NEXT: ret void ; CHECK-NEXT: ret void

; ;

entry: entry:

br label %loop br label %loop

loop: loop:

%iv.1 = phi i64 [ 1, %entry ], [ %iv.1.next, %loop ] %iv.1 = phi i64 [ 1, %entry ], [ %iv.1.next, %loop ]

%iv.2 = phi i64 [ 4, %entry ], [ %iv.2.next, %loop ] %iv.2 = phi i64 [ 4, %entry ], [ %iv.2.next, %loop ]

%iv.1.next = add nuw nsw i64 %iv.1, 3 %iv.1.next = add nuw nsw i64 %iv.1, 3

%iv.1.plus.4 = add nuw nsw i64 %iv.1, 4 %iv.1.plus.4 = add nuw nsw i64 %iv.1, 4

%gep.iv.1.plus.4 = getelementptr inbounds i32, ptr %arr, i64 %iv.1.plus.4 %gep.iv.1.plus.4 = getelementptr inbounds i32, ptr %arr, i64 %iv.1.plus.4

%l1 = load i32, ptr %gep.iv.1.plus.4 %l1 = load i32, ptr %gep.iv.1.plus.4

%gep.iv.2 = getelementptr inbounds i32, ptr %arr, i64 %iv.2 %gep.iv.2 = getelementptr inbounds i32, ptr %arr, i64 %iv.2

store i32 %l1, ptr %gep.iv.2 store i32 %l1, ptr %gep.iv.2

%iv.1.plus.2 = add nuw nsw i64 %iv.1, 2 %iv.1.plus.2 = add nuw nsw i64 %iv.1, 2

%gep.iv.1.plus.2= getelementptr inbounds i32, ptr %arr, i64 %iv.1.plus.2 %gep.iv.1.plus.2= getelementptr inbounds i32, ptr %arr, i64 %iv.1.plus.2

%l2 = load i32, ptr %gep.iv.1.plus.2 %l2 = load i32, ptr %gep.iv.1.plus.2

%l3 = load i32, ptr %gep.iv.2 %l3 = load i32, ptr %gep.iv.2

AyalUnsubmitted

Not Done

%gep.iv.1.plus.2= getelementptr inbounds i32, ptr %arr, i64 %iv.1.plus.2

- %l2 = load i32, ptr %gep.iv.1.plus.2

%l3 = load i32, ptr %gep.iv.2

+ %l2 = load i32, ptr %gep.iv.1.plus.2

%add = add i32 %l3 , %l2

Swapping these two loads circumvents the current CompletedLoadGroups, and deserves a separate test case.

This is because only the load which creates an interleaved group (the one appearing last in program order) is compared with obstructing stores.

Ayal: Swapping these two loads circumvents the current `CompletedLoadGroups`, and deserves a separate…

annaUnsubmitted

Not Done

Ayal, I had added this test locally along with the more complete fix you suggested for checking all loads in the interleave load group. I didn't see any change in the output with the complete fix versus what we have currently in the patch (checking only the single load which would be %l2 in this case that doesn't obstruct the store).
I'll place what I have for review.

(motivation is another miscompile that looks related to interleaving and I was hoping this more complete fix handles it. Doesn't though. Will add a reproducer upstream).

anna: Ayal, I had added this test locally along with the more complete fix you suggested for checking…

%add = add i32 %l3 , %l2 %add = add i32 %l3 , %l2

store i32 %add, ptr %gep.iv.2 store i32 %add, ptr %gep.iv.2

%iv.2.next = add nuw nsw i64 %iv.2, 3 %iv.2.next = add nuw nsw i64 %iv.2, 3

%icmp = icmp ugt i64 %iv.2, 50 %icmp = icmp ugt i64 %iv.2, 50

br i1 %icmp, label %exit, label %loop br i1 %icmp, label %exit, label %loop

exit: exit:

ret void ret void

} }

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Do not add load to group if it moves across conflicting store.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 538056

llvm/lib/Analysis/VectorUtils.cpp

llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-hoist-load-across-store.ll

[LV] Do not add load to group if it moves across conflicting store.
ClosedPublic