This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Initialize VectorizedValue when gathering
ClosedPublic

Authored by mssimpso on Aug 11 2016, 10:37 AM.

Details

Summary

We abort building vectorizable trees in some cases (e.g., if the maximum recursion depth is reached, if the region size is too large, etc.). If this happens for a reduction, we can be left with a root entry that needs to be gathered. For these cases, we need make sure we actually set VectorizedValue to the resulting vector.

This patch ensures we properly set VectorizedValue and it also ensures the insertelement sequence generated for the gathers is inserted at the correct location (after the last instruction in the bundle). Please let me know what you think.

Reference: https://llvm.org/bugs/show_bug.cgi?id=28330

Diff Detail

Event Timeline

mssimpso updated this revision to Diff 67704.Aug 11 2016, 10:37 AM
mssimpso retitled this revision from to [SLP] Initialize VectorizedValue when gathering.
mssimpso updated this object.
mssimpso added reviewers: nadav, mzolotukhin, mkuper.
mssimpso added subscribers: llvm-commits, hans, mcrosier.
mkuper added inline comments.Aug 15 2016, 2:51 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
2174

Just to make sure I understand this correctly - you're running from Leader, inclusive, so that if VL[0] is the last instruction in program order, you'll end up with LastInst == Leader, right?
If so, it's probably worth a comment - or just initialize LastInst to Leader instead of null.

Also, won't we get worst-case quardatic behavior here, if VL[0] is consistently last (so we end up running to the end of the block for each bundle)?

mssimpso added inline comments.Aug 15 2016, 3:01 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
2174

Right, if VL[0] is consistently last, we'll end up running to the end of the block a lot. I think the common case here should be that VL[0] is first in program order? (But I could be wrong about this). Is there a better way to do this? setInsertPointAfterBundle doesn't really work as expected.

For this particular bug, another way to fix the insertion issue might be to modify Gather to change the insert point before generating each insertlement instruction. What do you think?

mkuper added inline comments.Aug 15 2016, 3:17 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
2174

I agree, the common case is that VL[0] is first - but there are at least two cases where it may be last:

  1. For reductions, we may end up collecting things in reverse order (that is, VL[0] ought to be first, but it ends up last).
  2. If the root if a bundle of stores, we'll sort the bundle so that consecutive stores in the bundle have consecutive pointers.

And changing the order of the root bundle affects the order of all bundles in the tree, since the lanes have to be consistent, so having things in reversed order propagates.

I think changing the insertion point before generating each insertelement instruction doesn't really solve the problem even for gathers, because the inserts need to be in the bundle order (they form a use-def chain), so we can't just put every insert after its VL[i].

I don't have a good solution off the top of my head, though.

mssimpso added inline comments.Aug 16 2016, 7:52 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
2174

I think changing the insertion point before generating each insertelement instruction doesn't really solve the problem even for gathers, because the inserts need to be in the bundle order (they form a use-def chain), so we can't just put every insert after its VL[i].

OK, right. This won't work.

I'm now thinking that it may be possible to use ScheduleData for this. I'll give it a shot.

mssimpso updated this revision to Diff 68223.Aug 16 2016, 10:59 AM

Addressed Michael's comments.

  • Added more details in comments
  • Used ScheduleData to optimize the common case. This prevents us from always iterating over all instructions in the block.
mssimpso marked 2 inline comments as done.Aug 16 2016, 11:00 AM
mkuper added inline comments.Aug 16 2016, 12:04 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
2178

Could you please change this to be something like

} else {
 // If BB hasn't...
...
}

?

The dangling else really bugs me. :-)

2178

Also, how can it happen that the BB hasn't been scheduled?
I thought this only gets called from within vectorizeTree(), which calls scheduleBlock() first?

mssimpso added inline comments.Aug 16 2016, 12:46 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
2178

No problem!

2178

That's a good question... For the test case, an entry for BB is never created in BlocksSchedules because buildTree_rec exits early (max recursion depth). The BlocksSchedules entry isn't created until later on in the function, right before the "practice" scheduling. In vectorizeTree, the "real" scheduling only happens for the blocks in BlocksSchedules.

I bet if we move the BlockSchedules entry creation to the top of buildTree_rec, we can eliminate the "else" block here. I'll update the patch. Thanks!

mssimpso updated this revision to Diff 68248.Aug 16 2016, 1:43 PM

Addressed Michael's comments.

Michael,

I'm not sure we can avoid the "else" block like I previously guessed we might. We can end up with uninitialized schedule data if we exit early from buildTree_rec (in addition to a missing entry in BlocksSchedules). I've detailed the cases in the new comments I added. The "else" block is replaced by a new "if" to check for the null schedule data case.

Iterating over the block to find the insert point should be extremely rare I think, and probably only happens in exceptional cases such as the one demonstrated in the test case where we hit some hard limit and give up building the tree. I tested the current patch on the test suite and spec2000/2006, and we didn't have to iterate over the block a single time.

mkuper edited edge metadata.Aug 16 2016, 2:07 PM

I'm still slightly concerned. One of the reasons we'd bail out of buildTree_rec is if we hit one of the thresholds - whose only reason to exist in the first place is to provide an upper bound on compile time. So if bailing out of buildTree_rec because we exceeded a threshold causes us to take a compile-time hit, that's probably not good.
But I may just be paranoid - we'll take a hit only if we both exceeded the thresholds *and* the bundle is not in-order, and that may be extremely rare.

Other than that, LGTM.

Thanks, Michael.

It looks like we used to number the instructions in the block, and then use the numbering for this purpose. There was even a "getLastInstruction(ArrayRef<Value *> VL)" but this was all removed in rL214494.

Matt.

Erik, do you have any thoughts on this change? Thanks!

hans added a comment.Aug 17 2016, 4:06 PM

This is now the final release-blocking PR for 3.9. No pressure :-)

If there are no objections then, I'll commit this patch with Michael's LGTM to unblock the release. I'll also add a comment describing Michael's compile-time reservations.

hans added a comment.Aug 18 2016, 10:35 AM

If there are no objections then, I'll commit this patch with Michael's LGTM to unblock the release. I'll also add a comment describing Michael's compile-time reservations.

Sounds good to me.

mssimpso accepted this revision.Aug 18 2016, 1:06 PM
mssimpso added a reviewer: mssimpso.
This revision is now accepted and ready to land.Aug 18 2016, 1:06 PM
mssimpso closed this revision.Aug 18 2016, 1:07 PM

Committed in rL279125.

vitalybuka reopened this revision.Aug 20 2016, 12:18 AM
vitalybuka added a subscriber: vitalybuka.

Reverted by r279363

This revision is now accepted and ready to land.Aug 20 2016, 12:18 AM
mssimpso closed this revision.Aug 20 2016, 8:33 AM

Reapplied in rL279369 and rL279370 with a change to the test case that avoids hitting the unsigned behavior. The unsigned behavior will be addressed in D23723.