This is an archive of the discontinued LLVM Phabricator instance.

LoopDistribute/LAA: Respect convergent
ClosedPublic

Authored by arsenm on May 29 2019, 9:08 AM.

Details

Summary

This case is slightly tricky, because loop distribution should be
allowed in some cases, and not others. As long as runtime dependency
checks don't need to be introduced, this should be OK. This is further
complicated by the fact that LoopDistribute partially ignores if LAA
says that vectorization is safe, and then does its own runtime pointer
legality checks.

Note this pass still does not handle noduplicate correctly, as this
should always be forbidden with it. I'm not going to bother trying to
fix it, as it would require more effort and I think noduplicate should
be removed.

Diff Detail

Event Timeline

arsenm created this revision.May 29 2019, 9:08 AM
arsenm updated this revision to Diff 201954.May 29 2019, 9:08 AM

Rephrase debug printing

jdoerfert added inline comments.May 29 2019, 9:54 AM
lib/Analysis/LoopAccessAnalysis.cpp
1783

I would have assumed this to be true initially and set to false if a convergent call was found. That would also remove one of the variables. If there is a reason agains it, please explain.

arsenm marked an inline comment as done.May 29 2019, 10:00 AM
arsenm added inline comments.
lib/Analysis/LoopAccessAnalysis.cpp
1783

The loop scanning for this has early returns, so it may never reach the convergent instruction. Since at least LoopDistribution partially ignores the final result of the CanVecMem analysis, I thought it would be better to keep this as the conservative answer even if the loop bails out early.

Meinersbur added inline comments.May 29 2019, 10:01 AM
lib/Analysis/LoopAccessAnalysis.cpp
1798

What is the reason to not set CanInsertRuntimeCheck = false directly?

If it's because of the early exist in the following loop, since we are no computing two orthogonal properties instead of one, shouldn't we ensure that that we iterate over all instructions. That is, we may early-exit because it is not vectorizable, but it might still be distributable if there are no convergent instructions.

1987

This message carries no information. Remove entirely/move the check from line 2012 before this? The entire code after this seems useless when a convergent call has been found anyway.

lib/Transforms/Scalar/LoopDistribute.cpp
694

[serious] Isn't PredicatedScalarEvolution another source of runtime checks? We should ensure that it's predicate is always true. That is, LoopDistribute's condition is !Pred.isAlwaysTrue() || !Checks.empty().

695–696

The error message does not match the API canInsertRuntimeCheck. We might add more reasons why we cannot have a runtime check. If the analyzeLoop exists early before reaching CanInsertRuntimeCheck = !HasConvergentOp, there might not even be a convergent operation in the loop.

arsenm marked an inline comment as done.May 29 2019, 10:16 AM
arsenm added inline comments.
lib/Analysis/LoopAccessAnalysis.cpp
1987

It's not useless due to the fact that LoopDistribute barely cares about the final result computed here. It still depends on the analyses done later

arsenm marked an inline comment as done.May 29 2019, 10:38 AM
arsenm added inline comments.
lib/Transforms/Scalar/LoopDistribute.cpp
694

I haven't heard of PredicatedScalarEvolution. It seems if I set loop-distribute-scev-check-threshold=0, the only test that fails is symbolic-stride.ll, in the run line where -enable-mem-access-versioning=1. It seems like there are missing tests here?

arsenm marked an inline comment as done.May 29 2019, 1:12 PM
arsenm added inline comments.
lib/Transforms/Scalar/LoopDistribute.cpp
694

I've spent a while trying to break this with no success. Any ideas on how to make SCEV need a check that wouldn't also have a corresponding memory dependency?

kbarton added inline comments.May 30 2019, 9:04 AM
lib/Analysis/LoopAccessAnalysis.cpp
1796

I don't think the braces are necessary here.

1798

I think I agree with this. If I understand things correctly, it's probably better to look for the convergent op separately from this loop, and ensure we iterate over all the instructions in the loop since we're now looking for two, conceptually different, things.

2027

Maybe I'm missing something here, but shouldn't this case (HasConvergentOp) go in the condition above (isDependencyCheckNeeded)?
In other words, is it not possible that we do not need a runtime check and thus can still vectorize?

test/Analysis/LoopAccessAnalysis/unsafe-and-rt-checks-convergent.ll
10

I don't think the triple is needed.

arsenm updated this revision to Diff 203579.Jun 7 2019, 10:47 AM
arsenm marked 5 inline comments as done.

Address comments. Fixes still inserting runtime checks based on SCEV. Also fixes not distributing when runtime checks are necessary according to LAA, but partitioning decides they aren't needed after the loop is distributed

lib/Analysis/LoopAccessAnalysis.cpp
2027

This is what I tried initially, but moved it down here. LoopDistribute has the strange property of partially ignoring the memory dependencies here, and picking out the ones it really cares about. The runtime checks here still need to be detected for that to work

Meinersbur requested changes to this revision.Jun 7 2019, 3:02 PM
Meinersbur added inline comments.
lib/Analysis/LoopAccessAnalysis.cpp
1799

Could you add a small comment like With both, a non-vectorizable memory instruction and a convergent operation, found in this loop, no reason to continue the search?

You could alternatively add a if (HasComplexMemInst && HasConvergentOp) break after this check to break searching as soon as both conditions are true, independently of which one is found first.

1805

[typo] ben

1877

[serious] Is a if (!CanVecMem) return; missing before this? This would switch it back on again independently of whether HasComplexMemInst is set.

lib/Transforms/Scalar/LoopDistribute.cpp
805

Is there a reason to not check Pred.isAlwaysTrue() and !Checks.empty() in the same conditional?

This revision now requires changes to proceed.Jun 7 2019, 3:02 PM
arsenm marked an inline comment as done.Jun 7 2019, 5:00 PM
arsenm added inline comments.
lib/Analysis/LoopAccessAnalysis.cpp
1877

This made me find 2 broken things not covered by tests (D63035)

arsenm updated this revision to Diff 203942.Jun 10 2019, 5:13 PM
arsenm marked 3 inline comments as done.

Fix various failures

lib/Transforms/Scalar/LoopDistribute.cpp
805

This is trying to early return to avoid all of the partitioning

Meinersbur accepted this revision.Jun 11 2019, 4:19 PM

LGTM. Thank you.

lib/Analysis/LoopAccessAnalysis.cpp
1919

Leftover code? Please completely remove commented-out code.

lib/Transforms/Scalar/LoopDistribute.cpp
694

test/Analysis/ScalarEvolution/predicated-trip-count.ll has a case.

This revision is now accepted and ready to land.Jun 11 2019, 4:19 PM
arsenm closed this revision.Jun 12 2019, 6:31 AM
arsenm marked 2 inline comments as done.

r363160

lib/Transforms/Scalar/LoopDistribute.cpp
694

I found an example and based r362854 on this