Currently we may do iterleaving by more than estimated trip count
coming from the profile or computed maximum trip count. The solution is to
use "best known" trip count instead of exact one in interleaving analysis.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/Transforms/LoopVectorize/X86/no_fpmath_with_hotness.ll | ||
---|---|---|
6 | according to profile original loop has 99 iterations thus interleaving is disabled by short trip count heuristics controlled by tiny-trip-count-interleave-threshold |
This looks reasonable to me, but I'm not a qualified reviewer in this area.
llvm/test/Transforms/LoopVectorize/interleave_short_tc.ll | ||
---|---|---|
1 ↗ | (On Diff #221470) | I would suggest using an autogenerate checks for these. It'll be more verbose, but also more clear about the expected output. See utils/update_test_checks.py and FileCheck's --check-prefix option for the three variations. |
65 ↗ | (On Diff #221470) | Please remove unnecessary aspects of test. |
llvm/test/Transforms/LoopVectorize/interleave_short_tc.ll | ||
---|---|---|
1 ↗ | (On Diff #221470) |
Not sure I understand what exactly you mean but this. I know it is possible to have a dedicated prefix for common checks of two run lines and provide several prefixes to FileCheck. For example: In my case I have three different runs with one common check. Is there a way to combine them? |
llvm/test/Transforms/LoopVectorize/interleave_short_tc.ll | ||
---|---|---|
1 ↗ | (On Diff #221470) |
In fact I did use update_test_checks.py to generate initial checks and then manually removed all unrelated ones. I think that makes test less sensitive to side changes and it's easy to see what we actually care about. On the other hand I do see benefit of having wider context without the need to run 'opt' by hands. It would be interesting to hear what others think in this regard. |
65 ↗ | (On Diff #221470) | Done |
llvm/test/Transforms/LoopVectorize/interleave_short_tc.ll | ||
---|---|---|
65 ↗ | (On Diff #221470) | +1 Just leave metadata which you really need for this test. (Lines 56-70 can be removed) |
Vectorizer code change looks fine with me. I'd like to see the comments updated, though. Any more changes needed for the LIT tests?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5146–5147 | small estimated or constant trip count | |
5212–5213 | If trip count is expected to be small, limit the interleave count to be less than the trip count divided by VF |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5147–5150 | This assumes constant trip count case is handled well by getSmallConstantMaxTripCount called from getSmallBestKnownTC ---- but if that is not the case, that would be a bug on the SCEV side. I traced a little bit but could not verify it myself as I'm not familiar with SCEV code. As such, I'm just pointing out a different SCEV function will be called as a result of this change. |
Minor changes requested by reviewers
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5146–5147 | Done | |
5147–5150 | Maybe I'm missing your concern but constant trip count case is processed first in getSmallBestKnownTC by a call to getSmallConstantTripCount. Thus I don't see any change for constant trip count case at all. | |
5212–5213 | There is some ambiguity in using "small" through out the code. For getSmallBestKnownTC "small" is if it fits 32-bit. For "if (BestKnownTC && *BestKnownTC < TinyTripCountInterleaveThreshold)" check "small" is what less than TinyTripCountInterleaveThreshold. Here "small" should refer to the meaning defined by getSmallBestKnownTC . | |
llvm/test/Transforms/LoopVectorize/interleave_short_tc.ll | ||
65 ↗ | (On Diff #221470) | I tried to remove as much as I can but not all of them can be actually removed. |
@ebrevnov, I recommend you visit http://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access and obtain your own commit access. Your committed and under review patches deserve it.
Thank you very much for your contribution. Let me know if you still would like me to commit this one.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5212–5213 | Sorry for being unclear. I was suggesting an update to the comment. With this patch, BestKnownTC is not constant, right? |
I was going to do that right after this patch lands. Please assist me (hopefully last time) in landing the patch.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5212–5213 | I see what you are talking about. Another ambiguity here :-) BestKnownTC returns compile time constant value which may be exact runtime constant or estimated non-constant. In this case I believe "constant" means that we were able to get a compile time constant value for the trip count. How about the following wording? // If trip count is known or estimated compile time constant, limit .... |
OK. Please update the comment and then I'll land the patch.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5212–5213 | That's fine as well. |
Got it. First time committing through git (SVN is read-only now). Expecting some learning curve. If you'd like this in sooner, you might want to ask someone familiar with the process. Will see if I hit any issues.
Needed to have done this process before SVN went readonly. Please ask @reames to commit. He must be better prepared. Sorry about that.
https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access-to-the-github-repository
I failed to commit interleave_short_tc.ll initially. I fixed that yesterday, but then had to move it into the X86 test directory to appease the build bots.
We have observed some performance regressions, presumably because the vectorized code started to kick-in on short estimated trip count loops (as opposed to skipping vector code and execute scalar code). We'll try following up with cost model tuning. I'm not too surprised if others also hit a similar issue. Overall, though, that's the right direction to head to.