The previous SLP-vectorizer tried to vectorize for only one kind of vector register size, MinVecRegSize.
With this patch, we can try to vectorize in the range from MaxVecRegSize to MinVecRegSize.
Details
- Reviewers
mzolotukhin jmolloy
Diff Detail
Event Timeline
lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
3427 | I'm thinking this should be a TTI hook, so each target can define the MinVecRegSize. | |
3723 | Maybe use VecRegSize, rather than Size here? | |
3747 | I believe you've addressed this fix me, correct? | |
3987 | Same. Remove FIXME. | |
4342 | Maybe use VecRegSize, rather than Size here? | |
4419 | I assume this should be deleted, rather than commented out. | |
4489 | I don't think you need the temporary variable. for () { if (tryToVectorizeList()) { Changed = true; it = BB->begin(); e = BB->end(); break; } } | |
4586 | Please remove. | |
4587 | I don't think you need this temporary value. You can just do something like the below, correct? for (...) if (tryToVectorizeList()) { Changed = true; break; } |
Hi Jongwon,
Thanks for working on this, I think that's a natural improvement over what we have now. I agree with Chad's remarks and have some more comments.
The patch now includes two semantical parts:
- Change default MinVecRegSize value for AArch64.
- Actually use MinVecRegSize in tryToVectorizeList and friends.
They look to me completely separated (and, in the end, needed), so I'd suggest tackling them in separate patches.
For the first part - changing the default value - I totally agree with Chad that it should be done via a target hook. You could take a look at how MaxVecRegSize is implemented. In general, SLPVectorizer.cpp shouldn't contain any target-specific code, and all target-dependent info should be requested via hooks (i.e. having a variable IsAArch64 might be a bad idea).
Also, for this kind of change it would be nice to show the impact on compilation and execution time. LLVM test-suite is good for this, results on other popular testsuites (e.g. SPEC) are also valueable.
The second part is good in general, one possible improvement you could do (if you have time for this) is to refactor vectorizeStoreChain and tryToVectorizeList functions. They'll be very similar when tryToVectorizeList is parametrized with a size too. That potentially should remove a lot of duplication in the code, which is always good.
Also, you'll need new tests for the second part, currently I assume the test you have is for the first part.
Thanks,
Michael
lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
4279 | Fixed the code to call tryToVectorizeList() when NumElts > 1 is satisfied. |
I should have suggested this earlier (sorry), but do you think it make sense to push the loops into tryToVectorizeList and canMatchHorizontalReduction? I think it does; it avoids a lot of duplicated code at each call site.
lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
4279 | No that this logic is cleaned up you don't need the temporary bool. for (unsigned VecRegSize = MaxVecRegSize; VecRegSize >= MinVecRegSize; VecRegSize /= 2) { if (tryToVectorizeList(makeArrayRef(IncIt, NumElts), R, None, false, VecRegSize)) { // Success start over because instructions might have been changed. HaveVectorizedPhiNodes = true; Changed = true; break; } } |
do you think it make sense to push the loops into tryToVectorizeList and canMatchHorizontalReduction?
Actually, it might make sense to keep it as is, the reason being the current form is closer to what we have in vectorizeStoreChain. Eventually, it would be cool to merge these two functions together to reduce code duplication. What do you think?
Michael
lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
4279 | Removed the temporary bool. |
Removed the function 'vectorizeStoreChain' and modified the function 'tryToVectroizeList'.
Hi Jongwon,
Thanks for the work, please find some comments below.
Michael
lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
3696 | Nitpick: I'd rather swap if and else blocks to avoid negation in the condition. | |
3705–3706 | I think this check should be combined with the one below: if (!isPowerOf2_32(OpsWidth) || OpsWidth < 2) break; and it should be done independently on vectorizeStoreChain flag. | |
4545 | SLPCostThreshold disappeared after this change. Was it intentional? |
lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
4545 | SLPCostThreshold is a command line option, so it doesn't need to be passed as a function parameter. |
lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
4545 | Ah, right, thanks for pointing that out! |
lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
4545 | Yes. mssimpso is right. |
Hi Jongwon,
I've inlined a few comments. And I agree with Michael about the compile-time experiments. It would be helpful to see what kind of impact trying different register sizes will have. Thanks!
lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
3501 | Unless I missed something, it looks to me like every use of tryToVectorizeList passes VecRegSize. Why make the parameter optional? | |
4222 | I think it would be less confusing and more consistent if MinRegSize was renamed to VecRegSize here. |
This patch doesn't change the default value of MinVecRegSize, so it does not affect the performance.
(On top of this patch, another patch(http://reviews.llvm.org/D19151) is needed to change the default value of MinVecRegSize.)
In commercial benchmark, I found that clang lost the chance of 64-bit SLP vectorization in AArch64.
Although there would be side effect of extension of 64-bit SLP vectorization, the compiler should handle 64-bit SLP vectorization in AArch64. The side effect, if any, should be handled with another patch.
I tried to upload the data about compile-time and execution-time, but .csv file is not attached in the comment box. Please guide me how to upload .csv file if possible.
Hey Jongwon,
I think it's common to just paste in the most significant execution and compile time improvements and regressions. It's not necessary to report every data point.
Even though you're not changing the value of MinVecRegSize here, I think it makes sense to evaluate this patch and D19151 together so we can see the effect (particularly in compile time) that looping over potential register sizes will have.
In SPEC benchmark, compile-time increases by 2.8%. In LLVM test-suite, compile-time decreases by 9 %. The data are measured from the average of three times executions.
I'm thinking this should be a TTI hook, so each target can define the MinVecRegSize.