This patch prevents increases in the number of instructions, pre-instcombine, due to induction variable scalarization. An increase in instructions can lead to an increase in the compile-time required to simplify the induction variables. We now maintain a new map for scalarized induction variables to prevent us from converting between the scalar and vector forms.
Details
Diff Detail
Event Timeline
Hi Matt,
Looks basically good.
A bit more high-level info about the approach would be nice next time. I think I understand that you are using the new Map in the scalarizeInst.
You're missing tests.
Adam
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
604–607 | Please explain the relation to WidenMap. It might also be a good idea to move it closer there. |
Addressed Adam's comments.
- Renamed ScalarMap to ScalarIVMap since we're currently only using it for induction variables (though it's possible we may want to reuse this in the future to further limit the scalar/vector conversions for other cases).
- Updated ScalarIVMap comment and moved declaration.
- Added comment in vectorizeMemoryInstruction
- Added some pre-instcombine tests.
- Verified compile-time improvement. I see significant speedups in >25 tests from the test-suite with no slowdowns.
Adam,
Thanks for the feedback, and sorry for the lack of explanation. Yes, you understand correctly; here's a quick overview.
When we build the steps for the induction variables we want to be scalar, instead of inserting them into a vector to be maintained in WidenMap, we now leave them as-is and store the steps in ScalarIVMap. Then, whenever we need to scalarize a use of an induction variable, instead of accessing WidenMap and creating an extractelement instruction, we just grab the appropriate value from ScalarIVMap. The relevant extracts are in two places: one in scalarizeInstruction, and one in vectorizeMemoryInstruction.
The approach cuts the number of instructions, pre-instcombine, related to induction variable scalarization by 2/3 since we eliminate, for each step, an insertelement and an extractelement instruction. Compared to always vectorizing the induction variables, scalarizing now doesn't increase the number of instructions. This is because where we previously would have had an extract from the induction variable vector, we now have a scalar step computation. I hope that makes sense!
Thanks for the changes, just one more thing. For absolute paranoia, can you please also add a test with interleaving>1 (no instcombine)?
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2475 | This sounds a bit confusing. How about a comment talking about this IV having a scalarized version, like what you say in the next hunk? |
Addressed Adam's comments.
- Updated comment about GepOperand
- Added tests with interleave > 1 and no instcombine
Thanks!
Please explain the relation to WidenMap. It might also be a good idea to move it closer there.