This patch modifies the cost calculation of predicated instructions (div and rem) to avoid the accumulation of rounding errors due to multiple truncating integer divisions. The calculation for predicated stores will be addressed in a follow-on patch, since we currently don't scale the cost of predicated stores by block probability.
Details
Diff Detail
- Build Status
Buildable 256 Build 256: arc lint + arc unit
Event Timeline
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
6531 | Not sure the use of "auto" instead of "unsigned" adheres to the coding guidelines for auto. |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
6531 | Good point, I'll make this explicit. Thanks, Gil! |
test/Transforms/LoopVectorize/AArch64/predication_costs.ll | ||
---|---|---|
5 | I'm a bit uncomfortable with this test being under a target-specific sub-directory as it checks a target-independent fix to the cost-model. OTOH I don't know if there's a better way to write a test for the cost-model. It seems the common solution in LV's tests is to duplicate: gather-cost.ll is duplicated into both AArch64 and X86, specifying the datalayout in the file and the target in the RUN command-line argument. The interleaved_cost.ll test seems to copy the ARM test into the larger AArch64 test. This is more of a general question. It shouldn't delay this commit if there's no clear answer. I'm fine with duplicating the test (with a comment in each variant referencing the others). I'm also fine with just adding a "not AArch64-specific" comment in the test. |
test/Transforms/LoopVectorize/AArch64/predication_costs.ll | ||
---|---|---|
5 | In this case, we could actually probably use the -force-target-instruction-cost flag and then move this up to the target-independent directory. The actual costs don't really matter here, just the way we do the division. I'll give that a shot. Thanks! |
Addressed Gil's comments
- Added a comment to the test indicating that the functionality being tested is not AArch64-specific.
I had originally intended to make the test target-independent using the force-target-instruction-cost flag to set the instruction costs. But this flag resets the costs after calculating the scalarization overhead and scaling the costs by block probability. So that doesn't help. I'm not sure of the best way to go about checking multiple targets without excessive duplication, so I went with adding the suggested "not AArch64-specific" comment for now. Thanks!
Not sure the use of "auto" instead of "unsigned" adheres to the coding guidelines for auto.