This is an archive of the discontinued LLVM Phabricator instance.

[LV] Avoid rounding errors for predicated instruction costs
ClosedPublic

Authored by mssimpso on Oct 6 2016, 10:21 AM.

Details

Summary

This patch modifies the cost calculation of predicated instructions (div and rem) to avoid the accumulation of rounding errors due to multiple truncating integer divisions. The calculation for predicated stores will be addressed in a follow-on patch, since we currently don't scale the cost of predicated stores by block probability.

Diff Detail

Repository
rL LLVM

Event Timeline

mssimpso updated this revision to Diff 73816.Oct 6 2016, 10:21 AM
mssimpso retitled this revision from to [LV] Avoid rounding errors for predicated instruction costs.
mssimpso updated this object.
mssimpso added reviewers: anemet, mkuper, gilr.
mssimpso added subscribers: llvm-commits, mcrosier.
gilr added inline comments.Oct 7 2016, 8:09 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
6528 ↗(On Diff #73816)

Not sure the use of "auto" instead of "unsigned" adheres to the coding guidelines for auto.

mssimpso added inline comments.Oct 7 2016, 8:11 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
6528 ↗(On Diff #73816)

Good point, I'll make this explicit. Thanks, Gil!

mssimpso updated this revision to Diff 73938.Oct 7 2016, 8:57 AM

Addressed Gil's comment.

  • Made cost type explicit instead of using "auto".
gilr added inline comments.Oct 8 2016, 2:06 AM
test/Transforms/LoopVectorize/AArch64/predication_costs.ll
5 ↗(On Diff #73938)

I'm a bit uncomfortable with this test being under a target-specific sub-directory as it checks a target-independent fix to the cost-model. OTOH I don't know if there's a better way to write a test for the cost-model.

It seems the common solution in LV's tests is to duplicate: gather-cost.ll is duplicated into both AArch64 and X86, specifying the datalayout in the file and the target in the RUN command-line argument. The interleaved_cost.ll test seems to copy the ARM test into the larger AArch64 test.
Is there a way to use multiple prefixes in the test to check multiple platforms, e.g.:
; REQUIRES-TARGET: <TARGET>
; RUN-TARGET: -loop-vectorize -mtriple=<TARGET's tiple> -mcpu=<TARGET's cpu> ...
; CHECK-TARGET: Found an estimated cost of <TARGET's cost> ...
and if so, is it in the spirit of the testing guide?

This is more of a general question. It shouldn't delay this commit if there's no clear answer. I'm fine with duplicating the test (with a comment in each variant referencing the others). I'm also fine with just adding a "not AArch64-specific" comment in the test.

mssimpso added inline comments.Oct 11 2016, 7:54 AM
test/Transforms/LoopVectorize/AArch64/predication_costs.ll
5 ↗(On Diff #73938)

In this case, we could actually probably use the -force-target-instruction-cost flag and then move this up to the target-independent directory. The actual costs don't really matter here, just the way we do the division. I'll give that a shot. Thanks!

mssimpso updated this revision to Diff 74401.Oct 12 2016, 10:54 AM

Addressed Gil's comments

  • Added a comment to the test indicating that the functionality being tested is not AArch64-specific.

I had originally intended to make the test target-independent using the force-target-instruction-cost flag to set the instruction costs. But this flag resets the costs after calculating the scalarization overhead and scaling the costs by block probability. So that doesn't help. I'm not sure of the best way to go about checking multiple targets without excessive duplication, so I went with adding the suggested "not AArch64-specific" comment for now. Thanks!

gilr accepted this revision.Oct 12 2016, 11:48 AM
gilr edited edge metadata.

LGTM - Thanks, Matt!

This revision is now accepted and ready to land.Oct 12 2016, 11:48 AM
This revision was automatically updated to reflect the committed changes.