This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Improve cost model to handle sdiv by a pow-of-two.
AbandonedPublic

Authored by mcrosier on Aug 14 2015, 9:47 AM.

Details

Summary

This patch improves the target-specific cost model to better handle signed
division by a power of two. The immediate result is that this enables the SLP
vectorizer and loop vectorizer to do a better job.

Just something I saw in passing. This is already done for the X86 and AArch64 backends.

Diff Detail

Repository
rL LLVM

Event Timeline

mcrosier updated this revision to Diff 32159.Aug 14 2015, 9:47 AM
mcrosier retitled this revision from to [ARM] Improve cost model to handle sdiv by a pow-of-two..
mcrosier updated this object.
mcrosier added reviewers: t.p.northover, jmolloy, Gerolf.
mcrosier set the repository for this revision to rL LLVM.
mcrosier added subscribers: gberry, mssimpso, junbuml, bmakam.

It seems like we may want to factor this code out (perhaps into a protected TargetTransformInfo function) so that all targets that use this DAG combine (which looks to be most of them) can use it as well.

I see a couple of potential issues with the cost estimation itself:

  1. the DAG combine adds an additional SUB if the divisor is negative which isn't accounted for
  2. the OperandValueKinds (OpInfo1, OpInfo2) are used for all of the operations, but these do not describe the actual operands for the intermediate instructions since they are not operating on the original operands of the SDIV. For example, the ADD instruction is an add of two intermediate results which we know nothing about (so they should be marked as OK_AnyValue), but operand 2 of the ADD will be marked as OK_UniformConstant. This may not make a difference for this particular pattern on the targets in question, but could potentially be incorrect for other targets.

Thanks for the review, Geoff. Unfortunately, I don't have time revisit this change (again, mostly a drive by patch). Perhaps someone else in the community would like to push this one through.. :)

I agree with Geoff that this should be factored out and thought well.

Not only such a cost calculation should be used everywhere, but it looks like a special case in its own. Without knowing the effect of this cost change into other patterns, it's hard to decide if this is truly beneficial overall, or just a local spike in this use-case.

I agree that the cost tables are not relying enough on knowledge of the instructions and timings and there's a lot of fudge already to help the vectorizer do a better job, but that fudge is in place after general agreements from benchmarks and large code bases. I don't like it, but I also don't have a better solution right now.

Unfortunately, any change needs to come with some benchmarking results, even those that would make the cost table better. :/

For the patch, specifically, I'd suggest this to be factored out into a higher level (for all scalar divide needs), and also take into account platforms that have HW divide.

@haicheng: Feel free to take a look at this..

mcrosier abandoned this revision.Oct 6 2015, 6:49 AM