This is an archive of the discontinued LLVM Phabricator instance.

[x86] fix cost model inaccuracy for vector memory ops
ClosedPublic

Authored by spatel on Mar 9 2016, 10:28 AM.

Details

Summary

The irony of this patch is that the one CPU that is affected is AMD Jaguar, and Jaguar has a completely double-pumped AVX implementation. But getting the cost model to reflect that is a much bigger problem. The small goal here is simply to improve on the lie that !AVX2 == SandyBridge.

Diff Detail

Event Timeline

spatel updated this revision to Diff 50157.Mar 9 2016, 10:28 AM
spatel retitled this revision from to [x86] fix cost model inaccuracy for vector memory ops.
spatel updated this object.
spatel added reviewers: DavidKreitzer, RKSimon, zansari.
spatel added a subscriber: llvm-commits.
zansari accepted this revision.Mar 9 2016, 11:34 AM
zansari edited edge metadata.

lgtm for being an improvement over the existing code.

This revision is now accepted and ready to land.Mar 9 2016, 11:34 AM
DavidKreitzer edited edge metadata.Mar 9 2016, 2:28 PM

This seems reasonable to me also. It might be nice to have a separate X86Subtarget property for double pumped 32-byte load/stores, but that may be overkill for this one use.

This revision was automatically updated to reflect the committed changes.
spatel added a comment.Mar 9 2016, 2:41 PM

This seems reasonable to me also. It might be nice to have a separate X86Subtarget property for double pumped 32-byte load/stores, but that may be overkill for this one use.

Thanks, Zia and Dave. I agree a separate property would be good if we really want to model this better. For reference, I noticed this bug as part of the TTI cost model discussion in PR26837:
https://llvm.org/bugs/show_bug.cgi?id=26837

There's a lot of bigger stuff that could be modeled better. :)
Ideally, I think we would lift latency/throughput data from the SchedMachineModel, but I'm not sure how to do that yet.