This is an archive of the discontinued LLVM Phabricator instance.

[TTI] Refine default cost for interleaved load groups with gaps
ClosedPublic

Authored by mssimpso on Jun 1 2016, 11:26 AM.

Details

Summary

This patch refines the default costs for interleaved load groups having gaps. If a load group has gaps, the legalized instructions corresponding to the unused elements will be dead. Thus, we don't need to account for them in the cost model. Instead, we only need to account for the fraction of legalized loads that will actually be used.

This change will have the greatest impact on the cost of interleaved load groups with large factors and few accessed members. (e.g., accessing only one member of an eight-element struct).

Diff Detail

Repository
rL LLVM

Event Timeline

mssimpso updated this revision to Diff 59257.Jun 1 2016, 11:26 AM
mssimpso retitled this revision from to [TTI] Refine default cost for interleaved load groups with gaps.
mssimpso updated this object.
mssimpso added subscribers: llvm-commits, mcrosier.
sbaranga added inline comments.
include/llvm/CodeGen/BasicTTIImpl.h
560 ↗(On Diff #59257)

It might be better to use the returned MVT instead of the returned cost (which might not be related to the number of legal sub-vectors of our wide type).

mssimpso updated this revision to Diff 59578.Jun 3 2016, 10:32 AM
mssimpso updated this object.

Use the returned MVT to compute the number of legalized instructions that will be generated for a wide load.

mssimpso marked an inline comment as done.Jun 3 2016, 10:32 AM
mssimpso edited reviewers, added: sbaranga; removed: silviu.baranga.Jun 3 2016, 10:47 AM
mssimpso removed a subscriber: sbaranga.
sbaranga edited edge metadata.Jun 8 2016, 6:10 AM

This seems reasonable to me. Do you have performance data for this change?

Cheers,
Silviu

include/llvm/CodeGen/BasicTTIImpl.h
552 ↗(On Diff #59578)

Would it better to use (A + B - 1) / B?

mssimpso updated this revision to Diff 60049.Jun 8 2016, 9:23 AM
mssimpso edited edge metadata.

Simplified ceiling computation.

Hi Silviu,

Other than a 9% improvement in spec2000/art, I didn't observe any non-noise performance differences in the test suite, spec2000, or spec2006. Some loops in spec2000/art access only a few elements of a large struct. This patch enables those loops to be vectorized, even though we don't map the accesses to ldN instructions.

mssimpso marked an inline comment as done.Jun 8 2016, 9:24 AM
mssimpso added inline comments.
include/llvm/CodeGen/BasicTTIImpl.h
552 ↗(On Diff #60049)

A + B should never wrap, so this sounds good to me.

sbaranga accepted this revision.Jun 9 2016, 8:30 AM
sbaranga edited edge metadata.

LGTM!

This revision is now accepted and ready to land.Jun 9 2016, 8:30 AM
mssimpso marked an inline comment as done.Jun 9 2016, 8:35 AM

Thanks for the review!

This revision was automatically updated to reflect the committed changes.