This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Use type-legalization cost for code size memop cost.
ClosedPublic

Authored by fhahn on Apr 12 2021, 2:22 AM.

Details

Summary

At the moment, getMemoryOpCost returns 1 for all inputs if CostKind is
CodeSize or SizeAndLatency. This fools LoopUnroll into thinking memory
operations on large vectors have a cost of one, even if they will get
expanded to a large number of memory operations in the backend.

This patch updates getMemoryOpCost to return the cost for the type
legalization for both CodeSize and SizeAndLatency. This should more
accurately reflect the number of memory operations required.

I am not sure how latency should properly be included in SizeAndLatency
from the description, but returning the size cost should be clearly more
accurate.

This does not cause any binary changes when building
MultiSource/SPEC2000/SPEC2006 with -O3 -flto for AArch64, likely because
large vector memops are not really formed by code emitted from Clang.
But using the C/C++ matrix extension can easily result in code with very
large vector operations directly from Clang, e.g.
https://clang.godbolt.org/z/6xzxcTGvb

Diff Detail

Event Timeline

fhahn created this revision.Apr 12 2021, 2:22 AM
fhahn requested review of this revision.Apr 12 2021, 2:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2021, 2:22 AM
samparker accepted this revision.Apr 12 2021, 2:31 AM

Makes sense to me.

This revision is now accepted and ready to land.Apr 12 2021, 2:31 AM
fhahn updated this revision to Diff 337658.Apr 15 2021, 1:22 AM

rebase on top of recent changes before committing.

This revision was landed with ongoing or failed builds.Apr 15 2021, 2:11 AM
This revision was automatically updated to reflect the committed changes.