The benefit from converting narrow loads into a wider loads (r251438) could be
micro-architecturally dependent, as it assumes that a single load with two bitfield
extracts is cheaper than two narrow loads. Currently, this conversion is enabled
only in cortex-a57 on which performance benefits were verified.
Details
- Reviewers
mzolotukhin ab jmolloy mcrosier
Diff Detail
Event Timeline
lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp | ||
---|---|---|
1196 | Why not call enableNarrowLdMerge(MBB->getParent()) here? You would need to change the logic in enableNarrowLdMerge since you're passing in a pointer to the MF, but that should be trivial. | |
1379 | How about enableNarrowLdMerge, rather than couldNarrowLdMergeEnabled? | |
1395 | I'd still prefer we sink this check into the optimizeBlock() function. |
LGTM. Thanks, Jun.
lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp | ||
---|---|---|
1395 | Jun and I discussed this offline. We shouldn't sink this because we don't need to call this predicate for every function. The answer will always be the same, so we only call it once in the runOnFunction function. |
Why not call enableNarrowLdMerge(MBB->getParent()) here? You would need to change the logic in enableNarrowLdMerge since you're passing in a pointer to the MF, but that should be trivial.