Download Raw Diff

Details

Reviewers

sebpop
rengolin
t.p.northover
gberry
jmolloy
MatzeB
evandro
mcrosier

Commits

rG2c8de192a1ea: [AArch64] Guard Misaligned 128-bit store penalty by subtarget feature
rL289845: [AArch64] Guard Misaligned 128-bit store penalty by subtarget feature

Summary

This patch checks that the SlowMisaligned128Store subtarget feature is set when penalizing misaligned stores in getMemoryOpCost TTI function.

Diff Detail

Repository: rL LLVM

Event Timeline

mssimpso updated this revision to Diff 81093.Dec 12 2016, 9:09 AM

mssimpso retitled this revision from to [AArch64] Add feature for disabling unaligned quadword store penalty.

mssimpso updated this object.

mssimpso added reviewers: MatzeB, rengolin, mcrosier, gberry, t.p.northover.

mssimpso added a subscriber: llvm-commits.

Herald added a subscriber: aemerson. · View Herald TranscriptDec 12 2016, 9:09 AM

Isn't this the same as "FeatureSlowMisaligned128Store"?

lib/Target/AArch64/AArch64Subtarget.cpp
74 ↗	(On Diff #81093)	Use target a target feature (or rather reuse the existing one) and let tablegen deal with a boolean flag instead of manually setting it here. (This switch is only needed for integer values because the tablegen logic only deals with booleans at the moment).

In D27677#620116, @MatzeB wrote:

Isn't this the same as "FeatureSlowMisaligned128Store"?

I wasn't aware of this, thanks! It looks like Cyclone is the only subtarget that sets FeatureSlowMisaligned128Store at the moment. To preserve existing behavior for all subtargets (except Kryo and Falkor), should we also set this feature for the others?

In D27677#620131, @mssimpso wrote:

In D27677#620116, @MatzeB wrote:

Isn't this the same as "FeatureSlowMisaligned128Store"?

I wasn't aware of this, thanks! It looks like Cyclone is the only subtarget that sets FeatureSlowMisaligned128Store at the moment. To preserve existing behavior for all subtargets (except Kryo and Falkor), should we also set this feature for the others?

If you set it for most of the other CPUs then you will change behavior of AArch64TargetLowering::allowsMisalignedMemoryAccesses() and that performStoreCombine() thing. It may very well be one of the many artifacts showing that the aarch64 target was initially designed for cyclone, however someone working on the other ARM chips should comment on that.

Right. I'll update the patch to just use isMisaligned128StoreSlow() in getMemoryOpCost, and add some more folks to the review. Thanks!

Addressed Matthias's comments. Thanks!

We now use the existing FeatureSlowMisaligned128Store instead of adding a new feature.
I added James and Sebastian to the review for opinions about non-Cyclone CPUs.

sebpop added a reviewer: evandro.Dec 12 2016, 12:20 PM

This feature is rather a subset of FeatureAvoidQuadLdStPairs, used in AArch64InstrInfo::isCandidateToMergeOrPair(), which effectively avoids any 128 load or store, aligned or misaligned.

I actually wonder if, for uniformity's sake, FeatureAvoidQuadLdStPairs should be renamed FeatureSlow256LoadStore. Thoughts?

BTW, is FeatureSlowMisaligned128Store really meant for v2i64 only and for no other 128-bit vector type?

In D27677#620254, @evandro wrote:

I actually wonder if, for uniformity's sake, FeatureAvoidQuadLdStPairs should be renamed FeatureSlow256LoadStore. Thoughts?

We can certainly discuss renaming that feature, but I think we should probably do so in a separate review/thread.

BTW, is FeatureSlowMisaligned128Store really meant for v2i64 only and for no other 128-bit vector type?

It doesn't look like the feature is limited to v2i64. The fact that the TTI function explicitly checks for an integer type seems wrong to me. But I think Matthias is probably right in that this heuristic was originally intended for Cyclone. It looks fairly clear to me that the heuristic should be guarded by FeatureSlowMisaligned128Store. This will preserve the existing functionality for Cyclone, but will change the other subtargets. Would you have any concerns for Exynos were this the case?

In D27677#621093, @mssimpso wrote:

It doesn't look like the feature is limited to v2i64. The fact that the TTI function explicitly checks for an integer type seems wrong to me. But I think Matthias is probably right in that this heuristic was originally intended for Cyclone. It looks fairly clear to me that the heuristic should be guarded by FeatureSlowMisaligned128Store. This will preserve the existing functionality for Cyclone, but will change the other subtargets. Would you have any concerns for Exynos were this the case?

Since this patch qualifies this transformation with FeatureSlowMisaligned128Store, I'm comfortable with verifying if Exynos needs this feature or not in a follow up patch, if needed.

In D27677#621103, @evandro wrote:

Since this patch qualifies this transformation with FeatureSlowMisaligned128Store, I'm comfortable with verifying if Exynos needs this feature or not in a follow up patch, if needed.

Great. Thanks very much for the feedback, Evandro.

Updated test case comment.

LGTM! Internally, the performance results for SPEC200X looked good with a few 1-3% gains. One minor regression reported on an internal benchmark.

lib/Target/AArch64/AArch64TargetTransformInfo.cpp
471 ↗	(On Diff #81442)	A comment to the Cyclone owners and not this particular patch: Is it possible for the Alignment to be > 16? If so, I'm wondering if the alignment check should be 'Alignment < 16'? Just something I saw in passing.
test/Analysis/CostModel/AArch64/store.ll
2 ↗	(On Diff #81442)	Perhaps put this RUN line first, since it's the default behavior. We might also consider changing the arm64 to aarch64 on both RUN lines.
6 ↗	(On Diff #81442)	I'm not a fan of the function name matching an IR keyword. store -> @store( ..or we might consider a different name for the test.

mcrosier accepted this revision.Dec 15 2016, 7:23 AM

mcrosier edited edge metadata.

This revision is now accepted and ready to land.Dec 15 2016, 7:23 AM

Addressed Chad's comments. Thanks!

Closed by commit rL289845: [AArch64] Guard Misaligned 128-bit store penalty by subtarget feature (authored by mssimpso). · Explain WhyDec 15 2016, 10:47 AM

This revision was automatically updated to reflect the committed changes.

evandro added a child revision: D27998: [AArch64] Consider all vector types for FeatureSlowMisaligned128Store.Dec 20 2016, 1:13 PM

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Guard Misaligned 128-bit store penalty by subtarget feature
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 81618

llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/trunk/test/Analysis/CostModel/AArch64/store.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Guard Misaligned 128-bit store penalty by subtarget featureClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 81618

llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/trunk/test/Analysis/CostModel/AArch64/store.ll

[AArch64] Guard Misaligned 128-bit store penalty by subtarget feature
ClosedPublic