This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Guard Misaligned 128-bit store penalty by subtarget feature
ClosedPublic

Authored by mssimpso on Dec 12 2016, 9:09 AM.

Details

Summary

This patch checks that the SlowMisaligned128Store subtarget feature is set when penalizing misaligned stores in getMemoryOpCost TTI function.

Event Timeline

mssimpso updated this revision to Diff 81093.Dec 12 2016, 9:09 AM
mssimpso retitled this revision from to [AArch64] Add feature for disabling unaligned quadword store penalty.
mssimpso updated this object.
mssimpso added a subscriber: llvm-commits.
MatzeB edited edge metadata.Dec 12 2016, 10:57 AM

Isn't this the same as "FeatureSlowMisaligned128Store"?

lib/Target/AArch64/AArch64Subtarget.cpp
74 ↗(On Diff #81093)

Use target a target feature (or rather reuse the existing one) and let tablegen deal with a boolean flag instead of manually setting it here. (This switch is only needed for integer values because the tablegen logic only deals with booleans at the moment).

Isn't this the same as "FeatureSlowMisaligned128Store"?

I wasn't aware of this, thanks! It looks like Cyclone is the only subtarget that sets FeatureSlowMisaligned128Store at the moment. To preserve existing behavior for all subtargets (except Kryo and Falkor), should we also set this feature for the others?

Isn't this the same as "FeatureSlowMisaligned128Store"?

I wasn't aware of this, thanks! It looks like Cyclone is the only subtarget that sets FeatureSlowMisaligned128Store at the moment. To preserve existing behavior for all subtargets (except Kryo and Falkor), should we also set this feature for the others?

If you set it for most of the other CPUs then you will change behavior of AArch64TargetLowering::allowsMisalignedMemoryAccesses() and that performStoreCombine() thing. It may very well be one of the many artifacts showing that the aarch64 target was initially designed for cyclone, however someone working on the other ARM chips should comment on that.

Right. I'll update the patch to just use isMisaligned128StoreSlow() in getMemoryOpCost, and add some more folks to the review. Thanks!

mssimpso updated this revision to Diff 81123.Dec 12 2016, 12:10 PM
mssimpso retitled this revision from [AArch64] Add feature for disabling unaligned quadword store penalty to [AArch64] Guard Misaligned 128-bit store penalty by subtarget feature.
mssimpso updated this object.
mssimpso added reviewers: jmolloy, sebpop.
mssimpso marked an inline comment as done.

Addressed Matthias's comments. Thanks!

  • We now use the existing FeatureSlowMisaligned128Store instead of adding a new feature.
  • I added James and Sebastian to the review for opinions about non-Cyclone CPUs.
evandro edited edge metadata.EditedDec 12 2016, 12:50 PM

This feature is rather a subset of FeatureAvoidQuadLdStPairs, used in AArch64InstrInfo::isCandidateToMergeOrPair(), which effectively avoids any 128 load or store, aligned or misaligned.

I actually wonder if, for uniformity's sake, FeatureAvoidQuadLdStPairs should be renamed FeatureSlow256LoadStore. Thoughts?

BTW, is FeatureSlowMisaligned128Store really meant for v2i64 only and for no other 128-bit vector type?

I actually wonder if, for uniformity's sake, FeatureAvoidQuadLdStPairs should be renamed FeatureSlow256LoadStore. Thoughts?

We can certainly discuss renaming that feature, but I think we should probably do so in a separate review/thread.

BTW, is FeatureSlowMisaligned128Store really meant for v2i64 only and for no other 128-bit vector type?

It doesn't look like the feature is limited to v2i64. The fact that the TTI function explicitly checks for an integer type seems wrong to me. But I think Matthias is probably right in that this heuristic was originally intended for Cyclone. It looks fairly clear to me that the heuristic should be guarded by FeatureSlowMisaligned128Store. This will preserve the existing functionality for Cyclone, but will change the other subtargets. Would you have any concerns for Exynos were this the case?

It doesn't look like the feature is limited to v2i64. The fact that the TTI function explicitly checks for an integer type seems wrong to me. But I think Matthias is probably right in that this heuristic was originally intended for Cyclone. It looks fairly clear to me that the heuristic should be guarded by FeatureSlowMisaligned128Store. This will preserve the existing functionality for Cyclone, but will change the other subtargets. Would you have any concerns for Exynos were this the case?

Since this patch qualifies this transformation with FeatureSlowMisaligned128Store, I'm comfortable with verifying if Exynos needs this feature or not in a follow up patch, if needed.

Since this patch qualifies this transformation with FeatureSlowMisaligned128Store, I'm comfortable with verifying if Exynos needs this feature or not in a follow up patch, if needed.

Great. Thanks very much for the feedback, Evandro.

mssimpso updated this revision to Diff 81442.Dec 14 2016, 12:44 PM
mssimpso edited edge metadata.

Updated test case comment.

mcrosier edited edge metadata.Dec 15 2016, 7:23 AM

LGTM! Internally, the performance results for SPEC200X looked good with a few 1-3% gains. One minor regression reported on an internal benchmark.

lib/Target/AArch64/AArch64TargetTransformInfo.cpp
471

A comment to the Cyclone owners and not this particular patch:

Is it possible for the Alignment to be > 16? If so, I'm wondering if the alignment check should be 'Alignment < 16'? Just something I saw in passing.

test/Analysis/CostModel/AArch64/store.ll
2

Perhaps put this RUN line first, since it's the default behavior. We might also consider changing the arm64 to aarch64 on both RUN lines.

5

I'm not a fan of the function name matching an IR keyword.

store -> @store(

..or we might consider a different name for the test.

mcrosier accepted this revision.Dec 15 2016, 7:23 AM
mcrosier edited edge metadata.
This revision is now accepted and ready to land.Dec 15 2016, 7:23 AM
mssimpso updated this revision to Diff 81585.Dec 15 2016, 7:56 AM
mssimpso edited edge metadata.
mssimpso marked 2 inline comments as done.

Addressed Chad's comments. Thanks!

This revision was automatically updated to reflect the committed changes.