This patch checks that the SlowMisaligned128Store subtarget feature is set when penalizing misaligned stores in getMemoryOpCost TTI function.
Isn't this the same as "FeatureSlowMisaligned128Store"?
|74 ↗||(On Diff #81093)|
Use target a target feature (or rather reuse the existing one) and let tablegen deal with a boolean flag instead of manually setting it here. (This switch is only needed for integer values because the tablegen logic only deals with booleans at the moment).
I wasn't aware of this, thanks! It looks like Cyclone is the only subtarget that sets FeatureSlowMisaligned128Store at the moment. To preserve existing behavior for all subtargets (except Kryo and Falkor), should we also set this feature for the others?
If you set it for most of the other CPUs then you will change behavior of AArch64TargetLowering::allowsMisalignedMemoryAccesses() and that performStoreCombine() thing. It may very well be one of the many artifacts showing that the aarch64 target was initially designed for cyclone, however someone working on the other ARM chips should comment on that.
Addressed Matthias's comments. Thanks!
- We now use the existing FeatureSlowMisaligned128Store instead of adding a new feature.
- I added James and Sebastian to the review for opinions about non-Cyclone CPUs.
This feature is rather a subset of FeatureAvoidQuadLdStPairs, used in AArch64InstrInfo::isCandidateToMergeOrPair(), which effectively avoids any 128 load or store, aligned or misaligned.
I actually wonder if, for uniformity's sake, FeatureAvoidQuadLdStPairs should be renamed FeatureSlow256LoadStore. Thoughts?
We can certainly discuss renaming that feature, but I think we should probably do so in a separate review/thread.
BTW, is FeatureSlowMisaligned128Store really meant for v2i64 only and for no other 128-bit vector type?
It doesn't look like the feature is limited to v2i64. The fact that the TTI function explicitly checks for an integer type seems wrong to me. But I think Matthias is probably right in that this heuristic was originally intended for Cyclone. It looks fairly clear to me that the heuristic should be guarded by FeatureSlowMisaligned128Store. This will preserve the existing functionality for Cyclone, but will change the other subtargets. Would you have any concerns for Exynos were this the case?
LGTM! Internally, the performance results for SPEC200X looked good with a few 1-3% gains. One minor regression reported on an internal benchmark.
A comment to the Cyclone owners and not this particular patch:
Is it possible for the Alignment to be > 16? If so, I'm wondering if the alignment check should be 'Alignment < 16'? Just something I saw in passing.
Perhaps put this RUN line first, since it's the default behavior. We might also consider changing the arm64 to aarch64 on both RUN lines.
I'm not a fan of the function name matching an IR keyword.
store -> @store(
..or we might consider a different name for the test.