This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Consider all vector types for FeatureSlowMisaligned128Store
ClosedPublic

Authored by evandro on Dec 20 2016, 1:11 PM.

Details

Summary

The original code considered only v2i64 as slow for this feature. This patch consider all 128-bit long vector types as slow candidates.

It's not clear why Cyclone, the first processor to use this feature, restricted it to just v2i64 and excluded the other 128-bit vector types. But, in internal tests, extending this feature to all 128-bit vector types resulted in an overall improvement of 1% on Exynos M1.

Diff Detail

Repository
rL LLVM

Event Timeline

evandro updated this revision to Diff 82137.Dec 20 2016, 1:11 PM
evandro retitled this revision from to [AArch64] Consider all vector types for FeatureSlowMisaligned128Store.
evandro updated this object.
evandro set the repository for this revision to rL LLVM.
evandro added subscribers: MatzeB, jmolloy.
mssimpso edited edge metadata.Dec 20 2016, 1:54 PM

This makes sense to me, but it's probably a good idea to have someone with Cyclone knowledge comment.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
474–478

Would it make sense to check the legalized type here? Something like LT.second.is128BitVector()?

rengolin accepted this revision.Dec 22 2016, 10:31 AM
rengolin added a reviewer: rengolin.

This makes sense. LGTM. Thanks!

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
474–478

I'm not sure the type would make any difference on the store. All that matters is that this is a 128-bit store and the required alignment is larger than that. The previous conditional was weird, I think.

This revision is now accepted and ready to land.Dec 22 2016, 10:31 AM

I'll wait a few more days for a Cyclone maintainer to comment before committing this patch.

mssimpso added inline comments.Dec 23 2016, 1:12 PM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
474–478

I guess I was thinking about the case where Src is something like a <4 x i64>. I believe getPrimativeSizeInBits would be 256. But this should get legalized into two 128 bit stores. So isn't the size of the legalized type what matters?

No comments on how this change could affect Cyclone. Does anyone object if I commit this patch in 12h?

mssimpso requested changes to this revision.Jan 5 2017, 4:36 PM
mssimpso edited edge metadata.

Evandro,

Unless I'm missing something, I still think the size of the legalized type is what matters. Do you disagree? Also, please add a test case for the <4 x i64> store I mentioned previously, so we know we have the right cost.

This revision now requires changes to proceed.Jan 5 2017, 4:36 PM

... Also, please add a test case for the <4 x i64> store I mentioned previously, so we know we have the right cost.

Will do.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
474–478

I believe that at this point the type is already legalized, but I'll double check it.

mssimpso added inline comments.Jan 6 2017, 8:53 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
474–478

LT.second is the legalized version of Src. As I previously commented, you should just use LT.second.is128BitVector(). The cost of <4 x i64> should be 24. LT.first is the legalization cost (the number of stores, 2 in this case). So 2*2*6 = 24.

ab added a subscriber: ab.

Digging around, here's my summary of the various comments:

  • with FeatureSlowMisaligned128Store, 128-bit misaligned stores are split because they're slow
  • v2i64 is special because it's generated by memcpy lowering; splitting was disabled for that type because it turned out to cause regressions
  • to minimize the amount of other (non-memcpy) misaligned v2i64 stores, the cost model was taught to penalize them heavily; otherwise, these aren't split and cause slowdowns.

From that perspective, penalizing other 128-bit types sounds reasonable to me, but I don't have any concrete cyclone numbers.

Thanks for digging it up, @ab.

evandro updated this revision to Diff 83859.Jan 10 2017, 1:38 PM
evandro edited edge metadata.
evandro marked an inline comment as done.

Hey Evandro,

It looks like you may have forgotten the test case for unlegalized types (e.g., <4 x i64>).

evandro updated this revision to Diff 83878.Jan 10 2017, 2:47 PM
evandro edited edge metadata.

Included 256-bit vector types in the test.

mssimpso accepted this revision.Jan 10 2017, 2:52 PM
mssimpso edited edge metadata.

LGTM now. Thanks!

This revision is now accepted and ready to land.Jan 10 2017, 2:52 PM
This revision was automatically updated to reflect the committed changes.