Page MenuHomePhabricator

[AArch64] Add support to loop vectorization for non temporal loads
Needs ReviewPublic

Authored by zjaffal on Aug 16 2022, 6:36 AM.

Details

Reviewers
fhahn
dmgreen
ab
Summary

Currently, AArch64 doesn't support vectorization for non temporal loads because isLegalNTLoad is not implemented for the target.
This patch applies similar functionality as D73158 but for non temporal loads

Diff Detail

Event Timeline

zjaffal created this revision.Aug 16 2022, 6:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2022, 6:36 AM
zjaffal requested review of this revision.Aug 16 2022, 6:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2022, 6:36 AM

Could you please update the description to provide some details about the fix? It will also require building on top of D131773 to also add support for generating LDNP for smaller types than 256 bits.

dmgreen added inline comments.Aug 16 2022, 9:11 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
329

This can be dyn_cast<FixedVectorType>, which avoids the need for the cast<..> below. The same thing can be done above in isLegalNTStore too.

zjaffal edited the summary of this revision. (Show Details)Aug 16 2022, 9:34 AM
zjaffal updated this revision to Diff 453054.Aug 16 2022, 9:56 AM

Updating D131964: [AArch64] Add support to loop vectorization for non temporal loads

zjaffal marked an inline comment as done.Aug 16 2022, 9:57 AM
Matt added a subscriber: Matt.Aug 16 2022, 1:56 PM

This sounds good to me if Florian has not further comments.

llvm/test/Transforms/LoopVectorize/AArch64/nontemporal-load-store.ll
283

Is it worth adding a quick check line for the code that is produced? ; CHECK: = load <16 x i8>, <16 x i8>* {{.*}}, align 1, !nontemporal !0 so that the new load is tested.

fhahn added inline comments.Aug 19 2022, 11:05 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
331

Is there much difference between the load and store version? Could they just share the same code?

I think this would also require support for generating LDNP for types smaller than 256 bits. @zjaffal is currently looking into this.

zjaffal updated this revision to Diff 460109.Wed, Sep 14, 8:32 AM

Add a check for isLittleEndian