This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add support to loop vectorization for non temporal loads
ClosedPublic

Authored by zjaffal on Aug 16 2022, 6:36 AM.

Details

Summary

Currently, AArch64 doesn't support vectorization for non temporal loads because isLegalNTLoad is not implemented for the target.
This patch applies similar functionality as D73158 but for non temporal loads

Diff Detail

Event Timeline

zjaffal created this revision.Aug 16 2022, 6:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2022, 6:36 AM
zjaffal requested review of this revision.Aug 16 2022, 6:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2022, 6:36 AM

Could you please update the description to provide some details about the fix? It will also require building on top of D131773 to also add support for generating LDNP for smaller types than 256 bits.

dmgreen added inline comments.Aug 16 2022, 9:11 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
329

This can be dyn_cast<FixedVectorType>, which avoids the need for the cast<..> below. The same thing can be done above in isLegalNTStore too.

zjaffal edited the summary of this revision. (Show Details)Aug 16 2022, 9:34 AM
zjaffal updated this revision to Diff 453054.Aug 16 2022, 9:56 AM

Updating D131964: [AArch64] Add support to loop vectorization for non temporal loads

zjaffal marked an inline comment as done.Aug 16 2022, 9:57 AM
Matt added a subscriber: Matt.Aug 16 2022, 1:56 PM

This sounds good to me if Florian has not further comments.

llvm/test/Transforms/LoopVectorize/AArch64/nontemporal-load-store.ll
283–284

Is it worth adding a quick check line for the code that is produced? ; CHECK: = load <16 x i8>, <16 x i8>* {{.*}}, align 1, !nontemporal !0 so that the new load is tested.

fhahn added inline comments.Aug 19 2022, 11:05 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
331

Is there much difference between the load and store version? Could they just share the same code?

I think this would also require support for generating LDNP for types smaller than 256 bits. @zjaffal is currently looking into this.

zjaffal updated this revision to Diff 460109.Sep 14 2022, 8:32 AM

Add a check for isLittleEndian

zjaffal updated this revision to Diff 463866.Sep 29 2022, 6:19 AM

Refactor common code between isLegalNTStore and isLegalNTLoad and move it to a seperate function.
Add extra checks in nontemporal-load-store.ll to look for the generated load instruction

fhahn added inline comments.Sep 30 2022, 4:53 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
324

This could also be folded into isLegalNTStoreLoad, same for return BaseT::isLegalNTStore(DataType, Alignment);?

zjaffal updated this revision to Diff 464297.Sep 30 2022, 9:00 AM

fold isLegalNTStore logic into isLegalNTStoreLoad

fhahn added inline comments.Oct 3 2022, 1:59 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
312

This should be also moved to isLegalNTStoreLoad

zjaffal added inline comments.Oct 3 2022, 2:05 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
312

This line only or the whole comment block ?

zjaffal updated this revision to Diff 464632.Oct 3 2022, 2:11 AM

move comment to isLegalNTStoreLoad

fhahn accepted this revision.Oct 3 2022, 5:43 AM

LGTM, thanks!

This revision is now accepted and ready to land.Oct 3 2022, 5:43 AM
This revision was landed with ongoing or failed builds.Oct 3 2022, 9:07 AM
This revision was automatically updated to reflect the committed changes.