This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Use rq gather/scatters for smaller v4 vectors
ClosedPublic

Authored by dmgreen on Jun 4 2021, 1:04 AM.

Details

Summary

A pointer will always fit into an i32, so a rq offset gather/scatter can be used with v4i8 and v4i16 gathers, using a base of 0 and the Ptr as the offsets. The rq gather can then correctly extend the type, allowing us to use the gathers without falling back to scalarizing.

This patch rejigs tryCreateMaskedGatherOffset in the MVEGatherScatterLowering pass to decompose the Ptr into Base:0 + Offset:Ptr (with a scale of 1), if the Ptr could not be decomposed from a GEP. v4i32 gathers will already use qi gathers, this extends that to v4i8 and v4i16 gathers using the extending rq variants.

Diff Detail

Event Timeline

dmgreen created this revision.Jun 4 2021, 1:04 AM
dmgreen requested review of this revision.Jun 4 2021, 1:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 4 2021, 1:04 AM
SjoerdMeijer added inline comments.Jun 4 2021, 1:59 AM
llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
84

Typo?
to -> too.

222

Your description:

decompose Ptr into Base:0 + Offset:Ptr (with a scale of 1), if the Ptr could not be decomposed from a GEP. v4i32 gathers will already use qi gathers, this extends that to v4i8 and v4i16 gathers using the extending rq variants. A pointer will always fit into an i32, so a rq offset gather/scatter can be used with v4i8 and v4i16 gathers, using a base of 0 and the Ptr as the offsets.

would be useful here as a comment to describe decomposePtr.

239

Was wondering if this should be >= 32. Or does that not make any sense, and we never see 64 bit types/values here?

dmgreen updated this revision to Diff 350185.Jun 7 2021, 12:12 AM
dmgreen marked an inline comment as done.

Update comment

llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
222

Yeah, that was meant to be on the declaration above, the sentence just didn't get that far.

239

We have never supported i64 gathers in this pass before, mostly because the vectorizer doesn't usually vectorize i64's. We will only get this far with i32, i16 or i8's.

SjoerdMeijer accepted this revision.Jun 11 2021, 1:39 AM

Cheers, looks like a good change to me.

This revision is now accepted and ready to land.Jun 11 2021, 1:39 AM
This revision was landed with ongoing or failed builds.Jun 15 2021, 9:06 AM
This revision was automatically updated to reflect the committed changes.