This is an archive of the discontinued LLVM Phabricator instance.

Implement vector rotations on AArch64 using shift-insert instructions.
AbandonedPublic

Authored by resistor on Jul 28 2023, 10:23 PM.

Details

Reviewers
efriedma
Summary

XFAIL the RAX1 pattern matching test for the time being, as this
breaks the pattern matching for it. It still works properly via
intrinsics.

Diff Detail

Event Timeline

resistor created this revision.Jul 28 2023, 10:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2023, 10:23 PM
resistor requested review of this revision.Jul 28 2023, 10:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2023, 10:23 PM

Maybe instead of specifically looking at rotates, it makes sense to try to generically pattern-match (orr x, (srl, y, C)) using known bits? We already have code along these lines in tryLowerToSLI(); it should be straightforward to make it more generic.

If that isn't straightforward, we can go with this approach for now; there isn't really anything wrong with it.

XFAIL the RAX1 pattern matching test for the time being, as this breaks the pattern matching for it.

It should be easy enough to update the pattern, or explicitly handle this case in your new code.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6133

Since we're dealing with power-of-two integer sizes here, no need to check for negative shift amounts. You can just mask the shift amount (Splat.zextOrTrunc(32) & APInt::getLowBitsSet(32, Log2_32(LaneWidth))).

dmgreen added a subscriber: dmgreen.Aug 8 2023, 1:44 PM

Maybe instead of specifically looking at rotates, it makes sense to try to generically pattern-match (orr x, (srl, y, C)) using known bits? We already have code along these lines in tryLowerToSLI(); it should be straightforward to make it more generic.

If that isn't straightforward, we can go with this approach for now; there isn't really anything wrong with it.

XFAIL the RAX1 pattern matching test for the time being, as this breaks the pattern matching for it.

It should be easy enough to update the pattern, or explicitly handle this case in your new code.

Yeah, we should do this using known bits. (And at the least this should apply to all funnel shifts, not just rotates). I have some patches, I can try and put them up for review when I get some time to check the details.

It looks like c782e3497d catches all the testcases in this patch?

resistor abandoned this revision.Aug 8 2023, 6:56 PM

It looks like c782e3497d catches all the testcases in this patch?

I was surprised that it helped so much, it was only going to be part 1 of 3. An ASRA can work well in a lot of cases already.