This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Lower shuffles to permute instructions: rev/revb/revh/revw
ClosedPublic

Authored by wwei on Dec 2 2021, 7:15 AM.

Details

Summary

Attempt to lower a shuffle as a permute instruction(rev/revb/revh/revw) for fixed length SVE.

Diff Detail

Event Timeline

wwei created this revision.Dec 2 2021, 7:15 AM
wwei requested review of this revision.Dec 2 2021, 7:15 AM
Matt added a subscriber: Matt.Dec 2 2021, 9:34 AM
wwei added a comment.Dec 7 2021, 12:30 AM

@paulwalker-arm, could you help to review this patch and D113376?

@paulwalker-arm, could you help to review this patch and D113376?

Sorry @wwei, I was hoping to review both patches today but that hasn't really worked out. I'll take a proper look tomorrow. One observation: The patch could do with some floating point tests to verify the BITCAST logic for those types.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
19322–19323

Calling LowerToPredicatedOp feels like overkill here compared to DAG.getNode(RevOp, DL, NewVT, ..., DAG.getUNDEF(NewVT)

wwei updated this revision to Diff 392701.Dec 8 2021, 3:27 AM

Add some float test cases

wwei added inline comments.Dec 8 2021, 4:05 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
19322–19323

Since there's no unpredicated revb/revh/revw for SVE, LowerToPredicatedOp can help to construct a predicate operand, and can handle merge passthru opcode also. If using DAG.getNode(RevOp, DL, NewVT, ..., DAG.getUNDEF(NewVT), we need some extra code to get a predicate operand and pass correct merge passthru operands.

paulwalker-arm added inline comments.Dec 8 2021, 10:46 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
19322–19323

That's a good point, fair enough.

19332

Is it valid to use class Instruction methods this far into code generation? That said, ShuffleVectorInst::isReverseMask looks potentially unsafe in this context anyway because given a 4 element mask it'll return true for both 3, 2, 1, 0 and 7, 6, 5, 4 but as mentioned below ISD::VECTOR_REVERSE only supports a single operand and so you need to know specifically which of these scenarios apply.

Do you care about the 7, 6, 5, 4 case? I can see test_revv8i32v8i32 is a test for this but I think that is being simplified to a 3, 2, 1, 0 based shuffle before you get here.

I'll note that ARMISelLowering.cpp has the isReverseMask helper function that could be useful.

19333

This looks wrong because ISD::VECTOR_REVERSE only takes a single operand? I presume Op2 it just been ignored and there's nothing in getNode to ensure only a single operand.

llvm/test/CodeGen/AArch64/sve-fixed-length-permute-rev.ll
203–204

I originally wrote a comment asking why the i32 case emitted worse code than the f32 one but then spotted the attribute difference, so this seems more like a negative test. It's worth adding a function comment to make this clear to the user, which also goes for the other "negative" tests. Perhaps it's also worth adding _vl256 to functions where vscale_range is set just to drive the point home?

271

Is there a need for fadd here? I'm kind of assuming that without it the float shuffles are being converted to integer ones and thus you lost test coverage? but then I can see test_revv8f32 doesn't need it so figured I'd ask.

wwei updated this revision to Diff 393171.Dec 9 2021, 8:16 AM
wwei added inline comments.Dec 9 2021, 8:38 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
19332

Maybe we use ShuffleVectorInst::isReverseMask here will be better, because it will call isSingleSourceMask to ensure the shuffle mask chooses elements from exactly one source vector(isReverseMask in ARMISelLowering.cpp will not check this).
I tried various test case forms, but I couldn’t construct this kind(7, 6, 5, 4) of case. It would always be converted to 3, 2, 1, 0 form. To be on the safe side, I still added a check-- Op2.isUndef(), and Op1 will be the only valid operand for VECTOR_REVERSE

19333

yeah, you're right. I removed Op2 since it will always be Undef

llvm/test/CodeGen/AArch64/sve-fixed-length-permute-rev.ll
203–204

I rearranged the order of the test cases, sorted according to attributes, and added some comments for the rev instructions.

271

fadd is not needed, I removed it

wwei added inline comments.Dec 10 2021, 4:25 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
19332

And for REVB/REVH/REVW cases, I added Op2.isUndef() check too.

paulwalker-arm accepted this revision.Dec 14 2021, 7:20 AM

Functionally this looks good but I've added a few stylistic comments below to consider before committing. I found the tests harder to follow than necessary as it wasn't always clear what the intent of a specific test was, especially those when rev/b/h/w instructions are not emitted. More comments would help and I also think having two separate test files, one for rev (permute-rev) and another for revb/revh/revw (permute-rev-elts) would be advantageous.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
19311

In this instance I don't believe this is required because isREVMask does what you need. It's only the uses of ShuffleVectorInst::isReverseMask that need extra protection, which I see you've covered below.

19322–19324

Personally I think the following looks better:

Op = DAG.getNode(ISD::BITCAST, DL, NewVT, Op1);
Op = LowerToPredicatedOp(Op, DAG, RevOp);
Op = DAG.getNode(ISD::BITCAST, DL, ContainerVT, Op);

but then I'm also not a fan of how the function's input parameter Op is overwritten.

llvm/test/CodeGen/AArch64/sve-fixed-length-permute-rev.ll
98–101

This test either looks misplaced (belongs with the other rev tests towards the bottom of this file?) or is perhaps redundant?

223

Can this be moved to the bottom with the other attributes?

284–286

This test looks like it belongs in the top half after test_revhv32i16? (i.e. so it part of the other element shuffle tests as apposed to this blocks of tests which is testing whole vector reversals).

This revision is now accepted and ready to land.Dec 14 2021, 7:20 AM
wwei added a comment.Dec 15 2021, 6:00 AM

@paulwalker-arm Thanks for your comments, the code has been modified based on your review comments, and the test file has been modified too, adding some necessary comments