This is an archive of the discontinued LLVM Phabricator instance.

[DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask'))
ClosedPublic

Authored by dmgreen on Apr 14 2022, 9:21 AM.

Details

Summary

If the mask is made up of elements that form a mask in the higher type we can convert shuffle(bitcast into the bitcast type, simplifying the instruction sequence. A v4i32 2,3,0,1 for example can be treated as a 1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load combines, along with helping simplify a number of other tests.

The PowerPC combine for v16i8 splat vector loads needed some fixes to keep it working for v16i8 vectors. This improves the handling of v2i64 shuffles to match too, hopefully improving them in general.

Diff Detail

Event Timeline

dmgreen created this revision.Apr 14 2022, 9:21 AM
dmgreen requested review of this revision.Apr 14 2022, 9:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2022, 9:21 AM
ktkachov added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21969

typo "back"

This looks like a partial inverse of VectorCombine::foldBitcastShuf(). Would it help to limit that transform instead of or in addition to this?

The x86 diffs seem neutral, but it's a little surprising that we get any changes given that we have a bunch of code to combine shuffles that allows peeking through bitcasts. @RKSimon may have more to say on that.

RKSimon added inline comments.Apr 16 2022, 7:05 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21968

We need a TLI.isShuffleMaskLegal check?

22381

We already have this - maybe we'd be better off adding support for binary bitcasted shuffles here?

This looks like a partial inverse of VectorCombine::foldBitcastShuf(). Would it help to limit that transform instead of or in addition to this?

This is coming from the expansion of concats as a DAG combine (see 1ba8f4f67dcf52) but this seemed to help in general. In that insert-extend.ll case there are shuffle(bitcast(buildvectors involved, but it sounds beneficial to canonicalize towards the larger type if the shuffles are otherwise equivalent.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21968

I can add one. I'm pretty sure that isShuffleMaskLegal really means "is this shuffle cheap", with the way it is used at the moment. We can always convert a shuffle back (providing we haven't made anything less undef), so it might not really need to be cheap.

22381

Do you mean move the combineShuffleOfBitcast call here? I can do that.
This looks like it's performing a shuffle(bitcast(shuffle(.. combine.

dmgreen retitled this revision from [DAG] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask')) to [DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask')).Apr 20 2022, 2:36 AM

Ping. Any thoughts on this, especially the PowerPC parts?

RKSimon added inline comments.May 1 2022, 8:24 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21967

If we moved scaleShuffleElements from X86ISelLowering.cpp inside ShuffleVectorSDNode as a static helper (like commuteMask) - could we reuse it here?

dmgreen added inline comments.May 2 2022, 7:31 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21967

Long story short... yes I think it would work, but it handles the X86 specific SM_SentinelZero and the way undef is propagated is different. It could take extra args to specify those behaviours.

But then I found that llvm::widenShuffleMaskElts already does what we want here, so I've used that instead of this loop.

dmgreen updated this revision to Diff 426416.May 2 2022, 7:32 AM
RKSimon added inline comments.May 2 2022, 8:59 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21925

Can this can be dropped now?

dmgreen updated this revision to Diff 426447.May 2 2022, 10:41 AM

Remove little-endian restriction, which seems OK.

LGTM - @nemanjai do you have any comments?

shchenz added inline comments.May 4 2022, 9:09 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9875–9880

Can we integrate splat load check for 2 elements to isSplatShuffleMask() and getSplatIdxForPPCMnemonics()?

dmgreen added inline comments.May 5 2022, 12:11 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9875–9880

Yeah I think I can make that work. It is used in more places, but it shouldn't be an issue if they are only handling v16i8 types.

dmgreen updated this revision to Diff 427217.May 5 2022, 12:14 AM
dmgreen edited the summary of this revision. (Show Details)

Update isSplatShuffleMask and getSplatIdxForPPCMnemonics for v2i64.

shchenz accepted this revision as: shchenz.May 5 2022, 3:20 AM

Thanks. The PowerPC part changes are improvements and LGTM.

But please wait for some days in case @nemanjai has some comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
2140

I think we may need to handle all other types(1/2/4/8 bytes) of splat load for shuffle_vector with same masks in all lanes, like what we do for build_vector on PowerPC. But that should not be this patch's scope.

This revision is now accepted and ready to land.May 5 2022, 3:20 AM
dmgreen added a comment.EditedMay 6 2022, 2:24 AM

Thanks. Let me know if there are any problems.

This revision was landed with ongoing or failed builds.May 6 2022, 2:50 AM
This revision was automatically updated to reflect the committed changes.

I am sorry that I didn't get to this within a more reasonable time. This LGTM for PPC and thanks for the patch.

dyung added a subscriber: dyung.May 8 2022, 9:28 PM

@dmgreen, just a heads up that we are seeing an infinite loop in the compiler on an internal test which I have bisected to this change. I'm working on a repro at the moment and will post it when I can get it.

dyung added a comment.May 8 2022, 11:10 PM

@dmgreen, just a heads up that we are seeing an infinite loop in the compiler on an internal test which I have bisected to this change. I'm working on a repro at the moment and will post it when I can get it.

I have filed issue #55345 for this problem and put a repro in the bug.

I've stumbled on the (seemingly) infinite execution with this patch as well. I found it with code generated by llvm-stress.
So

llc -march=x86-64 -mcpu=corei7 -o /dev/null foo.ll

hangs in

[2022-05-09 08:43:33.796049513] 0x80f9960     Executing Pass 'X86 DAG->DAG Instruction Selection' on Function 'autogen_SD18620'...

Thanks for the reports. They were hopefully fixed in rG02f851950244. Let me know if anything else comes up.