As mentioned on D127115, this patch that attempts to recognise shuffle masks that could be simplified to a AND mask - we already have a similar transform that will fold AND -> 'clear mask' shuffle, but this patch handles cases where the referenced elements are not from the same lane indices but are known to be zero.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
22618–22619 | Not very important, but this doesn't match the LLVM style. |
llvm/test/CodeGen/X86/vselect-constants.ll | ||
---|---|---|
304 | Do you know what's up with this guy? This seems objectively worse. |
llvm/test/CodeGen/X86/vselect-constants.ll | ||
---|---|---|
304 | We currently reuse the <1,0,0,0> vector constant, purely by chance not design. Anything that attempts to exploit the zero bits tends to break that lucky pattern. Whats the real annoyance is that the <1,0,0,0> was originally <i1 true, i1 false>, but we zero-extended it to <i64 1, i64 0> during promotion instead of sign-extending it which made it a lot harder to fold with the 'all sign bits' elements from the compare - with a little luck this would have folded away entirely as part of shuffle combining :-( |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
22679–22680 | Why is a zero mask better than an undef mask for undef shuffle mask elements? | |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
11814 | Is this saying that MOV #0 + LegalShuffle is always better than create mask + and? I think that sounds OK, so long as it doesn't destroy any BIC patterns. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
22679–22680 | This was from before I added the isVectorClearMaskLegal handling, I'll see if I can relax it again, but XformToShuffleWithZero always forces undef mask elements to zero as "X & undef --> 0 (not undef)" and IIRC I was trying to keep the behaviours as similar as possible. | |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
11814 | There's nothing enforcing it, but M should always be a 'select/blend' style mask (+ undefs) - afaict it will only ever match in isShuffleMaskLegal against 2-element zip style patterns? I think those were the regressions I saw. |
LGTM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
11814 | Yeah I think this sounds OK to enable, from what I can see. The testcase it changes is a little odd as it has quite a few undefs, but in general I've not seen any issues from it from experimenting. |
Not very important, but this doesn't match the LLVM style.