This is an alternative to D59669 which more aggressively extracts i1 elements from vXi1 bool vectors using a MOVMSK.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
test/CodeGen/X86/avx2-masked-gather.ll | ||
---|---|---|
55 | Annoyingly this doesn't drop through the code, skipping zero-element gathers, instead repeating tests+branches. | |
393 | Repeated PACKSS+MOVMSK instructions - I assume due to it having a undef argument in the xmm0 slot. | |
test/CodeGen/X86/avx512-insert-extract.ll | ||
1024 | we should be able to replace both shifts and the cmp with a single BT $32, %RAX ? | |
test/CodeGen/X86/masked_compressstore.ll | ||
51 | More repeated PACKSS+MOVMSK | |
test/CodeGen/X86/movmsk-cmp.ll | ||
4376 | We should be able to reduce this to a TEST | |
4655 | Repeated comparisons | |
test/CodeGen/X86/setcc-combine.ll | ||
10 | I think this is effectively a NOT that we should be able to handle somehow. |
Updated version, which actually looks pretty similar to D59669.....
I'm performing all the extractions at the same time and only combining if (a) there are only extractions using the source vector and (b) there's more than 1 extract (I'd like to remove this limitation in the future).
The big difference is D59669 tries to limit to setcc usage only, which with our new SimplifyDemandedBits support is probably unnecessary.
test/CodeGen/X86/movmsk-cmp.ll | ||
---|---|---|
4543 | Interesting that we merge this OR chain but fail with the AND chain on movmsk_v2i64 |
This looks like a good refinement of the earlier patch, so I'm happy to abandon D59669 and move forward here.
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
34837 | Matter of taste, but don't need to explicitly use "llvm::" here. | |
34843–34844 | Could use a formula comment of the transform around here such as: // extelt vXi1 X, MaskIdx --> ((movmsk X) & Mask) == Mask | |
34849 | Did framing this as: (x & Mask == Mask) rather than: (x & Mask != 0) make a difference in the output? If so, add a TODO comment about trying to avoid that problem. |
Matter of taste, but don't need to explicitly use "llvm::" here.