From the description of the x86 masked compress instruction:
"The opmask register k1 selects the active elements (partial vector or possibly non-contiguous if less than 16 active
elements) from the source operand to compress into a contiguous vector. The contiguous vector is written to the
destination starting from the low element of the destination operand.
Memory destination version: Only the contiguous vector is written to the destination memory location. EVEX.z
must be zero."
Which means that the instruction leaves the upper part of the memory untouched.
The current pattern assumes that the instruction pads the rest of the space with zeros and thus replace the select with ImmAllZeros with the masked compress instruction.
Details
Diff Detail
Event Timeline
test/CodeGen/X86/compress-maskz.ll | ||
---|---|---|
10 ↗ | (On Diff #69536) | I don't see any store instruction in your check. Could you, please, use the "update_" utility to generate the full picture. |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
18861–18862 | Chain should be changed to LoadAddress | |
test/CodeGen/X86/compress-maskz.ll | ||
1 ↗ | (On Diff #69536) | could you please move the test to avx512vl-intrinsics.ll |
I'll let Igor/Elena review the logic, just a couple of comments on style.
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
18862 | Why is this a dyn_cast? if DAG.getLoad() is guaranteed to return a LoadSDNode (and I'd assume it is), you want a cast<>. If it's not, then you need to check the result of the dyn_cast. | |
18863 | The formatting here looks wrong. |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
18862 | We discussed with Igor and got into conclusion that this solution is not safe. The existing solution has the same problem. I suggest to add "IsCompressed" flag to MaskedStoreSDNode and than use it for COMPRESS_TO_MEM. Igor implemented masked_truncstore, you can take a look. |
Entirely changed the approach of the solution as Elena suggested.
Masked compressed store is now lowered to a sole node of masked store with a new isCompressed flag.
Later on the node is replaced with the specific X86 avx3 compress machine instruction.
wrong code alignment