Details
Diff Detail
Event Timeline
Thanks Volkan.
Adding a zip to the test above might make the testing stronger, i.e something like shufflevector <8 x i8> %tmp1, <8 x i8> %tmp2, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
Also, would you consider adding a test for arm (32-bit) as well (need not be a big one).
Best Regards.
We don't need to do this now, but should we encode the mask in the G_SHUFFLE_VECTOR itself? Either as multiple Imm MOs, or a single new 'VectorMask' operand of some sort?
test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll | ||
---|---|---|
1423 | Should we turn this into an extractelt in the IRTranslator? Seems like having vector ops 0,1,2 is a nice invariant. (which would be nice to check in the verifier, or even in the MIR parser, if we ever add type constraints in GenericOpcodes.td) |
I think we should probably stick with a single operand to represent the mask since some targets can use vector registers to specify the shuffle mask (e.g. VSHF.df on Mips), but I do agree that a single operand containing a whole constant mask would be nicer to match than the current G_MERGE_VALUES of G_CONSTANTS since there's lots of ways of writing the same constant right now. For example:
(G_MERGE_VALUES (G_CONSTANT i32 0), (G_CONSTANT i32 1), (G_CONSTANT i32 2), (G_CONSTANT i32 3))
and:
(G_MERGE_VALUES (G_MERGE_VALUES (G_CONSTANT i32 0), (G_CONSTANT i32 1)), (G_MERGE_VALUES (G_CONSTANT i32 2), (G_CONSTANT i32 3)))
are the same. The snag is that we would need to limit any flattening to the supportable masks (or be able to revert the transformation).
While I'm thinking about legalizing shuffles, it would be nice in some ways if LegalizeAction were an object so we could write something like:
extern bool isReversedElements(const MachineOperand &); extern bool isInterleavedElements(const MachineOperand &); for (const auto &Ty : {v2s64, v4s32, v8s16, v16s8}) { setAction({G_SHUFFLE_VECTOR, Ty}, ContentSensitiveLegalizer(/* Default case */ Lower)); auto &Legalizer = getAction({G_SHUFFLE_VECTOR, Ty}); if (hasFoo()) { Legalizer.addCase(isReversedElements(), Legal); Legalizer.addCase(isInterleavedElements(), Legal); } if (hasBar()) { Legalizer.addCase([](const MachineOperand &Op) { ... each 4-elements sub-sequence only shuffles within itself ... }, Legal); // Register-based shuffles handle everything. Legalizer.setDefaultCase(Legal); } }
We can achieve the same effect with Custom but it seems nice to keep the high-level conditions together in AArch64LegalizerInfo::AArch64LegalizerInfo() instead of distributing them across various callbacks and re-checking them for every MachineInstr.
I agree, a single operand would be better. In this way, we can simplify constant vectors as well and this problem would be fixed automatically as the mask is a constant vector.
(G_MERGE_VALUES (G_CONSTANT i32 0), (G_CONSTANT i32 1), (G_CONSTANT i32 2), (G_CONSTANT i32 3))and:
(G_MERGE_VALUES (G_MERGE_VALUES (G_CONSTANT i32 0), (G_CONSTANT i32 1)), (G_MERGE_VALUES (G_CONSTANT i32 2), (G_CONSTANT i32 3)))are the same. The snag is that we would need to limit any flattening to the supportable masks (or be able to revert the transformation).
While I'm thinking about legalizing shuffles, it would be nice in some ways if LegalizeAction were an object so we could write something like:
extern bool isReversedElements(const MachineOperand &); extern bool isInterleavedElements(const MachineOperand &); for (const auto &Ty : {v2s64, v4s32, v8s16, v16s8}) { setAction({G_SHUFFLE_VECTOR, Ty}, ContentSensitiveLegalizer(/* Default case */ Lower)); auto &Legalizer = getAction({G_SHUFFLE_VECTOR, Ty}); if (hasFoo()) { Legalizer.addCase(isReversedElements(), Legal); Legalizer.addCase(isInterleavedElements(), Legal); } if (hasBar()) { Legalizer.addCase([](const MachineOperand &Op) { ... each 4-elements sub-sequence only shuffles within itself ... }, Legal); // Register-based shuffles handle everything. Legalizer.setDefaultCase(Legal); } }We can achieve the same effect with Custom but it seems nice to keep the high-level conditions together in AArch64LegalizerInfo::AArch64LegalizerInfo() instead of distributing them across various callbacks and re-checking them for every MachineInstr.
Should we turn this into an extractelt in the IRTranslator? Seems like having vector ops 0,1,2 is a nice invariant. (which would be nice to check in the verifier, or even in the MIR parser, if we ever add type constraints in GenericOpcodes.td)