To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding,
which is legalized at type-legalization phase. Use xxsel to match vselect if vsx is open, or use vsel.
Details
- Reviewers
hfinkel nemanjai echristo - Commits
- rG6a3c279d1cdc: [PowerPC] Enhance the selection(ISD::VSELECT) of vector type
rG5b9a4f8ee5d1: [PowerPC] Enhance the selection(ISD::VSELECT) of vector type
rL346824: [PowerPC] Enhance the selection(ISD::VSELECT) of vector type
rL339779: [PowerPC] Enhance the selection(ISD::VSELECT) of vector type
Diff Detail
Event Timeline
llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
590 ↗ | (On Diff #160816) | I don't think Promote does the right thing here; VectorLegalizer::Promote on a VSELECT just bitcasts the operands/result, and the documentation for ISD::VSELECT say "The condition follows the BooleanContent format of the target." So DAGCombine might transform it incorrectly in some cases involving v16i8 or v8i16. Granted, I'm not sure DAGCombine actually does any relevant transforms on the condition of a VSELECT, but it seems like a bad idea to rely on that. |
Add some more detail explanation.
llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
590 ↗ | (On Diff #160816) | Well, for v16i8 the initial DAG is like this SelectionDAG has 15 nodes: t0: ch = EntryToken t6: v16i8,ch = CopyFromReg t0, Register:v16i8 %2 t8: v16i8,ch = CopyFromReg t0, Register:v16i8 %3 **t10: v16i1 = setcc t6, t8, seteq:ch** t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0 t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %1 t11: v16i8 = vselect t10, t2, t4 t13: ch,glue = CopyToReg t0, Register:v16i8 $v2, t11 t14: ch = PPCISD::RET_FLAG t13, Register:v16i8 $v2, t13:1 The setcc result type is v16i1, but legalize type phase will change it to v16i8 as you said that follow the BooleanContent format of the target. So it 's like blow. t0: ch = EntryToken t6: v16i8,ch = CopyFromReg t0, Register:v16i8 %2 t8: v16i8,ch = CopyFromReg t0, Register:v16i8 %3 ** t15: v16i8 = setcc t6, t8, seteq:ch** t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0 t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %1 ** t16: v16i8 = vselect t15, t2, t4** t13: ch,glue = CopyToReg t0, Register:v16i8 $v2, t16 t14: ch = PPCISD::RET_FLAG t13, Register:v16i8 $v2, t13:1 What promote does is just make v16i8 to v4i32 consistently or canonically, so that we can just write one single pattern in td file to select vselect since the xxsel instruction is suitable for all vector type like v16i8/v8i16 because it's bit selection. After legalize operation phase: SelectionDAG has 19 nodes: t0: ch = EntryToken t6: v16i8,ch = CopyFromReg t0, Register:v16i8 %2 t8: v16i8,ch = CopyFromReg t0, Register:v16i8 %3 t15: v16i8 = setcc t6, t8, seteq:ch ** t17: v4i32 = bitcast t15** t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0 ** t18: v4i32 = bitcast t2** t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %1 ** t19: v4i32 = bitcast t4** ** t20: v4i32 = vselect t17, t18, t19** t21: v16i8 = bitcast t20 t13: ch,glue = CopyToReg t0, Register:v16i8 $v2, t21 t14: ch = PPCISD::RET_FLAG t13, Register:v16i8 $v2, t13:1 |
It's reverted before. Reopen it.
Author: nemanjai
Date: Mon Aug 27 13:20:42 2018 UTC (4 weeks ago)
Changed paths: 6
Log Message:[PowerPC] Revert commit r339779
This commit has caused failures in some internal benchmarks. Temporarily
reverting this patch until the issue can be diagnosed and fixed.
llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
590 ↗ | (On Diff #160816) | The SelectionDAG DAG is a target-independent IR, just like LLVM IR; it has target-independent opcodes with target-independent meanings. It's not defined quite as formally as LLVM IR, but the concept is the same. And DAGCombine makes transforms based on that target-independent meaning. And if the target-independent meaning for an operation is that it returns an undefined value, you can't "extend" it to return something defined just because it would be convenient for your target. (IIRC the MIPS backend tried to do this with vector shifts, and ended up with miscompiles.) The issue here, specifically, is that ISD::VSELECT is defined as a per-element select, and the result is undefined if any of the elements is not a boolean of the correct form (on PPC, BooleanContent for a vector requires an all-zeros or all-ones value). There are a couple of ways to solve this: one, introduce a target-specific opcode (PPCISD::XXSEL, or something like that, which has the correct semantics). Two, you could change the promotion logic; promoting a v4i32 vselect to a v16i8 vselect should be fine. |
llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
590 ↗ | (On Diff #160816) |
It's already correct form boolean content in PPC (all-ones) after the type legalization.( That's why the result type of setcc is v16i8 not v16i1). |
llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
590 ↗ | (On Diff #160816) | No, it's not the correct boolean content: the elements of the condition of a v4i32 VSELECT are required to have value 0x00000000 or 0xFFFFFFFF, but your v4i32 can have elements of the form 0x0000FF00, which is not a boolean. |
llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
590 ↗ | (On Diff #160816) | Yes, it's not a perfect way to simply the pattern matching lazily. It's better to write every legal pattern in td file. |
I think it's a workaround that use vsel instead of xxsel for v16i8 and v8i16. So maybe we can change back in another patch after fix the encountered failures in check-all to make the selection consistently.
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
1159 | I can not use this pattern since the register class of XXSEL operand is vsrc, and it does not accept v16i8 or v8i16 as following registerclass definition. def VRRC : RegisterClass<"PPC", [v16i8,v8i16,v4i32,v2i64,v1i128,v4f32,v2f64, f128], 128, (add V2, V3, V4, V5, V0, V1, V6, V7, V8, V9, V10, V11, V12, V13, V14, V15, V16, V17, V18, V19, V31, V30, V29, V28, V27, V26, V25, V24, V23, V22, V21, V20)>; // VSX register classes (the allocation order mirrors that of the corresponding // subregister classes). def VSLRC : RegisterClass<"PPC", [v4i32,v4f32,v2f64,v2i64], 128, (add (sequence "VSL%u", 0, 13), (sequence "VSL%u", 31, 14))>; def VSRC : RegisterClass<"PPC", [v4i32,v4f32,v2f64,v2i64], 128, (add VSLRC, VRRC)>; if I add v16i8 and v8i16 into VSRC, it will cause many failures in check-all. But now v16i8 and v8i16 can not use xxsel but vsel although VSX is available. Why does VSRC RegisterClass not contain v16i8 and v8i16 like VRRC? Should I keep the patch and add v16i8 and v8i16 into VSRC and fix the failures in check-all? | |
test/CodeGen/PowerPC/vsx.ll | ||
485 ↗ | (On Diff #173125) | Since the reason I comment above, xxsel can not be selected. |
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
1159 | There's no need to add the traditional Altivec types to VSX vectors. There are very few VSX instructions that can operate on byte or halfword elements. So for the vast majority of cases, allowing them in VSX registers won't help at all. def : Pat<(v16i8 (vselect v16i8:$vA, v16i8:$vB, v16i8:$vC)), (XXSEL (COPY_TO_REGCLASS $vC, VSRC), (COPY_TO_REGCLASS $vB, VSRC), (COPY_TO_REGCLASS $vA, VSRC))>; |
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
1159 | Thanks a lot for the information. |
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
1159 | I am afraid it can not work. I use following pattern to hack. def : Pat<(v16i8 (vselect v16i8:$vA, v16i8:$vB, v16i8:$vC)), (COPY_TO_REGCLASS (XXSEL (COPY_TO_REGCLASS $vC, VSRC), (COPY_TO_REGCLASS $vB, VSRC), (COPY_TO_REGCLASS $vA, VSRC)), VRRC)>; |
I can not use this pattern since the register class of XXSEL operand is vsrc, and it does not accept v16i8 or v8i16 as following registerclass definition.
if I add v16i8 and v8i16 into VSRC, it will cause many failures in check-all. But now v16i8 and v8i16 can not use xxsel but vsel although VSX is available. Why does VSRC RegisterClass not contain v16i8 and v8i16 like VRRC? Should I keep the patch and add v16i8 and v8i16 into VSRC and fix the failures in check-all?
@nemanjai @hfinkel