This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Enhance the selection(ISD::VSELECT) of vector type
ClosedPublic

Authored by wuzish on Jul 18 2018, 10:25 PM.

Details

Summary

To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding,
which is legalized at type-legalization phase. Use xxsel to match vselect if vsx is open, or use vsel.

Diff Detail

Repository
rL LLVM

Event Timeline

wuzish created this revision.Jul 18 2018, 10:25 PM

Hi, all.
Could you please have a review if you don't mind :)?

nemanjai accepted this revision.Aug 13 2018, 6:57 AM

LGTM.

This revision is now accepted and ready to land.Aug 13 2018, 6:57 AM
wuzish added a comment.EditedAug 13 2018, 7:03 PM

Could someone commit for me since I don't have svn access. Thank you very much.

Could someone commit for me since I don't have svn access. Thank you very much.

I'll commit it for you later today.

This revision was automatically updated to reflect the committed changes.
efriedma added inline comments.
llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
590

I don't think Promote does the right thing here; VectorLegalizer::Promote on a VSELECT just bitcasts the operands/result, and the documentation for ISD::VSELECT say "The condition follows the BooleanContent format of the target." So DAGCombine might transform it incorrectly in some cases involving v16i8 or v8i16.

Granted, I'm not sure DAGCombine actually does any relevant transforms on the condition of a VSELECT, but it seems like a bad idea to rely on that.

Add some more detail explanation.

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
590

Well, for v16i8 the initial DAG is like this

SelectionDAG has 15 nodes:
  t0: ch = EntryToken
        t6: v16i8,ch = CopyFromReg t0, Register:v16i8 %2
        t8: v16i8,ch = CopyFromReg t0, Register:v16i8 %3
      **t10: v16i1 = setcc t6, t8, seteq:ch**
      t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
      t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %1
    t11: v16i8 = vselect t10, t2, t4
  t13: ch,glue = CopyToReg t0, Register:v16i8 $v2, t11
  t14: ch = PPCISD::RET_FLAG t13, Register:v16i8 $v2, t13:1

The setcc result type is v16i1, but legalize type phase will change it to v16i8 as you said that follow the BooleanContent format of the target. So it 's like blow.

  t0: ch = EntryToken
        t6: v16i8,ch = CopyFromReg t0, Register:v16i8 %2
        t8: v16i8,ch = CopyFromReg t0, Register:v16i8 %3
**      t15: v16i8 = setcc t6, t8, seteq:ch**
      t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
      t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %1
**    t16: v16i8 = vselect t15, t2, t4**
  t13: ch,glue = CopyToReg t0, Register:v16i8 $v2, t16
  t14: ch = PPCISD::RET_FLAG t13, Register:v16i8 $v2, t13:1

What promote does is just make v16i8 to v4i32 consistently or canonically, so that we can just write one single pattern in td file to select vselect since the xxsel instruction is suitable for all vector type like v16i8/v8i16 because it's bit selection.

After legalize operation phase:

 SelectionDAG has 19 nodes:
  t0: ch = EntryToken
            t6: v16i8,ch = CopyFromReg t0, Register:v16i8 %2
             t8: v16i8,ch = CopyFromReg t0, Register:v16i8 %3
           t15: v16i8 = setcc t6, t8, seteq:ch
 **        t17: v4i32 = bitcast t15**
           t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
**        t18: v4i32 = bitcast t2**
           t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %1
**        t19: v4i32 = bitcast t4**
 **      t20: v4i32 = vselect t17, t18, t19**
     t21: v16i8 = bitcast t20
   t13: ch,glue = CopyToReg t0, Register:v16i8 $v2, t21
   t14: ch = PPCISD::RET_FLAG t13, Register:v16i8 $v2, t13:1
wuzish reopened this revision.EditedSep 24 2018, 8:19 PM

It's reverted before. Reopen it.

Author: nemanjai
Date: Mon Aug 27 13:20:42 2018 UTC (4 weeks ago)
Changed paths: 6
Log Message:

[PowerPC] Revert commit r339779

This commit has caused failures in some internal benchmarks. Temporarily
reverting this patch until the issue can be diagnosed and fixed.

This revision is now accepted and ready to land.Sep 24 2018, 8:19 PM

It should be recommitted after the patch https://reviews.llvm.org/D52449

efriedma added inline comments.Sep 25 2018, 12:55 PM
llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
590

The SelectionDAG DAG is a target-independent IR, just like LLVM IR; it has target-independent opcodes with target-independent meanings. It's not defined quite as formally as LLVM IR, but the concept is the same. And DAGCombine makes transforms based on that target-independent meaning. And if the target-independent meaning for an operation is that it returns an undefined value, you can't "extend" it to return something defined just because it would be convenient for your target. (IIRC the MIPS backend tried to do this with vector shifts, and ended up with miscompiles.)

The issue here, specifically, is that ISD::VSELECT is defined as a per-element select, and the result is undefined if any of the elements is not a boolean of the correct form (on PPC, BooleanContent for a vector requires an all-zeros or all-ones value).

There are a couple of ways to solve this: one, introduce a target-specific opcode (PPCISD::XXSEL, or something like that, which has the correct semantics). Two, you could change the promotion logic; promoting a v4i32 vselect to a v16i8 vselect should be fine.

wuzish added inline comments.Sep 25 2018, 8:07 PM
llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
590

The issue here, specifically, is that ISD::VSELECT is defined as a per-element select, and the result is undefined if any of the elements is not a boolean of the correct form (on PPC, BooleanContent for a vector requires an all-zeros or all-ones value).

It's already correct form boolean content in PPC (all-ones) after the type legalization.( That's why the result type of setcc is v16i8 not v16i1).
So all vector type bitcast to v4i32 is just a kind of canonical so that we do not need to write many patterns in td.

efriedma added inline comments.Sep 26 2018, 11:15 AM
llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
590

No, it's not the correct boolean content: the elements of the condition of a v4i32 VSELECT are required to have value 0x00000000 or 0xFFFFFFFF, but your v4i32 can have elements of the form 0x0000FF00, which is not a boolean.

wuzish added inline comments.Nov 4 2018, 10:15 PM
llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
590

Yes, it's not a perfect way to simply the pattern matching lazily.

It's better to write every legal pattern in td file.

wuzish requested review of this revision.Nov 8 2018, 12:57 AM
wuzish updated this revision to Diff 173125.
wuzish edited the summary of this revision. (Show Details)

I think it's a workaround that use vsel instead of xxsel for v16i8 and v8i16. So maybe we can change back in another patch after fix the encountered failures in check-all to make the selection consistently.

lib/Target/PowerPC/PPCInstrVSX.td
1159 ↗(On Diff #173125)

I can not use this pattern since the register class of XXSEL operand is vsrc, and it does not accept v16i8 or v8i16 as following registerclass definition.

def VRRC : RegisterClass<"PPC",
                         [v16i8,v8i16,v4i32,v2i64,v1i128,v4f32,v2f64, f128],
                         128,
                         (add V2, V3, V4, V5, V0, V1, V6, V7, V8, V9, V10, V11,
                             V12, V13, V14, V15, V16, V17, V18, V19, V31, V30,
                             V29, V28, V27, V26, V25, V24, V23, V22, V21, V20)>;

// VSX register classes (the allocation order mirrors that of the corresponding
// subregister classes).
def VSLRC : RegisterClass<"PPC", [v4i32,v4f32,v2f64,v2i64], 128,
                          (add (sequence "VSL%u", 0, 13),
                               (sequence "VSL%u", 31, 14))>;
def VSRC  : RegisterClass<"PPC", [v4i32,v4f32,v2f64,v2i64], 128,
                          (add VSLRC, VRRC)>;

if I add v16i8 and v8i16 into VSRC, it will cause many failures in check-all. But now v16i8 and v8i16 can not use xxsel but vsel although VSX is available. Why does VSRC RegisterClass not contain v16i8 and v8i16 like VRRC? Should I keep the patch and add v16i8 and v8i16 into VSRC and fix the failures in check-all?

@nemanjai @hfinkel

test/CodeGen/PowerPC/vsx.ll
485 ↗(On Diff #173125)

Since the reason I comment above, xxsel can not be selected.

nemanjai added inline comments.Nov 9 2018, 1:24 PM
lib/Target/PowerPC/PPCInstrVSX.td
1159 ↗(On Diff #173125)

There's no need to add the traditional Altivec types to VSX vectors. There are very few VSX instructions that can operate on byte or halfword elements. So for the vast majority of cases, allowing them in VSX registers won't help at all.
All you need to do in this case is use COPY_TO_REGCLASS such as:

def : Pat<(v16i8 (vselect v16i8:$vA, v16i8:$vB, v16i8:$vC)),
          (XXSEL (COPY_TO_REGCLASS $vC, VSRC), 
                 (COPY_TO_REGCLASS $vB, VSRC),
                 (COPY_TO_REGCLASS $vA, VSRC))>;
wuzish added inline comments.Nov 9 2018, 7:28 PM
lib/Target/PowerPC/PPCInstrVSX.td
1159 ↗(On Diff #173125)

Thanks a lot for the information.

wuzish added inline comments.Nov 12 2018, 2:47 AM
lib/Target/PowerPC/PPCInstrVSX.td
1159 ↗(On Diff #173125)

I am afraid it can not work. I use following pattern to hack.

def : Pat<(v16i8 (vselect v16i8:$vA, v16i8:$vB, v16i8:$vC)),
          (COPY_TO_REGCLASS
                 (XXSEL (COPY_TO_REGCLASS $vC, VSRC),
                        (COPY_TO_REGCLASS $vB, VSRC),
                        (COPY_TO_REGCLASS $vA, VSRC)), VRRC)>;
wuzish marked an inline comment as done.Nov 13 2018, 7:12 AM

@efriedma @nemanjai

Gentle ping.

nemanjai accepted this revision.Nov 13 2018, 8:22 AM

LGTM now that we don't bitcast types any longer.

This revision is now accepted and ready to land.Nov 13 2018, 8:22 AM
wuzish edited the summary of this revision. (Show Details)Nov 13 2018, 6:36 PM
This revision was automatically updated to reflect the committed changes.