Page MenuHomePhabricator

[LV] Logical and/or select costs

Authored by dmgreen on Apr 5 2021, 7:22 AM.



D99674 stopped the folding of certain select operations into and/or, due to incorrect folding in the presence of poison. D97360 added some costs to attempt to account for the change, but only worked at the getUserCost level, not the getCmpSelInstrCost that the vectorizer will use directly. This adds similar logic into the vectorizer to handle these logical and/or selects, treating them like and/or directly.

This fixes 60% performance regressions from code like the attached test case.

Diff Detail

Event Timeline

dmgreen created this revision.Apr 5 2021, 7:22 AM
dmgreen requested review of this revision.Apr 5 2021, 7:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2021, 7:22 AM
aqjune added inline comments.Apr 5 2021, 6:29 PM

If the select is loop invariant, it is still considered expensive - is there a reason why it is so?

dmgreen added inline comments.Apr 6 2021, 2:43 AM

That would become a %a5 = select i1 %c, <4 x i1> <i1 1, i1 1, i1 1, i1 1>, <4 x i1> %a4
Which I don't think would then be converted to an and/or

At least under MVE, it becomes a branch, as that is how we lower vector SELECTs (as opposed to VSELECT, which are lowered to an MVE instruction using the predicate register).
Which can end up in the loop:

        vmsr    p0, r5
        vldrw.u32       q1, [r4], #16
        adds    r6, #16
        vpsel   q0, q0, q1
        vstrb.8 q0, [r9], #16
        subs.w  lr, lr, #1
        bne     .LBB0_10
        b       .LBB0_12
        vldrw.u32       q0, [r6]
        cmp.w   r12, #0
        ble     .LBB0_8
        vcmp.s32        gt, q0, zr
        b       ,LBB0_9
aqjune accepted this revision.Apr 7 2021, 5:20 PM

LGTM, Thanks!

This revision is now accepted and ready to land.Apr 7 2021, 5:20 PM
This revision was landed with ongoing or failed builds.Apr 8 2021, 2:40 AM
This revision was automatically updated to reflect the committed changes.