This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Improve code gen for vector add
ClosedPublic

Authored by lei on Jul 4 2023, 7:51 AM.

Details

Summary

Improve codegen for vectors modulo additions.

Diff Detail

Event Timeline

lei created this revision.Jul 4 2023, 7:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 4 2023, 7:51 AM
lei requested review of this revision.Jul 4 2023, 7:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 4 2023, 7:51 AM
lei updated this revision to Diff 537104.Jul 4 2023, 7:55 AM

Patch by Nemanja

I think that this patch is fine but I think there may be something missing in terms of vaddudm . Currently a test case like this one:

define dso_local <2 x i64> @x2d(<2 x i64> noundef %x) {
entry:
  %add = shl <2 x i64> %x, <i64 1, i64 1>
  ret <2 x i64> %add
}

Produces some fairly inefficient code:

x2d:                                    # @x2d
.Lfunc_begin3:
        .cfi_startproc
.Lfunc_gep3:
        addis 2, 12, .TOC.-.Lfunc_gep3@ha
        addi 2, 2, .TOC.-.Lfunc_gep3@l
.Lfunc_lep3:
        .localentry     x2d, .Lfunc_lep3-.Lfunc_gep3
# %bb.0:                                # %entry
        addis 3, 2, .LCPI3_0@toc@ha
        addi 3, 3, .LCPI3_0@toc@l
        lxv 35, 0(3)
        vsld 2, 2, 3
        blr

We even do a TOC access.

Unfortunately, this isn't just a case of adding:

def : Pat<(v2i64 (shl v2i64:$vA, (v2i64 (immEQOneV)))),
          (v2i64 (VADDUDM $vA, $vA))>;

like the others but I think it may be worth doing.

At this point maybe just add the test case and we can deal with the issue at a later date.

Unfortunately, this isn't just a case of adding:

def : Pat<(v2i64 (shl v2i64:$vA, (v2i64 (immEQOneV)))),
          (v2i64 (VADDUDM $vA, $vA))>;

like the others but I think it may be worth doing.

Yeah, this is because we don't have a way of materializing the <1, 1> vector so we end up with a constant pool load. We can provide custom legalization:
setOperationAction(ISD::SHL, MVT::v2i64, Custom); for Power8 and up where we would just leave the node alone if it's a shift by 1.

At this point maybe just add the test case and we can deal with the issue at a later date.

I agree this can be done in a follow-up patch.

lei updated this revision to Diff 539655.Jul 12 2023, 11:31 AM

add new tc and update to test for pwr8 since we are more interested
in the new tc behaviour for pwr8 and orig tc is same for pwr8 vs pwr7.

nemanjai accepted this revision.Jul 13 2023, 8:08 AM

LGTM.

This revision is now accepted and ready to land.Jul 13 2023, 8:08 AM
This revision was automatically updated to reflect the committed changes.