Improve codegen for vectors modulo additions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I think that this patch is fine but I think there may be something missing in terms of vaddudm . Currently a test case like this one:
define dso_local <2 x i64> @x2d(<2 x i64> noundef %x) {
entry:
  %add = shl <2 x i64> %x, <i64 1, i64 1>
  ret <2 x i64> %add
}Produces some fairly inefficient code:
x2d:                                    # @x2d
.Lfunc_begin3:
        .cfi_startproc
.Lfunc_gep3:
        addis 2, 12, .TOC.-.Lfunc_gep3@ha
        addi 2, 2, .TOC.-.Lfunc_gep3@l
.Lfunc_lep3:
        .localentry     x2d, .Lfunc_lep3-.Lfunc_gep3
# %bb.0:                                # %entry
        addis 3, 2, .LCPI3_0@toc@ha
        addi 3, 3, .LCPI3_0@toc@l
        lxv 35, 0(3)
        vsld 2, 2, 3
        blrWe even do a TOC access.
Unfortunately, this isn't just a case of adding:
def : Pat<(v2i64 (shl v2i64:$vA, (v2i64 (immEQOneV)))),
          (v2i64 (VADDUDM $vA, $vA))>;like the others but I think it may be worth doing.
At this point maybe just add the test case and we can deal with the issue at a later date.
Yeah, this is because we don't have a way of materializing the <1, 1> vector so we end up with a constant pool load. We can provide custom legalization:
setOperationAction(ISD::SHL, MVT::v2i64, Custom); for Power8 and up where we would just leave the node alone if it's a shift by 1.
At this point maybe just add the test case and we can deal with the issue at a later date.
I agree this can be done in a follow-up patch.
add new tc and update to test for pwr8 since we are more interested
in the new tc behaviour for pwr8 and orig tc is same for pwr8 vs pwr7.