Improve codegen for vectors modulo additions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I think that this patch is fine but I think there may be something missing in terms of vaddudm . Currently a test case like this one:
define dso_local <2 x i64> @x2d(<2 x i64> noundef %x) { entry: %add = shl <2 x i64> %x, <i64 1, i64 1> ret <2 x i64> %add }
Produces some fairly inefficient code:
x2d: # @x2d .Lfunc_begin3: .cfi_startproc .Lfunc_gep3: addis 2, 12, .TOC.-.Lfunc_gep3@ha addi 2, 2, .TOC.-.Lfunc_gep3@l .Lfunc_lep3: .localentry x2d, .Lfunc_lep3-.Lfunc_gep3 # %bb.0: # %entry addis 3, 2, .LCPI3_0@toc@ha addi 3, 3, .LCPI3_0@toc@l lxv 35, 0(3) vsld 2, 2, 3 blr
We even do a TOC access.
Unfortunately, this isn't just a case of adding:
def : Pat<(v2i64 (shl v2i64:$vA, (v2i64 (immEQOneV)))), (v2i64 (VADDUDM $vA, $vA))>;
like the others but I think it may be worth doing.
At this point maybe just add the test case and we can deal with the issue at a later date.
Yeah, this is because we don't have a way of materializing the <1, 1> vector so we end up with a constant pool load. We can provide custom legalization:
setOperationAction(ISD::SHL, MVT::v2i64, Custom); for Power8 and up where we would just leave the node alone if it's a shift by 1.
At this point maybe just add the test case and we can deal with the issue at a later date.
I agree this can be done in a follow-up patch.
add new tc and update to test for pwr8 since we are more interested
in the new tc behaviour for pwr8 and orig tc is same for pwr8 vs pwr7.