This patch aims at starting a conversation about how people think we should approach forming VW variants (operations that widen their inputs arguments) more aggressively.
Currently we fold sign/zero extensions in instructions that support widening only when the result of the extension is used only once.
The current (WIP) patch lifts this limitation by checking whether all the users of the extension support the folding and by allowing the transformation when that's the case.
The patch is far from being perfect because it doesn't actually check that the folding will happen for all the instructions (and in true SDISel fashion will be defeated by basic block boundaries) but demonstrates what could be achieved, codegen-wise, with the added test:
--- old_codegen.s 2022-09-13 00:12:48.989575265 +0000 +++ new_codegen.s 2022-09-13 00:13:02.134793836 +0000 @@ -16,30 +16,28 @@ .Lfunc_end0: .size vwmul_v2i16, .Lfunc_end0-vwmul_v2i16 .cfi_endproc # -- End function .globl vwmul_v2i16_multiple_users # -- Begin function vwmul_v2i16_multiple_users .p2align 2 .type vwmul_v2i16_multiple_users,@function vwmul_v2i16_multiple_users: # @vwmul_v2i16_multiple_users .cfi_startproc # %bb.0: - vsetivli zero, 2, e16, mf4, ta, mu + vsetivli zero, 2, e8, mf8, ta, mu vle8.v v8, (a0) vle8.v v9, (a1) vle8.v v10, (a2) - vsext.vf2 v11, v8 - vsext.vf2 v8, v9 - vsext.vf2 v9, v10 - vmul.vv v8, v11, v8 - vmul.vv v9, v11, v9 - vor.vv v8, v8, v9 + vwmul.vv v11, v8, v9 + vwmul.vv v9, v8, v10 + vsetvli zero, zero, e16, mf4, ta, mu + vor.vv v8, v11, v9
@craig.topper How do you think we should approach forming VW instructions?
Can this be something like llvm::all_of(Val->uses()?