Page MenuHomePhabricator

[InstCombine] don't widen most selects by hoisting an extend
AbandonedPublic

Authored by spatel on Nov 11 2016, 12:28 PM.

Details

Summary

This is related to the discussion in PR28160:
https://llvm.org/bugs/show_bug.cgi?id=28160

...but it's not the same example. It will help PR30773 more directly:
https://llvm.org/bugs/show_bug.cgi?id=30773

...because we handle selects with a constant operand on a different path than selects with two variables. Assuming this is the right thing to do, the next step will be to allow shrinking selects by sinking an extend after the select. That's easy because we already do that transform in InstCombiner::foldSelectExtConst(), but it's artificially limited to i1 types currently.

An example of the AVX2 codegen improvement from this patch using one of the vector regression tests:

Sext before:

define <4 x i64> @g1vec(<4 x i32> %a, <4 x i1> %cmp) {
  %ext = sext <4 x i32> %a to <4 x i64>
  %sel = select <4 x i1> %cmp, <4 x i64> %ext, <4 x i64> <i64 42, i64 42, i64 42, i64 42>
  ret <4 x i64> %ext
}

vpslld	$31, %xmm1, %xmm1
vpmovsxdq	%xmm1, %ymm1
vpmovsxdq	%xmm0, %ymm0
vbroadcastsd	LCPI1_0(%rip), %ymm2
vblendvpd	%ymm1, %ymm0, %ymm2, %ymm0
retq

Sext after:

define <4 x i64> @g1vec(<4 x i32> %a, <4 x i1> %cmp) {
  %sel = select <4 x i1> %cmp, <4 x i32> %a, <4 x i32> <i32 42, i32 42, i32 42, i32 42>
  %ext = sext <4 x i32> %sel to <4 x i64>
  ret <4 x i64> %ext
}

vpslld	$31, %xmm1, %xmm1
vbroadcastss	LCPI1_0(%rip), %xmm2  <-- smaller load
vblendvps	%xmm1, %xmm0, %xmm2, %xmm0 <-- smaller select
vpmovsxdq	%xmm0, %ymm0
retq

The check for a leading trunc is to avoid regressing this test in test/Transforms/InstCombine/sext.ll:

define i32 @test8(i8 %a, i32 %f, i1 %p, i32* %z) {
; CHECK-LABEL: @test8(
; CHECK-NEXT:    [[D:%.*]] = lshr i32 %f, 24
; CHECK-NEXT:    [[N:%.*]] = select i1 %p, i32 [[D]], i32 0
; CHECK-NEXT:    ret i32 [[N]]
;
  %d = lshr i32 %f, 24
  %e = select i1 %p, i32 %d, i32 0
  %s = trunc i32 %e to i16
  %n = sext i16 %s to i32
  ret i32 %n
}

Diff Detail

Event Timeline

spatel updated this revision to Diff 77644.Nov 11 2016, 12:28 PM
spatel retitled this revision from to [InstCombine] don't widen most selects by hoisting an extend .
spatel updated this object.
spatel added reviewers: efriedma, mkuper, majnemer.
spatel added a subscriber: llvm-commits.
filcab added a subscriber: filcab.Nov 15 2016, 3:21 PM
filcab added inline comments.
lib/Transforms/InstCombine/InstructionCombining.cpp
797

Do you want to match type sizes, though? Or at least make sure you're truncating more (or the same) as you're extending?
Like this:

[build-debug]% cat | ./bin/opt -O3 - -o - -S
define <4 x i64> @g3vec(<4 x i32> %_a, <4 x i1> %cmp) {
  %a = trunc <4 x i32> %_a to <4 x i24>
  %sel = select <4 x i1> %cmp, <4 x i24> %a, <4 x i24> <i24 42, i24 42, i24 42, i24 42>
  %ext = zext <4 x i24> %sel to <4 x i64>
  ret <4 x i64> %ext
}


; ModuleID = '<stdin>'
source_filename = "<stdin>"

; Function Attrs: norecurse nounwind readnone
define <4 x i64> @g3vec(<4 x i32> %_a, <4 x i1> %cmp) local_unnamed_addr #0 {
  %1 = and <4 x i32> %_a, <i32 16777215, i32 16777215, i32 16777215, i32 16777215>
  %2 = zext <4 x i32> %1 to <4 x i64>
  %ext = select <4 x i1> %cmp, <4 x i64> %2, <4 x i64> <i64 42, i64 42, i64 42, i64 42>
  ret <4 x i64> %ext
}

attributes #0 = { norecurse nounwind readnone }

vs just select + zext (using sext will make it even worse :-)

spatel added inline comments.Nov 15 2016, 4:09 PM
lib/Transforms/InstCombine/InstructionCombining.cpp
797

This case would be another improvement over the current behavior, right? Ok if I add a 'TODO' comment in this patch and follow up with another test case plus that refinement?

spatel updated this revision to Diff 78633.Nov 19 2016, 9:57 AM

Patch updated:
0. Preliminary: added a pile of tests for permutations of trunc/sel/ext with rL287400 .

  1. Added a function specifically to handle widening of select, so (in theory) we have this transform in one place and can do it in a principled way.
  2. But some of the tests still show the (unwanted?) changes noted in Filipe's example.
  3. Added TODO comments where those happen (we treat vectors differently than scalars).
  4. Added a FIXME because we're dropping profile metadata for all of these select transforms.
spatel updated this revision to Diff 79334.Nov 26 2016, 8:37 AM

Ping.

Patch updated:

  1. Rebase after rL287980 (no need to add FIXME comment now).
  2. Propagate metadata with SelectInst::Create() ( rL287976 ).
  3. Update trunc_sel_equal_zext / trunc_sel_equal_zext_vec tests to show that metadata is not dropped.
efriedma edited edge metadata.Dec 12 2016, 11:37 AM

I'm not sure this approach is really right... narrower isn't always better. You're just getting lucky with your AVX2 example: <4 x i1> as an argument happens to get passed as a 128-bit vector. If the compare operand were a 64-bit comparison, you'd be making the code worse; consider:

define <4 x i64> @f(<4 x i32> %a, <4 x i64> %b, <4 x i64> %c) {
  %cmp = icmp sgt <4 x i64> %b, %c
  %ext = sext <4 x i32> %a to <4 x i64>
  %sel = select <4 x i1> %cmp, <4 x i64> %ext, <4 x i64> <i64 42, i64 42, i64 42, i64 42>
  ret <4 x i64> %sel
}

This is perfectly straightforward code of the sort you could write using intrinsics in C. Move the sext, and the generated code becomes worse.

I'm not sure this approach is really right... narrower isn't always better. You're just getting lucky with your AVX2 example: <4 x i1> as an argument happens to get passed as a 128-bit vector. If the compare operand were a 64-bit comparison, you'd be making the code worse; consider:

define <4 x i64> @f(<4 x i32> %a, <4 x i64> %b, <4 x i64> %c) {
  %cmp = icmp sgt <4 x i64> %b, %c
  %ext = sext <4 x i32> %a to <4 x i64>
  %sel = select <4 x i1> %cmp, <4 x i64> %ext, <4 x i64> <i64 42, i64 42, i64 42, i64 42>
  ret <4 x i64> %sel
}

This is perfectly straightforward code of the sort you could write using intrinsics in C. Move the sext, and the generated code becomes worse.

Would you say this is a backend pattern-matching hole, or do you object to the IR transform itself? Ie, if we can fix the backend, would this be a valid patch? My view is that a narrower op in IR is better in terms of value tracking and could be thought of as a strength reduction optimization, so this is the theoretically correct approach to the IR...but as always, let me know if I'm off in the weeds. :)

For reference, if we're looking at AVX2 codegen, we have this:

define <4 x i64> @f(<4 x i32> %a, <4 x i64> %b, <4 x i64> %c) {
  %cmp = icmp sgt <4 x i64> %b, %c
  %ext = sext <4 x i32> %a to <4 x i64>
  %sel = select <4 x i1> %cmp, <4 x i64> %ext, <4 x i64> <i64 42, i64 42, i64 42, i64 42>
  ret <4 x i64> %sel
}

define <4 x i64> @g(<4 x i32> %a, <4 x i64> %b, <4 x i64> %c) {
  %cmp = icmp sgt <4 x i64> %b, %c
  %sel = select <4 x i1> %cmp, <4 x i32> %a, <4 x i32> <i32 42, i32 42, i32 42, i32 42>
  %ext = sext <4 x i32> %sel to <4 x i64>
  ret <4 x i64> %ext
}

Which becomes:

_f:                                     ## @f
vpcmpgtq	%ymm2, %ymm1, %ymm1
vpmovsxdq	%xmm0, %ymm0
vbroadcastsd	LCPI0_0(%rip), %ymm2
vblendvpd	%ymm1, %ymm0, %ymm2, %ymm0
retq
_g:                                     ## @g
vpcmpgtq	%ymm2, %ymm1, %ymm1
vextracti128	$1, %ymm1, %xmm2
vpacksswb	%xmm2, %xmm1, %xmm1
vbroadcastss	LCPI1_0(%rip), %xmm2
vblendvps	%xmm1, %xmm0, %xmm2, %xmm0
vpmovsxdq	%xmm0, %ymm0
retq

If we could pattern-match our way out, it might be okay... but I don't think that's realistic in more complicated cases. The sign-extend could be pushed forward through another operation or land in a different basic block. I think it makes more sense to just try to make selects use the "right" width for the target (the width of the compare operands for most targets).

spatel abandoned this revision.May 21 2018, 5:22 PM

If we could pattern-match our way out, it might be okay... but I don't think that's realistic in more complicated cases. The sign-extend could be pushed forward through another operation or land in a different basic block. I think it makes more sense to just try to make selects use the "right" width for the target (the width of the compare operands for most targets).

Sorry for the 18 month delay. Let's do that. :)
D47163

Abandoning.