This is an archive of the discontinued LLVM Phabricator instance.

[X86, AVX] adjust tablegen patterns to generate better code for scalar insertion into zero vector (PR23073)
ClosedPublic

Authored by spatel on Apr 1 2015, 5:52 PM.

Details

Summary

As noted in PR23073 ( https://llvm.org/bugs/show_bug.cgi?id=23073 ),
for code like this:

define <8 x i32> @load_v8i32() {
  ret <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
}

We produce this AVX code:

_load_v8i32:                            ## @load_v8i32
  movl	$7, %eax
  vmovd	%eax, %xmm0
  vxorps	%ymm1, %ymm1, %ymm1
  vblendps	$1, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7]
  retq

There are at least 2 bugs in play here:

  1. We're generating a blend when a move scalar does the same job using 2 less instruction bytes.
  2. We're not matching an existing pattern that would eliminate the xor and blend entirely. The zero bytes are free with vmovd.

The 2nd fix involves an adjustment of "AddedComplexity" [1] and masks the 1st problem, but I went ahead with a partial fix for that problem in case we ever match that pattern. I'm not sure how to do it, so I don't have an additional test case for it. I'll address the remaining FIXMEs if nobody sees any problems with this patch.

[1] AddedComplexity has close to no documentation in the source. The best we have is this comment: "roughly corresponds to the number of nodes that are covered". It appears that x86 has bastardized this definition by inflating its values for some other undocumented reason. For example, we have a pattern with "AddedComplexity = 400" (!). I searched my way to this page:
https://groups.google.com/forum/#!topic/llvm-dev/5UX-Og9M0xQ

Diff Detail

Event Timeline

spatel updated this revision to Diff 23112.Apr 1 2015, 5:52 PM
spatel retitled this revision from to [X86, AVX] adjust tablegen patterns to generate better code for scalar insertion into zero vector (PR23073).
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: ab, qcolombet, craig.topper, andreadb.
spatel added a subscriber: Unknown Object (MLST).
spatel updated this object.Apr 1 2015, 5:57 PM
craig.topper added inline comments.Apr 1 2015, 10:54 PM
test/CodeGen/X86/vector-shuffle-256-v4.ll
847

Why is this not ALL-NEXT anymore?

spatel added inline comments.Apr 2 2015, 7:13 AM
test/CodeGen/X86/vector-shuffle-256-v4.ll
847

Hi Craig -

We now print a 'kill' ahead of the xor. I didn't see any other tests matching on that in this file, so I loosened the check. But I see that other regression test files do have that kind of check, so I'll add another check line:

_insert_reg_and_zero_v4f64:             ## @insert_reg_and_zero_v4f64
## BB#0:
                                     ## kill: XMM0<def> XMM0<kill> YMM0<def>
vxorpd	%xmm1, %xmm1, %xmm1
vmovsd	%xmm0, %xmm1, %xmm0     ## xmm0 = xmm0[0],xmm1[1]
retq
spatel updated this revision to Diff 23148.Apr 2 2015, 7:20 AM

Patch updated to include new CHECK line for 'kill' instruction.

spatel updated this revision to Diff 23154.Apr 2 2015, 8:22 AM

Patch updated with additional tests that check that we'll generate a movs[d,s] instead of blendp[d,s] for the exact patterns that I've modified.

spatel updated this revision to Diff 23156.Apr 2 2015, 8:28 AM

Patch updated again: the old patterns were commented out rather than deleted in the last rev.

spatel updated this revision to Diff 23162.Apr 2 2015, 10:01 AM

Patch updated again:

I removed all of the changes related to blend vs. movs, so this patch is now purely about adjusting the AddedComplexity to fix PR23073.

I did some svn blaming and see the reasoning for the blend patterns. These were added in r219022 by Chandler. But I think that change overstepped, so I've put some FIXMEs in here. I think the procedure is to follow-up on the commit mail for that checkin, so I'll do that next.

qcolombet accepted this revision.Apr 2 2015, 10:13 AM
qcolombet edited edge metadata.

Hi Sanjay,

LGTM as a partial fix.

Thanks,
-Quentin

This revision is now accepted and ready to land.Apr 2 2015, 10:13 AM
This revision was automatically updated to reflect the committed changes.

Thanks, Quentin! Checked in at r233931.