Page MenuHomePhabricator

Catch combine opportunities for redundant imuls
ClosedPublic

Authored by zansari on Oct 14 2015, 1:28 PM.

Details

Summary

When we fold (mul (add x, c1), c1) -> (add (mul x, c2), c1*c2), we bail if (add x, c1) has multiple users, leaving and extra add instruction. In such cases, this patch adds a check to see if we can eliminate a multiply instruction in exchange for the extra add. combine-multiplies.ll illustrates this issue in a little more detail.

Diff Detail

Repository
rL LLVM

Event Timeline

zansari updated this revision to Diff 37382.Oct 14 2015, 1:28 PM
zansari retitled this revision from to Catch combine opportunities for redundant imuls.
zansari updated this object.
zansari added reviewers: mkuper, delena, spatel, hfinkel.
zansari added a subscriber: llvm-commits.
delena edited edge metadata.Oct 14 2015, 11:57 PM

I agree with this change. I think it should be profitable also for FP operations.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2213 ↗(On Diff #37382)

You can use isConstOrConstSplat() .

spatel added inline comments.Oct 15 2015, 8:57 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2152–2154 ↗(On Diff #37382)

Please move this into a helper function or two. That should reduce the indentation, the parameters can take on the new variable names, and the high-level comments can apply to a whole function.

test/CodeGen/X86/combine-multiplies.ll
1 ↗(On Diff #37382)

I didn't step through the debug output, but I see that this optimization doesn't fire for x86-64. Did you look into that case?

24 ↗(On Diff #37382)

I would prefer to see more of the correct output here rather than a CHECK-NOT. Eg:

CHECK: imull $400, %ecx, %edx # imm = 0x190
CHECK-NEXT: leal (%edx,%eax), %esi
CHECK-NEXT: movl $11, 2020(%esi,%ecx,4)
CHECK-NEXT: movl $22, 2080(%edx,%eax)
CHECK-NEXT: movl $33, 10080(%edx,%eax)

You can commit this test case and check for the current codegen before proceeding with this patch; that will make the functional difference from the new combine clearer.

42 ↗(On Diff #37382)

Remove unnecessary attributes.

44 ↗(On Diff #37382)

Looks like the test case text was duplicated when you created the diff.

Should there be a vector version of the test case?

zansari updated this revision to Diff 37511.Oct 15 2015, 1:10 PM
zansari edited edge metadata.

New patch with changes made based on code review comments. Factored out profitability code, and beefed up lit test.

Thanks Elena and Sanjay for the review and comments.

  • I liked the refactoring comment, and I made those changes. Looks much nicer now. Thanks.
  • The code doesn't kick in for 64bit stuff, due to intermediate extends (coincidentally, given your recent patch :) ). That's why I didn't include 64bit in the lit test. I might be worth looking into this later to see if anything can be done.
  • I beefed up the lit test, as you suggested, but I generalized it a little, since I'm not a fan of relying too much on register allocation and symbol table layout.
  • I have no idea what happened to the lit test to duplicate the code. Sorry about that.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2221 ↗(On Diff #37511)

Will this replace both isConstantSplatVector and isa<Constan...> ? Or just the latter? I had a hard time trying to figure out the differences between the former, and isConstOrConstSplay to see if they can be both replaced.

RKSimon added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2221 ↗(On Diff #37511)

You should be able to replace both with isConstOrConstSplat calls. In fact I think you could use isConstantIntBuildVectorOrConstantInt instead and match all constants not just those with a splatted value. This would definitely need a vector test case though.

Thanks, Simon... Let me work on that, and hopefully that will also answer Sanjay's last question regarding needing a vector version in the lit test, to which I didn't comment on earlier.

Zia.

spatel edited edge metadata.Oct 16 2015, 9:36 AM

Thanks, Simon... Let me work on that, and hopefully that will also answer Sanjay's last question regarding needing a vector version in the lit test, to which I didn't comment on earlier.

Thanks for the updates, Zia. Yes, we should have a vector test since this code is designed to handle vectors. Alternatively, you could bail out on vectors in this patch, add a TODO comment for that case, and make vector handling a small follow-on patch.

And yes, I'd certainly like to make this work for 64-bit too and see if D13757 will help in that case.

zansari updated this revision to Diff 37820.Oct 19 2015, 5:57 PM
zansari edited edge metadata.

New patch includes changes to use "isConstOrConstSplat" to consolidate the 2 constant checks, as suggested.

Also, as suggested, I added a test to make sure we catch the vector multiply by constant case. I verified that before this patch we generate 4 pmulduqs, compared to just 2 now (i.e. we eliminate an extra vector multiply after the patch).

Thanks,
Zia.

Before:

movdqa  .LCPI1_0, %xmm1         # xmm1 = [11,11,11,11]
paddd   %xmm0, %xmm1
movdqa  .LCPI1_1, %xmm2         # xmm2 = [22,22,22,22]
pshufd  $245, %xmm1, %xmm3      # xmm3 = xmm1[1,1,3,3]
movdqa  %xmm1, x
pmuludq %xmm2, %xmm1
pshufd  $232, %xmm1, %xmm1      # xmm1 = xmm1[0,2,2,3]
pmuludq %xmm2, %xmm3
pshufd  $232, %xmm3, %xmm3      # xmm3 = xmm3[0,2,2,3]
punpckldq       %xmm3, %xmm1    # xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1]
movdqa  %xmm1, v2
pshufd  $245, %xmm0, %xmm1      # xmm1 = xmm0[1,1,3,3]
pmuludq %xmm2, %xmm0
pshufd  $232, %xmm0, %xmm0      # xmm0 = xmm0[0,2,2,3]
pmuludq %xmm2, %xmm1
pshufd  $232, %xmm1, %xmm1      # xmm1 = xmm1[0,2,2,3]
punpckldq       %xmm1, %xmm0    # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
paddd   .LCPI1_2, %xmm0
movdqa  %xmm0, v3
retl

After:

movdqa  .LCPI1_0, %xmm1         # xmm1 = [11,11,11,11]
paddd   %xmm0, %xmm1
movdqa  .LCPI1_1, %xmm2         # xmm2 = [22,22,22,22]
pshufd  $245, %xmm0, %xmm3      # xmm3 = xmm0[1,1,3,3]
pmuludq %xmm2, %xmm0
pshufd  $232, %xmm0, %xmm0      # xmm0 = xmm0[0,2,2,3]
pmuludq %xmm2, %xmm3
pshufd  $232, %xmm3, %xmm2      # xmm2 = xmm3[0,2,2,3]
punpckldq       %xmm2, %xmm0    # xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
movdqa  .LCPI1_2, %xmm2         # xmm2 = [242,242,242,242]
paddd   %xmm0, %xmm2
movdqa  %xmm2, v2
paddd   .LCPI1_3, %xmm0
movdqa  %xmm0, v3
movdqa  %xmm1, x
retl

Thanks, Zia. Sadly, the scalar case won't fire on x86-64 even after r250560 because of the sexts, but that can be a follow-on patch.

Just a couple of nitpicks, otherwise LGTM.

I think Simon was concerned with the non-splat vector case though, so I'll let him give this another look if that scenario needs a different test case.

test/CodeGen/X86/combine-multiplies.ll
92 ↗(On Diff #37820)

multiple -> multiply

119 ↗(On Diff #37820)

Would a "-mattr=sse2" on the RUN line constrain this enough? I prefer to specify necessary attributes rather than CPU models as proxies for those attributes.

zansari updated this revision to Diff 37906.Oct 20 2015, 12:13 PM

Thanks, Sanjay.

I've incorporated your comments.

Also, apologies to Simon as I seem to have missed the point about "non-splatted" cases. I made the suggested change to use the more general vector constant check (these were never being caught before due to a conservative constant check which I had to change), and I also added another test case to check for the non-splatted cases which verifies the desired elimination of the extra two multiplies.

Thanks, again, and I appreciate your patience with me as I'm slowly learning llvm code and process.
Zia.

The vectors test look fine to me - although please can you give them more descriptive names than 'foo'?

zansari updated this revision to Diff 38018.Oct 21 2015, 8:21 AM

Thanks, Simon.. Will do.

This revision was automatically updated to reflect the committed changes.