When we fold (mul (add x, c1), c1) -> (add (mul x, c2), c1*c2), we bail if (add x, c1) has multiple users, leaving and extra add instruction. In such cases, this patch adds a check to see if we can eliminate a multiply instruction in exchange for the extra add. combine-multiplies.ll illustrates this issue in a little more detail.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
I agree with this change. I think it should be profitable also for FP operations.
| lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
|---|---|---|
| 2213 ↗ | (On Diff #37382) | You can use isConstOrConstSplat() . | 
| lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
|---|---|---|
| 2152–2154 ↗ | (On Diff #37382) | Please move this into a helper function or two. That should reduce the indentation, the parameters can take on the new variable names, and the high-level comments can apply to a whole function. | 
| test/CodeGen/X86/combine-multiplies.ll | ||
| 1 ↗ | (On Diff #37382) | I didn't step through the debug output, but I see that this optimization doesn't fire for x86-64. Did you look into that case? | 
| 24 ↗ | (On Diff #37382) | I would prefer to see more of the correct output here rather than a CHECK-NOT. Eg: CHECK:           imull	$400, %ecx, %edx        # imm = 0x190 You can commit this test case and check for the current codegen before proceeding with this patch; that will make the functional difference from the new combine clearer. | 
| 42 ↗ | (On Diff #37382) | Remove unnecessary attributes. | 
| 44 ↗ | (On Diff #37382) | Looks like the test case text was duplicated when you created the diff. Should there be a vector version of the test case? | 
New patch with changes made based on code review comments. Factored out profitability code, and beefed up lit test.
Thanks Elena and Sanjay for the review and comments.
- I liked the refactoring comment, and I made those changes. Looks much nicer now. Thanks.
- The code doesn't kick in for 64bit stuff, due to intermediate extends (coincidentally, given your recent patch :) ). That's why I didn't include 64bit in the lit test. I might be worth looking into this later to see if anything can be done.
- I beefed up the lit test, as you suggested, but I generalized it a little, since I'm not a fan of relying too much on register allocation and symbol table layout.
- I have no idea what happened to the lit test to duplicate the code. Sorry about that.
| lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
|---|---|---|
| 2221 ↗ | (On Diff #37511) | Will this replace both isConstantSplatVector and isa<Constan...> ? Or just the latter? I had a hard time trying to figure out the differences between the former, and isConstOrConstSplay to see if they can be both replaced. | 
| lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
|---|---|---|
| 2221 ↗ | (On Diff #37511) | You should be able to replace both with isConstOrConstSplat calls. In fact I think you could use isConstantIntBuildVectorOrConstantInt instead and match all constants not just those with a splatted value. This would definitely need a vector test case though. | 
Thanks, Simon... Let me work on that, and hopefully that will also answer Sanjay's last question regarding needing a vector version in the lit test, to which I didn't comment on earlier.
Zia.
Thanks for the updates, Zia. Yes, we should have a vector test since this code is designed to handle vectors. Alternatively, you could bail out on vectors in this patch, add a TODO comment for that case, and make vector handling a small follow-on patch.
And yes, I'd certainly like to make this work for 64-bit too and see if D13757 will help in that case.
New patch includes changes to use "isConstOrConstSplat" to consolidate the 2 constant checks, as suggested.
Also, as suggested, I added a test to make sure we catch the vector multiply by constant case. I verified that before this patch we generate 4 pmulduqs, compared to just 2 now (i.e. we eliminate an extra vector multiply after the patch).
Thanks,
Zia.
Before:
movdqa .LCPI1_0, %xmm1 # xmm1 = [11,11,11,11] paddd %xmm0, %xmm1 movdqa .LCPI1_1, %xmm2 # xmm2 = [22,22,22,22] pshufd $245, %xmm1, %xmm3 # xmm3 = xmm1[1,1,3,3] movdqa %xmm1, x pmuludq %xmm2, %xmm1 pshufd $232, %xmm1, %xmm1 # xmm1 = xmm1[0,2,2,3] pmuludq %xmm2, %xmm3 pshufd $232, %xmm3, %xmm3 # xmm3 = xmm3[0,2,2,3] punpckldq %xmm3, %xmm1 # xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1] movdqa %xmm1, v2 pshufd $245, %xmm0, %xmm1 # xmm1 = xmm0[1,1,3,3] pmuludq %xmm2, %xmm0 pshufd $232, %xmm0, %xmm0 # xmm0 = xmm0[0,2,2,3] pmuludq %xmm2, %xmm1 pshufd $232, %xmm1, %xmm1 # xmm1 = xmm1[0,2,2,3] punpckldq %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] paddd .LCPI1_2, %xmm0 movdqa %xmm0, v3 retl
After:
movdqa .LCPI1_0, %xmm1 # xmm1 = [11,11,11,11] paddd %xmm0, %xmm1 movdqa .LCPI1_1, %xmm2 # xmm2 = [22,22,22,22] pshufd $245, %xmm0, %xmm3 # xmm3 = xmm0[1,1,3,3] pmuludq %xmm2, %xmm0 pshufd $232, %xmm0, %xmm0 # xmm0 = xmm0[0,2,2,3] pmuludq %xmm2, %xmm3 pshufd $232, %xmm3, %xmm2 # xmm2 = xmm3[0,2,2,3] punpckldq %xmm2, %xmm0 # xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1] movdqa .LCPI1_2, %xmm2 # xmm2 = [242,242,242,242] paddd %xmm0, %xmm2 movdqa %xmm2, v2 paddd .LCPI1_3, %xmm0 movdqa %xmm0, v3 movdqa %xmm1, x retl
Thanks, Zia. Sadly, the scalar case won't fire on x86-64 even after r250560 because of the sexts, but that can be a follow-on patch.
Just a couple of nitpicks, otherwise LGTM.
I think Simon was concerned with the non-splat vector case though, so I'll let him give this another look if that scenario needs a different test case.
| test/CodeGen/X86/combine-multiplies.ll | ||
|---|---|---|
| 92 ↗ | (On Diff #37820) | multiple -> multiply | 
| 119 ↗ | (On Diff #37820) | Would a "-mattr=sse2" on the RUN line constrain this enough? I prefer to specify necessary attributes rather than CPU models as proxies for those attributes. | 
Thanks, Sanjay.
I've incorporated your comments.
Also, apologies to Simon as I seem to have missed the point about "non-splatted" cases. I made the suggested change to use the more general vector constant check (these were never being caught before due to a conservative constant check which I had to change), and I also added another test case to check for the non-splatted cases which verifies the desired elimination of the extra two multiplies.
Thanks, again, and I appreciate your patience with me as I'm slowly learning llvm code and process.
Zia.
The vectors test look fine to me - although please can you give them more descriptive names than 'foo'?