When we fold (mul (add x, c1), c1) -> (add (mul x, c2), c1*c2), we bail if (add x, c1) has multiple users, leaving and extra add instruction. In such cases, this patch adds a check to see if we can eliminate a multiply instruction in exchange for the extra add. combine-multiplies.ll illustrates this issue in a little more detail.
|2152–2154 ↗||(On Diff #37382)|
Please move this into a helper function or two. That should reduce the indentation, the parameters can take on the new variable names, and the high-level comments can apply to a whole function.
|1 ↗||(On Diff #37382)|
I didn't step through the debug output, but I see that this optimization doesn't fire for x86-64. Did you look into that case?
|24 ↗||(On Diff #37382)|
I would prefer to see more of the correct output here rather than a CHECK-NOT. Eg:
CHECK: imull $400, %ecx, %edx # imm = 0x190
You can commit this test case and check for the current codegen before proceeding with this patch; that will make the functional difference from the new combine clearer.
|42 ↗||(On Diff #37382)|
Remove unnecessary attributes.
|44 ↗||(On Diff #37382)|
Looks like the test case text was duplicated when you created the diff.
Should there be a vector version of the test case?
Thanks Elena and Sanjay for the review and comments.
- I liked the refactoring comment, and I made those changes. Looks much nicer now. Thanks.
- The code doesn't kick in for 64bit stuff, due to intermediate extends (coincidentally, given your recent patch :) ). That's why I didn't include 64bit in the lit test. I might be worth looking into this later to see if anything can be done.
- I beefed up the lit test, as you suggested, but I generalized it a little, since I'm not a fan of relying too much on register allocation and symbol table layout.
- I have no idea what happened to the lit test to duplicate the code. Sorry about that.
|2221 ↗||(On Diff #37511)|
Will this replace both isConstantSplatVector and isa<Constan...> ? Or just the latter? I had a hard time trying to figure out the differences between the former, and isConstOrConstSplay to see if they can be both replaced.
|2221 ↗||(On Diff #37511)|
You should be able to replace both with isConstOrConstSplat calls. In fact I think you could use isConstantIntBuildVectorOrConstantInt instead and match all constants not just those with a splatted value. This would definitely need a vector test case though.
Thanks, Simon... Let me work on that, and hopefully that will also answer Sanjay's last question regarding needing a vector version in the lit test, to which I didn't comment on earlier.
Thanks for the updates, Zia. Yes, we should have a vector test since this code is designed to handle vectors. Alternatively, you could bail out on vectors in this patch, add a TODO comment for that case, and make vector handling a small follow-on patch.
And yes, I'd certainly like to make this work for 64-bit too and see if D13757 will help in that case.
New patch includes changes to use "isConstOrConstSplat" to consolidate the 2 constant checks, as suggested.
Also, as suggested, I added a test to make sure we catch the vector multiply by constant case. I verified that before this patch we generate 4 pmulduqs, compared to just 2 now (i.e. we eliminate an extra vector multiply after the patch).
movdqa .LCPI1_0, %xmm1 # xmm1 = [11,11,11,11] paddd %xmm0, %xmm1 movdqa .LCPI1_1, %xmm2 # xmm2 = [22,22,22,22] pshufd $245, %xmm1, %xmm3 # xmm3 = xmm1[1,1,3,3] movdqa %xmm1, x pmuludq %xmm2, %xmm1 pshufd $232, %xmm1, %xmm1 # xmm1 = xmm1[0,2,2,3] pmuludq %xmm2, %xmm3 pshufd $232, %xmm3, %xmm3 # xmm3 = xmm3[0,2,2,3] punpckldq %xmm3, %xmm1 # xmm1 = xmm1,xmm3,xmm1,xmm3 movdqa %xmm1, v2 pshufd $245, %xmm0, %xmm1 # xmm1 = xmm0[1,1,3,3] pmuludq %xmm2, %xmm0 pshufd $232, %xmm0, %xmm0 # xmm0 = xmm0[0,2,2,3] pmuludq %xmm2, %xmm1 pshufd $232, %xmm1, %xmm1 # xmm1 = xmm1[0,2,2,3] punpckldq %xmm1, %xmm0 # xmm0 = xmm0,xmm1,xmm0,xmm1 paddd .LCPI1_2, %xmm0 movdqa %xmm0, v3 retl
movdqa .LCPI1_0, %xmm1 # xmm1 = [11,11,11,11] paddd %xmm0, %xmm1 movdqa .LCPI1_1, %xmm2 # xmm2 = [22,22,22,22] pshufd $245, %xmm0, %xmm3 # xmm3 = xmm0[1,1,3,3] pmuludq %xmm2, %xmm0 pshufd $232, %xmm0, %xmm0 # xmm0 = xmm0[0,2,2,3] pmuludq %xmm2, %xmm3 pshufd $232, %xmm3, %xmm2 # xmm2 = xmm3[0,2,2,3] punpckldq %xmm2, %xmm0 # xmm0 = xmm0,xmm2,xmm0,xmm2 movdqa .LCPI1_2, %xmm2 # xmm2 = [242,242,242,242] paddd %xmm0, %xmm2 movdqa %xmm2, v2 paddd .LCPI1_3, %xmm0 movdqa %xmm0, v3 movdqa %xmm1, x retl
Thanks, Zia. Sadly, the scalar case won't fire on x86-64 even after r250560 because of the sexts, but that can be a follow-on patch.
Just a couple of nitpicks, otherwise LGTM.
I think Simon was concerned with the non-splat vector case though, so I'll let him give this another look if that scenario needs a different test case.
|92 ↗||(On Diff #37820)|
multiple -> multiply
|119 ↗||(On Diff #37820)|
Would a "-mattr=sse2" on the RUN line constrain this enough? I prefer to specify necessary attributes rather than CPU models as proxies for those attributes.
I've incorporated your comments.
Also, apologies to Simon as I seem to have missed the point about "non-splatted" cases. I made the suggested change to use the more general vector constant check (these were never being caught before due to a conservative constant check which I had to change), and I also added another test case to check for the non-splatted cases which verifies the desired elimination of the extra two multiplies.
Thanks, again, and I appreciate your patience with me as I'm slowly learning llvm code and process.