@RKSimon Sorry, I should have mentioned in the summary that I don't have commit rights.
Jan 19 2019
Jan 6 2019
Rebased on top of D56372.
Dec 6 2018
@RKSimon I am sorry for the long radio silence. It is entirely my fault.
Sep 25 2018
Made a wrapper as dicussed.
@lebedev.ri Sorry, couldn't find any instances of 4 x i16 in the nonsplat tests; I think I only had those in the splat ones. The patch no longer applied cleanly, so I rebased it again.
Sep 21 2018
@RKSimon should I make any other changes to this?
Sep 13 2018
Re-added previously commited tests.
Sep 12 2018
Sep 11 2018
Thank you for the review, @RKSimon. Made changes as directed.
Sep 10 2018
Apologies again for the delay.
- Made cosmetic changes as directed by @lebedev.ri and added vector tests.
- Disabled the fold for vector divisors with even values (see inline comment and test_urem_even_vec_i16).
Aug 31 2018
Updated AArch64 tests.
Aug 30 2018
Comments addressed. The minsize condition needs some tweaking, it seems: the code with it works out to actually be longer on X86. Perhaps there should really be something like isIntDivShort.
After a rebase and bisect, it turned out that the current form does rely on D50222. The extra mpy nodes come from this combine on the srem, which would not be reached with the proposed SREM optimization:
Aug 29 2018
Thank you for the review, @lebedev.ri, addressed:
- Added isIntDivCheap as an additional condition preventing this optimization. If this isn't customizable enough, we could probably do something like if (isIntDivCheap || (minsize && isIntDivShort)) where isIntDivShort can be overriden per-target.
- I'm not an LLVM developer; as far as I understand, I can't commit anything
- Added tests for AArch64
- Removed the signed bits
Aug 20 2018
Thank you for the review, @lebedev.ri. Sorry for the delay; resolved most of the issues. Quick summary of the changes:
Aug 18 2018
Aug 16 2018
Aug 7 2018
As pointed out by @lebedev.ri, tests were inadequate. Added new tests and stepped through existing ones, adding descriptions. This uncovered two more bugs, which were also fixed here (an extraneous Q.lshrInPlace() and division by D instead of D0). As far as I can see, the coverage now seems reasonable; please point out any cases I missed.
Aug 5 2018
@majnemer Thanks; this was, in fact, incorrect. Now, to simplify the logic, the absolute value of D is taken and lshr is used.
Aug 3 2018
Made changes as requested.