This is a more general solution to the SimplifyCFG transform in D30910 as suggested by Eli. It's not quite an "elimination", so we may be stretching the definition of "CSE". I've tried to make the intent clear in the code comments.
- I'm re-using the existing "DivOpInfo" struct as the hash table key. So the BypassSlowDivision changes are just to move that to a header to make it visible to EarlyCSE. We could change the name (DivRemKey?), but I didn't want to pollute this patch with the cosmetic diffs. Let me know if I should make either of those changes as a pre-commit.
- There's a TTI-cost-based bailout for targets where hoisting a remainder could be more expensive than we'd like. That was another suggestion from D30910. The last test triggers that check because an i128 mul op is not legal on x86-64, so that mul has cost = 2. We can probably make a more flexible cost calculation, but this should conservatively prevent the transform if a target doesn't have an appropriate mul instruction.
- The first 7 tests (4 positive and 3 negative) are copied from D30910. I think this solution covers anything we could do in SimplifyCFG, so I'll remove those tests from SimplifyCFG and abandon that patch if this is approved. We get the same improvements to the final x86 and AArch64 asm with this patch.