Adds more divrem folds to try and get in sync with InstructionSimplify
This hits a number of reduced tests so I've had to tweak them; through a mixture of visual scrutiny and debugging they seem to be hitting the same points as the original test failed at.
Tried again against bleeding edge trunk and I still see this change and without the volatile (which prevents the loads being combined) the checks reduce to:
define void @pr32372(i8*) { ; CHECK-LABEL: pr32372: ; CHECK: # %bb.0: # %BB ; CHECK-NEXT: mvhhi 0(%r1), -3825 ; CHECK-NEXT: .LBB0_1: # %CF251 ; CHECK-NEXT: # =>This Inner Loop Header: Depth=1 ; CHECK-NEXT: j .LBB0_1