AArch64 is clearly a regression, will need a separate fix
AMDGPU change looks bad either way - bfe is not used
X86 changes look neutral-positive, except vector cases, those are clearly regressions
Looked at changes:
- I'll leave x86 vector stuff for later. since i actually wanted to look into reverse trasnform, and looked into this only for consistency.
- I don't know what to do with AArch64 regression. I can hide it with shouldFoldConstantShiftPairToMask(), but it is there regardless (tests added). Thoughts?
- That leaves AMDGPU?
After actually looking at -debug output, this regression happens because of SimplifyDemandedBits(),
t18: i32 = srl t15, Constant:i64<2> t15: i32 = or t26, t24 t26: i32 = and t7, Constant:i32<1073741816> t7: i32,ch = load<(load 4 from %ir.y, align 8)> t0, t2, undef:i64 t2: i64,ch = CopyFromReg t0, Register:i64 %0 t1: i64 = Register %0 t6: i64 = undef t25: i32 = Constant<1073741816> t24: i32 = and t12, Constant:i32<4> t12: i32 = srl t4, Constant:i64<16> t4: i32,ch = CopyFromReg t0, Register:i32 %1 t3: i32 = Register %1 t11: i64 = Constant<16> t23: i32 = Constant<4> t17: i64 = Constant<2>
I'm not sure how to turn this pattern on it's head to produce ubfx again.
@arsenm will AMDGPU prefer 2 shifts or shift+mask here?
Looks like AMDGPU changes are neutral too.
And now that i think about it, the AArch64 regression should be solvable (hidable) by an inverse transform.
Should i look into that before or after this patch?