This patch is ready to fix issue like:
t50: i64 = add t44:1, Constant:i64<3> t16: i64,ch = load<(load 8 from %ir.uglygep910.cast, !tbaa !3)> t0, t50, undef:i64
Currently we generate:
t50: i64 = ADDI8 t55:1, TargetConstant:i64<3> t16: i64,ch = LD<Mem:(load 8 from %ir.uglygep910.cast, !tbaa !3)> TargetConstant:i64<0>, t50, t0
But this is not the best one, we should use x-form load here, like:
t46: i64 = LI8 TargetConstant:i64<3> t18: i64,ch = LDX<Mem:(load 8 from %ir.uglygep910.cast, !tbaa !3)> t55:1, t46, t0
Because if above load is inside a loop, LI8 can be moved outside of loop in later LICM based on register pressure.
gain 10% for small case, %3 for 526.blender_r of SPEC 2017 on PWR8.