For P10, we have dq form pair load/store.Making PPCLoopInstrFormPrep pass can prepare more dq form chains.
This is important if there are many IV Users in a loop. Because currently many search space narrow heuristics functions are targeted for register number, but on powerpc we should narrow search space targeted for instruction number.
I tried to narrow down the test case, but seems, the issue only happens when there is a certain number of IV users which causes LSR can not find the best formulae sets. We can produce the issue when there are 5 chains, I used 7 chains to track the code generation for some internal testings. Please focus on the code change in loop .LBB0_4: # %_loop_2_do_
With this patch, we get more dform pair load/store for P10 in some internal testings and get slight gains (about 1%) for some cpu2017 benchmarks on P9.
Comments update ?