This patch tries to transform following patterns:
// Pattern 1: // A = FSUB X, Y (Leaf) // C = FMA A31, M31, A (Root) // M31 is const --> // A = FMA A31, Y, -M31 // C = FMA A, X, M31 // // Pattern 2: // A = FSUB X, Y (Leaf) // C = FMA A31, A, M32 (Root) // M32 is const --> // A = FMA A31, Y, -M32 // C = FMA A, X, M32 //
On PowerPC target, fma instructions are destructive, its def is always assigned with the same physical register with one of its operands. We could use this characteristic to generate more fma instructions to generate friendly code for register allocation.
For example, for the below case:
T = A * B + Const1 * (C - D) + Const2 * (E - F)
Without this patch, llvm generates:
t0 = mul A, B t1 = sub C, D t2 = sub E, F .... t3 = FMA t0, Const1, t1 T = FMA t3, Const2, t2
t0 & t1 & t2 must be assigned with different registers.
With this patch, we get
t0 = mul A, B t1 = FMA t0, Const1, C t2 = FMA t1, -Const1, D t3 = FMA t2, Const2, E T = FMA t3, -Const2, F
Now, t0 & t1 & t2 & t3 & T must be assigned with same physical register.
We only do this transformation when the register is high as the transformation will reduce ILP.
We saw some obvious improvement for some cpu2017 benchmarks.