SPE doesn't have a fmadd instruction, so don't bother hoisting a
multiply and add sequence to this, as it'd become just a library call.
Hoisting happens too late for the CTR usability test to veto using the CTR
in a loop, and results in an assert "Invalid PPC CTR loop!".
Details
Diff Detail
Event Timeline
This happens too late for the CTR usability test to veto using the CTR in a loop, and results in an assert "Invalid PPC CTR loop!".
This means after not hoisting fmul makes a case have "Invalid PPC CTR loop!" assertion? This is a little surprised for me. Hoist or not hoist fmul should not impact CTR register.
Looking forward to your case.
Sorry, on a further reading of the summary I can see it can be a little confusing. *hoisting* fmul + fadd to fma results in a library call, because SPE doesn't have a fma instruction. Unfortunately, this transform is performed long after a loop is transformed to a CTR with bdnz loop, so it can't be caught at the loop transform time, and block the loop. In addition to triggering that assert, hoisting two instructions into a function call is quite a pessimization anyway :)
I'll update the summary to clarify this.
I think this is the right thing to do regardless of whether it affects CTR loops or not. The FMA is never faster with SPE and that should be marked as such. But yes, we still need a test case.
Add test case into fma-assoc test. This appears to be the simplest test to validate the change.