This is an archive of the discontinued LLVM Phabricator instance.

PowerPC: Don't hoist float multiply + add to fused operation on SPE
ClosedPublic

Authored by jhibbits on Apr 6 2020, 8:12 AM.

Details

Reviewers
shchenz
nemanjai
Group Reviewers
Restricted Project
Summary

SPE doesn't have a fmadd instruction, so don't bother hoisting a
multiply and add sequence to this, as it'd become just a library call.
Hoisting happens too late for the CTR usability test to veto using the CTR
in a loop, and results in an assert "Invalid PPC CTR loop!".

Diff Detail

Event Timeline

jhibbits created this revision.Apr 6 2020, 8:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2020, 8:12 AM

This happens too late for the CTR usability test to veto using the CTR in a loop, and results in an assert "Invalid PPC CTR loop!".

This means after not hoisting fmul makes a case have "Invalid PPC CTR loop!" assertion? This is a little surprised for me. Hoist or not hoist fmul should not impact CTR register.
Looking forward to your case.

This happens too late for the CTR usability test to veto using the CTR in a loop, and results in an assert "Invalid PPC CTR loop!".

This means after not hoisting fmul makes a case have "Invalid PPC CTR loop!" assertion? This is a little surprised for me. Hoist or not hoist fmul should not impact CTR register.
Looking forward to your case.

Sorry, on a further reading of the summary I can see it can be a little confusing. *hoisting* fmul + fadd to fma results in a library call, because SPE doesn't have a fma instruction. Unfortunately, this transform is performed long after a loop is transformed to a CTR with bdnz loop, so it can't be caught at the loop transform time, and block the loop. In addition to triggering that assert, hoisting two instructions into a function call is quite a pessimization anyway :)

I'll update the summary to clarify this.

jhibbits edited the summary of this revision. (Show Details)Apr 7 2020, 7:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2022, 12:20 PM

I think this is the right thing to do regardless of whether it affects CTR loops or not. The FMA is never faster with SPE and that should be marked as such. But yes, we still need a test case.

jhibbits updated this revision to Diff 442752.Jul 6 2022, 7:07 PM
jhibbits edited the summary of this revision. (Show Details)
jhibbits added a reviewer: nemanjai.

Add test case into fma-assoc test. This appears to be the simplest test to validate the change.

@nemanjai Any further comment on this, or is it good to go?

nemanjai accepted this revision.Aug 9 2022, 8:01 AM

LGTM.

This revision is now accepted and ready to land.Aug 9 2022, 8:01 AM