- User Since
- Nov 15 2018, 2:10 PM (97 w, 5 d)
May 29 2020
Given the constraints in SDAG, we should choose the (fma(fma)) variant by default (assuming as we do here that the target has fma instructions). For example on x86, our best perf heuristic at this stage of compilation on any recent Intel or AMD core is number of uops. The option with separate fmul and fadd always has more uops, so it would be backwards to choose that sequence here and then try to undo that later.
Not sure if I'm understanding the question. Is there a target or a code pattern with a known disadvantage for the 2 fma variant?
Wouldn't it be better to choose between what you have here fmadd(a,b,fma(c,d,n)) and a*b + fmadd(c,d,n) for targets that perform worse with FMA chains?