If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element.
This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted.
Fixed PR43217
Should we be using the same pattern we use for forming ExtLoads and truncating other users?
// If the load value is used only by N, replace it via CombineTo N. bool NoReplaceTrunc = SDValue(LN0, 0).hasOneUse(); Combiner.CombineTo(N, ExtLoad); if (NoReplaceTrunc) { DAG.ReplaceAllUsesOfValueWith(SDValue(LN0, 1), ExtLoad.getValue(1)); Combiner.recursivelyDeleteUnusedNodes(LN0); } else { SDValue Trunc = DAG.getNode(ISD::TRUNCATE, SDLoc(N0), N0.getValueType(), ExtLoad); Combiner.CombineTo(LN0, Trunc, ExtLoad.getValue(1)); } return SDValue(N, 0); // Return N so it doesn't get rechecked!