Page MenuHomePhabricator

[AArch64] Change order of candidate FMLS patterns
ClosedPublic

Authored by mssimpso on Dec 26 2017, 12:15 PM.

Details

Summary

A few floating-point SPEC benchmarks are regressed on Falkor after rL319980. That patch added new patterns to the machine combiner for transforming (fsub (fmul x y) z) into (fmla (fneg z) x y). That is, fsub's where the first source operand is an fmul are transformed. We previously only matched the case where the second source operand of an fsub was an fmul, transforming (fsub z (fmul x y)) into (fmls z x y). Now, if we have an fsub where both source operands are fmuls, both of the above patterns are applicable.

However, the order in which we add the patterns to the list of candidates determines the transformation that takes place, since only the first pattern that matches will be used. This patch changes the order these two patterns are added to the list of candidates such that we prefer the case where the second source operand is an fmul (the fmls case), rather than the other one (the fmla/fneg case). When both source operands are fmuls, this ordering results in fewer instructions. Please refer to the added test cases for examples.

Event Timeline

mssimpso created this revision.Dec 26 2017, 12:15 PM
fhahn accepted this revision.Dec 26 2017, 12:34 PM

Thanks for spotting this!
The reordering looks good to me, give the current behavior of the MachineCombiner it makes sense to move up the more profitable patterns.

However, the order in which we add the patterns to the list of candidates determines the transformation that takes place, since only the first pattern that matches will be used.

This behavior is not ideal though, especially as some order of patterns might be good for one micro architecture, whereas a different order is better for a different micro architecture.

I'll try to look into this. Maybe it's worth to change the MachineCombiner to try all available patterns.

This revision is now accepted and ready to land.Dec 26 2017, 12:34 PM

Thanks for spotting this!
The reordering looks good to me, give the current behavior of the MachineCombiner it makes sense to move up the more profitable patterns.

However, the order in which we add the patterns to the list of candidates determines the transformation that takes place, since only the first pattern that matches will be used.

This behavior is not ideal though, especially as some order of patterns might be good for one micro architecture, whereas a different order is better for a different micro architecture.

I'll try to look into this. Maybe it's worth to change the MachineCombiner to try all available patterns.

Thanks Florian! Yes, I agree with your comment. There's actually a note about the limitations of the current behavior in MachineCombiner::combineInstructions.

This revision was automatically updated to reflect the committed changes.