Previously we would only check for another commutable operand if the first commute was an aggressive commute.
But if we have two kill operands and neither is tied to the def at the start, we should consider both operands as the one to use as the new def.
This improves the loop in the fma-commute-loop.ll test. This test is derived from a post from discourse here https://llvm.discourse.group/t/unnecessary-vmovapd-instructions-generated-can-you-hint-in-favor-of-vfmadd231pd/582
It does degrade some of the fastmath tests, but that's probably just due to the known problems with our decision making with physical register constraints from above and below on small code with multiple two address instructions.
Any chance that we can avoid this regression (and the other ones above)?