This is an archive of the discontinued LLVM Phabricator instance.

[X86CmovConversion] Make heuristic for optimized cmov depth more conservative (PR44539)
ClosedPublic

Authored by nikic on Feb 6 2020, 12:24 PM.

Details

Summary

Fix/workaround for https://bugs.llvm.org/show_bug.cgi?id=44539. As discussed there, this pass makes some overly optimistic assumptions, as it does not have access to actual branch weights.

This patch makes the computation of the depth of the optimized cmov more conservative, by assuming a distribution of 75/25 rather than 50/50 and placing the weights to get the more conservative result (larger depth). The fully conservative choice would be std::max(TrueOpDepth, FalseOpDepth), but that would break at least one existing test (which may or may not be an issue in practice).

Diff Detail

Event Timeline

nikic created this revision.Feb 6 2020, 12:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2020, 12:24 PM
nikic edited the summary of this revision. (Show Details)Feb 6 2020, 12:24 PM
lebedev.ri added a subscriber: lebedev.ri.

Could you please pull in

  • llvm/test/Transforms/SimplifyCFG/signbit-like-value-extension.ll,
  • llvm/test/Transforms/SimplifyCFG/safe-abs.ll
  • llvm/test/Transforms/SimplifyCFG/safe-low-bit-extract.ll

as codegen tests to see that we still produce cmov in those?

andreadb accepted this revision.Feb 6 2020, 12:40 PM

LGTM (modulo Roman comment). It fixes the regression in libgav1. So I am happy.

Thanks Nikita!

This revision is now accepted and ready to land.Feb 6 2020, 12:40 PM

Forgot to say that this should also be backported to the release branch.

nikic added a comment.Feb 6 2020, 12:57 PM

@lebedev.ri The cmov conversion pass only operates on innermost loops, so it doesn't apply to any of those tests, at least in their current form. There needs to be a loop and a loop carried dependency on the cmov to exercise this pass.

Okay. Then this needs tests.
There's a lot of bugs referenced in https://bugs.llvm.org/show_bug.cgi?id=44539,
at least one of those should be affected, right?

Okay. Then this needs tests.
There's a lot of bugs referenced in https://bugs.llvm.org/show_bug.cgi?id=44539,
at least one of those should be affected, right?

The problem is that it is unclear how to write a good reliable test for this issue.

That is because there is no way currently to tell if a branch obtained from the conversion of a CMOV is going to be predictable or not. The cmov conversion only models whether a conversion is profitable under the assumption of a reasonably well-predicted branch. There will always be cases where the lack of knowledge about branch probabilities will negatively affect the work done by this pass.

Nikita's patch is at least trying to make the cmov cost heuristic a bit more conservative, and I see it as a good short-term workaround. We know that this is not meant to be a proper fix in practice. But it is enough to fix all the known regressions in libgav1 (and it doesn't regress the rates of any SPEC CPU 2017 benchmark).

Even with this patch, cmov conversion is still performed in libgav1 ( function @_ZN7libgav13dsp12_GLOBAL__N_112CdefFilter_CILi8EhEEvPKvliiiiiiiiiiPvl) on three CMOVs. It just so happen (by luck) that those three expanded CMOVs are not the problematic (i.e. unpredictable) ones.

Before this patch:

===-------------------------------------------------------------------------===
                          ... Statistics Collected ...
===-------------------------------------------------------------------------===

48 x86-cmov-conversion - Number of CMOV-group candidates
 1 x86-cmov-conversion - Number of CMOV-conversion profitable loops
11 x86-cmov-conversion - Number of optimized CMOV-groups
 2 x86-cmov-conversion - Number of unsupported CMOV-groups

After this patch:

===-------------------------------------------------------------------------===
                          ... Statistics Collected ...
===-------------------------------------------------------------------------===

48 x86-cmov-conversion - Number of CMOV-group candidates
 1 x86-cmov-conversion - Number of CMOV-conversion profitable loops
 3 x86-cmov-conversion - Number of optimized CMOV-groups
 2 x86-cmov-conversion - Number of unsupported CMOV-groups

We could write a test that checks for example the statistics related to the number of expanded CMOVs for a function. However, that test would be extremely fragile, and therefore not very good. So I personally don't particularly mind if this change doesn't come with a test case.

That being said, I don't have a strong opinion on this. If the absence of a test case is too controversial, then there is always the less controversial option of fully disabling this pass for now.

That being said, I don't have a strong opinion on this. If the absence of a test case is too controversial, then there is always the less controversial option of fully disabling this pass for now.

No strong opinion here, just asking.

Ideally we should wait a bit for @chandlerc's opinion.

there is always the less controversial option of fully disabling this pass for now

I think this would be more controversial in RC1/RC2 stage.

reverse ping - @nikic can we get this committed to trunk please?

nikic added a comment.Feb 11 2020, 8:03 AM

@RKSimon Sorry, I thought this was waiting on more feedback (though not sure about what). I'll apply it asap.

This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Feb 19 2020, 5:06 AM

@RKSimon Sorry, I thought this was waiting on more feedback (though not sure about what). I'll apply it asap.

Since this seems to have been in a week without any issues, pushed to 10.x as 222de784df453659144c6d07b530e7858c980021