This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Target/X86/
-
lib/
-
Target/
-
X86/
-
X86CmovConversion.cpp

Differential D74155

[X86CmovConversion] Make heuristic for optimized cmov depth more conservative (PR44539)
ClosedPublic

Authored by nikic on Feb 6 2020, 12:24 PM.

Download Raw Diff

Details

Reviewers

andreadb
RKSimon
spatel
craig.topper
lebedev.ri
chandlerc

Commits

rG222de784df45: [X86CmovConversion] Make heuristic for optimized cmov depth more conservative…
rG5eb19bf4a2b0: [X86CmovConversion] Make heuristic for optimized cmov depth more conservative…

Summary

Fix/workaround for https://bugs.llvm.org/show_bug.cgi?id=44539. As discussed there, this pass makes some overly optimistic assumptions, as it does not have access to actual branch weights.

This patch makes the computation of the depth of the optimized cmov more conservative, by assuming a distribution of 75/25 rather than 50/50 and placing the weights to get the more conservative result (larger depth). The fully conservative choice would be std::max(TrueOpDepth, FalseOpDepth), but that would break at least one existing test (which may or may not be an issue in practice).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nikic created this revision.Feb 6 2020, 12:24 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2020, 12:24 PM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

nikic edited the summary of this revision. (Show Details)Feb 6 2020, 12:24 PM

Could you please pull in

llvm/test/Transforms/SimplifyCFG/signbit-like-value-extension.ll,
llvm/test/Transforms/SimplifyCFG/safe-abs.ll
llvm/test/Transforms/SimplifyCFG/safe-low-bit-extract.ll

as codegen tests to see that we still produce cmov in those?

LGTM (modulo Roman comment). It fixes the regression in libgav1. So I am happy.

Thanks Nikita!

This revision is now accepted and ready to land.Feb 6 2020, 12:40 PM

Forgot to say that this should also be backported to the release branch.

@lebedev.ri The cmov conversion pass only operates on innermost loops, so it doesn't apply to any of those tests, at least in their current form. There needs to be a loop and a loop carried dependency on the cmov to exercise this pass.

Okay. Then this needs tests.
There's a lot of bugs referenced in https://bugs.llvm.org/show_bug.cgi?id=44539,
at least one of those should be affected, right?

In D74155#1862443, @lebedev.ri wrote:

Okay. Then this needs tests.
There's a lot of bugs referenced in https://bugs.llvm.org/show_bug.cgi?id=44539,
at least one of those should be affected, right?

The problem is that it is unclear how to write a good reliable test for this issue.

That is because there is no way currently to tell if a branch obtained from the conversion of a CMOV is going to be predictable or not. The cmov conversion only models whether a conversion is profitable under the assumption of a reasonably well-predicted branch. There will always be cases where the lack of knowledge about branch probabilities will negatively affect the work done by this pass.

Nikita's patch is at least trying to make the cmov cost heuristic a bit more conservative, and I see it as a good short-term workaround. We know that this is not meant to be a proper fix in practice. But it is enough to fix all the known regressions in libgav1 (and it doesn't regress the rates of any SPEC CPU 2017 benchmark).

Even with this patch, cmov conversion is still performed in libgav1 ( function @_ZN7libgav13dsp12_GLOBAL__N_112CdefFilter_CILi8EhEEvPKvliiiiiiiiiiPvl) on three CMOVs. It just so happen (by luck) that those three expanded CMOVs are not the problematic (i.e. unpredictable) ones.

Before this patch:

===-------------------------------------------------------------------------===
                          ... Statistics Collected ...
===-------------------------------------------------------------------------===

48 x86-cmov-conversion - Number of CMOV-group candidates
 1 x86-cmov-conversion - Number of CMOV-conversion profitable loops
11 x86-cmov-conversion - Number of optimized CMOV-groups
 2 x86-cmov-conversion - Number of unsupported CMOV-groups

After this patch:

===-------------------------------------------------------------------------===
                          ... Statistics Collected ...
===-------------------------------------------------------------------------===

48 x86-cmov-conversion - Number of CMOV-group candidates
 1 x86-cmov-conversion - Number of CMOV-conversion profitable loops
 3 x86-cmov-conversion - Number of optimized CMOV-groups
 2 x86-cmov-conversion - Number of unsupported CMOV-groups

We could write a test that checks for example the statistics related to the number of expanded CMOVs for a function. However, that test would be extremely fragile, and therefore not very good. So I personally don't particularly mind if this change doesn't come with a test case.

That being said, I don't have a strong opinion on this. If the absence of a test case is too controversial, then there is always the less controversial option of fully disabling this pass for now.

In D74155#1863700, @andreadb wrote:

In D74155#1862443, @lebedev.ri wrote:

That being said, I don't have a strong opinion on this. If the absence of a test case is too controversial, then there is always the less controversial option of fully disabling this pass for now.

No strong opinion here, just asking.

xbolva00 added a reviewer: chandlerc.Feb 7 2020, 4:50 AM

Ideally we should wait a bit for @chandlerc's opinion.

there is always the less controversial option of fully disabling this pass for now

I think this would be more controversial in RC1/RC2 stage.

reverse ping - @nikic can we get this committed to trunk please?

@RKSimon Sorry, I thought this was waiting on more feedback (though not sure about what). I'll apply it asap.

Closed by commit rG5eb19bf4a2b0: [X86CmovConversion] Make heuristic for optimized cmov depth more conservative… (authored by nikic). · Explain WhyFeb 11 2020, 8:36 AM

This revision was automatically updated to reflect the committed changes.

In D74155#1869787, @nikic wrote:

@RKSimon Sorry, I thought this was waiting on more feedback (though not sure about what). I'll apply it asap.

Since this seems to have been in a week without any issues, pushed to 10.x as 222de784df453659144c6d07b530e7858c980021

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86CmovConversion.cpp

13 lines

Diff 243889

llvm/lib/Target/X86/X86CmovConversion.cpp

Show First 20 Lines • Show All 358 Lines • ▼ Show 20 Lines	bool X86CmovConverterPass::collectCmovCandidates(
NumOfCmovGroupCandidate += CmovInstGroups.size();		NumOfCmovGroupCandidate += CmovInstGroups.size();
return !CmovInstGroups.empty();		return !CmovInstGroups.empty();
}		}

/// \returns Depth of CMOV instruction as if it was converted into branch.		/// \returns Depth of CMOV instruction as if it was converted into branch.
/// \param TrueOpDepth depth cost of CMOV true value operand.		/// \param TrueOpDepth depth cost of CMOV true value operand.
/// \param FalseOpDepth depth cost of CMOV false value operand.		/// \param FalseOpDepth depth cost of CMOV false value operand.
static unsigned getDepthOfOptCmov(unsigned TrueOpDepth, unsigned FalseOpDepth) {		static unsigned getDepthOfOptCmov(unsigned TrueOpDepth, unsigned FalseOpDepth) {
//===--------------------------------------------------------------------===//		// The depth of the result after branch conversion is
// With no info about branch weight, we assume 50% for each value operand.		// TrueOpDepth * TrueOpProbability + FalseOpDepth * FalseOpProbability.
// Thus, depth of optimized CMOV instruction is the rounded up average of		// As we have no info about branch weight, we assume 75% for one and 25% for
// its True-Operand-Value-Depth and False-Operand-Value-Depth.		// the other, and pick the result with the largest resulting depth.
//===--------------------------------------------------------------------===//		return std::max(
return (TrueOpDepth + FalseOpDepth + 1) / 2;		divideCeil(TrueOpDepth * 3 + FalseOpDepth, 4),
		divideCeil(FalseOpDepth * 3 + TrueOpDepth, 4));
}		}

bool X86CmovConverterPass::checkForProfitableCmovCandidates(		bool X86CmovConverterPass::checkForProfitableCmovCandidates(
ArrayRef<MachineBasicBlock *> Blocks, CmovGroups &CmovInstGroups) {		ArrayRef<MachineBasicBlock *> Blocks, CmovGroups &CmovInstGroups) {
struct DepthInfo {		struct DepthInfo {
/// Depth of original loop.		/// Depth of original loop.
unsigned Depth;		unsigned Depth;
/// Depth of optimized loop.		/// Depth of optimized loop.
▲ Show 20 Lines • Show All 480 Lines • Show Last 20 Lines