This is an archive of the discontinued LLVM Phabricator instance.

add unpredictable metadata type for control flow
ClosedPublic

Authored by spatel on Aug 25 2015, 2:54 PM.

Details

Summary

This patch defines 'unpredictable' metadata. This metadata can be used to signal to the optimizer or backend that a branch or switch is unpredictable, and therefore, it's probably better to not split a compound predicate into multiple branches such as in CodeGenPrepare::splitBranchCondition(). This was discussed in:
https://llvm.org/bugs/show_bug.cgi?id=23827

See the dependent patches (to CodeGenPrepare and SelectionDAGBuilder) that use the metadata when deciding to split branch conditions. There will also be a patch to clang to make this available in C.

Diff Detail

Event Timeline

spatel updated this revision to Diff 33127.Aug 25 2015, 2:54 PM
spatel retitled this revision from to add llvm.unpredictable intrinsic and lower it to metadata.
spatel updated this object.
spatel added reviewers: hfinkel, chandlerc, kparzysz.
spatel added a subscriber: llvm-commits.
kparzysz edited edge metadata.Aug 27 2015, 8:11 AM

Minor nitpicking (not always directly related to the code you've changed). The change itself LGTM.

lib/Transforms/Scalar/LowerPredictionIntrinsic.cpp
71

See comment at 136.

82

The expected value doesn't have to be 1 in the non-optimized case. I'm not sure if optimized code will always have 1, but a check for non-zero would be more general.

114

Are we just hoping that the comparison will be against 0? :)

136

This part (136-140) is not really necessary here. The pass will remove all prediction intrinsics later on anyways, including replacing appropriate values. Handling that in one place would be more logical.

Thanks, Krzysztof!

I agree that the existing code in LowerExpectIntrinsic can be improved, but I didn't want to overlap more changes in this patch. How about if I mark them with 'FIXME' comments for now.

I'm also wondering if anyone knows why we have an intrinsic and special lowering for llvm.expect in the first place? Couldn't this be metadata from the start? I looked back at the original commit, but I don't see any discussion/example of the use-case for an intrinsic rather than metadata.

No problem, your changes look good as they are. The original commit was in 2011, so it's probably safe to assume that the reason was "it seemed like a good idea at the time" and move on. :)

hfinkel edited edge metadata.Aug 27 2015, 10:04 AM

Can you please explain why you're adding this intrinsic, instead of just adding branch-probability metadata from the frontend?

Can you please explain why you're adding this intrinsic, instead of just adding branch-probability metadata from the frontend?

Hi Hal -

See my earlier question. My reason for implementing it this way was simply because this is the way it's already done for llvm.expect...so I assume there's some case that's better served by having an intrinsic? If that's not true, then certainly this unpredictable hint could be a lot simpler and so could the expect hint?

Can you please explain why you're adding this intrinsic, instead of just adding branch-probability metadata from the frontend?

Hi Hal -

See my earlier question. My reason for implementing it this way was simply because this is the way it's already done for llvm.expect...so I assume there's some case that's better served by having an intrinsic? If that's not true, then certainly this unpredictable hint could be a lot simpler and so could the expect hint?

My understanding is that the way that llvm.expect was implemented is considered to be a mistake that we're now stuck with for backwards-compatibility reasons (and we could fix this and auto-upgrade, but no one has bothered to do this yet, in part, because until recently we still had too many problems with dropping the profiling metadata, and so there were bigger fish to fry).

My understanding is that the way that llvm.expect was implemented is considered to be a mistake that we're now stuck with for backwards-compatibility reasons (and we could fix this and auto-upgrade, but no one has bothered to do this yet, in part, because until recently we still had too many problems with dropping the profiling metadata, and so there were bigger fish to fry).

[Adding Bob Wilson (quoted below) and Jakub Staszak (author of the original llvm.expect patch)]

The only reference for the llvm.expect implementation I found was here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110704/123438.html
"One of the reasons we decided to implement builtin_expect this way is to allow some simple propagation of builtin_expect calls that are not directly inside conditionals. You haven't implemented that yet, but early CSE should help when the time comes."

If everyone agrees that metadata from the front-end is now sufficient, then I'll certainly abandon this patch for something much simpler. I still think we would want a distinct metadata type for 'unpredictable' rather than using 'prof' metadata because there are no special values of prof data that could be used to indicate unpredictability. Ie, just because both sides of a branch have the same weight, does not necessarily make them unpredictable.

kparzysz added a comment.EditedAug 27 2015, 11:45 AM

My understanding is that the way that llvm.expect was implemented is considered to be a mistake that we're now stuck with for backwards-compatibility reasons

One downside of using metadata is that it makes it harder to edit IR by hand. Debugging or aliasing information is pretty much static, but branch prediction information could be manipulated by hand for experiments. I don't know how important that is though.

My understanding is that the way that llvm.expect was implemented is considered to be a mistake that we're now stuck with for backwards-compatibility reasons

One downside of using metadata is that it makes it harder to edit IR by hand. Debugging or aliasing information is pretty much static, but branch prediction information could be manipulated by hand for experiments. I don't know how important that is though.

The problem with the intrinsic is that it is an optimization hint that needs to be removed/transformed before the code can be optimized. That's not very useful. Plus, it confuses information about a value with information about a branch based on that value.

To your point: The proposed metadata is pretty simple, but I agree that metadata can be a pain to manipulate by hand because you often need to renumber things. I'd not be opposed to some IR syntax enhancement that made it easier to write by hand. That's a separate matter, however.

I still think we would want a distinct metadata type for 'unpredictable' rather than using 'prof' metadata because there are no special values of prof data that could be used to indicate unpredictability. Ie, just because both sides of a branch have the same weight, does not necessarily make them unpredictable.

I agree that branch probabilities and unpredictability are separate concepts. It is easy to arrange for a 50/50% branch to be almost perfectly predictable (just have longs runs of each possibility). In the long run, we might want this not to be a Boolean setting (it does not seem far fetched to me to set metadata like this based on branch-prediction miss rates based on actual profiling data), but I'm okay with starting with this for now. That having been said, it might be worthwhile, conceptually, to simply mirror the existing profiling data so that we can share the relevant code. Meaning that we have metadata, that instead of counting the number of relative branches, counts the relative number of mispredicated branches (or predicted branches). Thoughts?

Meaning that we have metadata, that instead of counting the number of relative branches, counts the relative number of mispredicated branches (or predicted branches). Thoughts?

The number of times the branch changed directions and (divided by?) the number of times it was executed sounds ok. Do we want to differentiate between AAAAABAB and AABBAABB (which this scheme won't)?

I don't think you want to do the number of times that the branch changed directions. Hal made the comment that even 50% either way branches can have perfect prediction. The example he gave was of long runs of one direction, and then long runs of another direction.

But branch predictors are usually much smarter than that. Often they can also perfectly predict
ABABABABAB, branches which change direction with every other time. How good and what patterns a predictor is able to do well
is a function both of how many bits the HW devotes to maintaining various pieces of prediction state, as well as the sophistication of
the branch prediction algorithms themselves.

I don't think you want to attempt to embed anything that needs to have deep knowledge of the predictor itself. I think a simple
percentage of instances the "condition" is unpredictable would get the granularity (compared to bool) that you are thinking about,
without having any dependence on any specific prediction algorithm or implementation.

Feeding that information back into the IR from the HW can be interesting though. Changes to the number and direction of branches
can indirectly impact other branch's prediction rates. :-). Also, after various optimizations have happened, the relationship between
the predicitability of the final branch can, and that of the IR that it started out as can be hard to track, and thus feed back well.

Kevin Smith

I don't think you want to do the number of times that the branch changed directions. Hal made the comment that even 50% either way branches can have perfect prediction. The example he gave was of long runs of one direction, and then long runs of another direction.

So the branch changes direction once, which (assuming large number of executions) gives the probability of changing direction close to 0. With a branch that alternates each time, the value would be close to 1. We don't want to mimic any specific behavior of a branch predictor, but we need to settle on a some model. This one assumes knowledge of the single, most recent execution of the branch. It loses information about the direction of the change itself, i.e. whether it was A->B or B->A. Are you suggesting longer history? More complete information, e.g. all of P(A|A), P(A|B), P(B|A) and P(B|B)?

No, I am not suggesting you have more history. I am suggesting that you only have a single number that represents the percentage of time that the direction is mispredicted. It would make a lot of sense for it to be a simple 0-100 value, and would represent the percentage of mispredictions. Anything else takes more space, and is much too closely tied to the branch predictor.

ABAB can be perfectly predicted often, AABB can be perfectly predicted often. The number of transitions from true to false or vice-versa has a poor
correlation to branch predictability, just as the branch probability has a poor correlation to branch predictability.

  • Original Message -----

From: "Kevin B. Smith" <kevin.b.smith@intel.com>
To: spatel@rotateright.com, chandlerc@gmail.com, kparzysz@codeaurora.org, hfinkel@anl.gov, "bob wilson"
<bob.wilson@apple.com>, kubastaszak@gmail.com
Cc: "kevin b smith" <kevin.b.smith@intel.com>, llvm-commits@lists.llvm.org
Sent: Thursday, August 27, 2015 5:13:48 PM
Subject: Re: [PATCH] D12341: add llvm.unpredictable intrinsic and lower it to metadata

kbsmith1 added a comment.

No, I am not suggesting you have more history. I am suggesting that
you only have a single number that represents the percentage of time
that the direction is mispredicted. It would make a lot of sense
for it to be a simple 0-100 value, and would represent the
percentage of mispredictions. Anything else takes more space, and is
much too closely tied to the branch predictor.

I agree.

From a profiling perspective, we can likely extract this number from sampling (on modern Intel cores, for example, this information is recorded in the MSR_LASTBRANCH_x_FROM_IP MSR (read using RDPMC, etc.), and in the BTS if enabled). I doubt we can efficiently capture anything more.

In practice, as with the profiling data, we might want to store two numbers so that we can store raw sample counts from both sides of a branch without worrying about rescaling. I don't have a strong opinion here.

-Hal

ABAB can be perfectly predicted often, AABB can be perfectly
predicted often. The number of transitions from true to false or
vice-versa has a poor
correlation to branch predictability, just as the branch probability
has a poor correlation to branch predictability.

http://reviews.llvm.org/D12341

The problem with the intrinsic is that ... it confuses information about a value with information about a branch based on that value.

Do you see this being different than __builtin_expect() in Clang? If it's information about the branch itself, do we have:

__builtin_unpredictable_if( MyCondition ) { ... }

If it's about the condition, then it looks like the existing __builtin_expect:

if ( __builtin_unpredictable( MyCondition ) { ... }

...but we'll have to propagate the metadata from the condition value to branches that use it in Clang? Or does unpredictable metadata stick to a value in Clang and then the backend is responsible for tracking back to that metadata when looking at branches?

  • Original Message -----

From: "Sanjay Patel" <spatel@rotateright.com>
To: spatel@rotateright.com, chandlerc@gmail.com, kparzysz@codeaurora.org, hfinkel@anl.gov, "bob wilson"
<bob.wilson@apple.com>, kubastaszak@gmail.com
Cc: "kevin b smith" <kevin.b.smith@intel.com>, llvm-commits@lists.llvm.org
Sent: Thursday, August 27, 2015 6:38:39 PM
Subject: Re: [PATCH] D12341: add llvm.unpredictable intrinsic and lower it to metadata

spatel added a comment.

In http://reviews.llvm.org/D12341#234428, @hfinkel wrote:

The problem with the intrinsic is that ... it confuses information
about a value with information about a branch based on that value.

Do you see this being different than __builtin_expect() in Clang? If
it's information about the branch itself, do we have:

__builtin_unpredictable_if( MyCondition ) { ... }

If it's about the condition, then it looks like the existing
__builtin_expect:

if ( __builtin_unpredictable( MyCondition ) { ... }

...but we'll have to propagate the metadata from the condition value
to branches that use it in Clang?

I think that the intrinsic can look like __builtin_expect, in that it wraps a value (that's easiest in C), but the propagation should be done to the branches in Clang. We don't gain anything from doing it later because we run LowerExpectIntrinsic before inlining regardless (this might technically be untrue, because we do an initial SROA run, but I've never seen this used outside of its documented form inside an if/while/for statement, and so the data dependence is clear from the AST).

-Hal

Or does unpredictable metadata
stick to a value in Clang and then the backend is responsible for
tracking back to that metadata when looking at branches?

http://reviews.llvm.org/D12341

I am suggesting that you only have a single number that represents the percentage of time that the direction is mispredicted.

This is really the original idea if we assume that a change of direction causes a misprediction. The problem with tracking "misprediction" rate is that it does depend on the specific hardware. The implementation of the hardware branch prediction may change between subtargets of the same target, which is why I proposed tracking the behavior of the branch instead. A subtarget could then use this information to estimate the misprediction rate, which may not be as accurate as tracking misprediction itself for that subtarget, but it would be portable.

From: Sean Silva [mailto:chisophugis@gmail.com]
Sent: Thursday, August 27, 2015 5:39 PM
To: reviews+D12341+public+5f663c68720e9677@reviews.llvm.org; Smith, Kevin B
Cc: Sanjay Patel; Chandler Carruth; Krzysztof Parzyszek; Hal Finkel; Bob Wilson; kubastaszak@gmail.com; llvm-commits
Subject: Re: [PATCH] D12341: add llvm.unpredictable intrinsic and lower it to metadata

I am suggesting that you only have a single number that represents the percentage of time that the direction is mispredicted.

This is really the original idea if we assume that a change of direction causes a misprediction. The problem with tracking "misprediction" rate is that it does depend on the specific hardware. The implementation of the hardware branch prediction may change between subtargets of the same target, which is why I proposed tracking the behavior of the branch instead. A subtarget could then use this information to estimate the misprediction rate, which may not be as accurate as tracking misprediction itself for that subtarget, but it would be portable.

The assumption that a change in branch direction causes a misprediction is just not a correct assumption. Further most branch predictors have global state such that they are affected by other nearby branches, and also other branches that precede them in execution order. Compiler's really cannot model this, and thus attempting to discern/compute branch predictability from a pattern based on a specific implementation is unlikely to be successful. And most branch prediction algorithms are patented or closely guarded intellectual property. Having an implementation of various HW algorithms seems unlikely to be coded into LLVM. A boolean serves the immediate need, but I think a 0 to 100 integer, with 0 meaning completely predictable and 100 meaning completely unpredictable can serve that same purpose, while also allowing HW feedback. In actual practice, I'd expect anything with a unpredictability value higher than about 10 would be considered pretty unpredictable. HW branch prediction rates (correct predictions/total branches, over all branches in a program) often run in the 95-98% range, making any branch with more than 10% unpredictability, less predictable than an average branch.

[...]

I don't think we completely understand each other. The profiling data that is attached to MD_prof describes the behavior of the program. You want to have something that describes the behavior of the hardware. Those are two different things. The former is independent from the specific properties of the hardware and can be reused when compiling for different subtargets. The latter is not. I'm not opposed to profiling information tied to specific hardware in principle, but I think those two kinds should be easily separable.

[...]

I don't think we completely understand each other. The profiling data that is attached to MD_prof describes the behavior of the program. You want to have something that describes the behavior of the hardware. Those are two different things. The former is independent from the specific properties of the hardware and can be reused when compiling for different subtargets. The latter is not. I'm not opposed to profiling information tied to specific hardware in principle, but I think those two kinds should be easily separable.

No, I really do get that this is for describing the behavior of the program, and I am not trying to use this to describe behavior of the HW. All I am saying, is that you cannot simply describe the branch pattern, and expect to derive predictability information from that. I just think that the HW feedback data can be used as a good approximation for this program behavior information, and that the branch pattern will not be usable for deriving that information.

spatel updated this revision to Diff 33465.Aug 28 2015, 2:31 PM
spatel retitled this revision from add llvm.unpredictable intrinsic and lower it to metadata to add unpredictable metadata type for control flow.
spatel updated this object.
spatel edited edge metadata.

Patch (completely) updated:
It might have been better to abandon and start new, but I thought keeping the comment context would be nice.

In this version, there is no LLVM unpredictable intrinsic - it's just metadata. As promised, the patch becomes much simpler.

The code should match what Kevin B. Smith suggested: unpredictability is measured as a percentage, so 0 (default) means completely predictable and 100 means completely unpredictable. (I started implementing using a 0.0 - 1.0 FP scale before I saw Kevin's suggestion. I don't know if one way is better than the other.)

However, I have not attempted to define/use any threshold level for unpredictability (see TODO comments in the dependent patches) because I'm assuming we'll need a target hook for that, and of course there is no current producer of that sort of metadata. I should have the clang side of this patch posted soon, so you can see how the builtin_unpredictable handling works from C.

LGTM. I was thinking about having something like MD_prof_hw as the top-level metadata for all HW-specific profiling data, but if any further changes are needed, they can be done later. However, if this metadata is later extended (for example, to have target/subtarget identification attached to it), would the IR compatibility be a concern?

LGTM. I was thinking about having something like MD_prof_hw as the top-level metadata for all HW-specific profiling data, but if any further changes are needed, they can be done later. However, if this metadata is later extended (for example, to have target/subtarget identification attached to it), would the IR compatibility be a concern?

I think we're protected from IR compatibility problems since it's just metadata now. We can put whatever additional data is needed in those strings without breaking what's in there now. Up in Clang, a new builtin would have to be created if for example, someone wanted to specify that the hint mapped to some specific CPU.

But I think what's in the metadata now has not been decided: Hal suggested that we should not add a feature with no users. Should I remove the 0 to 100 specifier (and related docs/code comments) and make it an empty string as it was in the earlier draft?

But I think what's in the metadata now has not been decided: Hal suggested that we should not add a feature with no users. Should I remove the 0 to 100 specifier (and related docs/code comments) and make it an empty string as it was in the earlier draft?

If changing the metadata in the future is not a problem, I'd leave it as it is.

spatel updated this revision to Diff 33823.Sep 2 2015, 10:42 AM

Patch updated:
Although we have a decent idea about how a percentage-based unpredictability ratio might be used, we have no producer or consumer of that data today. Therefore, I've made the metadata string empty again in this version of the patch, so it's just treated as a boolean meaning 'completely unpredictable'. That's the use case we have today, and it makes this and the dependent patches slightly simpler. When we have profile data to feed into this metadata, we can decide the exact format of that string with a concrete use case.

spatel accepted this revision.Sep 2 2015, 12:26 PM
spatel added a reviewer: spatel.
This revision is now accepted and ready to land.Sep 2 2015, 12:26 PM
spatel closed this revision.Sep 2 2015, 12:27 PM

Committed at r246688.

lib/Transforms/IPO/PassManagerBuilder.cpp