This is an archive of the discontinued LLVM Phabricator instance.

[InstructionCost] Add saturation support.
ClosedPublic

Authored by sdesmalen on Jun 29 2021, 6:08 AM.

Details

Summary

This patch makes the operations on InstructionCost saturate, so that when
costs are accumulated they saturate to <max value>.

One of the compelling reasons for wanting to have saturation support
is because in various places, arbitrary values are used to represent
a 'high' cost, but when accumulating the cost of some set of operations
or a loop, overflow is not taken into account, which may lead to unexpected
results. By defining the operations to saturate, we can express the cost
of something 'very expensive' as InstructionCost::getMax().

Diff Detail

Event Timeline

sdesmalen created this revision.Jun 29 2021, 6:08 AM
sdesmalen requested review of this revision.Jun 29 2021, 6:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2021, 6:08 AM

Should the max/min values be treated as infinity? With this implementation, (max+1)-1 = max-1. Is that what we want?

Looking at the documentation of invalid costs, it says

/// These states can currently be used to indicate whether a cost is valid or
/// invalid. Examples of an invalid cost might be where the cost is
/// prohibitively expensive and the user wants to prevent certain
/// optimizations being performed. Or perhaps the cost is simply unknown
/// because the operation makes no sense in certain circumstances. These
/// states can be expanded in future to support other cases if necessary.

Do we have any examples of "Or perhaps the cost is simply unknown because the operation makes no sense in certain circumstances"? Where it wouldn't mean the same as an infinite cost.

Should the max/min values be treated as infinity? With this implementation, (max+1)-1 = max-1. Is that what we want?

Yes. It wasn't really the intention of this patch to implement a proper 'infinity', because that makes things a lot more complicated (e.g. having to define what "0 * infinity" means). Simple saturation arithmetic is sufficient for practical uses of InstructionCost which are often about accumulating (possibly scaled) costs, where currently wrapping may occur if large numbers are used.

Looking at the documentation of invalid costs, it says

/// These states can currently be used to indicate whether a cost is valid or
/// invalid. Examples of an invalid cost might be where the cost is
/// prohibitively expensive and the user wants to prevent certain
/// optimizations being performed. Or perhaps the cost is simply unknown
/// because the operation makes no sense in certain circumstances. These
/// states can be expanded in future to support other cases if necessary.

Do we have any examples of "Or perhaps the cost is simply unknown because the operation makes no sense in certain circumstances"? Where it wouldn't mean the same as an infinite cost.

If the IR has an operation with Invalid cost that would be considered a bug. That's because Invalid represents that an operation has no cost because it cannot be code-generated. If the cost is <very high>, that's fine although it suggests very inefficient. The former (Invalid) is true for scalable-vector operations that are not natively supported by the target and can only be lowered by scalarizing. SelectionDAG currently has no mechanism to do that, so any operation that requires scalarizing is considered to have an Invalid cost. Types like <vscale x 1 x i128> would be such an example for AArch64 SVE.

kparzysz added a comment.EditedJun 29 2021, 11:59 AM

If hitting either one of the saturation boundaries was realistic enough to warrant this change, then why having both, large positive and large negative costs wouldn't be? If cost1 > 0 and cost2 < 0, and they both exceed the bounds, then cost1+cost2 in the saturating arithmetic may be a finite number, reasonably close to 0, and yet completely meaningless.

Edit: clarified the saturated arithmetic

Do we have any examples of "Or perhaps the cost is simply unknown because the operation makes no sense in certain circumstances"? Where it wouldn't mean the same as an infinite cost.

If the IR has an operation with Invalid cost that would be considered a bug. That's because Invalid represents that an operation has no cost because it cannot be code-generated. If the cost is <very high>, that's fine although it suggests very inefficient. The former (Invalid) is true for scalable-vector operations that are not natively supported by the target and can only be lowered by scalarizing. SelectionDAG currently has no mechanism to do that, so any operation that requires scalarizing is considered to have an Invalid cost. Types like <vscale x 1 x i128> would be such an example for AArch64 SVE.

OK that matches my expectation of an invalid cost. Synonymous with an infinite cost. It's not intended for something with a high cost, or something that we just haven't bothered to assign a cost to.
It may be worth updating the documentation to be more explicit about that, if we don't have any other ways we generate "invalid" costs.

As for this patch, I have no objections to adding saturation, but would not usually expect them to get that high (especially if the underlying type is an i64), except for some of the place in the vectorizer where we are multiplying by max tripcounts.

If hitting either one of the saturation boundaries was realistic enough to warrant this change, then why having both, large positive and large negative costs wouldn't be? If cost1 > 0 and cost2 < 0, and they both exceed the bounds, then cost1+cost2 in the saturating arithmetic may be a finite number, reasonably close to 0, and yet completely meaningless.

Edit: clarified the saturated arithmetic

You're right that if values are both positive and negative that the resulting value is meaningless if these values are both exceeding their bounds. It's not the common use-case though, since most places just accumulate costs and expect all instruction costs to be positive values (the same holds for places that subtract (all positive) costs from some budget). I think the only exception is the SLPVectorizer, which does both addition/subtraction.

One the motivations for this patch was the overflow in the LoopVectorizer when applying D105113. This tries to implement the cost-comparison on InstructionCost, rather than doing it on InstructionCost::CostTy and then extending the values to int64_t before multiplication. The starting cost is set to std::numeric_limits<InstructionCost::CostTy>::max() which when multiplied by the vector element-count overflows. This could practically be resolved by setting that value to some lower value, but there may be other places where multiplication with e.g. MaxTripCounts could lead to overflow.

The common case for InstructionCost is asking "what is the cost of <operation> or <set of operations>?", which should always be a positive number. Conceptually it makes little sense to have negative costs to begin with, so at some point I'd like to see InstructionCost become unsigned to avoid such complications. Perhaps a feasible route would be to make InstructionCost unsigned and saturating by default and have a separate SignedWrappingInstructionCost class for any exceptions where signedness and wrapping may be required? If so, that would be a bigger piece of work than what this patch tried to achieve.

kparzysz accepted this revision.Jun 30 2021, 10:09 AM

Sounds good to me.

This revision is now accepted and ready to land.Jun 30 2021, 10:09 AM
sdesmalen updated this revision to Diff 356679.Jul 6 2021, 4:45 AM

Updated comments to clarify meaning of Invalid vs high-but-valid costs.

Matt added a subscriber: Matt.Jul 6 2021, 3:22 PM
dmgreen accepted this revision.Jul 8 2021, 1:42 AM

Thanks for updating the wording. Sounds good to me

This revision was landed with ongoing or failed builds.Jul 10 2021, 3:57 AM
This revision was automatically updated to reflect the committed changes.