Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline

[mlir] Add integer range inference analysis
ClosedPublic

Authored by krzysz00 on Apr 19 2022, 11:00 AM.

Details

Summary

This commit defines a dataflow analysis for integer ranges, which
uses a newly-added InferIntRangeInterface to compute the lower and
upper bounds on the results of an operation from the bounds on the
arguments. The range inference is a flow-insensitive dataflow analysis
that can be used to simplify code, such as by statically identifying
bounds checks that cannot fail in order to eliminate them.

The InferIntRangeInterface has one method, inferResultRanges(), which
takes a vector of inferred ranges for each argument to an op
implementing the interface and a callback allowing the implementation
to define the ranges for each result. These ranges are stored as
ConstantIntRanges, which hold the lower and upper bounds for a
value. Bounds are tracked separately for the signed and unsigned
interpretations of a value, which ensures that the impact of
arithmetic overflows is correctly tracked during the analysis.

The commit also adds a -test-int-range-inference pass to test the
analysis until it is integrated into SCCP or otherwise exposed.

Finally, this commit fixes some bugs relating to the handling of
region iteration arguments and terminators in the data flow analysis
framework.

Depends on D124020

Depends on D124021

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Fix braces, other such - add unit test for branches.

( @rriddle To bring the signed/unsigned/ConstantRange discussion out of inline comments)

I've got a few reasons I'd say it's reasonable to track the signed and unsigned interpretations of values separately, in contrast with LLVM's approach.

  1. The LLVM approach appears to rely on having a bunch of flow-sensitive reasoning that generates things like nsw in order to make their analysis more precise. We don't have, and probably don't want to recreate, all that infrastructure, especially with how extensible MLIR is. So, we need another avenue to get analysis precision.
  2. Having signed/unsigned bounds like this makes it easier to reason about what the range inference implementations are doing, especially when dealing with mixtures of signed and unsigned ops (like divui or remsi). Trying to precisely get bounds out of mixed code like that is tricky.
  3. In most cases, we don't have double computation - the three ops that really have to do duplicate reasoning about the signed and unsigned result are addition, subtraction, and multiplication. Just about everything else can call fromSigned or fromUnsigned after computing one set of bounds to copy the bounds over if that's a valid thing to do.
  4. Initial attempts at a one-range system were a confusing mess and I'm not looking to fight through making sure that's implemented correctly.
mlir/lib/Analysis/IntRangeAnalysis.cpp
25–26

Doesn't it need to have a defined namespace so it can be referenced in a header?

mlir/unittests/Interfaces/InferIntRangeInterfaceTest.cpp
2–3

clang-format

rriddle requested changes to this revision.May 12 2022, 10:23 PM

( @rriddle To bring the signed/unsigned/ConstantRange discussion out of inline comments)

I've got a few reasons I'd say it's reasonable to track the signed and unsigned interpretations of values separately, in contrast with LLVM's approach.

  1. The LLVM approach appears to rely on having a bunch of flow-sensitive reasoning that generates things like nsw in order to make their analysis more precise. We don't have, and probably don't want to recreate, all that infrastructure, especially with how extensible MLIR is. So, we need another avenue to get analysis precision.
  2. Having signed/unsigned bounds like this makes it easier to reason about what the range inference implementations are doing, especially when dealing with mixtures of signed and unsigned ops (like divui or remsi). Trying to precisely get bounds out of mixed code like that is tricky.
  3. In most cases, we don't have double computation - the three ops that really have to do duplicate reasoning about the signed and unsigned result are addition, subtraction, and multiplication. Just about everything else can call fromSigned or fromUnsigned after computing one set of bounds to copy the bounds over if that's a valid thing to do.
  4. Initial attempts at a one-range system were a confusing mess and I'm not looking to fight through making sure that's implemented correctly.

I think this is underestimating some of the effects resulting from having both signed and unsigned:

  • It seems like in the implementation of the interfaces for the arithmetic operations you are ignoring known information

E.g. if a signed operation has inputs that have known unsigned ranges, why shouldn't we try to preserve and utilize that information when possible?

  • The duplication is not just about adding the interface to an op, but also when using the analysis

Bouncing off of the previous point, if a user doesn't want to ignore known information they will inevitably have to check both anyways.

  • You aren't going to be able to get away from nsw and related flags

The arithmetic operations don't have those at this point, but that doesn't mean they won't grow them moving forward (the current modeling is not all encompassing). We also will want this analysis to support cases like the LLVM dialect, which do encode that information already.

I have some serious concerns about using multiple ranges, and I'd like to fully understand why a single range ended up being "a confusing mess". Especially given that LLVM has utilized it successfully, and hasn't revolted/reverted into a multi-range system.

This revision now requires changes to proceed.May 12 2022, 10:23 PM

It looks like, at the very least, I misread the comments on ConstantRange. Note that, for example, computeConstantRange at https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/ValueTracking.h#L548 has a forSigned parameter, and that the scalar evolution code has an enum requiring you to specify if you're computing the range for signed or unsigned values and corresponding utility functions getUnsignedRange() and getSignedRange().

This means my point 4, about things being a huge mess, doesn't apply to LLVM - what I initially tried to do is to use the same pair of bounds for both signed and unsigned values without any context about what I wanted, which I don't think is something that can be done precisely ... which is why LLVM doesn't do it either.

And to your points about ignoring information - that really shouldn't happen. The fromSigned and fromUnsigned constructors (which is what will be used when a particular operation has a signedness in its interpretation) will copy the bounds to the other range if possible. For example, if I say fromSigned(0, 15), I'll get the ranges unsigned = [0, 15], signed = [0, 15], but if I say fromSigned(-1, 1), I'll get unsigned = [-inf, +inf], signed = [-1, 1], because -1 <= x <= 1 doesn't correspond to a contiguous set of bit patterns from the unsigned perspective.

So, if you have a signed operation, it will use information about unsigned ranges implicitly - that's where fromUnsigned comes in. (In fact, if the unsigned view of that input has an upper bound <= int_max(type), the signed operation will know that argument is non-negative).

And so, if we used one range, we'd need to do what LLVM does and thread whether we want to know about each Value from a signed or unsigned point of view through the entire analysis. It seems more reasonable to me to just keep two ranges instead.

@rriddle I hope the above comment makes some sense

rriddle added inline comments.May 17 2022, 2:10 AM
mlir/include/mlir/Analysis/IntRangeAnalysis.h
31–32
34–36

Analyses are expected to be constructed from the operation, please change this to a constructor.

37
mlir/include/mlir/Interfaces/InferIntRangeInterface.h
23

This doesn't look necessary.

29

The name of this is a bit weird, given that there aren't any attribute. Can we use ConstantIntRange instead?

33

Please add newlines between methods, applies throughout the commit.

40

Please drop trivial braces.

46

Where are these tuples being built? Can we drop this constructor?

72

Can we just pass in the values directly? Why go through tuple?

Same for the other method.

99

Why do we need to use optional here? Do we need the default constructor? If this is just for the analysis, we could wrap this class with an internal one that isn't outwardly exposed.

102
mlir/lib/Analysis/IntRangeAnalysis.cpp
54

Can you complete this comment?

mlir/test/lib/Dialect/Test/TestOps.td
2901–2903
2924–2926
2938–2940

Thanks for the comments on the range representation and catching the pointless complexity introduced by Optional<>

mlir/include/mlir/Interfaces/InferIntRangeInterface.h
46

These tuples arise from methods in the implementation logic for arith that return various [min, max] ranges as a tuple. They're useful ergonomic constructors, IMO.

72

Utility methods that use tuples to return multiple values.

99

Having thought about it, we don't _need_ optional - ConstantRange down in LLVM doesn't need it. The case of unbounded ranges can be handled by [0, uint_max(width)] / [int_min(width), int_max(width)], respectively, and we could use width 0 APInts to handle things that aren't integers so we have a meaningful null value.

That would all, now that I think about it, simplify things substantially in many parts of the implementation at the cost of a few minor ergonomic annoyances. I'll make it happen.

krzysz00 updated this revision to Diff 430446.May 18 2022, 10:38 AM
krzysz00 marked 8 inline comments as done.

Make the ints in ranges not optional, since that's not needed.

(submitting comments)

@rriddle Do you have any other comments?

Nice, this is getting really close. Great work.

mlir/include/mlir/Analysis/IntRangeAnalysis.h
2

This looks a bit off.

19–21

Are all of these necessary?

32–36
mlir/include/mlir/Interfaces/InferIntRangeInterface.h
22

attributes? This comment looks out of date.

46

Can we remove these until we need them then, I'm still not convinced yet of the value. The value may be more clear when there are uses, in which case they can be readded.

52
62
64–66

Is this only for the pessimistic value state stuff? Could we just have a wrapper value for the analysis, and keep ConstantIntRanges clean from this? notAnInteger feels quite weird as an interface for a ConstantIntRange.

78–79

With above, can we please drop the tuple API until we need it?

94–96

Related to the comment above, if we used a wrapper class in the analysis we wouldn't need this to be public exposed.

108

nit: Can you hoist this near the constructors?

117

Can you document this?

mlir/include/mlir/Transforms/Passes.h
54 ↗(On Diff #430446)
mlir/lib/Transforms/FoldInferredConstants.cpp
1–2 ↗(On Diff #430446)

This is a bit off.

88–97 ↗(On Diff #430446)

SCCP in LLVM represents a range within the lattice, is that something we could do here (i.e. merge the necessary things into SCCP)? It would remove the need to have a duplicate pass with similar goals to SCCP.

krzysz00 updated this revision to Diff 431743.May 24 2022, 12:02 PM
krzysz00 marked 19 inline comments as done.

Address review comments, move join/pessimistic stuff into wrapper class

Thanks, and I'm glad to hear this is all moving in a good direction

mlir/include/mlir/Interfaces/InferIntRangeInterface.h
64–66

This becomes important for arith.constant and any other op we can think of whose return type might or might not be an integer. I think it'll improve code clarity if we keep this around as a helper method.

mlir/lib/Transforms/FoldInferredConstants.cpp
88–97 ↗(On Diff #430446)

I agree that we might be able to merge this and SCCP, but I'd rather do that in a future commit once we've roped in the SCCP folks on here to make sure we've handled their use cases (ex - do they handle vector constants? We don't)

(and I did move the tuple methods over to the commit that defines InferIntRangeInterface for arithmetic)

rriddle added inline comments.May 27 2022, 12:11 AM
mlir/include/mlir/Interfaces/InferIntRangeInterface.h
64–66

If the constant isn't an integer, can we just bail out? It seems like something to be handled outside of this class. For the interface I assume we would either:

  • Not invoke the callback function
  • Change the callback function to take an Optional<> and invoke it with None

What happens in the situations where an operation has one result that can have ranges inferred and one that doesn't? I assume we just wouldn't provide information for the result we can't infer, or am I wrong here? The case where constant doesn't return an int should fall into that same case (this isn't an int, so I'm not giving you anything).

mlir/lib/Transforms/FoldInferredConstants.cpp
88–97 ↗(On Diff #430446)

Can we move this to being a test pass then? The premise of this pass is identical to SCCP, and I'd rather work to merge the necessary functionality into there than to rely on having a separate pass.

krzysz00 updated this revision to Diff 433115.May 31 2022, 9:41 AM
krzysz00 retitled this revision from [mlir][Arith] Add integer range inference analysis to [mlir] Add integer range inference analysis.
krzysz00 edited the summary of this revision. (Show Details)

Integrate fold-inferred-constants into SCCP

krzysz00 updated this revision to Diff 433146.May 31 2022, 11:13 AM
krzysz00 edited the summary of this revision. (Show Details)

Fix bug in dataflow analysis uncovered while testing arithmetic range implementations.

As to SCCP, I've gone and moved the invocation of the integer range analysis into SCCP. I couldn't merge it with the existing SCCP analysis due to their differences in how they detect loop-variant variables (the range-less SCCP was more conservative about it), but I did at least put both analyses in one pass.

I hope this change satisfies the objections you had.

(I also ran into a bug in how data flow analysis was handling terminators for stuff like scf.condition - this has now been fixed here.)

mlir/include/mlir/Interfaces/InferIntRangeInterface.h
64–66

I've fixed the analysis so that failure to call the callback gives you [min(type), max(type)] and therefore removed notAnInteger(). Good call on how to clean all this up!

I will admit I'm not entirely happy with the SCCP integration - if you see any ways I could tighten it up, please do let me know.

I will admit I'm not entirely happy with the SCCP integration - if you see any ways I could tighten it up, please do let me know.

Ah sorry, didn't mean to imply you needed to do that in this patch. Can you split the SCCP out into a separate patch? Using a test pass for now to test the analysis? I'd like to get the base stuff landed, and I don't think that needs to be anchored on figuring out SCCP evolution.

I do want this functionality downstream for non-test reasons, though.

If it makes more sense, I can make -fold-inferred-constants a test pass in
the MLIR codebase and then temporarily pull it out of test/ in our fork
when I'm backporting this patch series. Would that work?

Or, if you don't think the SCCP evolution will be too contentious, I can
just split that into it's own patch and add it to the series - another few
days to make sure we get this right aren't the end of the world.

I do want this functionality downstream for non-test reasons, though.

If it makes more sense, I can make -fold-inferred-constants a test pass in
the MLIR codebase and then temporarily pull it out of test/ in our fork
when I'm backporting this patch series. Would that work?

Or, if you don't think the SCCP evolution will be too contentious, I can
just split that into it's own patch and add it to the series - another few
days to make sure we get this right aren't the end of the world.

Moving to test upstream, and then having a non-test pass downstream for now
would likely be the easiest. Given that you'll still be able to benefit from the work
you've done regardless of what we do with the SCCP integration.

krzysz00 updated this revision to Diff 433828.Jun 2 2022, 11:54 AM
krzysz00 edited the summary of this revision. (Show Details)

Move inference-based folding to -test-int-range-inference

krzysz00 updated this revision to Diff 433834.Jun 2 2022, 12:05 PM

Rebase to more recent main, clean up remaining SCCP change

rriddle accepted this revision.Jun 2 2022, 12:05 PM

Thanks! I really appreciate the patience during the review. This is really great work.

mlir/test/lib/Transforms/TestIntRangeInference.cpp
9

nit: Please drop usernames from TODO messages.

This revision is now accepted and ready to land.Jun 2 2022, 12:05 PM
krzysz00 updated this revision to Diff 433846.Jun 2 2022, 12:43 PM

Adress final nit

River, Jeff, thank you so much for your thorough feedback and review!
This code wouldn't be as good as it is without y'all.

This revision was automatically updated to reflect the committed changes.

I revert, this broke the bot: https://lab.llvm.org/buildbot/#/builders/61/builds/27314

(you should have got an email)

I did get an email but the email only mentioned the warning, not the error.

I worked out the problem and put together a fix that I'll send in tomorrow.

Mogball added inline comments.Jun 7 2022, 1:20 PM
mlir/lib/Analysis/DataFlowAnalysis.cpp
579

https://github.com/llvm/llvm-project/issues/55873 was introduced by this code block here. I'm investigating it at the moment