Page MenuHomePhabricator

[NFCI][IR] ConstantRangeTest: add basic scaffolding for next-gen precision/correctness testing
Needs ReviewPublic

Authored by lebedev.ri on Sep 25 2020, 1:29 AM.
This revision needs review, but there are no reviewers specified.



I have long complained that while we have exhaustive tests
for ConstantRange, they are, uh, not good.

The approach of groking our own constant range
via exhaustive enumeration is, mysterious.

It neither tells us without doubt that the result is
conservatively correct, nor the precise match to the ConstantRange
result tells us that the result is precise.
But yeah, it's fast, i give it that.

In short, there are three things that we need to check:

  1. That ConstantRange result is conservatively correct
  2. That ConstantRange range is reasonable
  3. That ConstantRange result is reasonably precise

So let's not just check the middle one, but all three.

This provides precision test coverage for D88178.

Diff Detail

Event Timeline

lebedev.ri requested review of this revision.Sep 25 2020, 1:29 AM
lebedev.ri created this revision.
nikic added a subscriber: nikic.Sep 26 2020, 11:47 AM

Thanks for putting up the review! I agree that the current testing situation is not ideal, but don't think the approach used here is moving us in the right direction.

It neither tells us without doubt that the result is
conservatively correct, nor the precise match to the ConstantRange
result tells us that the result is precise.

To start with, the current testing code does check for conservative correctness -- the problem is just that it does not check for it separately, so it's not obvious whether a failure is a failure of correctness or a failure of preciseness.

The question of precision is more complicated, because there is no generally applicable notion of a precise range. In many cases the "best" range is the one with the smallest size, but in some cases we also have an explicit sign preference (e.g. for intersection). In D88356 I've put up a more general testing framework that checks correctness and optimality under a preference function. The preference function is something like "smallest" or "smallest non-wrapping", which allows us to precisely specify what it is we're testing for.

This is an alternative to the approach you are proposing here. There's two reasons why I think it is more useful:

  • In this patch, you determine precision by counting how many values there are in the value set and how many there are in the range. For the case of the binaryNot() implementation, both of these will coincide for an optimal implementation. However, this is due to a special property of binaryNot() and will not generalize to other operations. In particular, binaryNot() maps a continuous range of values onto another continuous range, so there will be no gaps. In the general case, even an optimal range will contain elements that are not in the exact set. If the precise result set is just two elements 0 and 0x7f then the smallest covering range will still contain 64x more elements than the exact set. So if at the end of the test you get the result that on average, the computed range has 3.7x more elements than the exact set -- what does that actually tell you? Depending on the operation, this could easily be both a very good and a very bad result. For this reason, I don't think value counting is a useful metric in this context.
  • This testing approach still retains the need to explicitly specify how the "exact range" is constructed. This is easy for signed/unsigned non-wrapping ranges, but gets tricky for "smallest" ranges, and even trickier if the preference criterion is more complicated than that. The implementation of testBinarySetOperandExhaustive is probably the best example of that: The approach of enumerating all possible ranges and then picking out the best one by preference allows us to move that code onto the normal testing framework.