This is an archive of the discontinued LLVM Phabricator instance.

IR/ConstantRange: Add support for FP ranges
Needs ReviewPublic

Authored by jvesely on Apr 15 2020, 11:15 AM.

Details

Summary

The FP Range follows the same pattern as integer range, with two major differences.

  1. It's inclusive
  2. NaNs are tracked separately from the range.

This patch implements; construction, printing, fadd, fsub, fmul, fdiv, minimum, maximum, cast, makeAllowedRegion, makeSatisfyingRegion.
Rounding of operations is conservative and returns a potentially larger region.

Floating point vectors are still tracked as integer ranges.

Includes exhaustive testing for all supported operations.

Diff Detail

Event Timeline

jvesely created this revision.Apr 15 2020, 11:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2020, 11:15 AM

I would strongly recommend to implement this as a separate type (FloatRange for example), rather than combining it into ConstantRange. There's a couple reasons for that, the main ones would be a) float ranges have significantly different semantics, in particular the fact that they are inclusive and b) ConstantRange is a size critical structure, because it is used in value lattice elements. As implement, this change is going to skyrocket memory consumption during SCCP.

It would also be nice to have some general context on where you plan to use this.

I would strongly recommend to implement this as a separate type (FloatRange for example), rather than combining it into ConstantRange. There's a couple reasons for that, the main ones would be a) float ranges have significantly different semantics, in particular the fact that they are inclusive and b) ConstantRange is a size critical structure, because it is used in value lattice elements. As implement, this change is going to skyrocket memory consumption during SCCP.

The point of having one combined range is to allow simplification of logic for users. you don't have to check for the type just do the same operations union/intersect/binop/contains/...
I'm not sure how splitting it to a separate class would address the memory concerns. Tracking fp ranges will require two APFloats irrespective of the structure.
If keeping both APFloat and APint bounds is a problem I can put them in a union since they are used exclusively and an instance never switches between float and int.
The difference in semantics shouldn't impact users much. they should use contains/isEmpty/isFull/isSingleElement/getABCrange instead of checking the bounds manually.
The application to valluelattice/lvi/sccp is in D78224, to keep diff simple it hasn't implemented cleanups that are allowed by having one range class, yet.

It would also be nice to have some general context on where you plan to use this.

Other than the usual uses of range propagation, floating point ranges can be used to safely apply fast-math optimizations if operands are known to not require special handling
There's already rudimentary support for NaN tracking, using ranges allows to safely apply nosignedzeros, noinfs, nonans, denorms are zero optimizations.

moreover, gpu backends can generate special instructions if there's advanced knowledge of fp operands.
for example ptx backend can use different fdiv implementation based on the range of inputs [0]. amdgpu can eliminate canonicalize instructions if it knows the range of inputs[1].

[0] https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-div
[1] https://reviews.llvm.org/D35335

fhahn added a subscriber: fhahn.Apr 16 2020, 2:38 AM

I would strongly recommend to implement this as a separate type (FloatRange for example), rather than combining it into ConstantRange. There's a couple reasons for that, the main ones would be a) float ranges have significantly different semantics, in particular the fact that they are inclusive and b) ConstantRange is a size critical structure, because it is used in value lattice elements. As implement, this change is going to skyrocket memory consumption during SCCP.

+1, I think there is a strong case for not increasing the size here. We should rather try to reduce the size of the existing ConstantRange (using the 2 APInts means we store the same bit width twice :()

The point of having one combined range is to allow simplification of logic for users. you don't have to check for the type just do the same operations union/intersect/binop/contains/...
I'm not sure how splitting it to a separate class would address the memory concerns. Tracking fp ranges will require two APFloats irrespective of the structure.
If keeping both APFloat and APint bounds is a problem I can put them in a union since they are used exclusively and an instance never switches between float and int.

Even with a union, you still need extra space for the bits to distinguish between FP/integers (and other FP related flags, like canBeNaN. It should be possible to achieve the same level of convenience to this patch, but separating integer and FP ranges into separate classes. E.g. add ConstantRangeIntStorage & ConstantRangeFloatStorage which use APInt/APFloat respectively. The shared logic can be implemented in a ConstantRangeCommon template, which can be instantiated with either ConstantRangeIntStorage/ConstantRangeFPStorage. ConstantRange would turn into an instantiation of ConstantRangeCommon<ConstantRangeIntStorage>. Of course that would require updating code to make use of the FP version of ConstantRange. That might be a benefit actually, because there probably are various places that assume ConstantRange only contains integers.

When using the FP version of the constant range in ValueLattice, that would allow us to keep the size the same, if no extra bits (like canBeNaN) are added.

It might be that there is no impact on compile-time/memory usage overall with the current version (using a union), but it would be good to collect data.

It would also be nice to have some general context on where you plan to use this.

Other than the usual uses of range propagation, floating point ranges can be used to safely apply fast-math optimizations if operands are known to not require special handling
There's already rudimentary support for NaN tracking, using ranges allows to safely apply nosignedzeros, noinfs, nonans, denorms are zero optimizations.

FWIW I think it might be good to bring up the NaN tracking & co in separate patches.

I would strongly recommend to implement this as a separate type (FloatRange for example), rather than combining it into ConstantRange. There's a couple reasons for that, the main ones would be a) float ranges have significantly different semantics, in particular the fact that they are inclusive and b) ConstantRange is a size critical structure, because it is used in value lattice elements. As implement, this change is going to skyrocket memory consumption during SCCP.

+1, I think there is a strong case for not increasing the size here. We should rather try to reduce the size of the existing ConstantRange (using the 2 APInts means we store the same bit width twice :()

The point of having one combined range is to allow simplification of logic for users. you don't have to check for the type just do the same operations union/intersect/binop/contains/...
I'm not sure how splitting it to a separate class would address the memory concerns. Tracking fp ranges will require two APFloats irrespective of the structure.
If keeping both APFloat and APint bounds is a problem I can put them in a union since they are used exclusively and an instance never switches between float and int.

Even with a union, you still need extra space for the bits to distinguish between FP/integers (and other FP related flags, like canBeNaN. It should be possible to achieve the same level of convenience to this patch, but separating integer and FP ranges into separate classes. E.g. add ConstantRangeIntStorage & ConstantRangeFloatStorage which use APInt/APFloat respectively. The shared logic can be implemented in a ConstantRangeCommon template, which can be instantiated with either ConstantRangeIntStorage/ConstantRangeFPStorage. ConstantRange would turn into an instantiation of ConstantRangeCommon<ConstantRangeIntStorage>. Of course that would require updating code to make use of the FP version of ConstantRange. That might be a benefit actually, because there probably are various places that assume ConstantRange only contains integers.

When using the FP version of the constant range in ValueLattice, that would allow us to keep the size the same, if no extra bits (like canBeNaN) are added.

OK, I'll take a stab at separating the backend, but I don't see a good way to reduce memory usage.

the current situation:
sizeof(APInt) == 16 and there are two of them so sizeof(ConstantRange) == 32, ValueLattice includes ConstantRange as member + extra tags givining it the size of 33 (aligned to 40).

possible implementations of fp tracking:
a) using unions for Lower/LowerFP, Upper/UpperFP:
sizeof(APFloat) == 32 x2 + 2 for tags => sizeof(ConstantRange) == 66 (aligned up to 72). ValueLattice includes ConstantRange as member bringing the size to 73 (aligned to 80).

b) union backend:
sizeof (ConstantRangeBackendInt) == 32, sizeof(ConstantRangeBackendFP) == 65 (2x APFloat + bool NaN, aligned up to 72), sizeof ConstantRange == 72). ValueLattice includes ConstantRange as member bring s the size to 73 (aligned to 80)

c) separate classes for int and float ranges:
sizeof(ConstantRangeInt) == 32, sizeof(ConstantRangeFP) == 65 (aligns up to 72). ValueLattice includes both variants in a union bringing the size to 73 (aligned to 80).

The memory overhead is 2x, which is consistent with tracking more information.
As long as we have one flag, adding another another is free because of the alignment. (I used 8B alignment, but 4B align is not going to change the sizes much).

Unless we use pointers and an extra level of indirection the end sizes for users like ValueLattice end up the same irrespective of where the union is (lower+upper vs. backend vs. user).

If memory usage is a concern, dropping ConstantRange from ValueLattice structure (so it's not taking up space for unknown/overdefined tags) might be a better target.

It might be that there is no impact on compile-time/memory usage overall with the current version (using a union), but it would be good to collect data.

Is there a test suite that I can use to collect this? Are there any integer heavy/only use cases that would only suffer overhead without any benefit to fp range tracking?

It would also be nice to have some general context on where you plan to use this.

Other than the usual uses of range propagation, floating point ranges can be used to safely apply fast-math optimizations if operands are known to not require special handling
There's already rudimentary support for NaN tracking, using ranges allows to safely apply nosignedzeros, noinfs, nonans, denorms are zero optimizations.

FWIW I think it might be good to bring up the NaN tracking & co in separate patches.

NaNs can be created as a result of normal operations (like Inf - Inf, or 0/0) so detecting NaN producing operations is required to track regular ranges. storing it in a flag is a minimal change.

Unless we use pointers and an extra level of indirection the end sizes for users like ValueLattice end up the same irrespective of where the union is (lower+upper vs. backend vs. user).

I'd expect that this is indeed what we want to do. Given that the float range is just so much larger, we'll want to allocate it out of line.

If memory usage is a concern, dropping ConstantRange from ValueLattice structure (so it's not taking up space for unknown/overdefined tags) might be a better target.

That's also a possibility (LVI for example uses a separate set for overdefined values to reduce memory usage), but fairly orthogonal. Even if ConstantRange is allocated out of line as well, we'll still want to be able to allocate the smaller integer only structure.

In any case, it's really not just a matter of memory usage, but also semantics. Integer and float constant ranges cannot interact in any meaningful way, you can't take the union of an integer and float range for example. Your patch already asserts that range types match in many places. It is better if these invariants get enforced more robustly by the type system.

Even with a union, you still need extra space for the bits to distinguish between FP/integers (and other FP related flags, like canBeNaN. It should be possible to achieve the same level of convenience to this patch, but separating integer and FP ranges into separate classes. E.g. add ConstantRangeIntStorage & ConstantRangeFloatStorage which use APInt/APFloat respectively. The shared logic can be implemented in a ConstantRangeCommon template, which can be instantiated with either ConstantRangeIntStorage/ConstantRangeFPStorage. ConstantRange would turn into an instantiation of ConstantRangeCommon<ConstantRangeIntStorage>. Of course that would require updating code to make use of the FP version of ConstantRange. That might be a benefit actually, because there probably are various places that assume ConstantRange only contains integers.

I expect that there is very little logic that can be meaningfully shared between these -- it's mostly a matter of some API names being the same, while the implementation is different. I wouldn't try too many gymnastics to share anything between them.

Unless we use pointers and an extra level of indirection the end sizes for users like ValueLattice end up the same irrespective of where the union is (lower+upper vs. backend vs. user).

I'd expect that this is indeed what we want to do. Given that the float range is just so much larger, we'll want to allocate it out of line.

wouldn't the indirection needlessly penalize fp oriented targets? GPUs often use runtime compilers so performance here is more important than memory usage (gpu kernels tend to be smaller than offline compiled programs).

If memory usage is a concern, dropping ConstantRange from ValueLattice structure (so it's not taking up space for unknown/overdefined tags) might be a better target.

That's also a possibility (LVI for example uses a separate set for overdefined values to reduce memory usage), but fairly orthogonal. Even if ConstantRange is allocated out of line as well, we'll still want to be able to allocate the smaller integer only structure.

In any case, it's really not just a matter of memory usage, but also semantics. Integer and float constant ranges cannot interact in any meaningful way, you can't take the union of an integer and float range for example. Your patch already asserts that range types match in many places. It is better if these invariants get enforced more robustly by the type system.

Even with a union, you still need extra space for the bits to distinguish between FP/integers (and other FP related flags, like canBeNaN. It should be possible to achieve the same level of convenience to this patch, but separating integer and FP ranges into separate classes. E.g. add ConstantRangeIntStorage & ConstantRangeFloatStorage which use APInt/APFloat respectively. The shared logic can be implemented in a ConstantRangeCommon template, which can be instantiated with either ConstantRangeIntStorage/ConstantRangeFPStorage. ConstantRange would turn into an instantiation of ConstantRangeCommon<ConstantRangeIntStorage>. Of course that would require updating code to make use of the FP version of ConstantRange. That might be a benefit actually, because there probably are various places that assume ConstantRange only contains integers.

I expect that there is very little logic that can be meaningfully shared between these -- it's mostly a matter of some API names being the same, while the implementation is different. I wouldn't try too many gymnastics to share anything between them.

The logic is separate as most operations are either int or float with limited number of transitions (fptoint/inttofp). The point is to have a single instance that can handle both. the user should be able to do getFull(Type*)/getEmpty(Type*)/isSingleElement()/getSingleElementConstant(...)/getSatisfyingRange(CmpInst).contains(item1.getRange()) without having to duplicate the logic for integer and floating point types.
Separate classes impose extra complexity on every user that wants to track floating point values. it's not just handling of storage in a union (which at least for ValueLattice is mostly in place), but every single function that currently returns a ConstantRange will need to be duplicated. this leads to error-prone code structure.

The current version of the patches intentionally keeps things separate for reviewing, but the plan is to unify most of the duplicated code paths introduced in D78224. Even in D78224 the codepaths that only deal with ranges are unified, the separation is mainly for range creation or extracting single values.

fhahn added a comment.Apr 19 2020, 1:08 PM

Even with a union, you still need extra space for the bits to distinguish between FP/integers (and other FP related flags, like canBeNaN. It should be possible to achieve the same level of convenience to this patch, but separating integer and FP ranges into separate classes. E.g. add ConstantRangeIntStorage & ConstantRangeFloatStorage which use APInt/APFloat respectively. The shared logic can be implemented in a ConstantRangeCommon template, which can be instantiated with either ConstantRangeIntStorage/ConstantRangeFPStorage. ConstantRange would turn into an instantiation of ConstantRangeCommon<ConstantRangeIntStorage>. Of course that would require updating code to make use of the FP version of ConstantRange. That might be a benefit actually, because there probably are various places that assume ConstantRange only contains integers.

When using the FP version of the constant range in ValueLattice, that would allow us to keep the size the same, if no extra bits (like canBeNaN) are added.

OK, I'll take a stab at separating the backend, but I don't see a good way to reduce memory usage.

the current situation:
sizeof(APInt) == 16 and there are two of them so sizeof(ConstantRange) == 32, ValueLattice includes ConstantRange as member + extra tags givining it the size of 33 (aligned to 40).

possible implementations of fp tracking:
a) using unions for Lower/LowerFP, Upper/UpperFP:
sizeof(APFloat) == 32 x2 + 2 for tags => sizeof(ConstantRange) == 66 (aligned up to 72). ValueLattice includes ConstantRange as member bringing the size to 73 (aligned to 80).

b) union backend:
sizeof (ConstantRangeBackendInt) == 32, sizeof(ConstantRangeBackendFP) == 65 (2x APFloat + bool NaN, aligned up to 72), sizeof ConstantRange == 72). ValueLattice includes ConstantRange as member bring s the size to 73 (aligned to 80)

c) separate classes for int and float ranges:
sizeof(ConstantRangeInt) == 32, sizeof(ConstantRangeFP) == 65 (aligns up to 72). ValueLattice includes both variants in a union bringing the size to 73 (aligned to 80).

The memory overhead is 2x, which is consistent with tracking more information.
As long as we have one flag, adding another another is free because of the alignment. (I used 8B alignment, but 4B align is not going to change the sizes much).

Unless we use pointers and an extra level of indirection the end sizes for users like ValueLattice end up the same irrespective of where the union is (lower+upper vs. backend vs. user).

If memory usage is a concern, dropping ConstantRange from ValueLattice structure (so it's not taking up space for unknown/overdefined tags) might be a better target.

I think there are 2 ways to stage the introduction: 1) extending ConstantRange as in this patch is overall less work in terms of patches, but will probably require more analysis on the impact and/or more convincing as ConstantRange is widely used. 2) Introducing a ConstantRangeFP first allows clients to migrate step-by-step and would probably be easier to get in incrementally and things can unified later, once any outstanding issues are addressed.

It might be that there is no impact on compile-time/memory usage overall with the current version (using a union), but it would be good to collect data.

Is there a test suite that I can use to collect this? Are there any integer heavy/only use cases that would only suffer overhead without any benefit to fp range tracking?

Not sure about specific test cases, but people commonly use CTMark (from test-suite) and/or various versions of SPEC.

lebedev.ri requested changes to this revision.Jul 1 2020, 7:06 AM
This revision now requires changes to proceed.Jul 1 2020, 7:06 AM
lebedev.ri resigned from this revision.Jan 12 2023, 4:51 PM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

This revision now requires review to proceed.Jan 12 2023, 4:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 4:51 PM
Herald added a subscriber: StephenFan. · View Herald Transcript