Page MenuHomePhabricator

[LV] Support converting FP add to integer reductions.
Needs ReviewPublic

Authored by fhahn on Oct 4 2021, 10:06 AM.

Details

Summary

Floating point reductions that start at an integer value and only get
incremented by another positive integer value can be performed on
integers. The result then needs to be clamped by the maximum value value
with mantissa 1.0 (with the maximum exponent) times the step.

This approach has been suggested by @scanon.

One thing I am not sure about is if there is an API to get the
corresponding integer value of a APFloat, if it is an integer. The patch
does same manual matching against explicitly constructed APFloats.

Another thing to note is that the patch uses float->signed int/signed
int->float conversions. Should conversions to unsigned be used?

Diff Detail

Event Timeline

fhahn created this revision.Oct 4 2021, 10:06 AM
fhahn requested review of this revision.Oct 4 2021, 10:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2021, 10:06 AM
Herald added a subscriber: vkmr. · View Herald Transcript
fhahn updated this revision to Diff 376941.Oct 4 2021, 10:17 AM

Use 0x1.0p53 and 0x1.0p24 for maximum constants instead of large integer values.

Interesting idea. Are these two bits of code always the same?
https://godbolt.org/z/EfPKPTMdf

Should we be doing this more generally, outside the vectorizing reductions?

fhahn added a comment.Oct 4 2021, 12:50 PM

Interesting idea. Are these two bits of code always the same?
https://godbolt.org/z/EfPKPTMdf

I think both cases above should be the same. But I think we can construct slight variations where is they would not be. E.g. consider a loop where the induction variable starts at 0 and is incremented and overflow is allowed. If n would be negative, the result of removing the loop and converting n to a float would yield a negative number , but the loop version would always return a positive number. I might be missing some subtleties when it comes to sign handling, perhaps @scanon as further thoughts.

Should we be doing this more generally, outside the vectorizing reductions?

I think it might be worthwhile to convert such reductions outside the vectorizer in some cases. My motivation for starting in LV is that it should be clearly profitable if it allows vectorization. For general loops without vectorization, it might not be profitable I think, e.g. for loops that only execute once, due to the conversion overhead.

Should we be doing this more generally, outside the vectorizing reductions?

I think it might be worthwhile to convert such reductions outside the vectorizer in some cases. My motivation for starting in LV is that it should be clearly profitable if it allows vectorization. For general loops without vectorization, it might not be profitable I think, e.g. for loops that only execute once, due to the conversion overhead.

This is an interesting patch @fhahn! I haven't looked at it yet, but the principle sounds goods! I think we can also do this at -O3 for FP induction variables, i.e.

float f = 0;
for (int i = 0; i < n; i++)
{
  dst[i] = f;
  f += 2.0;
}

provided we know the conversion between float and integer is guaranteed to be lossless for every iteration. This can be done by versioning the loop with runtime SCEV checks I think, right? Or if we know the trip count at runtime we don't even need runtime checks.

xbolva00 added a subscriber: xbolva00.EditedOct 5 2021, 12:13 AM

Something like

const APFloat *NF = …;

APSInt NI(64, false);
If (NF->convertToInteger(NI, APFloat::rmTowardZero, &lgnored) == APFloat::opOK)

?

Check ConvertToSInt helper in IndVarSimplify.

Interesting idea. Are these two bits of code always the same?
https://godbolt.org/z/EfPKPTMdf

I think both cases above should be the same. But I think we can construct slight variations where is they would not be. E.g. consider a loop where the induction variable starts at 0 and is incremented and overflow is allowed. If n would be negative, the result of removing the loop and converting n to a float would yield a negative number , but the loop version would always return a positive number. I might be missing some subtleties when it comes to sign handling, perhaps @scanon as further thoughts.

Yep, I was ignoring the negative numbers :) I meant more about the general idea of converting the loop to straight line code.

Should we be doing this more generally, outside the vectorizing reductions?

I think it might be worthwhile to convert such reductions outside the vectorizer in some cases. My motivation for starting in LV is that it should be clearly profitable if it allows vectorization. For general loops without vectorization, it might not be profitable I think, e.g. for loops that only execute once, due to the conversion overhead.

As far as I can tell from this code: https://godbolt.org/z/caPszPafr
The trace through when n==1 would be

cmp     w1, #1
b.lt    .LBB0_3
cmp     w1, #1
b.ne    .LBB0_4
mov     w8, wzr
movi    d0, #0000000000000000
b       .LBB0_7
sub     w8, w1, w8
fmov    s1, #1.00000000
subs    w8, w8, #1
fadd    s0, s0, s1
b.ne    .LBB0_8
ret

vs straight line code with no branches:

bic     w8, w1, w1, asr #31
mov     w9, #1266679808
scvtf   s0, w8
fmov    s1, w9
fminnm  s0, s0, s1
ret

And that's not including vectorization. It's kind of like a "high cost expansion" from SCEV (but to be fair as far as I understand we wouldn't always rewrite high cost exit values, even if it would mean deleting the loop (?)). Which is what made me wonder if we should be doing it generally, not just in the vectorizer. (Not that I have anything against this patch - it looks pretty sensible and doesn't complicate the reduction code any more than it already is. It seems to fit quite well).

fhahn added a comment.Oct 8 2021, 12:10 PM

Interesting idea. Are these two bits of code always the same?
https://godbolt.org/z/EfPKPTMdf

I think both cases above should be the same. But I think we can construct slight variations where is they would not be. E.g. consider a loop where the induction variable starts at 0 and is incremented and overflow is allowed. If n would be negative, the result of removing the loop and converting n to a float would yield a negative number , but the loop version would always return a positive number. I might be missing some subtleties when it comes to sign handling, perhaps @scanon as further thoughts.

Yep, I was ignoring the negative numbers :) I meant more about the general idea of converting the loop to straight line code.

Should we be doing this more generally, outside the vectorizing reductions?

I think it might be worthwhile to convert such reductions outside the vectorizer in some cases. My motivation for starting in LV is that it should be clearly profitable if it allows vectorization. For general loops without vectorization, it might not be profitable I think, e.g. for loops that only execute once, due to the conversion overhead.

And that's not including vectorization. It's kind of like a "high cost expansion" from SCEV (but to be fair as far as I understand we wouldn't always rewrite high cost exit values, even if it would mean deleting the loop (?)). Which is what made me wonder if we should be doing it generally, not just in the vectorizer. (Not that I have anything against this patch - it looks pretty sensible and doesn't complicate the reduction code any more than it already is. It seems to fit quite well).

I think the case you shared is should be beneficial. I'd need to check where it would fit best and I think we'd need a few extra cost checks, but it shouldn't be too tricky.

I think there will still be cases in which it may not be profitable to perform the transformation on its own, so it might be good to also support this in LV, especially as we should be able to support it quite naturally.

reames added a subscriber: reames.Oct 10 2021, 11:50 AM

As related work in the same area, and possibly an alternative (though I haven't looked at your test cases closely), I want to mention two old reviews of mine.

https://reviews.llvm.org/D68844 teaches SCEV to compute trip counts for simple floating point IVs
https://reviews.llvm.org/D68954 uses the above in IndVars to canonicalize to integers when possible

fhahn added a comment.Oct 11 2021, 2:50 AM

As related work in the same area, and possibly an alternative (though I haven't looked at your test cases closely), I want to mention two old reviews of mine.

https://reviews.llvm.org/D68844 teaches SCEV to compute trip counts for simple floating point IVs
https://reviews.llvm.org/D68954 uses the above in IndVars to canonicalize to integers when possible

Thanks for sharing those patches. After taking a look, it seems a notable difference is that the current patch supports loops with variable trip counts due to clamping to the max value.

I've not thought too much about how this potentially could fit into SCEV, but a similar approach may be feasible there as well.

fhahn updated this revision to Diff 387996.Wed, Nov 17, 10:55 AM

rebase and add tests where the IV may signed/unsigned overflow.

Like I said, I do think a lot of the more common cases of this kind of code would be better suited deleting the loop and transforming to the straight line code. It will be when extra stuff going on in the loop where vectorization will become more useful. It might be worth looking into the loop-deletion case too, at some point.

llvm/lib/Analysis/IVDescriptors.cpp
235

Should half be supported too?

707

Is fsub supported? I don't see any tests or the code to handle them.

llvm/lib/Transforms/Vectorize/VPlan.cpp
1349

Are all start values valid? What if it is not an integer? Or if it is already -0x1.0p24? Or not a multiple of the induction amount? (I'm not sure any of those will be wrong, they might be fine and would start to "saturate" at the same point. I would have to test it to see what would happen).

llvm/test/Transforms/LoopVectorize/AArch64/fadd-reduction-as-int.ll
27

What prevents this add from overflowing?

Ayal added a comment.Sat, Nov 20, 1:59 PM

Floating point reductions that start at an integer value and only get
incremented by another positive integer value can be performed on
integers

When the (integer) increments are invariant, as in several examples, such reductions should better be identified as (integer) inductions, with their final live-out value of saturate(init_value + trip_count[-1] * invariant_increment) pre-computed at the pre-header? Their in-loop values could then also be used, more easily (and efficiently) for trip counts known to avoid saturation. The general reduction infrastructure is needed when the increments are variant, as in the select'ing examples.
Why restrict to integer increments that are powers of 2?

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
914

The EnableStrictReductions-part of the comment above should be moved below?