Reference: https://bugs.llvm.org/show_bug.cgi?id=32758
Details
Diff Detail
- Build Status
Buildable 5826 Build 5826: arc lint + arc unit
Event Timeline
The original PR looks fishy to me, but I agree this is a real issue regardless.
I'm not sure this patch is correct, though. Just to understand what's going on here - we have an FP IV, for which we can compute the (integer, obviously) trip-count. We then cast that integer trip count into an FP value (possibly losing precision) , and them compute start + (step * trip count), in FP, to get the value from the penultimate iteration?
Hi Michael,
After transforming the IV, we end up with something like fp_start + float(n_vec - 1) * fp_iv_step. So we are casting the number of vector loop iterations and multiplying by the floating-point step
Yes, that should have been "trip count - 1", sorry.
My question is whether doing this in FP actually produces the desired result. When I wrote this code for int IVs, the idea was that "start + (step * (count - 1))" is equivalent to "start + step + ... + step" with count-1 additions.
Is this true for the FP case?
I think it should be fine if fast-math is enabled? I was under the impression that we would only recognize floating-point inductions with fast-math enabled, but having just tested it this doesn't seem to be the case after all. Shouldn't it be? What do you think?
I'm not sure. Do we currently (I mean, without this patch) do anything with FP IVs that violates spec?
If we do, then, yes, we not be recognizing them - and this transformation is also safe.
If we don't, then it would be a good idea to ask somebody who understands FP better whether this makes sense or not.
OK, we require fast-math to vectorize floating-point inductions unless vectorization is forced. In my last update, I mentioned that I was seeing the test case be vectorized even without fast-math, but the test was using -force-vector-width. So I've moved the test to the X86 directory and added an additional no fast-math variant. We vectorize the fast-math version and compute the value of the external IV use the same way we do for integer IVs. We don't vectorize the no fast-math version.
Elena, what do you think about this? The alternative I see would be to just disallow external uses of floating-point inductions.
We vectorize loops with FP inductions and FP reductions under the "fast-math". The main point here that the FP induction is a "secondary", the tripcount does not depends on it. I think it's OK to use FP induction outside the loop, why not? FP reduction, that we allow today, actually means using the result value outside.