This is an archive of the discontinued LLVM Phabricator instance.

[LV] Handle external uses of floating-point induction variables
ClosedPublic

Authored by mssimpso on Apr 24 2017, 10:56 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

mssimpso created this revision.Apr 24 2017, 10:56 AM
mkuper edited edge metadata.Apr 24 2017, 11:12 AM

The original PR looks fishy to me, but I agree this is a real issue regardless.

I'm not sure this patch is correct, though. Just to understand what's going on here - we have an FP IV, for which we can compute the (integer, obviously) trip-count. We then cast that integer trip count into an FP value (possibly losing precision) , and them compute start + (step * trip count), in FP, to get the value from the penultimate iteration?

Hi Michael,

After transforming the IV, we end up with something like fp_start + float(n_vec - 1) * fp_iv_step. So we are casting the number of vector loop iterations and multiplying by the floating-point step

Yes, that should have been "trip count - 1", sorry.

My question is whether doing this in FP actually produces the desired result. When I wrote this code for int IVs, the idea was that "start + (step * (count - 1))" is equivalent to "start + step + ... + step" with count-1 additions.
Is this true for the FP case?

I think it should be fine if fast-math is enabled? I was under the impression that we would only recognize floating-point inductions with fast-math enabled, but having just tested it this doesn't seem to be the case after all. Shouldn't it be? What do you think?

I'm not sure. Do we currently (I mean, without this patch) do anything with FP IVs that violates spec?

If we do, then, yes, we not be recognizing them - and this transformation is also safe.
If we don't, then it would be a good idea to ask somebody who understands FP better whether this makes sense or not.

mssimpso updated this revision to Diff 96624.Apr 25 2017, 1:05 PM
mssimpso added a reviewer: delena.

OK, we require fast-math to vectorize floating-point inductions unless vectorization is forced. In my last update, I mentioned that I was seeing the test case be vectorized even without fast-math, but the test was using -force-vector-width. So I've moved the test to the X86 directory and added an additional no fast-math variant. We vectorize the fast-math version and compute the value of the external IV use the same way we do for integer IVs. We don't vectorize the no fast-math version.

Elena, what do you think about this? The alternative I see would be to just disallow external uses of floating-point inductions.

delena edited edge metadata.Apr 25 2017, 1:28 PM

We vectorize loops with FP inductions and FP reductions under the "fast-math". The main point here that the FP induction is a "secondary", the tripcount does not depends on it. I think it's OK to use FP induction outside the loop, why not? FP reduction, that we allow today, actually means using the result value outside.

mkuper accepted this revision.Apr 25 2017, 2:01 PM

Ok, for fast-math, I think this makes sense.
LGTM.

This revision is now accepted and ready to land.Apr 25 2017, 2:01 PM
This revision was automatically updated to reflect the committed changes.