This is an archive of the discontinued LLVM Phabricator instance.

[MVE] Don't try to unroll vectorised MVE loops
ClosedPublic

Authored by dmgreen on Aug 6 2019, 6:07 AM.

Details

Summary

Due to the nature of the beat system in an MVE pipeline, with tail predication and low-overhead loops, unrolling has less benefit compared to normal loops. You can not, for example, hide the latency of a load with other instructions as you can for scalar code. Not unrolling also makes the code easier to read and reason about.

So if a loop has already been vectorised, or otherwise contains vector code, don't enable the runtime unrolling. At least for the time being.

Diff Detail

Repository
rL LLVM

Event Timeline

dmgreen created this revision.Aug 6 2019, 6:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2019, 6:07 AM

Rather than for performance, I can't see how we can legally unroll a vectorized loops in the case of tail predication. I also don't know how we'd detect that a loop had been unrolled when we get to the conversion phase, so it seems like this is the only place where we can maintain correctness. And because we need this for legality, we can't rely on metadata...

Hey. I'm not sure I see what you mean. Surely if unrolling cause changes that were illegal from a tail predication point of view, we would need to not do tail predication! (i.e you can manually unroll a loop and produce similar code).

These changes are aimed at performance though. For any loop that we are not tail-predicating, we should still prefer to not unroll it.

Yes, I guess that would be sensible approach! I am worried that one of the (many) passes will trip somewhere, so this gives me another example test case... For performance, I'm still not convinced this is the best approach because (1) we can't depend on metadata and (2) doesn't this also prevent the scalar remainder from being unrolled too?

(1) we can't depend on metadata

We can depend on metadata for performance though, that's what it's there for! Just not for correctness (at least, that's what I understood/remember from the language ref). The vectoriser adds llvm.loop.unroll.runtime.disable, being fairly confident that it disables the unrolling for the remainder loop, for example.

(2) doesn't this also prevent the scalar remainder from being unrolled too?

It will already have no-unroll metadata on it. It's known to be a short loop, at most 3 iterations in this case (as we vectorise x 4).

But, am I correct in saying that you are not against preventing unrolling, just this way of looking at the metadata? If so would looking through for vector instructions sounds better to you? That sounds simple enough and should produce the same effect in most cases. Let me know!

Yes, I'm definitely up for preventing unrolling and I think checking the instructions would be better - we'll catch vector intrinsics that way too.

dmgreen updated this revision to Diff 214066.Aug 7 2019, 10:46 PM
dmgreen edited the summary of this revision. (Show Details)

This cleaned up nicely. It just checks return type, as I believe that will catch all interesting loops.

samparker accepted this revision.Aug 8 2019, 12:55 AM

Ah, nice. LGTM

This revision is now accepted and ready to land.Aug 8 2019, 12:55 AM
This revision was automatically updated to reflect the committed changes.