Page MenuHomePhabricator

[LV] Generate RT checks up-front and remove them if required. (WIP)
Needs RevisionPublic

Authored by fhahn on Mar 11 2020, 4:03 AM.

Details

Summary

This patch updates LV to generate the runtime checks just after cost
modeling, to allow a more precise estimate of the actual cost of the
checks. This information will be used in future patches to generate
larger runtime checks in cases where the checks only make up a small
fraction of the expected scalar loop execution time.

The runtime checks are created up-front in a temporary block to allow better
estimating the cost and un-linked from the existing IR. After deciding to
vectorize, the checks are moved backed. If deciding not to vectorize, the
temporary block is completely removed.

This patch is similar in spirit to D71053, but explores a different
direction: instead of delaying the decision on whether to vectorize in
the presence of runtime checks it instead optimistically creates the
runtime checks early and discards them later if decided to not
vectorize. This has the advantage that the cost-modeling decisions
can be kept together and can be done up-front and thus preserving the
general code structure. I think delaying (part) of the decision to
vectorize would also make the VPlan migration a bit harder.

One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.

Diff Detail

Event Timeline

fhahn created this revision.Mar 11 2020, 4:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2020, 4:03 AM

Thank you for looking into this!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1559

'moved backed'?

Reverse-ping. Can i somehow help get this going? :)

One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.

There's prior art for that, loop idiom pass has ExpandedValuesCleaner e.g.

Reverse-ping. Can i somehow help get this going? :)

I actually started rebasing/updating over the weekend. There are a few more loose ends to tie up, but I should be able to share an update in the next few days.

One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.

There's prior art for that, loop idiom pass has ExpandedValuesCleaner e.g.

Yes, part of the update is to fix SCEVExpander to actually collect all instructions it generates automatically, so they can be cleaned up later. It already does so for some instructions, but is missing a ton of cases.

Reverse-ping. Can i somehow help get this going? :)

I actually started rebasing/updating over the weekend. There are a few more loose ends to tie up, but I should be able to share an update in the next few days.

One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.

There's prior art for that, loop idiom pass has ExpandedValuesCleaner e.g.

Yes, part of the update is to fix SCEVExpander to actually collect all instructions it generates automatically, so they can be cleaned up later. It already does so for some instructions, but is missing a ton of cases.

Without looking at how it would look, i would imagine that can be cleaned by supplying a callback to the builder, that would automatically put each new instruction into some vector.
Let me know if that is the right direction and if it would be helpful for me to look into that (unless you're halfway there of course)

ebrevnov added a comment.EditedJul 21 2020, 11:32 PM

I'm glad to see this discussion alive again. Thanks Roman!

I think the topic deserves a dedicated discussion on dev list in a wider audience because it is not trivial one.

In fact I have an experience of doing similar thing in SLP and it was mainly negative. Things look simple in the beginning and we went ahead and implemented this approach. The problem was that over the time it became hard to manage pre-generated instructions correctly and we started having nasty bugs. Everything is simple until you don't have sharing. If you have references from multiple places (both from inside IR and internal data structures) things become complex and error prone. I would not like to go this direction again until we 1000% sure things will remain simple.

One more thing is concerning me in this particular implementation. I think the way we are trying to "hide" Checks.TmpBlock from all global analysis and data structures is fragile and can be source of bugs in future. In my opinion it would be much easier and cleaner to make pre-generated code runtime dead (by changing guarding condition to be unconditionally false) and let dedicated cleanup passes make their job.

fhahn added a comment.Jul 23 2020, 7:10 AM

I think the topic deserves a dedicated discussion on dev list in a wider audience because it is not trivial one.

Agreed, it would be good to make sure there are no major objections on the general approach. We already have a few cases of 'prior art' in the code base, but this patch might take things a little further than in some other places.

In fact I have an experience of doing similar thing in SLP and it was mainly negative. Things look simple in the beginning and we went ahead and implemented this approach. The problem was that over the time it became hard to manage pre-generated instructions correctly and we started having nasty bugs. Everything is simple until you don't have sharing. If you have references from multiple places (both from inside IR and internal data structures) things become complex and error prone. I would not like to go this direction again until we 1000% sure things will remain simple.

From what I've seen so far, the major concerns is around SCEV expansion. I started some patches solidifying the way we clean up after SCEV expansion (D84327), which is already done in LoopIdiomRecognize. I think we need to properly address this first, before continuing here.

I am not too concerned about interactions with other LV internal data structures, as the runtime checks are pretty independent of vector code generation/legality, as they are outside of the loop and nothing in the vector loop should be aware of the checks.

One more thing is concerning me in this particular implementation. I think the way we are trying to "hide" Checks.TmpBlock from all global analysis and data structures is fragile and can be source of bugs in future. In my opinion it would be much easier and cleaner to make pre-generated code runtime dead (by changing guarding condition to be unconditionally false) and let dedicated cleanup passes make their job.

That's an interesting suggestion and would certainly simplify things. My main concern with leaving cleanup to another pass is that it would mean that LV would always modify functions with loops that need runtime checks, even if the checks are not used. Not sure how big of an issue that would be in terms of compile-time caused by the additional invalidations.

It would be helpful if you could add prerequisite patches are parents of this patch.
What else needs to be done before this?

I approve of the general direction of this patch series :)
Some high-level notes. I'm not that familiar with the code in question,
so it will be best for someone else to review, too.

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
2649

Should this go into a separate patch? Is this a preexisting problem?

2654–2659

Are these two assertions not identical?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
799

s/they/vectorization/

1623

I'm guessing it can't be i1 true ?

2883–2886
2936–2938
lebedev.ri requested changes to this revision.Aug 12 2020, 2:33 PM

(marking as reviewed)

This revision now requires changes to proceed.Aug 12 2020, 2:33 PM
fhahn added a comment.Oct 7 2020, 11:00 AM

Just in the process of rebasing this again now that most other parts landed and discovered I accidentally made our job harder by teaching SCEVExpander to preserve LCSSA. Needs a bit of more work now.