Page MenuHomePhabricator

[NFC][LoopVectorize] Remove setBestPlan in favour of getBestPlanFor
ClosedPublic

Authored by david-arm on Oct 5 2021, 1:57 AM.

Details

Summary

I have removed LoopVectorizationPlanner::setBestPlan, since this
function is quite aggressive because it deletes all other plans
except the one containing the <VF,UF> pair required. The code is
currently written to assume that all <VF,UF> pairs will live in the
same vplan. This is overly restrictive, since scalable VFs live in
different plans to fixed-width VFS. When we add support for
vectorising epilogue loops when the main loop uses scalable vectors
then we will the vplan for the main loop will be different to the
epilogue.

Instead I have added a new function called

LoopVectorizationPlanner::getBestPlanFor

that returns the best vplan for the <VF,UF> pair requested and leaves
all the vplans untouched. We then pass this best vplan to

LoopVectorizationPlanner::executePlan

This patch was spun off due to comments in D109432.

Diff Detail

Event Timeline

david-arm created this revision.Oct 5 2021, 1:57 AM
david-arm requested review of this revision.Oct 5 2021, 1:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2021, 1:57 AM

Seems like a reasonable idea to me, but just pointing out that the lifetime of unused plans will be extended with this patch. Could we delete unused plans, eg by calling something like LVP.deleteVPlansExcept(BestPlan2) right before LVP.executePlan(...,) on line 10435?

bmahjour added inline comments.Oct 5 2021, 10:00 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
293

could you make this function const?

Matt added a subscriber: Matt.Oct 5 2021, 12:40 PM

Seems like a reasonable idea to me, but just pointing out that the lifetime of unused plans will be extended with this patch. Could we delete unused plans, eg by calling something like LVP.deleteVPlansExcept(BestPlan2) right before LVP.executePlan(...,) on line 10435?

Hi @bmahjour, I think that's basically what I was doing in D109432, where I backed up a copy of the plans before deleting them. @sdesmalen suggested I could take a different approach with this patch instead. The problem is that once the plans are deleted I can't get them back again, and the scalable VFs and fixed-width VFs live in different plans. This meant that if I deleted the plans when emitting the main vector loop, I then had to find a way of reinstating the plans so I could then generate the epilogue vector loop.

david-arm updated this revision to Diff 377519.Oct 6 2021, 6:05 AM
  • Added const keywords.
david-arm marked an inline comment as done.Oct 6 2021, 6:05 AM

Seems like a reasonable idea to me, but just pointing out that the lifetime of unused plans will be extended with this patch. Could we delete unused plans, eg by calling something like LVP.deleteVPlansExcept(BestPlan2) right before LVP.executePlan(...,) on line 10435?

Hi @bmahjour, I think that's basically what I was doing in D109432, where I backed up a copy of the plans before deleting them. @sdesmalen suggested I could take a different approach with this patch instead. The problem is that once the plans are deleted I can't get them back again, and the scalable VFs and fixed-width VFs live in different plans. This meant that if I deleted the plans when emitting the main vector loop, I then had to find a way of reinstating the plans so I could then generate the epilogue vector loop.

Yes, but say the plan for the main loop (scalable) is P1 and the plan for the epilogue (fixed-width) is P2. Assuming P1 != P2, once P1 is executed, we don't need it anymore, so we can delete it right before executing P2. Similarly for the path where epilogue is not vectorized, we could delete all other plans before executing the "best plan". Am I misunderstanding something?

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
293

sorry I meant making the member function constant, but the return type would be good to be const as well (ie const VPlanPtr &getBestPlanFor(ElementCount VF, unsigned UF) const;)

david-arm updated this revision to Diff 377769.Oct 7 2021, 1:58 AM
  • Added 'const' keyword to function, which required changing how we set the BestVF and BestUF.

Hi @bmahjour, I'll have a look into ways of removing some of the unneeded vplans too!

Hi @bmahjour, I'll have a look into ways of removing some of the unneeded vplans too!

Thanks @david-arm!

david-arm updated this revision to Diff 378178.Oct 8 2021, 5:18 AM
  • Ensured we remove all unneeded vplans separately.
david-arm marked an inline comment as done.Oct 8 2021, 5:18 AM
bmahjour added inline comments.Oct 8 2021, 8:19 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
315

Instead of passing the list of VFs, why not pass in a VPlanPtr objects that's returned by getBestPlanFor? That would simplify the interface (since you don't have to pass in a list), simplifies the logic (since it makes it clear that we only delete plans that are not considered "best"), and avoids iterating through the list of VFs and VPlans a second time.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8200

I'd suggest moving this debug line to the beginning of executePlan and keep outputting the interleave factor (BestUF).

david-arm added inline comments.Oct 8 2021, 8:24 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
315

I could do that, but would have to change this immediately in D109432 to accept a list of vplans, since in that patch different VFs will be in different plans. I would like to keep this as a list if possible. If you still think it's better to pass in a list of VPlanPtrs here instead of a list of VFs I can see if that makes the code simpler?

bmahjour added inline comments.Oct 8 2021, 8:40 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
315

Ok, fair enough....I guess you could make it a list in this patch and pass a list of size one (just like you're doing now), but I still think that the type of the list element should be VPlanPtr.

david-arm added inline comments.Oct 11 2021, 1:02 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
315

Hi @bmahjour, there is just one problem with your approach - the VPlanPtr is defined as std::unique_ptr<VPlan>, which means that I can only return a reference from getBestPlanFor. In order to pass a list of VPlanPtrs to removePlansExcept I have to first call getBestPlanFor to get VPlanPtr references. However, those references become invalid after calling removePlansExcept! That means that in effect I have to perform this call sequence:

const VPlanPtr &PB1 = getBestPlanFor();
const VPlanPtr &PB2 = getBestPlanFor();
removePlansExcept({PB1, PB2});
const VPlanPtr &P1 = getBestPlanFor();
const VPlanPtr &P2 = getBestPlanFor();

This was the actual reason I passed in a list of VFs here instead. In order to pass in a list of VPlanPtrs I can think of three choices:

  1. We change VPlanPtr to be simply "typedef VPlan *VPlanPtr;" and stop making it unique. This allows the VPlanPtr to be copied. Then I change getBestPlanFor to return const VPlanPtr instead of a reference, which allows me to pass a list of VPlanPtrs to removePlansExcept.
  2. We don't remove the plans at all.
  3. We stick with passing a list of VFs to removePlansExcept.

I'm also happy to listen to alternative suggestions! However, at the moment I can't really move forward with this patch.

(Sorry for the slow reply, I was traveling last week)

My suggestion on the other patch was to make the code structurally simpler. Removing plans prematurely complicates things, because you need to carefully keep track which plans can be removed and which ones may be needed.
@bmahjour Is there a real problem with memory-usage that requires us to reduce the lifetime of these plans a little bit longer? If not, I'd like to suggest not removing the plans to keep the code simpler.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8215

nit: odd spaces?

david-arm added inline comments.Oct 12 2021, 12:31 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8215

Yeah, clang-format does that, which surprised me too. I can try to remove the spaces, but I have a feeling I will then be told to fix up formatting. :)

(Sorry for the slow reply, I was traveling last week)

My suggestion on the other patch was to make the code structurally simpler. Removing plans prematurely complicates things, because you need to carefully keep track which plans can be removed and which ones may be needed.
@bmahjour Is there a real problem with memory-usage that requires us to reduce the lifetime of these plans a little bit longer? If not, I'd like to suggest not removing the plans to keep the code simpler.

I do not know of a "real problem", but would expect loops with large bodies would benefit from some level of garbage collection. I've proposed to delete plans at specific static locations in the code when we know we don't need them, so should not need extra book keeping. I think we are very close to a solution that deletes unused plans, but if there are further complications or controversies I'm ok with dealing with the garbage collection aspect as a separate patch.

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
315

Hi @david-arm , sorry I was away last week. Yes, the references would indeed be invalidated as the container elements are moved around. However, the problem can be avoided by making getBestPlanFor and executePlan work with raw pointers. These functions simply use the passed vplan and do not alter it nor do they care about its ownership, so no need to pass them a smart pointer. Here's a patch I tried on top of yours that works fine:

Hi @bmahjour, thanks a lot for sharing a patch to change the interfaces!

To address @sdesmalen's point that removePlansExcept introduces complexity to reduce the lifetime of vplans when their short lifetime may not have a noticable impact on memory usage, I did an experiment where I built the LLVM test suite with two versions of this patch: 1) Patch A that keeps all the vplans, 2) Patch B that removes the unneeded vplans. I recorded the memory usage of each compilation using /usr/bin/time -v and extracted the maximum resident memory each time. I then took the average of all these values to determine the average maximum resident memory usage. I found practically no difference in peak memory consumption between Patch A and B.

One benefit from keeping all vplans until after code-generation is that we avoid having to structure the code in a certain way (i.e. having to decide on all relevant vplans first, before removing the unnecessary ones and doing code-gen for the selected plans) and the code will look more natural. I also wondered if there would ever be a situation in future where we might need three or more plans? I realise it's a bit of a stretch, but I could imagine more layers, i.e. a main vector loop, a vector epilogue, and a predicated vector tail.

Anyway, any thoughts you have are appreciated!

Thanks for doing the experiment and collecting data on test-suite.

I don't feel strongly about this, but given that we've already spent a fair bit of effort discussing and implementing a solution, I slightly prefer to have a way to claim unused memory so we stay closer to the current state of the vectorizer, and be a bit more future-proof. I worry that vplans may get heavier or more numerous and that the test-suite may not be representative of the larger applications compiled with LLVM. That's my last two cents. I'll leave it up to you and/or other reviewers to make the final call.

david-arm updated this revision to Diff 381972.Oct 25 2021, 6:34 AM
  • Removed the removePlansExcept function.
  • Changed getBestPlanFor to just return a raw "VPlan *" pointer.
david-arm marked 5 inline comments as done.Oct 25 2021, 6:35 AM

Hi @bmahjour, I hope you don't mind but I've gone with the simpler patch for now, which doesn't involve removing the vplans. However, if this becomes a problem in future or some buildbots flag up issues I'm happy to revisit this!

bmahjour accepted this revision.Oct 25 2021, 10:17 AM

That's fine. LGTM.

This revision is now accepted and ready to land.Oct 25 2021, 10:17 AM
fhahn added inline comments.Oct 26 2021, 12:37 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8192–8196

can we instead return a reference?

8209

This seems a bit inaccurate now. Maybe it would be better to say Executing best plan with VF, UF or something like that?

david-arm updated this revision to Diff 382245.Oct 26 2021, 3:05 AM
  • Changed getBestPlanFor to return a reference.
  • Updated the debug message in executePlan, which also required changing a test.
david-arm marked 2 inline comments as done.Oct 26 2021, 3:05 AM
sdesmalen accepted this revision.Oct 26 2021, 8:37 AM

Cheers @david-arm, LGTM!

fhahn accepted this revision.Oct 26 2021, 2:45 PM

Thanks, LGTM

This revision was landed with ongoing or failed builds.Oct 27 2021, 1:38 AM
This revision was automatically updated to reflect the committed changes.