This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Add VPCanonicalIVRecipe, partly retire createInductionVariable.
ClosedPublic

Authored by fhahn on Nov 4 2021, 3:55 PM.

Details

Summary

At the moment, the primary induction variable for the vector loop is
created as part of the skeleton creation. This is tied to creating the
vector loop latch outside of VPlan. This prevents from modeling the
*whole* vector loop in VPlan, which in turn is required to model
preheader and exit blocks in VPlan as well.

This patch introduces a new recipe VPCanonicalIVRecipe to represent the
primary IV in VPlan and InductionIncrement{NUW} opcodes for
VPInstruction to model the increment.

This allows us to partly retire createInductionVariable. At the moment,
a bit of patching up is done after executing all blocks in the plan.

This is still WIP as there are a couple of test failures left to fix,
but I'd like to share it so the general direction can be reviewed.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Ayal added inline comments.Dec 25 2021, 3:21 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1102–1103

Same createStepForVF() function now implemented also in VPlan.cpp?

2439

Use ScalarIV instead of PrimInd above, similar to resetting ScalarIV below?

9214

addInduction >> addCanonicalIV? (starting from zero by default; and add its increment)

Pass a DebugLoc instead of DLInst, as it is passed to recipe constructor?

9229

In both cases PrimaryInd (CanonicalIV?) should be inserted as the first recipe of Header?

9231

Is the cast needed?

9240

is the cast needed?

llvm/lib/Transforms/Vectorize/VPlan.cpp
671

Which VPInstruction uses this opcode and should be ignored here?

736

early-break if Part != 0

742

Only user is Phi, can hook-up back to it now?

813

Should all invariant values that are generated in the pre-header be modelled as VPValues similar to BackedgeTakenCount, until the preheader is modelled as VPBB with recipes? I.e., treat VectorTripCount, [Canonical]IVStartValue and "CanonicalIVStepValue" (returned by createStepForVF()) similar to VPlan's BackedgeTakenCount, which is hooked-up to its State->TripCount IR Value here?

952

and canonical IV phis

959

Start the loop with the early-continue?

auto *PhiR = dyn_cast<VPWidenPHIRecipe>(&R);
if (!PhiR || !(isa<VPCanonicalIVRecipe>(&R) ||
               isa<VPFirstOrderRecurrencePHIRecipe>(&R) ||
               isa<VPReductionPHIRecipe>(&R)))
  continue;
963

The name was already set to "index" initially when Ind executed and generated P?

965

Fold addIncoming into that of FOR and Reductions below? For Canonical IV's, SinglePartNeeded should be true, but Val needs to get Part (i.e., 0, rather than UF-1).

970

Would it be simpler to have a recipe that takes care of generating the canonical IV increment, the compare, and the branch? Rather than have Increment[NUW] VPInstructions take care of the former only, have the compare generated here at stage 3, and have the branch generated by createLatchTerminator() at the outset when building the skeleton?
I.e., have VPCanonicalIVRecipe be the first recipe in a (non-replicating) region, and say VPBranchOnCountRecipe be the last - analogous to BCT instructions and complementing VPBranchOnMaskRecipe?
VPCanonicalIVRecipe can alternatively be named VPLoopCountPhiRecipe.

1329

Store the Phi value per-part 0, rather than per-lane 0? That's how uniform (UAV) scalars are stored, and also how the induction increment is stored. Admittedly there should be a 'once/singleton' option in addition to per-lane and per-part.

llvm/lib/Transforms/Vectorize/VPlan.h
354

Document, place next to TripCount?

1614

Document

Relationship between VPCanonicalIVRecipe and VPWidenCanonicalIVRecipe? This one is the VP[Scalar]CanonicalIVRecipe?

1629

"a canonical [vector] induction"

with what?

2312

Start by calling getVectorLoopRegion()?

Associate getCanonicalIV() with a (non replicating) Region, instead of 'entire' VPlan?
Cache a pointer to VPCanonicalIVRecipe in Region?
(IV's of replicating regions are eliminated by complete unrolling.)

fhahn updated this revision to Diff 396229.Dec 26 2021, 10:58 AM
fhahn marked 29 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
586–587

I added a doc-comment.

I think it is still needed for now, as `widenIntOrFpInduction sets the incoming value for vector phis, and for that the backedge needs to be already set up.

833

I updated the code so it is not stored in ILV directly. the skeleton creation methods return it instead, so it is more local.

1102–1103

Updated to be shared.

1173–1174

I'll do that in a separate patch.

2348–2349

Yes, that should be doable once we have separate increment recipes for vector inductions,

2439

update, thanks

3094–3095

Above comment is obsolete.

removed.

Create the latch terminator later along with its compare? Seems more like "replace" latch terminator than "create" it?

Unfortunately widenIntOrFpInduction still creates and sets the incoming values for vector inductions. To set the incoming value from the backedge for a phi, the latch already needs to be connected to the header.

3720

moved to VPlan.cpp

3721–3722

nope, removed

3722–3723

sunk to createInductionResumeValues

8220

IV = BestPlan->getCanonicalIV();?

Updated to use that.

This seems to be better placed at the end of stage 1, than beginning of stage 2? Or stage -1 of VPlan::execute()

I moved it to the the end of stage 1, with a comment saying why and when we need to replace the start value.

Slightly better to first wrap IVStartValue with VPV and add it as an externalDef, and then look up IV to finally replace its first operand with VPV?

reordered!

9214

addInduction >> addCanonicalIV? (starting from zero by default; and add its increment)

renamed & added a comment

Pass a DebugLoc instead of DLInst, as it is passed to recipe constructor?

done!

9229

Yes, but unfortunately in the native path it looks like there's a pre-header block already. Perhaps we should make sure the native path also wraps the vector loop in a region.

9231

removed

9240

No, also dropped the variable.

llvm/lib/Transforms/Vectorize/VPlan.cpp
671

looks like a leftover from an earlier iteration, removed

742

I'm not sure, modifying an instruction created by a different recipe here seems like a bit of a stretch.

813

I am not sure if it is worth it at the moment, as long as they are only used by a single recipe. Specifically, the start value is mostly a constant (0), the step is currently only used by the induction increment and the vector trip count is currently only used in post-processing. VectorTripCount is moved to a dedicated VPValue in the patch that introduces recipes for the exit condition. The step value could be moved out as follow up. WDYT?

952

updated the comment

963

Yes, it can be removed.

965

Updated the code as suggested, but a few more knobs are needed.

1329

Done!

fhahn added inline comments.Dec 26 2021, 11:39 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9229

I looked into this and wrapping the loops in the native path into regions is a bit more tricky, because of the explicit edge from latch to header; this means the header cannot simply be used as entry block to the loop region. Once we have branch-exit instructions, we should be able to address this.

llvm/lib/Transforms/Vectorize/VPlan.h
1629

reworded the comment and dropped the with part.

Ayal added inline comments.Dec 26 2021, 2:14 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3094–3095

Create the latch terminator later along with its compare? Seems more like "replace" latch terminator than "create" it?

Unfortunately widenIntOrFpInduction still creates and sets the incoming values for vector inductions. To set the incoming value from the backedge for a phi, the latch already needs to be connected to the header.

ok, and D113183 is where widenIntOrFpInduction will stop doing that, and join other header phi's which hook-up to their backedge values at the end of VPlan::execute.

9221

PrimaryInd - let's settle for CanonicalIV throughout?

9229

Sorry, didn't mean to modify the native path, just wondering if something like this would work:

if (IsVPlanNative)
  Header = Header->getSingleSuccessor();
CanonicalIV->insertBefore(&*cast<VPBasicBlock>(Header)->begin());
llvm/lib/Transforms/Vectorize/VPlan.cpp
961

Have VPCanonicalIVRecipe also inherit from VPWidenPHIRecipe, possibly renaming the latter VPHeaderPHIRecipe, so that it too could benefit from getBackedgeValue()?

978

Introducing the ICmpEQ feeding TermBr should ideally be done by an appropriate (BCT) recipe in the normal course of VPlan::execute() stage 2 above, rather than during this backedge-rewiring stage 3.

987

This hopefully could fold into

// Is a single part fed across the backedge, or all UF parts.
IsSinglePart = isa<VPFirstOrderRecurrencePHIRecipe>(&R) ||
               isa<VPCanonicalIVRecipe>(&R) ||
               cast<VPReductionPHIRecipe>(&R)->isOrdered();
SinglePart = isa<VPCanonicalIVRecipe>(&R) ? 0 : State->UF-1;
NumPartsForNewPhi = IsSinglePart ? 1 : State->UF;
BackedgeValue = PhiR->getBackedgeValue();

Note that if a single part is fed across the backedge, it must be the last part UF-1 for ordered reductions and FOR, but only the first part 0 is set for the Canonical IV backedge value (it could also set part UF-1...).

992

Storing per-part 0 rather than per-lane 0 and common base class should lead to a common State->get(PhiR, Part) for all, so the loop becomes

for (unsigned Part = 0; Part < NumPartsForNewPhi; ++Part) {
  Value *Phi = State->get(PhiR, Part);
  Value *Val = State->get(BackedgeValue, IsSinglePart ? SinglePart : Part);
  cast<PHINode>(Phi)->addIncoming(Val, VectorLatchBB);
}
1329

State.set(this, EntryPart, Part);
and elsewhere?

llvm/lib/Transforms/Vectorize/VPlan.h
1633

"vector" still needs to be dropped too.

fhahn updated this revision to Diff 396289.Dec 27 2021, 4:13 AM
fhahn marked 7 inline comments as done.

Address latest comments, thanks!

fhahn marked an inline comment as done.Dec 27 2021, 4:14 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9221

Thanks, I missed this instance. Updated!

9229

Ah I see, thanks! Updated!

llvm/lib/Transforms/Vectorize/VPlan.cpp
961

I think just renaming VPWidenPHIRecipe may not be the best way forward, as the VPCanonicalIVPRecipe will stay scalar. I put up D116304 that moves the common code to a new abstract base class VPHeaderPHIRecipe.

I also renamed the recipe to VPCanonicalIVPHIRecipe, so it fits with the other phi recipes.

978

I added a TODO and this should be addressed soon in a follow-up patch. Or should those changes also be pulled in here?

987

Updated, thanks!

992

Storing a scalar value in the 'vector' part of State seems not ideal, but it does indeed simplify things a lot here. Updated

1329

Updated

Ayal added inline comments.Dec 27 2021, 10:15 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1102–1103

(More formally, "getOrCreateStepForVF()", referring to scalable or non-scalable. Indep. of this patch.)

3088

Initialize B to Latch->getTerminator(), instead of setting it there below after initializing it to first insertion point of Header?

4695

State.get(IVR, 0), per-part 0?

8208

"... changed [from zero] to ..."

9216

"addCanonicalIV" >> "addCanonicalIVRecipe"?

9234

Can do TopRegion->getExitBasicBlock() directly, similar to TopRegion->getEntryBasicBlock() above.

llvm/lib/Transforms/Vectorize/VPlan.cpp
813

I am not sure if it is worth it at the moment, as long as they are only used by a single recipe. Specifically, the start value is mostly a constant (0), the step is currently only used by the induction increment and the vector trip count is currently only used in post-processing. VectorTripCount is moved to a dedicated VPValue in the patch that introduces recipes for the exit condition. The step value could be moved out as follow up. WDYT?

Perhaps instead of passing everything via one VPTransformState into VPlan::execute(), it would be better to separate and first have VPlan initialize all its live-in/pre-header VPValues, and then dedicate VPTransformState to hold only general context required during code-gen?

978

I added a TODO and this should be addressed soon in a follow-up patch. Or should those changes also be pulled in here?

As you prefer. A BranchOnCount recipe could potentially take care of the Increment[NUW]-feeding-the-ICmpEQ-feeding-the-branch better, while defining the single VPValue to feed back the CanonicalIVPHI. It could be introduced here instead of Increment[NUW] et al., or replace them in the TODO follow-up patch.

992

That is how uniform (/UAV) scalar values are stored in general, namely, in the 'vector' part of State.
The scalar canonical IV is conceptually UAV, only its lane 0 is used.

1324

getOperand(0) >> getStartValue()?

llvm/lib/Transforms/Vectorize/VPlan.h
1114

Keep above in Lex order?

2312

Can add to VPlanVerifier a check that first recipe only is VPCanonicalIVPHIRecipe?

fhahn updated this revision to Diff 396332.Dec 27 2021, 1:11 PM
fhahn marked 13 inline comments as done.

Address latest round of comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3088

Updated, thanks!

4695

Adjusted!

8208

Thanks, I added the extra part.

9234

Thanks, adjusted!

llvm/lib/Transforms/Vectorize/VPlan.cpp
813

That's a good point! I put up D116320 and then added a VPValue for the VectorTripCount in this patch

992

One issue with that representation is that it is not possible to get a vector with the scalar splatted I think. Not an issue at the moment here though.

1324

updated, thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
1114

moved WidenPHI to the back.

2312

Sounds good, added!

fhahn updated this revision to Diff 396405.Dec 28 2021, 9:13 AM

Update & rebase after landing D116320 & D116304, move CanonicalIVStartValue handling to prepareToExecute.

Ayal added inline comments.Dec 29 2021, 12:50 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3086

If we are sure no latch exists yet as we're just creating this loop, update comment and assert(!L->getLoopLatch())?

3717–3723

Perhaps place the call to createLatchTerminator(Lp) above the setting of OldInduction, as the latter is not used by the former, whereas createInductionResumeValues() below does use OldInduction. Perhaps OldInduction can be eliminated altogether, following [New]Induction?

4694

Plan is used only here, right? So can fold into
auto *IVR = PhiR->getParent()->getPlan()->getCanonicalIV();

9216

Ahh, on second thought, this method adds two recipes (one being a VPInstruction, but that may change..), one placed first in the header and the other last in the latch. How about "addCanonicalIVRecipes()"?

9730

State.get per-part 0?

llvm/lib/Transforms/Vectorize/VPlan.cpp
745

If Step is fixed to be VF*UF rather than support arbitrary inductions, a more accurate name would be CanonicalIVIncrement[NUW]?

Potentially related to https://reviews.llvm.org/D116123#inline-1111668

978

If the ICmpEQ is to be generated here/now connected to InductionIncrement and branch, suggest to do so separately rather than inside a loop taking care of rewiring header phis. The InductionIncrement can be located directly at the end of the Exit, analogous to getCanonicalIV(), conceptually as part of a terminating inc-cmp-br idiom.

1342

State.get per-part 0?

llvm/lib/Transforms/Vectorize/VPlan.h
2163

Hold VPValue VectorTripCount; as an instance rather than a pointer?

2219

Can return a reference, is never null.

2320

can drop phis().

fhahn updated this revision to Diff 396615.Dec 29 2021, 11:05 PM
fhahn marked 12 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3086

I removed the variable, added the assert and also renamed the function

3717–3723

Removed.

4694

updated, thanks

9216

Renamed and extended the comment to explicitly mention the increment VPInstruction.

9730

Updated, thanks!

llvm/lib/Transforms/Vectorize/VPlan.cpp
745

Updated to the more descriptive CanonicalIVIncrement

978

Moved outside the loop.

llvm/lib/Transforms/Vectorize/VPlan.h
2163

That makes things a bit simpler, updated!

2219

Adjusted!

2320

Removed!

Ayal added inline comments.Dec 30 2021, 1:10 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
587

More accurately, something like:
/// Introduce a conditional branch (on true, condition to be set later) at the end of the header=latch connecting it to itself (across the backedge) and to the exit block of \p L.

9216

addCanonicalIVPHIRecipes >> addCanonicalIVRecipes

... to the header block.

The phi is added to the header block, the increment is added to the latch block.

llvm/lib/Transforms/Vectorize/VPlan.cpp
793

How about printing something like " VF*UF + " instead?

961

if (isa_and_nonnull<VPWidenPHIRecipe>(PhiR)) ?

Use PhiR instead of &R below?

982

call getCanonicalIV(), or call getCanonicalIVIncrement() which should take care of looking up the backedge value of the canonical IV - or retrieving the last recipe in the latch directly?

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
178 ↗(On Diff #396615)

Verify that the last recipe in Exit is a CanonicalIVIncrement VPInstruction?

fhahn updated this revision to Diff 396772.Dec 31 2021, 4:02 AM
fhahn marked 6 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
587

Adjusted wording, thanks!

9216

Adjusted wording ,thanks!

llvm/lib/Transforms/Vectorize/VPlan.cpp
793

That is much better, updated!

961

There are some phi-like recipes that do not inherit from VPHeaderPHIRecipe, like VPWidenIntOrFpInductionRecipe. I think it would be better to adjust this separately, once they *only* model the phi.

I updated the code the explicitly skip unwanted recipes and then use cast<VPHeaderPHIRecipe>.

Use PhiR instead of &R below?

Done!

982

updated to use getCanonicalIV.

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
178 ↗(On Diff #396615)

I am not sure this is something we should enforce in the verifier, as it is easy to retrieve from the canonical IV.

Ayal added inline comments.Jan 1 2022, 1:40 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9217

addCanonicalIVPHIRecipes >> addCanonicalIVRecipes (one is a PHI, the other is not)

llvm/lib/Transforms/Vectorize/VPlan.cpp
736

Ah, now it looks more logical to first handle Part 0; and possibly fuse the last State.set & break.

961
if (isa<VPWidenIntOrFpInductionRecipe>(&R) || isa<VPWidenPHIRecipe>(&R) ||
   isa<VPWidenCanonicalIVRecipe>(&R))
  continue;

Reasoning behind this early-exit:
createVectorIntOrFpInductionPHI()/widenPHIInstruction() take care of generating LastInduction/InductionGEP and rewiring them back to the phi during VPWidenIntOrFpInductionRecipe/VPWidenPHIRecipe::execute, respectively. Once these values are recipe'd, they too c/should be rewired here instead (TODO).
VPWidenCanonicalIVRecipe depends on the phi of VPCanonicalIVRecipe (which is rewired below) rather than having its own phi. It represents an original IR phi that turns into a non-IR instruction when vectorized, similar to VPBlendRecipe, which should appear right after "real" phi recipes. Should they be placed outside VPBB->phis()? Should both be supported, e.g., VPBB->real-phis() and VPBB->aliased-phis()?

965

"// For first-order recurrences [, canonical IV] and in-order reduction phis ..."

"Otherwise" - for non ordered reductions,

992

It should be possible to obtain any vector, stored per-part; any uniform scalar, stored per-part; and any non-uniform scalar, stored per-lane.

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
178 ↗(On Diff #396615)

That's fine. Conceptually, the phi of the canonical IV can be placed anywhere among the header phis, although it's good to canonicalize its position to be first. OTOH, the inc-cmp-br of the canonical IV must be placed at the end of the latch, so it would be more robust to rely on it and its user(s) instead.

fhahn updated this revision to Diff 396852.Jan 1 2022, 4:18 AM
fhahn marked 3 inline comments as done.

Address latest comments, thanks!

fhahn marked an inline comment as done.Jan 1 2022, 4:20 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9217

updated, thanks

llvm/lib/Transforms/Vectorize/VPlan.cpp
961

Reasoning behind this early-exit:
createVectorIntOrFpInductionPHI()/widenPHIInstruction() take care of generating LastInduction/InductionGEP and rewiring them back to the phi during VPWidenIntOrFpInductionRecipe/VPWidenPHIRecipe::execute, respectively. Once these values are recipe'd, they too c/should be rewired here instead (TODO).

Exactly, I added. a comment.

VPWidenCanonicalIVRecipe depends on the phi of VPCanonicalIVRecipe (which is rewired below) rather than having its own phi. It represents an original IR phi that turns into a non-IR instruction when vectorized, similar to VPBlendRecipe, which should appear right after "real" phi recipes. Should they be placed outside VPBB->phis()? Should both be supported, e.g., VPBB->real-phis() and VPBB->aliased-phis()?

VPWidenCanonicalIVRecipe should probably be placed outside the phis section, as it doesn't expand to phi instructions. I put up D116473. I don't think there's a strong need for more involved accessors like real-phis and aliased-phis.

965

Updated, thanks!

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
178 ↗(On Diff #396615)

OTOH, the inc-cmp-br of the canonical IV must be placed at the end of the latch, so it would be more robust to rely on it and its user(s) instead.

Sounds good, we can adjust it once this is modeled with recipes.

fhahn marked an inline comment as done.Jan 1 2022, 12:43 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlan.cpp
978

I put up D116479 to add a cmp-and-branch VPInstruction opcode as a follow-up.

fhahn updated this revision to Diff 396924.Jan 2 2022, 4:53 AM

Rebase after landing b1a333f0feb8.

Ayal added inline comments.Jan 2 2022, 9:34 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2436

Slightly more consistent to check ScalarIV->getType() != IV->getType(), given the cast from one to the other below.

May be good to assign auto NeededType = IV->getType() ?

9222

"CanonicalIV" >> "CanonicalIVPhi", to complement "CanonicalIVIncrement".

9732

getVPValue(0) >> getVPSingleValue()?

llvm/lib/Transforms/Vectorize/VPlan.cpp
965

VectorTripCount is always used by the compare of branch-on-count, and by it only. So its VPValue is always generated but will have a user only once the compare (possibly w/ its increment and branch) is also represented as a recipe/VPInstruction. For now can assert it has no users; code introduced otherwise is dead.

fhahn updated this revision to Diff 396937.Jan 2 2022, 10:46 AM

Address latest comments, thanks!

fhahn marked 4 inline comments as done.Jan 2 2022, 10:49 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2436

I added the new variable, thanks!

9222

Adjusted!

9732

There's no need to use a getVP*Value accessor, this can passed directly. Updated!

llvm/lib/Transforms/Vectorize/VPlan.cpp
965

Good point, I replaced the printing with assert, will move the printing to the patch that uses it explicitly.

Ayal accepted this revision.Jan 4 2022, 2:23 PM

This looks fine, thanks!!
Clarifying a previous suggestion.

llvm/lib/Transforms/Vectorize/VPlan.cpp
736

Ah, now it looks more logical to first handle Part 0; and possibly fuse the last State.set & break.

Above thought was something like:

case VPInstruction::CanonicalIVIncrement:
case VPInstruction::CanonicalIVIncrementNUW: {
  Value *Next = nullptr;
  if (Part == 0) {
    bool IsNUW = getOpcode() == VPInstruction::CanonicalIVIncrementNUW;
    auto *Phi = State.get(getOperand(0), 0);
    // The loop step is equal to the vectorization factor (num of SIMD elements)
    // times the unroll factor (num of SIMD instructions).
    Value *Step = createStepForVF(Builder, Phi->getType(), State.VF, State.UF);
    Next = Builder.CreateAdd(Phi, Step, "index.next", IsNUW, false);
  } else
    Next = State.get(this, 0);

  State.set(this, Next, Part);
  break;
}
This revision is now accepted and ready to land.Jan 4 2022, 2:23 PM
fhahn updated this revision to Diff 397488.Jan 5 2022, 2:36 AM
fhahn marked 4 inline comments as done.

Thanks for all the comments! Rebased and applied suggestion. I will land this soon.

This revision was landed with ongoing or failed builds.Jan 5 2022, 2:46 AM
This revision was automatically updated to reflect the committed changes.