This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Add VPCanonicalIVRecipe, partly retire createInductionVariable.
ClosedPublic

Authored by fhahn on Nov 4 2021, 3:55 PM.

Details

Summary

At the moment, the primary induction variable for the vector loop is
created as part of the skeleton creation. This is tied to creating the
vector loop latch outside of VPlan. This prevents from modeling the
*whole* vector loop in VPlan, which in turn is required to model
preheader and exit blocks in VPlan as well.

This patch introduces a new recipe VPCanonicalIVRecipe to represent the
primary IV in VPlan and InductionIncrement{NUW} opcodes for
VPInstruction to model the increment.

This allows us to partly retire createInductionVariable. At the moment,
a bit of patching up is done after executing all blocks in the plan.

This is still WIP as there are a couple of test failures left to fix,
but I'd like to share it so the general direction can be reviewed.

Diff Detail

Unit TestsFailed

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Ayal added inline comments.Dec 25 2021, 3:21 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1075–1076

Same createStepForVF() function now implemented also in VPlan.cpp?

2477

Use ScalarIV instead of PrimInd above, similar to resetting ScalarIV below?

8931

addInduction >> addCanonicalIV? (starting from zero by default; and add its increment)

Pass a DebugLoc instead of DLInst, as it is passed to recipe constructor?

8946

In both cases PrimaryInd (CanonicalIV?) should be inserted as the first recipe of Header?

8948

Is the cast needed?

8957

is the cast needed?

llvm/lib/Transforms/Vectorize/VPlan.cpp
655

Which VPInstruction uses this opcode and should be ignored here?

717

early-break if Part != 0

723

Only user is Phi, can hook-up back to it now?

814

Should all invariant values that are generated in the pre-header be modelled as VPValues similar to BackedgeTakenCount, until the preheader is modelled as VPBB with recipes? I.e., treat VectorTripCount, [Canonical]IVStartValue and "CanonicalIVStepValue" (returned by createStepForVF()) similar to VPlan's BackedgeTakenCount, which is hooked-up to its State->TripCount IR Value here?

893

and canonical IV phis

900

Start the loop with the early-continue?

auto *PhiR = dyn_cast<VPWidenPHIRecipe>(&R);
if (!PhiR || !(isa<VPCanonicalIVRecipe>(&R) ||
               isa<VPFirstOrderRecurrencePHIRecipe>(&R) ||
               isa<VPReductionPHIRecipe>(&R)))
  continue;
904

The name was already set to "index" initially when Ind executed and generated P?

906

Fold addIncoming into that of FOR and Reductions below? For Canonical IV's, SinglePartNeeded should be true, but Val needs to get Part (i.e., 0, rather than UF-1).

911

Would it be simpler to have a recipe that takes care of generating the canonical IV increment, the compare, and the branch? Rather than have Increment[NUW] VPInstructions take care of the former only, have the compare generated here at stage 3, and have the branch generated by createLatchTerminator() at the outset when building the skeleton?
I.e., have VPCanonicalIVRecipe be the first recipe in a (non-replicating) region, and say VPBranchOnCountRecipe be the last - analogous to BCT instructions and complementing VPBranchOnMaskRecipe?
VPCanonicalIVRecipe can alternatively be named VPLoopCountPhiRecipe.

1319

Store the Phi value per-part 0, rather than per-lane 0? That's how uniform (UAV) scalars are stored, and also how the induction increment is stored. Admittedly there should be a 'once/singleton' option in addition to per-lane and per-part.

llvm/lib/Transforms/Vectorize/VPlan.h
356

Document, place next to TripCount?

1609

Document

Relationship between VPCanonicalIVRecipe and VPWidenCanonicalIVRecipe? This one is the VP[Scalar]CanonicalIVRecipe?

1624

"a canonical [vector] induction"

with what?

2305

Start by calling getVectorLoopRegion()?

Associate getCanonicalIV() with a (non replicating) Region, instead of 'entire' VPlan?
Cache a pointer to VPCanonicalIVRecipe in Region?
(IV's of replicating regions are eliminated by complete unrolling.)

fhahn updated this revision to Diff 396229.Dec 26 2021, 10:58 AM
fhahn marked 29 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
577–578

I added a doc-comment.

I think it is still needed for now, as `widenIntOrFpInduction sets the incoming value for vector phis, and for that the backedge needs to be already set up.

800

I updated the code so it is not stored in ILV directly. the skeleton creation methods return it instead, so it is more local.

1075–1076

Updated to be shared.

1167

I'll do that in a separate patch.

2414–2415

Yes, that should be doable once we have separate increment recipes for vector inductions,

2477

update, thanks

3003–3004

Above comment is obsolete.

removed.

Create the latch terminator later along with its compare? Seems more like "replace" latch terminator than "create" it?

Unfortunately widenIntOrFpInduction still creates and sets the incoming values for vector inductions. To set the incoming value from the backedge for a phi, the latch already needs to be connected to the header.

3620–3621

sunk to createInductionResumeValues

3620–3621

nope, removed

3620–3621

moved to VPlan.cpp

7947

IV = BestPlan->getCanonicalIV();?

Updated to use that.

This seems to be better placed at the end of stage 1, than beginning of stage 2? Or stage -1 of VPlan::execute()

I moved it to the the end of stage 1, with a comment saying why and when we need to replace the start value.

Slightly better to first wrap IVStartValue with VPV and add it as an externalDef, and then look up IV to finally replace its first operand with VPV?

reordered!

8931

addInduction >> addCanonicalIV? (starting from zero by default; and add its increment)

renamed & added a comment

Pass a DebugLoc instead of DLInst, as it is passed to recipe constructor?

done!

8946

Yes, but unfortunately in the native path it looks like there's a pre-header block already. Perhaps we should make sure the native path also wraps the vector loop in a region.

8948

removed

8957

No, also dropped the variable.

llvm/lib/Transforms/Vectorize/VPlan.cpp
655

looks like a leftover from an earlier iteration, removed

723

I'm not sure, modifying an instruction created by a different recipe here seems like a bit of a stretch.

814

I am not sure if it is worth it at the moment, as long as they are only used by a single recipe. Specifically, the start value is mostly a constant (0), the step is currently only used by the induction increment and the vector trip count is currently only used in post-processing. VectorTripCount is moved to a dedicated VPValue in the patch that introduces recipes for the exit condition. The step value could be moved out as follow up. WDYT?

893

updated the comment

904

Yes, it can be removed.

906

Updated the code as suggested, but a few more knobs are needed.

1319

Done!

fhahn added inline comments.Dec 26 2021, 11:39 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8946

I looked into this and wrapping the loops in the native path into regions is a bit more tricky, because of the explicit edge from latch to header; this means the header cannot simply be used as entry block to the loop region. Once we have branch-exit instructions, we should be able to address this.

llvm/lib/Transforms/Vectorize/VPlan.h
1624

reworded the comment and dropped the with part.

Ayal added inline comments.Dec 26 2021, 2:14 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3003–3004

Create the latch terminator later along with its compare? Seems more like "replace" latch terminator than "create" it?

Unfortunately widenIntOrFpInduction still creates and sets the incoming values for vector inductions. To set the incoming value from the backedge for a phi, the latch already needs to be connected to the header.

ok, and D113183 is where widenIntOrFpInduction will stop doing that, and join other header phi's which hook-up to their backedge values at the end of VPlan::execute.

8938

PrimaryInd - let's settle for CanonicalIV throughout?

8946

Sorry, didn't mean to modify the native path, just wondering if something like this would work:

if (IsVPlanNative)
  Header = Header->getSingleSuccessor();
CanonicalIV->insertBefore(&*cast<VPBasicBlock>(Header)->begin());
llvm/lib/Transforms/Vectorize/VPlan.cpp
902

Have VPCanonicalIVRecipe also inherit from VPWidenPHIRecipe, possibly renaming the latter VPHeaderPHIRecipe, so that it too could benefit from getBackedgeValue()?

919

Introducing the ICmpEQ feeding TermBr should ideally be done by an appropriate (BCT) recipe in the normal course of VPlan::execute() stage 2 above, rather than during this backedge-rewiring stage 3.

928

This hopefully could fold into

// Is a single part fed across the backedge, or all UF parts.
IsSinglePart = isa<VPFirstOrderRecurrencePHIRecipe>(&R) ||
               isa<VPCanonicalIVRecipe>(&R) ||
               cast<VPReductionPHIRecipe>(&R)->isOrdered();
SinglePart = isa<VPCanonicalIVRecipe>(&R) ? 0 : State->UF-1;
NumPartsForNewPhi = IsSinglePart ? 1 : State->UF;
BackedgeValue = PhiR->getBackedgeValue();

Note that if a single part is fed across the backedge, it must be the last part UF-1 for ordered reductions and FOR, but only the first part 0 is set for the Canonical IV backedge value (it could also set part UF-1...).

933

Storing per-part 0 rather than per-lane 0 and common base class should lead to a common State->get(PhiR, Part) for all, so the loop becomes

for (unsigned Part = 0; Part < NumPartsForNewPhi; ++Part) {
  Value *Phi = State->get(PhiR, Part);
  Value *Val = State->get(BackedgeValue, IsSinglePart ? SinglePart : Part);
  cast<PHINode>(Phi)->addIncoming(Val, VectorLatchBB);
}
1319

State.set(this, EntryPart, Part);
and elsewhere?

llvm/lib/Transforms/Vectorize/VPlan.h
1628

"vector" still needs to be dropped too.

fhahn updated this revision to Diff 396289.Dec 27 2021, 4:13 AM
fhahn marked 7 inline comments as done.

Address latest comments, thanks!

fhahn marked an inline comment as done.Dec 27 2021, 4:14 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8938

Thanks, I missed this instance. Updated!

8946

Ah I see, thanks! Updated!

llvm/lib/Transforms/Vectorize/VPlan.cpp
902

I think just renaming VPWidenPHIRecipe may not be the best way forward, as the VPCanonicalIVPRecipe will stay scalar. I put up D116304 that moves the common code to a new abstract base class VPHeaderPHIRecipe.

I also renamed the recipe to VPCanonicalIVPHIRecipe, so it fits with the other phi recipes.

919

I added a TODO and this should be addressed soon in a follow-up patch. Or should those changes also be pulled in here?

928

Updated, thanks!

933

Storing a scalar value in the 'vector' part of State seems not ideal, but it does indeed simplify things a lot here. Updated

1319

Updated

Ayal added inline comments.Dec 27 2021, 10:15 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1076–1077

(More formally, "getOrCreateStepForVF()", referring to scalable or non-scalable. Indep. of this patch.)

2998

Initialize B to Latch->getTerminator(), instead of setting it there below after initializing it to first insertion point of Header?

4521

State.get(IVR, 0), per-part 0?

7926

"... changed [from zero] to ..."

8933

"addCanonicalIV" >> "addCanonicalIVRecipe"?

8951

Can do TopRegion->getExitBasicBlock() directly, similar to TopRegion->getEntryBasicBlock() above.

llvm/lib/Transforms/Vectorize/VPlan.cpp
814

I am not sure if it is worth it at the moment, as long as they are only used by a single recipe. Specifically, the start value is mostly a constant (0), the step is currently only used by the induction increment and the vector trip count is currently only used in post-processing. VectorTripCount is moved to a dedicated VPValue in the patch that introduces recipes for the exit condition. The step value could be moved out as follow up. WDYT?

Perhaps instead of passing everything via one VPTransformState into VPlan::execute(), it would be better to separate and first have VPlan initialize all its live-in/pre-header VPValues, and then dedicate VPTransformState to hold only general context required during code-gen?

919

I added a TODO and this should be addressed soon in a follow-up patch. Or should those changes also be pulled in here?

As you prefer. A BranchOnCount recipe could potentially take care of the Increment[NUW]-feeding-the-ICmpEQ-feeding-the-branch better, while defining the single VPValue to feed back the CanonicalIVPHI. It could be introduced here instead of Increment[NUW] et al., or replace them in the TODO follow-up patch.

933

That is how uniform (/UAV) scalar values are stored in general, namely, in the 'vector' part of State.
The scalar canonical IV is conceptually UAV, only its lane 0 is used.

1314

getOperand(0) >> getStartValue()?

llvm/lib/Transforms/Vectorize/VPlan.h
1106

Keep above in Lex order?

2305

Can add to VPlanVerifier a check that first recipe only is VPCanonicalIVPHIRecipe?

fhahn updated this revision to Diff 396332.Dec 27 2021, 1:11 PM
fhahn marked 13 inline comments as done.

Address latest round of comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2998

Updated, thanks!

4521

Adjusted!

7926

Thanks, I added the extra part.

8951

Thanks, adjusted!

llvm/lib/Transforms/Vectorize/VPlan.cpp
814

That's a good point! I put up D116320 and then added a VPValue for the VectorTripCount in this patch

933

One issue with that representation is that it is not possible to get a vector with the scalar splatted I think. Not an issue at the moment here though.

1314

updated, thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
1106

moved WidenPHI to the back.

2305

Sounds good, added!

fhahn updated this revision to Diff 396405.Dec 28 2021, 9:13 AM

Update & rebase after landing D116320 & D116304, move CanonicalIVStartValue handling to prepareToExecute.

Ayal added inline comments.Dec 29 2021, 12:50 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2996

If we are sure no latch exists yet as we're just creating this loop, update comment and assert(!L->getLoopLatch())?

3620–3621

Perhaps place the call to createLatchTerminator(Lp) above the setting of OldInduction, as the latter is not used by the former, whereas createInductionResumeValues() below does use OldInduction. Perhaps OldInduction can be eliminated altogether, following [New]Induction?

4520

Plan is used only here, right? So can fold into
auto *IVR = PhiR->getParent()->getPlan()->getCanonicalIV();

8933

Ahh, on second thought, this method adds two recipes (one being a VPInstruction, but that may change..), one placed first in the header and the other last in the latch. How about "addCanonicalIVRecipes()"?

9686

State.get per-part 0?

llvm/lib/Transforms/Vectorize/VPlan.cpp
726

If Step is fixed to be VF*UF rather than support arbitrary inductions, a more accurate name would be CanonicalIVIncrement[NUW]?

Potentially related to https://reviews.llvm.org/D116123#inline-1111668

919

If the ICmpEQ is to be generated here/now connected to InductionIncrement and branch, suggest to do so separately rather than inside a loop taking care of rewiring header phis. The InductionIncrement can be located directly at the end of the Exit, analogous to getCanonicalIV(), conceptually as part of a terminating inc-cmp-br idiom.

1334

State.get per-part 0?

llvm/lib/Transforms/Vectorize/VPlan.h
2154

Hold VPValue VectorTripCount; as an instance rather than a pointer?

2210

Can return a reference, is never null.

2313

can drop phis().

fhahn updated this revision to Diff 396615.Dec 29 2021, 11:05 PM
fhahn marked 12 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2996

I removed the variable, added the assert and also renamed the function

3620–3621

Removed.

4520

updated, thanks

8933

Renamed and extended the comment to explicitly mention the increment VPInstruction.

9686

Updated, thanks!

llvm/lib/Transforms/Vectorize/VPlan.cpp
726

Updated to the more descriptive CanonicalIVIncrement

919

Moved outside the loop.

llvm/lib/Transforms/Vectorize/VPlan.h
2154

That makes things a bit simpler, updated!

2210

Adjusted!

2313

Removed!

Ayal added inline comments.Dec 30 2021, 1:10 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
578–579

More accurately, something like:
/// Introduce a conditional branch (on true, condition to be set later) at the end of the header=latch connecting it to itself (across the backedge) and to the exit block of \p L.

8933

addCanonicalIVPHIRecipes >> addCanonicalIVRecipes

... to the header block.

The phi is added to the header block, the increment is added to the latch block.

llvm/lib/Transforms/Vectorize/VPlan.cpp
777

How about printing something like " VF*UF + " instead?

902

if (isa_and_nonnull<VPWidenPHIRecipe>(PhiR)) ?

Use PhiR instead of &R below?

923

call getCanonicalIV(), or call getCanonicalIVIncrement() which should take care of looking up the backedge value of the canonical IV - or retrieving the last recipe in the latch directly?

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
178 ↗(On Diff #396615)

Verify that the last recipe in Exit is a CanonicalIVIncrement VPInstruction?

fhahn updated this revision to Diff 396772.Dec 31 2021, 4:02 AM
fhahn marked 6 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
578–579

Adjusted wording, thanks!

8933

Adjusted wording ,thanks!

llvm/lib/Transforms/Vectorize/VPlan.cpp
777

That is much better, updated!

902

There are some phi-like recipes that do not inherit from VPHeaderPHIRecipe, like VPWidenIntOrFpInductionRecipe. I think it would be better to adjust this separately, once they *only* model the phi.

I updated the code the explicitly skip unwanted recipes and then use cast<VPHeaderPHIRecipe>.

Use PhiR instead of &R below?

Done!

923

updated to use getCanonicalIV.

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
178 ↗(On Diff #396615)

I am not sure this is something we should enforce in the verifier, as it is easy to retrieve from the canonical IV.

Ayal added inline comments.Jan 1 2022, 1:40 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8934

addCanonicalIVPHIRecipes >> addCanonicalIVRecipes (one is a PHI, the other is not)

llvm/lib/Transforms/Vectorize/VPlan.cpp
717

Ah, now it looks more logical to first handle Part 0; and possibly fuse the last State.set & break.

902
if (isa<VPWidenIntOrFpInductionRecipe>(&R) || isa<VPWidenPHIRecipe>(&R) ||
   isa<VPWidenCanonicalIVRecipe>(&R))
  continue;

Reasoning behind this early-exit:
createVectorIntOrFpInductionPHI()/widenPHIInstruction() take care of generating LastInduction/InductionGEP and rewiring them back to the phi during VPWidenIntOrFpInductionRecipe/VPWidenPHIRecipe::execute, respectively. Once these values are recipe'd, they too c/should be rewired here instead (TODO).
VPWidenCanonicalIVRecipe depends on the phi of VPCanonicalIVRecipe (which is rewired below) rather than having its own phi. It represents an original IR phi that turns into a non-IR instruction when vectorized, similar to VPBlendRecipe, which should appear right after "real" phi recipes. Should they be placed outside VPBB->phis()? Should both be supported, e.g., VPBB->real-phis() and VPBB->aliased-phis()?

906

"// For first-order recurrences [, canonical IV] and in-order reduction phis ..."

"Otherwise" - for non ordered reductions,

933

It should be possible to obtain any vector, stored per-part; any uniform scalar, stored per-part; and any non-uniform scalar, stored per-lane.

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
178 ↗(On Diff #396615)

That's fine. Conceptually, the phi of the canonical IV can be placed anywhere among the header phis, although it's good to canonicalize its position to be first. OTOH, the inc-cmp-br of the canonical IV must be placed at the end of the latch, so it would be more robust to rely on it and its user(s) instead.

fhahn updated this revision to Diff 396852.Jan 1 2022, 4:18 AM
fhahn marked 3 inline comments as done.

Address latest comments, thanks!

fhahn marked an inline comment as done.Jan 1 2022, 4:20 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8934

updated, thanks

llvm/lib/Transforms/Vectorize/VPlan.cpp
902

Reasoning behind this early-exit:
createVectorIntOrFpInductionPHI()/widenPHIInstruction() take care of generating LastInduction/InductionGEP and rewiring them back to the phi during VPWidenIntOrFpInductionRecipe/VPWidenPHIRecipe::execute, respectively. Once these values are recipe'd, they too c/should be rewired here instead (TODO).

Exactly, I added. a comment.

VPWidenCanonicalIVRecipe depends on the phi of VPCanonicalIVRecipe (which is rewired below) rather than having its own phi. It represents an original IR phi that turns into a non-IR instruction when vectorized, similar to VPBlendRecipe, which should appear right after "real" phi recipes. Should they be placed outside VPBB->phis()? Should both be supported, e.g., VPBB->real-phis() and VPBB->aliased-phis()?

VPWidenCanonicalIVRecipe should probably be placed outside the phis section, as it doesn't expand to phi instructions. I put up D116473. I don't think there's a strong need for more involved accessors like real-phis and aliased-phis.

906

Updated, thanks!

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
178 ↗(On Diff #396615)

OTOH, the inc-cmp-br of the canonical IV must be placed at the end of the latch, so it would be more robust to rely on it and its user(s) instead.

Sounds good, we can adjust it once this is modeled with recipes.

fhahn marked an inline comment as done.Jan 1 2022, 12:43 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlan.cpp
919

I put up D116479 to add a cmp-and-branch VPInstruction opcode as a follow-up.

fhahn updated this revision to Diff 396924.Jan 2 2022, 4:53 AM

Rebase after landing b1a333f0feb8.

Ayal added inline comments.Jan 2 2022, 9:34 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2471

Slightly more consistent to check ScalarIV->getType() != IV->getType(), given the cast from one to the other below.

May be good to assign auto NeededType = IV->getType() ?

8939

"CanonicalIV" >> "CanonicalIVPhi", to complement "CanonicalIVIncrement".

9688

getVPValue(0) >> getVPSingleValue()?

llvm/lib/Transforms/Vectorize/VPlan.cpp
953

VectorTripCount is always used by the compare of branch-on-count, and by it only. So its VPValue is always generated but will have a user only once the compare (possibly w/ its increment and branch) is also represented as a recipe/VPInstruction. For now can assert it has no users; code introduced otherwise is dead.

fhahn updated this revision to Diff 396937.Jan 2 2022, 10:46 AM

Address latest comments, thanks!

fhahn marked 4 inline comments as done.Jan 2 2022, 10:49 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2471

I added the new variable, thanks!

8939

Adjusted!

9688

There's no need to use a getVP*Value accessor, this can passed directly. Updated!

llvm/lib/Transforms/Vectorize/VPlan.cpp
953

Good point, I replaced the printing with assert, will move the printing to the patch that uses it explicitly.

Ayal accepted this revision.Jan 4 2022, 2:23 PM

This looks fine, thanks!!
Clarifying a previous suggestion.

llvm/lib/Transforms/Vectorize/VPlan.cpp
717

Ah, now it looks more logical to first handle Part 0; and possibly fuse the last State.set & break.

Above thought was something like:

case VPInstruction::CanonicalIVIncrement:
case VPInstruction::CanonicalIVIncrementNUW: {
  Value *Next = nullptr;
  if (Part == 0) {
    bool IsNUW = getOpcode() == VPInstruction::CanonicalIVIncrementNUW;
    auto *Phi = State.get(getOperand(0), 0);
    // The loop step is equal to the vectorization factor (num of SIMD elements)
    // times the unroll factor (num of SIMD instructions).
    Value *Step = createStepForVF(Builder, Phi->getType(), State.VF, State.UF);
    Next = Builder.CreateAdd(Phi, Step, "index.next", IsNUW, false);
  } else
    Next = State.get(this, 0);

  State.set(this, Next, Part);
  break;
}
This revision is now accepted and ready to land.Jan 4 2022, 2:23 PM
fhahn updated this revision to Diff 397488.Jan 5 2022, 2:36 AM
fhahn marked 4 inline comments as done.

Thanks for all the comments! Rebased and applied suggestion. I will land this soon.

This revision was landed with ongoing or failed builds.Jan 5 2022, 2:46 AM
This revision was automatically updated to reflect the committed changes.