This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Enhance vectorization of ld.param & st.param
ClosedPublic

Authored by kovdan01 on Feb 18 2022, 6:29 AM.

Details

Summary

Since function parameters and return values are passed via param space, we
can force special alignment for values hold in it which will add vectorization
options. This change may be done if the function has private or internal
linkage. Special alignment is forced during 2 phases.

  1. Instruction selection lowering. Here we use special alignment for function prototypes (changing both own return value and parameters alignment), call lowering (changing both callee's return value and parameters alignment).
  1. IR pass nvptx-lower-args. Here we change alignment of byval parameters that belong to param space (or are casted to it). We only handle cases when all uses of such parameters are loads from it. For such loads, we can change the alignment according to special type alignment and the load offset. Then, load-store-vectorizer IR pass will perform vectorization where alignment allows it.

Special alignment calculated as maximum from default ABI type alignment and
alignment 16. Alignment 16 is chosen because it's the maximum size of
vectorized ld.param & st.param.

Before specifying such special alignment, we should check if it is a multiple
of the alignment that the type already has. For example, if a value has an
enforced alignment of 64, default ABI alignment of 4 and special alignment
of 16, we should preserve 64.

This patch will be followed by a refactoring patch that removes duplicating
code in handling byval and non-byval arguments.

Diff Detail

Event Timeline

kovdan01 created this revision.Feb 18 2022, 6:29 AM
kovdan01 requested review of this revision.Feb 18 2022, 6:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 18 2022, 6:29 AM
tra added a comment.Feb 22 2022, 11:21 AM

FYI. I've recently proposed to pass the values directly, instead of a byval pointer: https://discourse.llvm.org/t/nvptx-calling-convention-for-aggregate-arguments-passed-by-value/5881
It needs additional changes to get lowering done correctly. I should have them ready in about a week.

This change may be done if the function has private or internal linkage.

I think we should be able to do that to all no-kernel functions if we're compiling without -fgpu-rdc. I think we do reduce visibility of non-kernels in that case, but it would be good to make sure.

If S is a multiple of 4 * A, let special alignment be 4 * A.
Else, if S is a multiple of 2 * A, let special alignment be 2 * A.
Else, let special alignment be A.

I'm not sure if that logic makes sense to me. E.g. if we have [5 x i32], the optimal way to load it would be to align it by 16 and use ld.v4 + ld. With your approach [4 x i32] would be loaded ad ld.v4, [6 x i32] as 3 * ld.v2, but [5 x i32] would use 5 * ld. If we can align 4 and 6 element arrays, I do not see why we would not be allowed to align 5-element array, too -- it's an equivalent of struct { [4 x i32], i32 } as far as in-memory layout is concerned.

I wonder if we can just always set alignment to 16. Byval pointer for NVPTX is a fiction anyways as we always copy the data when we actually lower those arguments.

FYI. I've recently proposed to pass the values directly, instead of a byval pointer: https://discourse.llvm.org/t/nvptx-calling-convention-for-aggregate-arguments-passed-by-value/5881
It needs additional changes to get lowering done correctly. I should have them ready in about a week.

Okay, please let me know when some results are present. Anyway, IMO, we should be able to handle IR with byval pointers used - at least because a person might potentially use frontend different from clang.

This change may be done if the function has private or internal linkage.

I think we should be able to do that to all no-kernel functions if we're compiling without -fgpu-rdc. I think we do reduce visibility of non-kernels in that case, but it would be good to make sure.

Okay, I'll investigate it.

If S is a multiple of 4 * A, let special alignment be 4 * A.
Else, if S is a multiple of 2 * A, let special alignment be 2 * A.
Else, let special alignment be A.

I'm not sure if that logic makes sense to me. E.g. if we have [5 x i32], the optimal way to load it would be to align it by 16 and use ld.v4 + ld. With your approach [4 x i32] would be loaded ad ld.v4, [6 x i32] as 3 * ld.v2, but [5 x i32] would use 5 * ld. If we can align 4 and 6 element arrays, I do not see why we would not be allowed to align 5-element array, too -- it's an equivalent of struct { [4 x i32], i32 } as far as in-memory layout is concerned.

I wonder if we can just always set alignment to 16. Byval pointer for NVPTX is a fiction anyways as we always copy the data when we actually lower those arguments.

I suppose that you are correct and we can always set alignment to 16. The reason why I implemented such logic is that I try to be as conservative as possible. For example, if we have two values of [5 x i32] aligned as 4, they might be placed together without any gaps. If we align them as 16 - an additional gap of 12 bytes will appear. If keeping such king of "layout" stable is not important - I'll change the logic so alignment is always 16 in param space.

tra added a comment.Mar 1 2022, 12:54 PM

I suppose that you are correct and we can always set alignment to 16. The reason why I implemented such logic is that I try to be as conservative as possible.

For example, if we have two values of [5 x i32] aligned as 4, they might be placed together without any gaps.

I do not think PTX gives us any guarantees on how parameters are actually arranged. For all we know they may be packed in registers and alignment is completely irrelevant.
I've poked a bit at the SASS generated for various args and alignments and it appears that first few arguments are actually passed via registers. Up to 12 ints are passed via registers on sm_86, Subsequent arguments are passed via local memory.
Here's an example of alignment effect on the argument layout: https://cuda.godbolt.org/z/87Mj1hz4h

In any case, if the code ends up with local memory loads/stores in the hot path, we can already forget about performance and being able to use vectorized local/stores will not help much.
I think the best we can do here is give ptxas as much flexibility as we can. Setting alignment for return values to 16 should be safe. We rarely return multiple values.

If we align them as 16 - an additional gap of 12 bytes will appear. If keeping such king of "layout" stable is not important - I'll change the logic so alignment is always 16 in param space.

Larger alignment will potentially waste some of the limited param space, so it's a trade-off between guaranteeing efficient loads/stores of parameters vs not being able to pass very large number of arguments. E.g. if someone were to have a function with hundreds of [5 x i32] arguments, we would probably be able to handle more such args if they were aligned by 4.
Efficient loads/stores are arguably much more important as they are very common and affect about everyone.

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 12:54 PM
krisb added a subscriber: krisb.Mar 3 2022, 10:11 AM
kovdan01 updated this revision to Diff 416338.Mar 17 2022, 3:17 PM
kovdan01 edited the summary of this revision. (Show Details)

@tra Thanks for your comments! Updated the patch according the discussion about forcing alignment 16.

I think we should be able to do that to all no-kernel functions if we're compiling without -fgpu-rdc. I think we do reduce visibility of non-kernels in that case, but it would be good to make sure.

Checked if we do reduce visibility in such cases, and looks like we do not. The following code:

__device__ int foo(int a, int b, int c) {
  return (a + b) / c;
}

Compiles to the following IR:

; ModuleID = 'device.cu'
source_filename = "device.cu"
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

; Function Attrs: convergent mustprogress noinline nounwind optnone
define dso_local noundef i32 @_Z3fooiii(i32 noundef %a, i32 noundef %b, i32 noundef %c) #0 {
entry:
  %a.addr = alloca i32, align 4
  %b.addr = alloca i32, align 4
  %c.addr = alloca i32, align 4
  store i32 %a, i32* %a.addr, align 4
  store i32 %b, i32* %b.addr, align 4
  store i32 %c, i32* %c.addr, align 4
  %0 = load i32, i32* %a.addr, align 4
  %1 = load i32, i32* %b.addr, align 4
  %add = add nsw i32 %0, %1
  %2 = load i32, i32* %c.addr, align 4
  %div = sdiv i32 %add, %2
  ret i32 %div
}

attributes #0 = { convergent mustprogress noinline nounwind optnone "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="sm_35" "target-features"="+ptx75,+sm_35" }

!llvm.module.flags = !{!0, !1, !2, !3}
!llvm.ident = !{!4, !5}

!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 11, i32 5]}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{i32 4, !"nvvm-reflect-ftz", i32 0}
!3 = !{i32 7, !"frame-pointer", i32 2}
!4 = !{!"clang version 15.0.0 (https://github.com/llvm/llvm-project.git 9879c555f21097aee15e73dd25bd89f652dba8ea)"}
!5 = !{!"clang version 3.8.0 (tags/RELEASE_380/final)"}

The compilation command that I use:

clang++ --cuda-gpu-arch=sm_35 --cuda-path=/opt/cuda/ -S -emit-llvm -fno-gpu-rdc device.cu

Also, searching for GPURelocatableDeviceCode through LLVM codebase does not get results where this value is checked in context of reducing function visibility. I could implement that change, and IMHO that should be a separate patch.

Regarding the current patch – how is https://reviews.llvm.org/D118084 going? Can we merge this patch without waiting for your change about passing byval aggregates directly? As I already mentioned in my previous comment, I suppose the change is useful as far as compiler should work good with any type of IR maybe even generated by non-clang frontend.

tra added a subscriber: yaxunl.Mar 17 2022, 4:45 PM

@tra Thanks for your comments! Updated the patch according the discussion about forcing alignment 16.

I think we should be able to do that to all no-kernel functions if we're compiling without -fgpu-rdc. I think we do reduce visibility of non-kernels in that case, but it would be good to make sure.

Checked if we do reduce visibility in such cases, and looks like we do not.

I was indeed mistaken. AFAICT, we only internalize some device-side variables.
@yaxunl - Sam, do we change visibility for anything else? I know we must keep kernels visible, but the question is whether we ever internalize non-kernel functions and if not, whether we want to. In this case it would allow us to bump argument and return value alignment.

Also, searching for GPURelocatableDeviceCode through LLVM codebase does not get results where this value is checked in context of reducing function visibility. I could implement that change, and IMHO that should be a separate patch.

Yeah, it's a separate issue.

Regarding the current patch – how is https://reviews.llvm.org/D118084 going? Can we merge this patch without waiting for your change about passing byval aggregates directly? As I already mentioned in my previous comment, I suppose the change is useful as far as compiler should work good with any type of IR maybe even generated by non-clang frontend.

There's been little progress on that patch lately. I've got way less time to work on all related changes than I hoped for. This patch does not depend on my changes and will not be blocked. I was mostly pointing at it as a heads up that if/when it lands we would no longer see byval pointers in clang-generated IR and the impact of this patch would be reduced.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
4313–4314

"..may want to increase their alignment in a way that ensures that we can effectively vectorize their loads & stores."

4315

"or has private linkage"

4316

Did you mean "rely on default alignment" instead of "linkage"?

"To allow using 128-bit vectorized loads/stores. his function ensures that alignment is 16 or greater."

llvm/lib/Target/NVPTX/NVPTXLowerArgs.cpp
235

"tryIncreaseByValAlign" does not sound quite right.
"adjustByValArgAlignment()" ?

I'd also mention that we traverse all loads from byval pointer and adjust their alignment, if those were using known offset.

llvm/test/CodeGen/NVPTX/param-vectorize.ll
38 ↗(On Diff #416338)

We should also check that alignment of kernel parameters is not affected.

@tra Thanks for your comments! Updated the patch according the discussion about forcing alignment 16.

I think we should be able to do that to all no-kernel functions if we're compiling without -fgpu-rdc. I think we do reduce visibility of non-kernels in that case, but it would be good to make sure.

Checked if we do reduce visibility in such cases, and looks like we do not.

I was indeed mistaken. AFAICT, we only internalize some device-side variables.
@yaxunl - Sam, do we change visibility for anything else? I know we must keep kernels visible, but the question is whether we ever internalize non-kernel functions and if not, whether we want to. In this case it would allow us to bump argument and return value alignment.

For HIP, we mark non-kernel device functions with hidden visibility and internalize them in a LLVM pass for -fno-gpu-rdc.

tra added a comment.Mar 18 2022, 10:35 AM

For HIP, we mark non-kernel device functions with hidden visibility and internalize them in a LLVM pass for -fno-gpu-rdc.

Looks like now we may have a reason to do so for CUDA, too. Could you point me to where we do it for HIP?

For HIP, we mark non-kernel device functions with hidden visibility and internalize them in a LLVM pass for -fno-gpu-rdc.

Looks like now we may have a reason to do so for CUDA, too. Could you point me to where we do it for HIP?

Make default visibility to be hidden: https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/HIPAMD.cpp#L203

To avoid making kernels invisible, make them protected visibility: https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/TargetInfo.cpp#L9315

Tell the backend that it needs to internalize non-kernel functions: https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/HIPAMD.cpp#L189

Let backend internalize non-kernel functions but not variables: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp#L702

tra added a comment.Mar 21 2022, 11:21 AM

Thank you for the details!

kovdan01 updated this revision to Diff 417265.Mar 22 2022, 6:08 AM
kovdan01 marked 4 inline comments as done.Mar 22 2022, 6:11 AM

@tra Added appropriate tests for kernels. Looks like that kernel functions get external linkage regardless static specifier, so their parameters alignment is kept untouched.
To ensure that it's true, altered clang/test/CodeGenCUDA/device-fun-linkage.cu and added appropriate assertion in getFunctionParamOptimizedAlign - it checks module metadata and identifies if the function is marked as kernel or not.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
4316

Sure, I meant that :)

llvm/lib/Target/NVPTX/NVPTXLowerArgs.cpp
235

I agree, your variant sound better. Fixed.

tra accepted this revision.Mar 22 2022, 10:58 AM

Few cosmetic&style nits. Otherwise LGTM.

clang/test/CodeGenCUDA/device-fun-linkage.cu
25–26

Negative testing is tricky. The check here only ensures that the function does not appear between the positive matches around them. If we emit this function as the first or last function in the file, these checks will succeed, even though they should not.

One way to deal with this is to run negative tests separately, with their own RUN lines and check tags.

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1423–1424

I'd inline it into printout, too.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
1306

This could be moved out of the loop.

1317–1318

No need for the variable, just inline the call into printout statement. Makes it easier to see what we're printing.

1353–1354

ditto.

1359
Align ParamByValAlign = std::max(Outs[OIdx].Flags.getNonZeroByValAlign(), 
                                                           getFunctionParamOptimizedAlign(F, ETy, DL));

We could try inlining it into the printout, but it may be too unwieldy for that. Up to you.

This revision is now accepted and ready to land.Mar 22 2022, 10:58 AM
kovdan01 updated this revision to Diff 417852.Mar 24 2022, 2:19 AM
kovdan01 marked 2 inline comments as done.
kovdan01 marked 6 inline comments as done.Mar 24 2022, 2:23 AM
kovdan01 added inline comments.
clang/test/CodeGenCUDA/device-fun-linkage.cu
25–26

Used the approach with separate RUN lines, thanks!

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
1359

Looks too unwieldy for inlining IMHO, so kept in current state.

This revision was landed with ongoing or failed builds.Mar 24 2022, 2:38 AM
This revision was automatically updated to reflect the committed changes.
kovdan01 marked 2 inline comments as done.
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2022, 2:38 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
jdoerfert reopened this revision.Mar 25 2022, 7:51 AM
jdoerfert added a subscriber: jdoerfert.

Our internal CI flagged an assertion in llvm::NVPTXTargetLowering::getFunctionParamOptimizedAlign(llvm::Function const*, llvm::Type*, llvm::DataLayout const&) last night.

Given the error:
static bool llvm::isa_impl_cl<llvm::ConstantAsMetadata, const llvm::Metadata *>::doit(const From *) [To = llvm::ConstantAsMetadata, From = const llvm::Metadata *]: Assertion Val && "isa<> used on a null pointer"' failed.`
I assume it's this patch and the constant as metadata casts.

I'm working on the reproducer now.

This revision is now accepted and ready to land.Mar 25 2022, 7:51 AM

Godbold seems to not have included this:

$ cat test.cpp
int main(){return 0;}
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target -march=sm_80 test.cpp
... static bool llvm::isa_impl_cl<llvm::ConstantAsMetadata, const llvm::Metadata *>::doit(const From *) [To = llvm::ConstantAsMetadata, From = const llvm::Metadata *]: Assertion `Val && "isa<> used on a null pointer"' failed.
llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
4327

FWIW, if (!F->hasLocalLinkage()) { return Align(ABITypeAlign); } <case code>.

Godbold seems to not have included this:

$ cat test.cpp
int main(){return 0;}
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target -march=sm_80 test.cpp
... static bool llvm::isa_impl_cl<llvm::ConstantAsMetadata, const llvm::Metadata *>::doit(const From *) [To = llvm::ConstantAsMetadata, From = const llvm::Metadata *]: Assertion `Val && "isa<> used on a null pointer"' failed.

Thanks for the reproducer! Is the problem specific for sm_80 or not? I am unable to reproduce the issue with sm_75 and lower.

Godbold seems to not have included this:

$ cat test.cpp
int main(){return 0;}
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target -march=sm_80 test.cpp
... static bool llvm::isa_impl_cl<llvm::ConstantAsMetadata, const llvm::Metadata *>::doit(const From *) [To = llvm::ConstantAsMetadata, From = const llvm::Metadata *]: Assertion `Val && "isa<> used on a null pointer"' failed.

Thanks for the reproducer! Is the problem specific for sm_80 or not? I am unable to reproduce the issue with sm_75 and lower.

You don't need to run it. If you use this command line it doesn't crash?

You don't need to run it. If you use this command line it doesn't crash?

Yes, I run the same command (with sm_75 instead of sm_80 because I don't have libomptarget-nvptx-sm_80.bc), and it doesn't crash.

You don't need to run it. If you use this command line it doesn't crash?

Yes, I run the same command (with sm_75 instead of sm_80 because I don't have libomptarget-nvptx-sm_80.bc), and it doesn't crash.

So it's not sm_80 but the cuda version that is important. 11.4.0 works fine for me, 11.0.2 breaks also for sm_70.

Here are the null pointers that cause the assertion:

!nvvm.annotations = !{!8, !9, !8, !10, !10, !10, !10, !11, !11, !10}

!8 = !{null, !"align", i32 8}
!9 = !{null, !"align", i32 8, !"align", i32 65544, !"align", i32 131080}
!10 = !{null, !"align", i32 16}
!11 = !{null, !"align", i32 16, !"align", i32 65552, !"align", i32 131088}
jdoerfert requested changes to this revision.EditedMar 27 2022, 7:46 PM

Please revert this commit. It breaks any code using the cuda.11.0.2 libdevice.bc file, the source of the null nvvm.annotations and the annotations with 5 arguments. See below.
Once a fix is in, and a test is added, we can reapply this.

$ opt -S .../cuda-11.0.2/nvvm/libdevice/libdevice.10.bc | tail
!nvvm.annotations = !{!1, !2, !1, !3, !3, !3, !3, !4, !4, !3}

!0 = !{i32 1, i32 4}
!1 = !{null, !"align", i32 8}
!2 = !{null, !"align", i32 8, !"align", i32 65544, !"align", i32 131080}
!3 = !{null, !"align", i32 16}
!4 = !{null, !"align", i32 16, !"align", i32 65552, !"align", i32 131088}
!5 = distinct !{!5, !6}
This revision now requires changes to proceed.Mar 27 2022, 7:46 PM

Please revert this commit. It breaks any code using the cuda.11.0.2 libdevice.bc file, the source of the null nvvm.annotations and the annotations with 5 arguments. See below.
Once a fix is in, and a test is added, we can reapply this.

$ opt -S .../cuda-11.0.2/nvvm/libdevice/libdevice.10.bc | tail
!nvvm.annotations = !{!1, !2, !1, !3, !3, !3, !3, !4, !4, !3}

!0 = !{i32 1, i32 4}
!1 = !{null, !"align", i32 8}
!2 = !{null, !"align", i32 8, !"align", i32 65544, !"align", i32 131080}
!3 = !{null, !"align", i32 16}
!4 = !{null, !"align", i32 16, !"align", i32 65552, !"align", i32 131088}
!5 = distinct !{!5, !6}

Got it, will revert the patch shortly. Could you please describe what do you mean by "break" in a bit more detail? Is that only about poorly designed and failing assertions in getFunctionParamOptimizedAlign or there are some other (maybe even functional) issues? If that's only about assertions, won't using isKernelFunction (as suggested in D122550) help?

Please revert this commit. It breaks any code using the cuda.11.0.2 libdevice.bc file, the source of the null nvvm.annotations and the annotations with 5 arguments. See below.
Once a fix is in, and a test is added, we can reapply this.

$ opt -S .../cuda-11.0.2/nvvm/libdevice/libdevice.10.bc | tail
!nvvm.annotations = !{!1, !2, !1, !3, !3, !3, !3, !4, !4, !3}

!0 = !{i32 1, i32 4}
!1 = !{null, !"align", i32 8}
!2 = !{null, !"align", i32 8, !"align", i32 65544, !"align", i32 131080}
!3 = !{null, !"align", i32 16}
!4 = !{null, !"align", i32 16, !"align", i32 65552, !"align", i32 131088}
!5 = distinct !{!5, !6}

Got it, will revert the patch shortly. Could you please describe what do you mean by "break" in a bit more detail? Is that only about poorly designed and failing assertions in getFunctionParamOptimizedAlign or there are some other (maybe even functional) issues? If that's only about assertions, won't using isKernelFunction (as suggested in D122550) help?

The two assertions introduced here do not hold for the libdevice.bc above. So whenever we link the above we will cause the assertions to fail. That's what I mean with break. isKernelFunction is probably going to fix all that. For a test, copy annotations like the above into one of the .ll files and make sure it contains a private/internal function w/ arguments as well.

kovdan01 added a comment.EditedMar 27 2022, 8:52 PM

The two assertions introduced here do not hold for the libdevice.bc above. So whenever we link the above we will cause the assertions to fail. That's what I mean with break. isKernelFunction is probably going to fix all that. For a test, copy annotations like the above into one of the .ll files and make sure it contains a private/internal function w/ arguments as well.

OK, thanks for the explanation! Can we just submit a new patch with a fix (like D122550) instead of reverting this one? The problem with revert is that we should also revert D122381 which depends on this patch. Also, adding tests for null nvvm.annotations and the annotations with 5 arguments IMHO will look better when submitted as a separate patch. So, is revert crucial for you or can we just submit a fix separately?

The two assertions introduced here do not hold for the libdevice.bc above. So whenever we link the above we will cause the assertions to fail. That's what I mean with break. isKernelFunction is probably going to fix all that. For a test, copy annotations like the above into one of the .ll files and make sure it contains a private/internal function w/ arguments as well.

OK, thanks for the explanation! Can we just submit a new patch with a fix (like D122550) instead of reverting this one? The problem with revert is that we should also revert D122381 which depends on this patch. Also, adding tests for null nvvm.annotations and the annotations with 5 arguments IMHO will look better when submitted as a separate patch. So, is revert crucial for you or can we just submit a fix separately?

Our internal build bots and CI for some projects are broken for 3 days. I wish to unbreak them so we get actual meaningful results, e.g., see if something else is breaking our build. I'm fine with a separate patch on top but I would prefer it now so people can run code on Monday.

Our internal build bots and CI for some projects are broken for 3 days. I wish to unbreak them so we get actual meaningful results, e.g., see if something else is breaking our build. I'm fine with a separate patch on top but I would prefer it now so people can run code on Monday.

OK, I'll submit a separate patch within an hour, and if it looks good to you - will merge it immediately. The fix looks simple enough so there is no need to wait for additional approvals.

jdoerfert accepted this revision.Mar 28 2022, 9:04 AM
This revision is now accepted and ready to land.Mar 28 2022, 9:04 AM
jdoerfert closed this revision.Mar 28 2022, 9:04 AM