Page MenuHomePhabricator

[ELF] Parallelize --compress-debug-sections=zlib
ClosedPublic

Authored by MaskRay on Jan 20 2022, 9:50 PM.

Details

Summary

When linking a Debug build clang (265MiB SHF_ALLOC sections, 920MiB uncompressed
debug info), in a --threads=1 link "Compress debug sections" takes 2/3 time and
in a --threads=8 link "Compress debug sections" takes ~70% time.

This patch splits a section into 1MiB shards and calls zlib deflake parallelly.

  • use Z_SYNC_FLUSH for all shards but the last to flush the output to a byte boundary to be concatenated with the next shard
  • use Z_FINISH for the last shard to set the BFINAL flag to indicate the end of the output stream (per RFC1951)

In a --threads=8 link, "Compress debug sections" is 5.7x as fast and the total
speed is 2.54x. Because the hash table for one shard is not shared with the next
shard, the output is slightly larger. Better compression ratio can be achieved
by preloading the window size from the previous shard as dictionary
(deflateSetDictionary), but that is overkill.

# 1MiB shards
% bloaty clang.new -- clang.old
    FILE SIZE        VM SIZE
 --------------  --------------
  +0.3%  +129Ki  [ = ]       0    .debug_str
  +0.1%  +105Ki  [ = ]       0    .debug_info
  +0.3%  +101Ki  [ = ]       0    .debug_line
  +0.2% +2.66Ki  [ = ]       0    .debug_abbrev
  +0.0% +1.19Ki  [ = ]       0    .debug_ranges
  +0.1%  +341Ki  [ = ]       0    TOTAL

# 2MiB shards
% bloaty clang.new -- clang.old
    FILE SIZE        VM SIZE
 --------------  --------------
  +0.2% +74.2Ki  [ = ]       0    .debug_line
  +0.1% +72.3Ki  [ = ]       0    .debug_str
  +0.0% +69.9Ki  [ = ]       0    .debug_info
  +0.1%    +976  [ = ]       0    .debug_abbrev
  +0.0%    +882  [ = ]       0    .debug_ranges
  +0.0%  +218Ki  [ = ]       0    TOTAL

Bonus in not using zlib::compress

  • we can compress a debug section larger than 4GiB
  • peak memory usage is lower because for most shards the output size is less than 50% input size (all less than 55% for a large binary I tested, but decreasing the initial output size does not decrease memory usage)

Diff Detail

Event Timeline

MaskRay created this revision.Jan 20 2022, 9:50 PM
MaskRay requested review of this revision.Jan 20 2022, 9:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2022, 9:50 PM

In a large executable I tested, for all shard, compressed divided by uncompressed is smaller than 0.558342. The median is 0.408379.
I have tried 0.25 as initial output size but do not see a memory usage difference.

lld/ELF/OutputSections.cpp
292

I'm wondering if you have considered using llvm/Support/Compression.h
(the implementation there appears to contain some bits to make it msan-friendly + error handling, but I'm not closely familiar with that code)

MaskRay added inline comments.Jan 20 2022, 11:42 PM
lld/ELF/OutputSections.cpp
292

The code is largely lld/ELF specific. If I add the code to llvm/Support/Compression.h, LLVMSupport will get bloated. Technically llvm-objcopy --compress-debug-sections can use the code as well but the two projects may have different tweaks and sharing code won't help much in my opinion.

lld/ELF/OutputSections.cpp
292

just in case - after looking at https://zlib.net/manual.html and https://llvm.org/doxygen/Compression_8cpp_source.html -
the return values of deflateInit2, deflate or compress2 are not ignored there.

p.s. Compression.h contains wrappers around compress2, but what's going on here is a bit different,
(compression of chunks + no headers), so, yeah, it answers my question above.

No objections from me. I think the speed up is worth the small amount of extra size. I've made a few small suggestions but are all subjective. I don't have a lot of large programs hanging around to test this on. I guess something like Chromium would give you another data point.

If you've not done it yet, would be good to try and open the test program in a debugger to check to see if it decompress the output. I'd expect there to be no problems but could be worth a sanity check.

lld/ELF/OutputSections.cpp
302

Typo // Allocate a buffer

358

Is it worth picking a plural as there can be more than one shard? Similarly for out and adler. For example ins, outs and adlers. I'm not sure ins and outs sound right though, perharps shardsIn and shardsOut. Again not a strong opinion.

359

Might be worth using start and end rather than i and j? I've not got a strong opinion here, happy to keep with i, j if you prefer.

367

The code above use idx for going through in[] and i for something else, could be worth using the same value?

Is there any chance to avoid buffering the compressed output? (I guess probably not, because you need to know how large it is before you write it to the output file (if you want to parallelize writing sections, which is important no doubt))

MaskRay added a comment.EditedJan 21 2022, 10:11 AM

Is there any chance to avoid buffering the compressed output? (I guess probably not, because you need to know how large it is before you write it to the output file (if you want to parallelize writing sections, which is important no doubt))

I have asked myself this question... Unfortunately no. To have accurate estimate of sizes, we have to buffer all compressed output.
It's needed to compute sh_offset and sh_size fields of a .debug_* section. To know the size we need to compress it first (or estimate, but the compression ratio is not easy to estimate).

I think pigz uses an approach to only keep concurrency shards, but it does not have the requirement to know the output size beforehand.

MaskRay updated this revision to Diff 402095.Jan 21 2022, 1:19 PM
MaskRay marked 4 inline comments as done.
MaskRay edited the summary of this revision. (Show Details)

address comments
update description

MaskRay added inline comments.Jan 21 2022, 1:21 PM
lld/ELF/OutputSections.cpp
346

This zero fills the buffer, but I have tested that removing it and adding gap filling in writeTo does not improve performance.

MaskRay updated this revision to Diff 402178.Jan 21 2022, 10:59 PM
MaskRay edited the summary of this revision. (Show Details)

Simplify
improve description

MaskRay edited the summary of this revision. (Show Details)Jan 21 2022, 11:10 PM

https://maskray.me/blog/2022-01-23-compressed-debug-sections#linkers has a longer discussion why avoiding memory allocation is bad.
Note: this patch decreases memory usage because the previous deflateBound is wasteful (it's always larger than the input size).

Can we do better? At one time, the compressed data is stored in two places. One in the allocated memory holding the compressed shard, the other in the memory mapped output file. It will be nice if we can avoid memory allocation. Unfortunately we need to compute the section size, otherwise we do not know the offsets of following sections and the section header table. There is no good way estimating the compressed section size without doing the compression. Technically if the section header table along with .symtab/.shstrtab/.strtab is moved before debug sections, we can compress the debug compression and append them to the output file. The output file will unfortunately be unconventional and this will not work when a linker script specifies exact orders of sections. It is just too hacky to do so much to just save a little memory.

ikudrin accepted this revision.Jan 24 2022, 8:07 AM

No objections from me too.

lld/ELF/OutputSections.cpp
349

Maybe mention Z_BEST_SPEED instead of just 1?

This revision is now accepted and ready to land.Jan 24 2022, 8:07 AM

Is there any chance to avoid buffering the compressed output? (I guess probably not, because you need to know how large it is before you write it to the output file (if you want to parallelize writing sections, which is important no doubt))

I have asked myself this question... Unfortunately no. To have accurate estimate of sizes, we have to buffer all compressed output.
It's needed to compute sh_offset and sh_size fields of a .debug_* section. To know the size we need to compress it first (or estimate, but the compression ratio is not easy to estimate).

I think pigz uses an approach to only keep concurrency shards, but it does not have the requirement to know the output size beforehand.

Yeah, I guess out of scope for this change - but maybe another time. It'd break parallelism, but you could stream out a section at a time (at least for the compressed sections) and then seek back to write the sh* offset fields based on how the compression actually worked out.

I guess for Split DWARF the memory savings wouldn't be that significant, though? Do you have a sense of how much memory it'd take.

Another direction to go could be to do compressed data concatenation - if the compression algorithm supports concatenation, you could lose some size benefits and gain speed (like lld's sliding scale of string deduplication) by just concatenating the compressed sections together - predictable size and you could write the updated compressed section header based on the input sections headers.

Though I guess most of the DWARF sections remaining in the objects/linked binary when using Split DWARF require relocations to be applied, so that requires decompressing/recompressing anyway... :/

Is there any chance to avoid buffering the compressed output? (I guess probably not, because you need to know how large it is before you write it to the output file (if you want to parallelize writing sections, which is important no doubt))

I have asked myself this question... Unfortunately no. To have accurate estimate of sizes, we have to buffer all compressed output.
It's needed to compute sh_offset and sh_size fields of a .debug_* section. To know the size we need to compress it first (or estimate, but the compression ratio is not easy to estimate).

I think pigz uses an approach to only keep concurrency shards, but it does not have the requirement to know the output size beforehand.

Yeah, I guess out of scope for this change - but maybe another time. It'd break parallelism, but you could stream out a section at a time (at least for the compressed sections) and then seek back to write the sh* offset fields based on how the compression actually worked out.

I guess for Split DWARF the memory savings wouldn't be that significant, though? Do you have a sense of how much memory it'd take.

The saving is still large because of .debug_line.

Here is a -DCMAKE_BUILD_TYPE=Debug -DLLVM_TARGETS_TO_BUILD=X86 -DCMAKE_CXX_FLAGS='-gdwarf-5 -gsplit-dwarf' build of Clang.

% ~/projects/bloaty/Release/bloaty lld
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  38.0%   368Mi   0.0%       0    .debug_gnu_pubnames
  13.3%   129Mi  62.0%   129Mi    .text
  12.7%   123Mi   0.0%       0    .debug_line
  11.5%   111Mi   0.0%       0    .debug_gnu_pubtypes
  10.9%   105Mi   0.0%       0    .strtab
   2.8%  27.3Mi  13.1%  27.3Mi    .eh_frame
   2.4%  22.9Mi  11.0%  22.9Mi    .rodata
   2.2%  21.6Mi   0.0%       0    .debug_addr
   2.2%  21.0Mi   0.0%       0    .symtab
   1.3%  12.3Mi   5.9%  12.3Mi    .dynstr
   1.0%  9.37Mi   0.0%       0    .debug_rnglists
   0.7%  6.83Mi   3.3%  6.83Mi    .eh_frame_hdr
   0.4%  4.15Mi   2.0%  4.15Mi    .data.rel.ro
   0.3%  3.06Mi   1.5%  3.06Mi    .dynsym
   0.1%  1.02Mi   0.5%  1.02Mi    .hash
   0.1%   995Ki   0.0%       0    .debug_info
   0.1%   907Ki   0.4%   907Ki    .gnu.hash
   0.1%   558Ki   0.1%   249Ki    [24 Others]
   0.0%   364Ki   0.0%       0    .debug_str
   0.0%       0   0.2%   363Ki    .bss
   0.0%   261Ki   0.1%   261Ki    .gnu.version
 100.0%   970Mi 100.0%   208Mi    TOTAL

With --compress-debug-sections=zlib but not --gdb-index (so the huge not-so-useful .debug_gnu_pubnames is compressed)

% hyperfine --warmup 2 --min-runs 10 "numactl -C 20-27 "{/tmp/c/0,/tmp/c/1}" -flavor gnu @response.txt --threads=8 -o lld --compress-debug-sections=zlib"
Benchmark 1: numactl -C 20-27 /tmp/c/0 -flavor gnu @response.txt --threads=8 -o lld --compress-debug-sections=zlib
  Time (mean ± σ):     10.756 s ±  0.025 s    [User: 10.797 s, System: 1.852 s]
  Range (min … max):   10.712 s … 10.791 s    10 runs
 
Benchmark 2: numactl -C 20-27 /tmp/c/1 -flavor gnu @response.txt --threads=8 -o lld --compress-debug-sections=zlib
  Time (mean ± σ):      5.487 s ±  0.047 s    [User: 10.964 s, System: 1.830 s]
  Range (min … max):    5.403 s …  5.559 s    10 runs
 
Summary
  'numactl -C 20-27 /tmp/c/1 -flavor gnu @response.txt --threads=8 -o lld --compress-debug-sections=zlib' ran
    1.96 ± 0.02 times faster than 'numactl -C 20-27 /tmp/c/0 -flavor gnu @response.txt --threads=8 -o lld --compress-debug-sections=zlib'

With --gdb-index

% hyperfine --warmup 2 --min-runs 10 "numactl -C 20-27 "{/tmp/c/0,/tmp/c/1}" -flavor gnu @response.txt --threads=8 -o lld --compress-debug-sections=zlib --gdb-index"
Benchmark 1: numactl -C 20-27 /tmp/c/0 -flavor gnu @response.txt --threads=8 -o lld --compress-debug-sections=zlib --gdb-index
  Time (mean ± σ):      6.981 s ±  0.020 s    [User: 9.516 s, System: 1.979 s]
  Range (min … max):    6.945 s …  7.015 s    10 runs
 
Benchmark 2: numactl -C 20-27 /tmp/c/1 -flavor gnu @response.txt --threads=8 -o lld --compress-debug-sections=zlib --gdb-index
  Time (mean ± σ):      5.350 s ±  0.037 s    [User: 9.623 s, System: 1.935 s]
  Range (min … max):    5.293 s …  5.399 s    10 runs
 
Summary
  'numactl -C 20-27 /tmp/c/1 -flavor gnu @response.txt --threads=8 -o lld --compress-debug-sections=zlib --gdb-index' ran
    1.30 ± 0.01 times faster than 'numactl -C 20-27 /tmp/c/0 -flavor gnu @response.txt --threads=8 -o lld --compress-debug-sections=zlib --gdb-index'

Another direction to go could be to do compressed data concatenation - if the compression algorithm supports concatenation, you could lose some size benefits and gain speed (like lld's sliding scale of string deduplication) by just concatenating the compressed sections together - predictable size and you could write the updated compressed section header based on the input sections headers.

The concatenation approach is what used here :)

Though I guess most of the DWARF sections remaining in the objects/linked binary when using Split DWARF require relocations to be applied, so that requires decompressing/recompressing anyway... :/

The end of https://maskray.me/blog/2022-01-23-compressed-debug-sections#linkers discusses why not allocating a buffer is tricky and is not generic enough.
Updating section headers afterwards has an issue that the output file size is unknown so cannot mmap the output in a read-write way.

Is there any chance to avoid buffering the compressed output? (I guess probably not, because you need to know how large it is before you write it to the output file (if you want to parallelize writing sections, which is important no doubt))

I have asked myself this question... Unfortunately no. To have accurate estimate of sizes, we have to buffer all compressed output.
It's needed to compute sh_offset and sh_size fields of a .debug_* section. To know the size we need to compress it first (or estimate, but the compression ratio is not easy to estimate).

I think pigz uses an approach to only keep concurrency shards, but it does not have the requirement to know the output size beforehand.

Yeah, I guess out of scope for this change - but maybe another time. It'd break parallelism, but you could stream out a section at a time (at least for the compressed sections) and then seek back to write the sh* offset fields based on how the compression actually worked out.

I guess for Split DWARF the memory savings wouldn't be that significant, though? Do you have a sense of how much memory it'd take.

The saving is still large because of .debug_line.

I mostly meant the memory savings that might be available if we could avoid caching compressed debug info output sections - I guess looking at the numbers you posted, assuming lld's internal data structures don't use much memory compared to the output size & assuming you're writing to tmpfs so the output counts as memory usage - that's still like half the output file size again as memory usage for compressed output section buffers, so a possible 30% reduction in memory usage or so... which seems pretty valuable, but hard to achieve for sure.

Another direction to go could be to do compressed data concatenation - if the compression algorithm supports concatenation, you could lose some size benefits and gain speed (like lld's sliding scale of string deduplication) by just concatenating the compressed sections together - predictable size and you could write the updated compressed section header based on the input sections headers.

The concatenation approach is what used here :)

Ah, sorry, I meant concatenation of the input sections - no need to decompress or recompress, but that only applies if there are no relocations or other changes to apply to the data.

Though I guess most of the DWARF sections remaining in the objects/linked binary when using Split DWARF require relocations to be applied, so that requires decompressing/recompressing anyway... :/

The end of https://maskray.me/blog/2022-01-23-compressed-debug-sections#linkers discusses why not allocating a buffer is tricky and is not generic enough.
Updating section headers afterwards has an issue that the output file size is unknown so cannot mmap the output in a read-write way.

Ah - I think gold's dwp does it by using a pwrite stream instead - streaming out the section contents and then seeking back to modify the header, rather than memory mapped copies. Not sure what the performance tradeoffs are like for that & whether you could then go back after streaming out the compressed data - and then I guess maybe reopening as memory mapped to write out the rest of the contents.

MaskRay added a comment.EditedJan 24 2022, 6:05 PM

Is there any chance to avoid buffering the compressed output? (I guess probably not, because you need to know how large it is before you write it to the output file (if you want to parallelize writing sections, which is important no doubt))

I have asked myself this question... Unfortunately no. To have accurate estimate of sizes, we have to buffer all compressed output.
It's needed to compute sh_offset and sh_size fields of a .debug_* section. To know the size we need to compress it first (or estimate, but the compression ratio is not easy to estimate).

I think pigz uses an approach to only keep concurrency shards, but it does not have the requirement to know the output size beforehand.

Yeah, I guess out of scope for this change - but maybe another time. It'd break parallelism, but you could stream out a section at a time (at least for the compressed sections) and then seek back to write the sh* offset fields based on how the compression actually worked out.

I guess for Split DWARF the memory savings wouldn't be that significant, though? Do you have a sense of how much memory it'd take.

The saving is still large because of .debug_line.

I mostly meant the memory savings that might be available if we could avoid caching compressed debug info output sections - I guess looking at the numbers you posted, assuming lld's internal data structures don't use much memory compared to the output size & assuming you're writing to tmpfs so the output counts as memory usage - that's still like half the output file size again as memory usage for compressed output section buffers, so a possible 30% reduction in memory usage or so... which seems pretty valuable, but hard to achieve for sure.

There will be some memory savings but I am speculating that it is small.
My rationale is that zlib::compress allocates a compressed buffer whose size is a bit larger than the input size (zlib deflateBound).
(This is actually a saving many projects do not realize (jdk,ffmpeg,etc))
This patch switches to half by default but I see a very small memory usage decrease (I don't remember clearly, but definitely less than 2%).
So I speculate that even if I drop the output buffer entirely, the saving won't be large.
The likely reason is that the memory just overlaps some data structures allocated by previous passes.
I haven't use a heap profiler to look into it more deeply.

Another direction to go could be to do compressed data concatenation - if the compression algorithm supports concatenation, you could lose some size benefits and gain speed (like lld's sliding scale of string deduplication) by just concatenating the compressed sections together - predictable size and you could write the updated compressed section header based on the input sections headers.

The concatenation approach is what used here :)

Ah, sorry, I meant concatenation of the input sections - no need to decompress or recompress, but that only applies if there are no relocations or other changes to apply to the data.

Oh, you mean compressing input sections individually and than concatenating them.
I've thought about this.
One big issue is that initializating zlib data structures takes time.
If we create z_stream one for every input section, the overhead may be too high.
See https://zlib.net/zlib_tech.html "Memory Footprint", the time complexity is comparable with the memory footprint.
Maybe someone interested can do the experiments.
My bet is that even if it may some memory usage benefit, the CPU overhead may be too large (I will not be surprised if it is even slower than the status quo).

Is there any chance to avoid buffering the compressed output? (I guess probably not, because you need to know how large it is before you write it to the output file (if you want to parallelize writing sections, which is important no doubt))

I have asked myself this question... Unfortunately no. To have accurate estimate of sizes, we have to buffer all compressed output.
It's needed to compute sh_offset and sh_size fields of a .debug_* section. To know the size we need to compress it first (or estimate, but the compression ratio is not easy to estimate).

I think pigz uses an approach to only keep concurrency shards, but it does not have the requirement to know the output size beforehand.

Yeah, I guess out of scope for this change - but maybe another time. It'd break parallelism, but you could stream out a section at a time (at least for the compressed sections) and then seek back to write the sh* offset fields based on how the compression actually worked out.

I guess for Split DWARF the memory savings wouldn't be that significant, though? Do you have a sense of how much memory it'd take.

The saving is still large because of .debug_line.

I mostly meant the memory savings that might be available if we could avoid caching compressed debug info output sections - I guess looking at the numbers you posted, assuming lld's internal data structures don't use much memory compared to the output size & assuming you're writing to tmpfs so the output counts as memory usage - that's still like half the output file size again as memory usage for compressed output section buffers, so a possible 30% reduction in memory usage or so... which seems pretty valuable, but hard to achieve for sure.

There will be some memory savings but I am speculating that it is small.
My rationale is that zlib::compress allocates a compressed buffer whose size is a bit larger than the input size (zlib deflateBound).
(This is actually a saving many projects do not realize (jdk,ffmpeg,etc))
This patch switches to half by default but I see a very small memory usage decrease (I don't remember clearly, but definitely less than 2%).
So I speculate that even if I drop the output buffer entirely, the saving won't be large.
The likely reason is that the memory just overlaps some data structures allocated by previous passes.
I haven't use a heap profiler to look into it more deeply.

Yeah, might be interesting to know where peak linker memory usage is - if this isn't at the peak point, that's fair - less to worry about.

Another direction to go could be to do compressed data concatenation - if the compression algorithm supports concatenation, you could lose some size benefits and gain speed (like lld's sliding scale of string deduplication) by just concatenating the compressed sections together - predictable size and you could write the updated compressed section header based on the input sections headers.

The concatenation approach is what used here :)

Ah, sorry, I meant concatenation of the input sections - no need to decompress or recompress, but that only applies if there are no relocations or other changes to apply to the data.

Oh, you mean compressing input sections individually and than concatenating them.
I've thought about this.
One big issue is that initializating zlib data structures takes time.
If we create z_stream one for every input section, the overhead may be too high.

Ah, sorry, no, I meant taking the already-compressed input sections and writing them straight to the output without the linker ever decompressing or compressing this data. Which, yeah, only applies if there are no relocations to apply - which is more relevant with dwp (where I mostly have in mind) than with lld (if you're using Split DWARF - if you're not using Split DWARF but you are using DWARFv5, there might be more opportunities for DWARF sections that have no relocations), though some sections even with Split DWARF have no relocations, like .debug_rnglists for instance.

Yeah, might be interesting to know where peak linker memory usage is - if this isn't at the peak point, that's fair - less to worry about.

Another direction to go could be to do compressed data concatenation - if the compression algorithm supports concatenation, you could lose some size benefits and gain speed (like lld's sliding scale of string deduplication) by just concatenating the compressed sections together - predictable size and you could write the updated compressed section header based on the input sections headers.

The concatenation approach is what used here :)

Ah, sorry, I meant concatenation of the input sections - no need to decompress or recompress, but that only applies if there are no relocations or other changes to apply to the data.

Oh, you mean compressing input sections individually and than concatenating them.
I've thought about this.
One big issue is that initializating zlib data structures takes time.
If we create z_stream one for every input section, the overhead may be too high.

Ah, sorry, no, I meant taking the already-compressed input sections and writing them straight to the output without the linker ever decompressing or compressing this data. Which, yeah, only applies if there are no relocations to apply - which is more relevant with dwp (where I mostly have in mind) than with lld (if you're using Split DWARF - if you're not using Split DWARF but you are using DWARFv5, there might be more opportunities for DWARF sections that have no relocations), though some sections even with Split DWARF have no relocations, like .debug_rnglists for instance.

OK, got it:) Strip the zlib header and the trailer of a compressed input section and concatenate the data part.
This is what https://github.com/madler/zlib/blob/master/examples/gzjoin.c#L34 does. It does not re-compress the output but needs to uncompresses input to get the final block marker (BFINAL).
The implementation is a bit involved and more, the compressed data may not be retained (see D52917 for data()). --gdb-index needs to uncompress .debug_info.
If we want to leverage this optimization (the output will be larger because the default 32KiB window size is essentially shrunk to the input section size), there would be quite involved changes....

The feedback is positive. I'll push this tomorrow.

MaskRay edited the summary of this revision. (Show Details)Jan 25 2022, 10:24 AM
This revision was automatically updated to reflect the committed changes.
mgorny reopened this revision.Feb 6 2022, 5:10 AM
mgorny added inline comments.
lld/ELF/OutputSections.cpp
19

This breaks the build against installed LLVM since config.h is a private header. I guess you're looking to add a new constant to llvm-config.h.

This revision is now accepted and ready to land.Feb 6 2022, 5:10 AM
mgorny requested changes to this revision.Feb 6 2022, 5:10 AM
This revision now requires changes to proceed.Feb 6 2022, 5:10 AM

The llvm-config.h thing is discussed in D119058.

I don't think the standalone build is officially supported (removed for some projects) and does not work due to GetErrcMessages and a libunwind header issue, so if it going to be problematic we may have to take the compromise.

MaskRay removed a reviewer: mgorny.Feb 7 2022, 2:01 PM
This revision is now accepted and ready to land.Feb 7 2022, 2:01 PM
MaskRay closed this revision.Feb 7 2022, 4:18 PM