See D117853: compressing debug sections is a bottleneck and therefore it
has a large value parallizing the step.
zstd provides multi-threading API and the output is deterministic even with
different numbers of threads (see https://github.com/facebook/zstd/issues/2238).
Therefore we can leverage it instead of using the pigz-style sharding approach.
Also, switch to the default compression level 3. The current level 5
is significantly slower without providing justifying size benefit.
'dash b.sh 1' ran 1.05 ± 0.01 times faster than 'dash b.sh 3' 1.18 ± 0.01 times faster than 'dash b.sh 4' 1.29 ± 0.02 times faster than 'dash b.sh 5' level=1 size: 358946945 level=3 size: 309002145 level=4 size: 307693204 level=5 size: 297828315