This is the sibling fold for insert-of-insert that was added with D56604.
Now that we have x86 shuffle narrowing (D57156), this change shows improvements for lots of AVX512 reduction code (not sure that we would ever expect extract-of-extract otherwise).
There's a small regression in some of the partial-permute tests (extracting followed by splat). I'll try to reduce that and file a bug.