If we are extracting a chunk of a vector that's smaller than an operand of the concatenated vector operand, we can extract directly from one of those original operands.
This is another suggestion from PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024#c2
But I'm not sure yet if it will make any difference on those patterns. It seems to help a few existing AVX512 tests though.
Most of the code diff here is refactoring, so I can make that a preliminary commit if preferred.
If we're peeking into any concat can we guarantee this or should we just wrap in an if() instead?