The CONCAT_VECTORS case was using the original mask element count to determine how to adjust the broadcast index. But if we looked through a bitcast the original mask size doesn't tell us anything about the concat_vectors.
This patch switchs to using the concat_vectors input element count directly instead.
This caused a crash while doing experiments with -mprefer-vector-width=256 with skylake-avx512 on some benchmarks. All the types present in the crash were 256 or 128 bits wide so I'm not sure why -mprefer-vector-width=256 was relevant.
I don't currently have a reduced test case, but I'll see what I can do.