Much like mulx's WriteIMulH, there are two outputs of AVX2 GATHER instructions.
This was changed back in rL160110, but the sched model change wasn't present.
So right now, for sched models that are marked as complete (znver3 only now),
codegen'ning GATHER results in a crash:
DefIdx 1 exceeds machine model writes for early-clobber renamable $ymm3, dead early-clobber renamable $ymm2 = VPGATHERDDYrm killed renamable $ymm3(tied-def 0), undef renamable $rax, 4, renamable $ymm0, 0, $noreg, killed renamable $ymm2(tied-def 1) :: (load 32, align 1)
https://godbolt.org/z/Ks7zW7WGh
I'm guessing we need to deal with this like we deal with WriteIMulH?
Why does it match a scalar load latency and not a vector load latency?