Perform the 256-bit AVX1 splitting as part of a post legalize op DAG combine.
This seems to give better opportunities for replacing sign_extend/zero_extend with any_extend when it preceeds a pmuldq/pmuludq.
There are a few slight instruction count regressions where some loads were previously being split and folded directly into pmovzx instructions. Now we are doing one load and then shuffling. This seems to be because generic DAG combine can create a concat of zextload/sextloads when it sees an illegal zero/sign extended load. But this doens't extend to any extend. It's unclear which code is actually better.