An alternative to D109348, this adds fallback broadcast patterns on AVX1 targets instead.
I've added AVX2 test coverage to help show a missing fold - where the vbroadcastsd_ymm(vbroadcastss_load_xmm()) should be foldable to a single vbroadcastss_load_ymm() on AVX1 and AVX2 targets if we create the broadcast nodes.
Thanks for creating this bugfix. This case only cover the code:
I think we'd better add more cases.