Building on the work on D124284, this patch tags v4i8 and v2i16 vector loads as custom, enabling SLP to try to vectorize these types ending in a partial store (using the SSE MOVD instruction) - we already do something similar for 64-bit vector types.
I haven't had time to properly test these (my last testing was with D103925 which attempted something similar), so if anyone has a working test suite instance to hand that'd be very useful!