This fixes PR23349: the _Int instructions are special, in that they operate on the full VR128 instead of FR32. The load folding then looks at MOVSS, at the user, and bails out when it sees a size mismatch.
What we really know is that the rm_Int instructions don't load the higher lanes, so folding is fine.
This happens for the straightforward intrinsic code, e.g.:
_mm_add_ss(a, _mm_load_ss(p));