This can avoid a loss of decoupling with the scalar unit on cores
with decoupled scalar and vector units.
We should support FP too, but those use extract_element and not a
custom ISD node so it is a little different. I also left a FIXME
in the test for i64 extract and store on RV32.
I was wondering what'd happen if you replaced it with a ISD::VP_STORE. Just thinking that a standard node would potentially provide more optimization opportunities compared to a RISCVISD node.