When inserting a scalable subvector into a scalable vector through
the stack, the index to store to needs to be scaled by vscale.
Before this patch, that didn't yet happen, so it would generate the
wrong offset, thus storing a subvector to the incorrect address
and overwriting the wrong lanes.
For some insert:
nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2)
The offset was not scaled by vscale:
orr x8, x8, #0x4 st1h { z0.h }, p0, [sp] st1h { z1.d }, p1, [x8] ld1h { z0.h }, p0/z, [sp]
And is changed to:
mov x8, sp st1h { z0.h }, p0, [sp] st1h { z1.d }, p1, [x8, #1, mul vl] ld1h { z0.h }, p0/z, [sp]
Given you're not making assumptions as to where Idx is coming from I don't think an assert is safe enough. Sure an asserts build will exit here but a release build could leak/corrupt data, which is a problem the user is trying to prevent, hence calling this function.
Instead I think the assert should be replaced by always clamping Idx (see the MaxIndex calculation below). If the source of Idx is indeed an EXTRACT_SUBVECTOR/INSERT_SUBVECTOR then the invalid index should be asserting as part of getNode rather than getting this far.