These instructions are semantically identical in the case where the offset is 0 with the exception that vslideup has a vector overlap constraint and vslidedown doesn't. As a result, we can prefer the one without register overlap constraints to improve register allocation flexibility.
This patch is specific to subvector insertion, but this pattern may show up elsewhere. We could possibly rework this as a DAG combine if reviewers thought that was worthwhile.
Should we do the same thing here? Would that catch the cases the patch from @luke gets?