These instructions are semantically identical in the case where the
offset is 0 with the exception that vslideup has a vector overlap
constraint and vslidedown doesn't. As a result, we can prefer the one
without register overlap constraints to improve register allocation
flexibility.
This patch implements https://reviews.llvm.org/D152298 but as a DAG
combine. It catches a few more cases, including some scalable vectors.
Co-authored-by: preames@rivosinc.com