If we have an element-wise vmov immediate instruction then a subsequent vrev with width greater or equal to the vmov element width, then that vrev won't do anything. Add a DAG combine to convert bitcasts that would become such vrevs into vector_reg_casts instead.
I see. Because we are just swapping around the same values anyway. Makes sense.
Would it make sense to do this as a DAG fold? That way we would get MVE for free too.
Can you please run the update script on this with old codegen, pre-commit the result and then show the diffs against new codegen here.
It might help to make them hard-float too.
Do you mean do something in PerformDAGCombine? I don't think that would work, because the only way to do it that I can see would be to remove the bitcast but then we'd get an error due to a type mismatch.
We added a VECTOR_REG_CAST, which it like a bitcast but doesn't change the bits. Similar to the AArch64 NVCAST.
Not that you have to do this here, but it also might allow more patterns to be converted, removing move vrev's. Essentially if the immediate after it has been bitcast/vrev'd is still a legal immediate, we can just generate the new immediate and VECTOR_REG_CAST it.
I tried using update_llc_test_checks.py, but it doesn't really work well here because the compiler output is different when big-endian and little-endian (for the llc before this patch, and also in the tests that aren't to do with this patch) and the only way the script can cope with that is to have entirely separate check prefixes, which doesn't do a good job of checking that the code generated doesn't change depending on endianness.
The code looks OK. I think update_llc_test_checks should work, I've used it elsewhere in the past.
You may run into problems with zero vectors no longer being recognized? Like in the passthru of masked loads. Can you rebase this onto the MVE tests too?
Rebase and add adjust MVE masked load handling. Also make use of update_llc_checks - I tried it again and this time it did use a single CHECK when both little-endian and big-endian code generation are the same, so it looks like I was using it wrong somehow before (though I don't know in what way).