Page MenuHomePhabricator

[ARM] Avoid pointless vrev of element-wise vmov

Authored by john.brawn on Mar 20 2020, 10:19 AM.



If we have an element-wise vmov immediate instruction then a subsequent vrev with width greater or equal to the vmov element width, then that vrev won't do anything. Add a DAG combine to convert bitcasts that would become such vrevs into vector_reg_casts instead.

Diff Detail

Event Timeline

john.brawn created this revision.Mar 20 2020, 10:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2020, 10:19 AM

I see. Because we are just swapping around the same values anyway. Makes sense.

Would it make sense to do this as a DAG fold? That way we would get MVE for free too.


Can you please run the update script on this with old codegen, pre-commit the result and then show the diffs against new codegen here.

It might help to make them hard-float too.

Would it make sense to do this as a DAG fold? That way we would get MVE for free too.

Do you mean do something in PerformDAGCombine? I don't think that would work, because the only way to do it that I can see would be to remove the bitcast but then we'd get an error due to a type mismatch.

We added a VECTOR_REG_CAST, which it like a bitcast but doesn't change the bits. Similar to the AArch64 NVCAST.

Not that you have to do this here, but it also might allow more patterns to be converted, removing move vrev's. Essentially if the immediate after it has been bitcast/vrev'd is still a legal immediate, we can just generate the new immediate and VECTOR_REG_CAST it.

Do this in ARMTargetLowering::PerformDAGCombine

john.brawn marked an inline comment as done.Mar 26 2020, 11:42 AM
john.brawn added inline comments.

I tried using, but it doesn't really work well here because the compiler output is different when big-endian and little-endian (for the llc before this patch, and also in the tests that aren't to do with this patch) and the only way the script can cope with that is to have entirely separate check prefixes, which doesn't do a good job of checking that the code generated doesn't change depending on endianness.

RKSimon added inline comments.

you can add common prefixes:


john.brawn marked an inline comment as done.Mar 27 2020, 5:12 AM
john.brawn added inline comments.
416–1 ignores everything but the last prefix (there's a FIXME in it about this).

The code looks OK. I think update_llc_test_checks should work, I've used it elsewhere in the past.

You may run into problems with zero vectors no longer being recognized? Like in the passthru of masked loads. Can you rebase this onto the MVE tests too?

john.brawn updated this revision to Diff 254806.Apr 3 2020, 8:05 AM
john.brawn edited the summary of this revision. (Show Details)

Rebase and add adjust MVE masked load handling. Also make use of update_llc_checks - I tried it again and this time it did use a single CHECK when both little-endian and big-endian code generation are the same, so it looks like I was using it wrong somehow before (though I don't know in what way).

dmgreen accepted this revision.Apr 3 2020, 8:40 AM

LGTM, Thanks.

This revision is now accepted and ready to land.Apr 3 2020, 8:40 AM
This revision was automatically updated to reflect the committed changes.