Instead of using, for example, dup v0.4s, wzr, which transfers between register files, use the more efficient movi v0.4s, #0 instead.
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
Comment Actions
This looks good to me for Cortex cores (A57,A72), where movi and dup have the same cost, so this should be a (smallish) improvement there.
Comment Actions
LGTM, it indeed seems like a sensible change which is confirmed by the available cost models.
This comment was removed by evandro.