This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Increase the cost of i1 inserts / extracts
ClosedPublic

Authored by dmgreen on May 23 2023, 1:08 AM.

Details

Summary

i1 inserts will need an extra cset, and i1 extracts need a cmp (or tst) in order to be used. This increase the cost of them a little to account for those extra instructions.
https://godbolt.org/z/3c5z4G7Mh

Diff Detail

Event Timeline

dmgreen created this revision.May 23 2023, 1:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2023, 1:08 AM
dmgreen requested review of this revision.May 23 2023, 1:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2023, 1:08 AM
SjoerdMeijer accepted this revision.May 23 2023, 1:25 AM

This is related to D142359, that change exposed poor codegen for these type of codes. I think increasing the cost a little makes sense.
Nice one, thanks for fixing.

This revision is now accepted and ready to land.May 23 2023, 1:25 AM
This revision was landed with ongoing or failed builds.Jun 1 2023, 2:55 AM
This revision was automatically updated to reflect the committed changes.

Hi David,

I'm seeing an interesting corner-case that this patch triggers on aarch64-linux-gnu at -O2 -flto -- it increases code size of 462.libquantum by 24%! Is this something interesting to investigate? I'll be happy to assist.

Hello. From looking at the precommit tests I had, they seem to show the opposite in libquantum. A small decrease in codesize, which in my testing had led to a performance improvement (but that may be a bit noisy). This would be O3 or Ofast though.

I would expect this patch to cause less SLP vectorization, and that should usually cause a codesize to increase a little. But the amounts you mention are much bigger than I would expect from that. Maybe there is different inlining happening now?