i1 inserts will need an extra cset, and i1 extracts need a cmp (or tst) in order to be used. This increase the cost of them a little to account for those extra instructions.
https://godbolt.org/z/3c5z4G7Mh
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This is related to D142359, that change exposed poor codegen for these type of codes. I think increasing the cost a little makes sense.
Nice one, thanks for fixing.
Hi David,
I'm seeing an interesting corner-case that this patch triggers on aarch64-linux-gnu at -O2 -flto -- it increases code size of 462.libquantum by 24%! Is this something interesting to investigate? I'll be happy to assist.
Hello. From looking at the precommit tests I had, they seem to show the opposite in libquantum. A small decrease in codesize, which in my testing had led to a performance improvement (but that may be a bit noisy). This would be O3 or Ofast though.
I would expect this patch to cause less SLP vectorization, and that should usually cause a codesize to increase a little. But the amounts you mention are much bigger than I would expect from that. Maybe there is different inlining happening now?