-Lower AVGFloor(A, B) to:
SRL(A) + SRL(B) + (A&B)&1.
-Lower AVGCeil(A, B) to:
SRL(A) + SRL(B) + (A|B)&1.
Differential D143283
[AArch64][SVE]: custom lower AVGFloor/AVGCeil. hassnaa-arm on Feb 3 2023, 9:57 AM. Authored by
Details -Lower AVGFloor(A, B) to: SRL(A) + SRL(B) + (A&B)&1. -Lower AVGCeil(A, B) to: SRL(A) + SRL(B) + (A|B)&1.
Diff Detail
Event TimelineComment Actions This patch doesn't seem like it is in a state to review yet (e.g. it doesn't have any tests), perhaps you can add 'WIP' to the title to make it clear that this is a Work In Progress?
Comment Actions Please pre-commit the new testcases, so the changes are more visible. The <vscale x 2 x i128> variations currently crash the compiler; I'd recommend fixing that before trying to optimize them. The "expected" code for haddu_v2i32 is worse than what we currently generate. Comment Actions Optimize the generated code by checking if the extended node was previously truncated. Comment Actions Optimize the generated code by checking if the extended nodes were previously truncated.
Comment Actions Hello. Sorry for the delay in looking at this but I wasn't sure exactly what you were trying to do, and I've never been a huge fan of DAG combines that create the wrong node just to expand it later. It looks like for legal types this can lead to a nice decrease in instruction count though. For smaller types I'm not sure that checking for individual opcodes for extension will work well. They could already be extending loads for example. I've not thought about it too much yet, but As far as I understand from the original hadd (/avg) work it would probably be better to be checking that the top bits are known 0 for unsigned, and that there are >1 sign bits for the signed cases. In any case, it would be good to see alive proofs for the transforms you are making.
Comment Actions Remove sve-avgfloor testing file.
Comment Actions Hi - I was just looking at the patch whilst you updated it! Please ignore any comments that don't apply any more.
Comment Actions Thanks. here are some alive proofs for the transform in https://alive2.llvm.org/ce/z/N6hwQY and https://alive2.llvm.org/ce/z/u_GjYJ. Can you extend the testing to include both ashr and lshr versions? They should both be useful if we are custom legalizing the nodes. Otherwise I think this looks good.
Comment Actions I think it's worth adding test for both the ashr and lshr versions, but otherwise I think this LGTM. Thanks
Comment Actions Thanks for all the changes @hassnaa-arm, I've just left some final minor comments.
Comment Actions Thanks for the changes @hassnaa-arm, I'm satisfied with the patch now so removing my 'requesting changes'. |
Just as a suggestion for future uses of these pattern matches, is that you could write this in a more compact way like this: