-Lower AVGFloor(A, B) to:
SRL(A) + SRL(B) + (A&B)&1.
-Lower AVGCeil(A, B) to:
SRL(A) + SRL(B) + (A|B)&1.
Paths
| Differential D143283
[AArch64][SVE]: custom lower AVGFloor/AVGCeil. ClosedPublic Authored by hassnaa-arm on Feb 3 2023, 9:57 AM.
Details Summary -Lower AVGFloor(A, B) to: SRL(A) + SRL(B) + (A&B)&1. -Lower AVGCeil(A, B) to: SRL(A) + SRL(B) + (A|B)&1.
Diff Detail
Unit TestsFailed Event TimelineComment Actions This patch doesn't seem like it is in a state to review yet (e.g. it doesn't have any tests), perhaps you can add 'WIP' to the title to make it clear that this is a Work In Progress?
This revision now requires changes to proceed.Feb 6 2023, 12:58 AM hassnaa-arm retitled this revision from [Transform][InstCombine]: transform lshr pattern. to [Transform][InstCombine]: transform lshr pattern. [WIP].Feb 6 2023, 2:01 AM hassnaa-arm retitled this revision from [Transform][InstCombine]: transform lshr pattern. [WIP] to [AArch64][combine]: transform lshr pattern. [WIP].Feb 7 2023, 10:28 AM hassnaa-arm retitled this revision from [AArch64][combine]: transform lshr pattern. [WIP] to [AArch64][combine]: combine lshr pattern. [WIP].Feb 7 2023, 10:28 AM
hassnaa-arm retitled this revision from [AArch64][combine]: combine lshr pattern. [WIP] to [AArch64][SVE]: custom lower AVGFloor/AVGCeil. [WIP].Feb 10 2023, 4:02 AM Comment Actions Please pre-commit the new testcases, so the changes are more visible. The <vscale x 2 x i128> variations currently crash the compiler; I'd recommend fixing that before trying to optimize them. The "expected" code for haddu_v2i32 is worse than what we currently generate. Comment Actions Optimize the generated code by checking if the extended node was previously truncated. Comment Actions Optimize the generated code by checking if the extended nodes were previously truncated. hassnaa-arm retitled this revision from [AArch64][SVE]: custom lower AVGFloor/AVGCeil. [WIP] to [AArch64][SVE]: custom lower AVGFloor/AVGCeil..Feb 20 2023, 2:04 AM
hassnaa-arm marked 4 inline comments as done. Comment ActionsCheck if it's better to emit the original code or custom lower AVGFloor/Ceil
hassnaa-arm added inline comments.
Comment Actions Hello. Sorry for the delay in looking at this but I wasn't sure exactly what you were trying to do, and I've never been a huge fan of DAG combines that create the wrong node just to expand it later. It looks like for legal types this can lead to a nice decrease in instruction count though. For smaller types I'm not sure that checking for individual opcodes for extension will work well. They could already be extending loads for example. I've not thought about it too much yet, but As far as I understand from the original hadd (/avg) work it would probably be better to be checking that the top bits are known 0 for unsigned, and that there are >1 sign bits for the signed cases. In any case, it would be good to see alive proofs for the transforms you are making.
hassnaa-arm marked 6 inline comments as done. Comment ActionsRemove sve-avgfloor testing file.
Comment Actions Hi - I was just looking at the patch whilst you updated it! Please ignore any comments that don't apply any more.
hassnaa-arm added inline comments.
Comment Actions Thanks. here are some alive proofs for the transform in https://alive2.llvm.org/ce/z/N6hwQY and https://alive2.llvm.org/ce/z/u_GjYJ. Can you extend the testing to include both ashr and lshr versions? They should both be useful if we are custom legalizing the nodes. Otherwise I think this looks good.
Comment Actions I think it's worth adding test for both the ashr and lshr versions, but otherwise I think this LGTM. Thanks
Comment Actions Thanks for all the changes @hassnaa-arm, I've just left some final minor comments.
hassnaa-arm added inline comments.
Comment Actions Thanks for the changes @hassnaa-arm, I'm satisfied with the patch now so removing my 'requesting changes'. This revision is now accepted and ready to land.Mar 9 2023, 6:14 AM This revision was landed with ongoing or failed builds.Mar 13 2023, 12:01 PM Closed by commit rG40a51e1afce9: [AArch64][SVE]: custom lower AVGFloor/AVGCeil. (authored by Hassnaa Hamdi <hassnaa.hamdi@arm.com>). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 503500 llvm/lib/Target/AArch64/AArch64ISelLowering.h
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
llvm/test/CodeGen/AArch64/sve-hadd.ll
llvm/test/CodeGen/AArch64/sve2-hadd.ll
|
This can be moved into the start of LowerAVG to make this function a little more regular. (I know it's not super regular already, but other parts already do the same thing).