This patch is a separate patch for lowering fix as mentioned in - https://reviews.llvm.org/D119010
TODO - Create this patch on top of https://reviews.llvm.org/D113291 (as a child of the D113291).
TODO - Add unit test
Differential D120462
[AArch64InstrInfo.td] - Lowering fix for cttz intrinsic gsocshubham on Feb 24 2022, 1:36 AM. Authored by
Details This patch is a separate patch for lowering fix as mentioned in - https://reviews.llvm.org/D119010 TODO - Create this patch on top of https://reviews.llvm.org/D113291 (as a child of the D113291). TODO - Add unit test
Diff Detail
Unit Tests Event TimelineComment Actions
Yeah, this needs some tests to explain what you think it is improving. It seems to just add an extra instructions where it isn't needed? Was it intended to improve some larger code pattern?
Comment Actions It was intended to generate assembly as of GCC - f(unsigned int): rbit w0, w0 clz w0, w0 and w0, w0, 31 ret but patch present here - https://reviews.llvm.org/D113291 is generating below assembly - rbit w8, w0 cmp w0, #0 clz w8, w8 csel w0, wzr, w8, eq ret Hence, to map with GCC assembly, I had modified patch to remove clz and csel instruction and generate below assembly - rbit w8, w0 clz w8, w8 and w0, w8, #0x1f ret Comment Actions How does the csel get removed? There's no code here that detects that. To me it looks like it will now emit rbit w8, w0 cmp w0, #0 clz w8, w8 and w0, w8, #0x1f csel w0, wzr, w8, eq ret It also looks like this patch would break the codegen for define i32 @foo(i32 %x) { %a = call i32 @llvm.cttz.i32(i32 %x, i1 0) ret i32 %a } As you'll now add an AND that shouldn't be there. There are test cases in llvm/test/CodeGen/AArch64/dp1.ll that fail with this patch that should demonstrate that issue. Comment Actions Yes. It emits above assembly for current patch.
which emits - rbit w8, w0 clz w8, w8 and w0, w8, #0x1f ret which is modification of the patch submitted by original author which was generating assembly - rbit w8, w0 cmp w0, #0 clz w8, w8 csel w0, wzr, w8, eq ret
Understood. Yes, the changes causes the regression. I will work on it and also changes requested by you on other patch also. a. What I would like to know is whether the approach I have followed i.e. to remove csel and clz changes from https://reviews.llvm.org/D119010 and generate below assembly which I am able to - rbit w8, w0 clz w8, w8 and w0, w8, #0x1f ret which is quite similar to what GCC is emitting currently - f(unsigned int): rbit w0, w0 clz w0, w0 and w0, w0, 31 ret b. My aim to provide fix for the issue is to generate similar assembly that is being generated by GCC AArch64 backend currently. Please suggest me if I am wrong here. Once, it is clear i.e. if aim is to generate assembly as of GCC, I will work on regression and other comments. But first, I want to validate my approach.
Comment Actions The correct fix is to generate (select (icmp eq X, 0), 0, cttz X) from AgressiveInstCombine. Then, in performSelectCombine in AArch64ISeLowring.cpp, detect the select pattern and place it with an ISD::AND. The cttz intrinsic by itself must be able to return 32. You can't put an AND after any cttz. You have to look for a select that uses 0 when cttz returns 32. Comment Actions This review may be stuck/dead, consider abandoning if no longer relevant. |
This doesn't make sense to me. If you have a plain cttz operation you don't need an AND.
I thought the problem was that you have (select (seteq X, 0), 0, (cttz X)) and you need to turn the select and setcc in to an AND. That needs to be done as a DAGCombine.