This is an archive of the discontinued LLVM Phabricator instance.

[AArch64InstrInfo.td] - Lowering fix for cttz intrinsic
Needs RevisionPublic

Authored by gsocshubham on Feb 24 2022, 1:36 AM.

Details

Summary

This patch is a separate patch for lowering fix as mentioned in - https://reviews.llvm.org/D119010

TODO - Create this patch on top of https://reviews.llvm.org/D113291 (as a child of the D113291).

TODO - Add unit test

Diff Detail

Event Timeline

gsocshubham created this revision.Feb 24 2022, 1:36 AM
gsocshubham requested review of this revision.Feb 24 2022, 1:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2022, 1:36 AM

TODO - Add unit test

Yeah, this needs some tests to explain what you think it is improving. It seems to just add an extra instructions where it isn't needed? Was it intended to improve some larger code pattern?

craig.topper added inline comments.Feb 24 2022, 10:39 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.td
2053

This doesn't make sense to me. If you have a plain cttz operation you don't need an AND.

I thought the problem was that you have (select (seteq X, 0), 0, (cttz X)) and you need to turn the select and setcc in to an AND. That needs to be done as a DAGCombine.

craig.topper requested changes to this revision.Feb 24 2022, 10:39 AM
This revision now requires changes to proceed.Feb 24 2022, 10:39 AM

TODO - Add unit test

Yeah, this needs some tests to explain what you think it is improving. It seems to just add an extra instructions where it isn't needed? Was it intended to improve some larger code pattern?

It was intended to generate assembly as of GCC -

f(unsigned int):
        rbit    w0, w0
        clz     w0, w0
        and     w0, w0, 31
        ret

but patch present here - https://reviews.llvm.org/D113291 is generating below assembly -

rbit w8, w0
cmp w0, #0
clz w8, w8
csel w0, wzr, w8, eq
ret

Hence, to map with GCC assembly, I had modified patch to remove clz and csel instruction and generate below assembly -

rbit    w8, w0
clz     w8, w8
and     w0, w8, #0x1f
ret
craig.topper added a comment.EditedFeb 27 2022, 10:34 AM

TODO - Add unit test

Yeah, this needs some tests to explain what you think it is improving. It seems to just add an extra instructions where it isn't needed? Was it intended to improve some larger code pattern?

It was intended to generate assembly as of GCC -

f(unsigned int):
        rbit    w0, w0
        clz     w0, w0
        and     w0, w0, 31
        ret

but patch present here - https://reviews.llvm.org/D113291 is generating below assembly -

rbit w8, w0
cmp w0, #0
clz w8, w8
csel w0, wzr, w8, eq
ret

Hence, to map with GCC assembly, I had modified patch to remove clz and csel instruction and generate below assembly -

rbit    w8, w0
clz     w8, w8
and     w0, w8, #0x1f
ret

How does the csel get removed? There's no code here that detects that. To me it looks like it will now emit

rbit w8, w0
cmp w0, #0
clz w8, w8
and     w0, w8, #0x1f
csel w0, wzr, w8, eq
ret

It also looks like this patch would break the codegen for

define i32 @foo(i32 %x) {
  %a = call i32 @llvm.cttz.i32(i32 %x, i1 0)
  ret i32 %a
}

As you'll now add an AND that shouldn't be there.

There are test cases in llvm/test/CodeGen/AArch64/dp1.ll that fail with this patch that should demonstrate that issue.

TODO - Add unit test

Yeah, this needs some tests to explain what you think it is improving. It seems to just add an extra instructions where it isn't needed? Was it intended to improve some larger code pattern?

It was intended to generate assembly as of GCC -

f(unsigned int):
        rbit    w0, w0
        clz     w0, w0
        and     w0, w0, 31
        ret

but patch present here - https://reviews.llvm.org/D113291 is generating below assembly -

rbit w8, w0
cmp w0, #0
clz w8, w8
csel w0, wzr, w8, eq
ret

Hence, to map with GCC assembly, I had modified patch to remove clz and csel instruction and generate below assembly -

rbit    w8, w0
clz     w8, w8
and     w0, w8, #0x1f
ret

How does the csel get removed? There's no code here that detects that. To me it looks like it will now emit

rbit w8, w0
cmp w0, #0
clz w8, w8
and     w0, w8, #0x1f
csel w0, wzr, w8, eq
ret

Yes. It emits above assembly for current patch.

  1. Earlier, I had submitted patch - https://reviews.llvm.org/D119010

which emits -

rbit    w8, w0
clz     w8, w8
and     w0, w8, #0x1f
ret

which is modification of the patch submitted by original author which was generating assembly -

rbit w8, w0
cmp w0, #0
clz w8, w8
csel w0, wzr, w8, eq
ret
  1. Hence, to solve the issue, what I have proposed is generation of assembly similar to that of GCC upstream.
  1. Current patch which is https://reviews.llvm.org/D120462 - are just target dependant changes from https://reviews.llvm.org/D119010 since it was difficult there to review both intrinsic identification and AArch64InstrInfo.td changes.

It also looks like this patch would break the codegen for

define i32 @foo(i32 %x) {
  %a = call i32 @llvm.cttz.i32(i32 %x, i1 0)
  ret i32 %a
}

As you'll now add an AND that shouldn't be there.

There are test cases in llvm/test/CodeGen/AArch64/dp1.ll that fail with this patch that should demonstrate that issue.

Understood. Yes, the changes causes the regression. I will work on it and also changes requested by you on other patch also.

a. What I would like to know is whether the approach I have followed i.e. to remove csel and clz changes from https://reviews.llvm.org/D119010 and generate below assembly which I am able to -

rbit    w8, w0
clz     w8, w8
and     w0, w8, #0x1f
ret

which is quite similar to what GCC is emitting currently -

f(unsigned int):
        rbit    w0, w0
        clz     w0, w0
        and     w0, w0, 31
        ret

b. My aim to provide fix for the issue is to generate similar assembly that is being generated by GCC AArch64 backend currently. Please suggest me if I am wrong here. Once, it is clear i.e. if aim is to generate assembly as of GCC, I will work on regression and other comments. But first, I want to validate my approach.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
2053

Thanks for the comment. I will take a look at it.

Does generation of below assembly not acceptable?

rbit    w8, w0
clz     w8, w8
and     w0, w8, #0x1f
ret

If not, please suggest what is the fix needed apart from identifying cttz intrinsic which this patch https://reviews.llvm.org/D113291 is already doing?

The correct fix is to generate (select (icmp eq X, 0), 0, cttz X) from AgressiveInstCombine. Then, in performSelectCombine in AArch64ISeLowring.cpp, detect the select pattern and place it with an ISD::AND.

The cttz intrinsic by itself must be able to return 32. You can't put an AND after any cttz. You have to look for a select that uses 0 when cttz returns 32.

Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2022, 12:48 AM
Herald added a subscriber: StephenFan. · View Herald Transcript
lebedev.ri resigned from this revision.Jan 12 2023, 4:58 PM

This review may be stuck/dead, consider abandoning if no longer relevant.
Removing myself as reviewer in attempt to clean dashboard.