This is an archive of the discontinued LLVM Phabricator instance.

Differential D120462

[AArch64InstrInfo.td] - Lowering fix for cttz intrinsic
Needs RevisionPublic

Authored by gsocshubham on Feb 24 2022, 1:36 AM.

Download Raw Diff

Details

Reviewers

craig.topper
xbolva00
spatel
fhahn
lebedev.ri

Summary

This patch is a separate patch for lowering fix as mentioned in - https://reviews.llvm.org/D119010

TODO - Create this patch on top of https://reviews.llvm.org/D113291 (as a child of the D113291).

TODO - Add unit test

Diff Detail

Unit TestsFailed

	Time	Test
	60,030 ms	x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp
	60,030 ms	x64 debian > libFuzzer.libFuzzer::large.test

Event Timeline

gsocshubham created this revision.Feb 24 2022, 1:36 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald TranscriptFeb 24 2022, 1:36 AM

gsocshubham requested review of this revision.Feb 24 2022, 1:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2022, 1:36 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

gsocshubham mentioned this in D119010: [AggressiveInstCombine] Recognize table-based ctz implementation and enable it for AARCH64 at -O3.Feb 24 2022, 1:39 AM

Harbormaster completed remote builds in B151207: Diff 411038.Feb 24 2022, 2:24 AM

gsocshubham added reviewers: craig.topper, xbolva00, spatel, lebedev.ri, fhahn.Feb 24 2022, 10:31 AM

TODO - Add unit test

Yeah, this needs some tests to explain what you think it is improving. It seems to just add an extra instructions where it isn't needed? Was it intended to improve some larger code pattern?

craig.topper added inline comments.Feb 24 2022, 10:39 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.td
2053	This doesn't make sense to me. If you have a plain cttz operation you don't need an AND. I thought the problem was that you have (select (seteq X, 0), 0, (cttz X)) and you need to turn the select and setcc in to an AND. That needs to be done as a DAGCombine.

craig.topper requested changes to this revision.Feb 24 2022, 10:39 AM

This revision now requires changes to proceed.Feb 24 2022, 10:39 AM

In D120462#3343650, @dmgreen wrote:

TODO - Add unit test

Yeah, this needs some tests to explain what you think it is improving. It seems to just add an extra instructions where it isn't needed? Was it intended to improve some larger code pattern?

It was intended to generate assembly as of GCC -

f(unsigned int):
        rbit    w0, w0
        clz     w0, w0
        and     w0, w0, 31
        ret

but patch present here - https://reviews.llvm.org/D113291 is generating below assembly -

rbit w8, w0
cmp w0, #0
clz w8, w8
csel w0, wzr, w8, eq
ret

Hence, to map with GCC assembly, I had modified patch to remove clz and csel instruction and generate below assembly -

rbit    w8, w0
clz     w8, w8
and     w0, w8, #0x1f
ret

In D120462#3347920, @gsocshubham wrote:
In D120462#3343650, @dmgreen wrote:

TODO - Add unit test

Yeah, this needs some tests to explain what you think it is improving. It seems to just add an extra instructions where it isn't needed? Was it intended to improve some larger code pattern?

It was intended to generate assembly as of GCC -
f(unsigned int):
        rbit    w0, w0
        clz     w0, w0
        and     w0, w0, 31
        ret
but patch present here - https://reviews.llvm.org/D113291 is generating below assembly -
rbit w8, w0
cmp w0, #0
clz w8, w8
csel w0, wzr, w8, eq
ret
Hence, to map with GCC assembly, I had modified patch to remove clz and csel instruction and generate below assembly -
rbit    w8, w0
clz     w8, w8
and     w0, w8, #0x1f
ret

How does the csel get removed? There's no code here that detects that. To me it looks like it will now emit

rbit w8, w0
cmp w0, #0
clz w8, w8
and     w0, w8, #0x1f
csel w0, wzr, w8, eq
ret

It also looks like this patch would break the codegen for

define i32 @foo(i32 %x) {
  %a = call i32 @llvm.cttz.i32(i32 %x, i1 0)
  ret i32 %a
}

As you'll now add an AND that shouldn't be there.

There are test cases in llvm/test/CodeGen/AArch64/dp1.ll that fail with this patch that should demonstrate that issue.

In D120462#3347925, @craig.topper wrote:
In D120462#3347920, @gsocshubham wrote:
In D120462#3343650, @dmgreen wrote:

TODO - Add unit test

Yeah, this needs some tests to explain what you think it is improving. It seems to just add an extra instructions where it isn't needed? Was it intended to improve some larger code pattern?

It was intended to generate assembly as of GCC -
f(unsigned int):
        rbit    w0, w0
        clz     w0, w0
        and     w0, w0, 31
        ret
but patch present here - https://reviews.llvm.org/D113291 is generating below assembly -
rbit w8, w0
cmp w0, #0
clz w8, w8
csel w0, wzr, w8, eq
ret
Hence, to map with GCC assembly, I had modified patch to remove clz and csel instruction and generate below assembly -
rbit    w8, w0
clz     w8, w8
and     w0, w8, #0x1f
ret
How does the csel get removed? There's no code here that detects that. To me it looks like it will now emit
rbit w8, w0
cmp w0, #0
clz w8, w8
and     w0, w8, #0x1f
csel w0, wzr, w8, eq
ret

Yes. It emits above assembly for current patch.

Earlier, I had submitted patch - https://reviews.llvm.org/D119010

which emits -

rbit    w8, w0
clz     w8, w8
and     w0, w8, #0x1f
ret

which is modification of the patch submitted by original author which was generating assembly -

rbit w8, w0
cmp w0, #0
clz w8, w8
csel w0, wzr, w8, eq
ret

Hence, to solve the issue, what I have proposed is generation of assembly similar to that of GCC upstream.

Current patch which is https://reviews.llvm.org/D120462 - are just target dependant changes from https://reviews.llvm.org/D119010 since it was difficult there to review both intrinsic identification and AArch64InstrInfo.td changes.

It also looks like this patch would break the codegen for
define i32 @foo(i32 %x) {
  %a = call i32 @llvm.cttz.i32(i32 %x, i1 0)
  ret i32 %a
}
As you'll now add an AND that shouldn't be there.

There are test cases in llvm/test/CodeGen/AArch64/dp1.ll that fail with this patch that should demonstrate that issue.

Understood. Yes, the changes causes the regression. I will work on it and also changes requested by you on other patch also.

a. What I would like to know is whether the approach I have followed i.e. to remove csel and clz changes from https://reviews.llvm.org/D119010 and generate below assembly which I am able to -

rbit    w8, w0
clz     w8, w8
and     w0, w8, #0x1f
ret

which is quite similar to what GCC is emitting currently -

f(unsigned int):
        rbit    w0, w0
        clz     w0, w0
        and     w0, w0, 31
        ret

b. My aim to provide fix for the issue is to generate similar assembly that is being generated by GCC AArch64 backend currently. Please suggest me if I am wrong here. Once, it is clear i.e. if aim is to generate assembly as of GCC, I will work on regression and other comments. But first, I want to validate my approach.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
2053	Thanks for the comment. I will take a look at it. Does generation of below assembly not acceptable? rbit w8, w0 clz w8, w8 and w0, w8, #0x1f ret If not, please suggest what is the fix needed apart from identifying cttz intrinsic which this patch https://reviews.llvm.org/D113291 is already doing?

The correct fix is to generate (select (icmp eq X, 0), 0, cttz X) from AgressiveInstCombine. Then, in performSelectCombine in AArch64ISeLowring.cpp, detect the select pattern and place it with an ISD::AND.

The cttz intrinsic by itself must be able to return 32. You can't put an AND after any cttz. You have to look for a select that uses 0 when cttz returns 32.

rahular-rrlogic added a parent revision: D113291: [AggressiveInstCombine] Lower Table Based CTTZ .Apr 14 2022, 12:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2022, 12:48 AM

Herald added a subscriber: StephenFan. · View Herald Transcript

rahular-rrlogic mentioned this in D123782: [AArch64] Generate AND in place of CSEL for Table Based CTTZ lowering in -O3.Apr 14 2022, 4:45 AM

gsocshubham removed a parent revision: D113291: [AggressiveInstCombine] Lower Table Based CTTZ .Jun 30 2022, 7:20 AM

This review may be stuck/dead, consider abandoning if no longer relevant.
Removing myself as reviewer in attempt to clean dashboard.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64InstrInfo.td

4 lines

Diff 411038

llvm/lib/Target/AArch64/AArch64InstrInfo.td

Context not available.
	def REV16Xr : OneXRegData<0b001, "rev16", null_frag>;	def REV16Xr : OneXRegData<0b001, "rev16", null_frag>;

	def : Pat<(cttz GPR32:$Rn),	def : Pat<(cttz GPR32:$Rn),
	(CLZWr (RBITWr GPR32:$Rn))>;	(ANDWri (CLZWr (RBITWr GPR32:$Rn)), (i32 4))>;
		craig.topperUnsubmitted Not Done Reply Inline Actions This doesn't make sense to me. If you have a plain cttz operation you don't need an AND. I thought the problem was that you have (select (seteq X, 0), 0, (cttz X)) and you need to turn the select and setcc in to an AND. That needs to be done as a DAGCombine. craig.topper: This doesn't make sense to me. If you have a plain cttz operation you don't need an AND. I…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Thanks for the comment. I will take a look at it. Does generation of below assembly not acceptable? rbit w8, w0 clz w8, w8 and w0, w8, #0x1f ret If not, please suggest what is the fix needed apart from identifying cttz intrinsic which this patch https://reviews.llvm.org/D113291 is already doing? gsocshubham: Thanks for the comment. I will take a look at it. Does generation of below assembly not…
	def : Pat<(cttz GPR64:$Rn),	def : Pat<(cttz GPR64:$Rn),
	(CLZXr (RBITXr GPR64:$Rn))>;	(ANDXri (CLZXr (RBITXr GPR64:$Rn)), (i64 5))>;
	def : Pat<(ctlz (or (shl (xor (sra GPR32:$Rn, (i64 31)), GPR32:$Rn), (i64 1)),	def : Pat<(ctlz (or (shl (xor (sra GPR32:$Rn, (i64 31)), GPR32:$Rn), (i64 1)),
	(i32 1))),	(i32 1))),
	(CLSWr GPR32:$Rn)>;	(CLSWr GPR32:$Rn)>;
Context not available.