This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] isel (add (and X, 0x1FFFFFFFE), Y) as (SH1ADD (SRLI X, 1), Y)
ClosedPublic

Authored by craig.topper on May 27 2022, 10:48 PM.

Download Raw Diff

Details

Reviewers

reames
asb
luismarques
jrtc27

Commits

rG6a6cf2e28db5: [RISCV] isel (add (and X, 0x1FFFFFFFE), Y) as (SH1ADD (SRLI X, 1), Y)

Summary

This pattern is what we get after DAG combine for C code like this.

short *ptr1, *ptr2, *ptr3;
unsigned diff = ptr1 - ptr2;
return ptr3[diff];

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.May 27 2022, 10:48 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 10:48 PM

Herald added subscribers: sunshaoce, VincentWu, luke957 and 27 others. · View Herald Transcript

craig.topper requested review of this revision.May 27 2022, 10:48 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 10:48 PM

Herald added subscribers: • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B166750: Diff 432695.May 27 2022, 10:49 PM

craig.topper added a parent revision: D126589: Add test cases showing missed opportunity to use slli.uw or shXadd.uw. NFC.May 27 2022, 10:49 PM

LGTM as is.

And a thought for you; is there a more generic form of this?

Instead of going straight to shNadd, could we use zext.w instead? Something along the lines of:
srli a0, a0, N mask low
zext.w a0, a0 mask high
slli a0, a0, N restore position
add a0, a0, a1 add other value

The first three seem like a generic pattern for and X, C is any contiguous 32 bit mask.

From this form, we could then recognize the shNadd from the last three right?

Not sure if this works, and if it does, not a required follow up. Just an idea for you to think about.

This revision is now accepted and ready to land.May 28 2022, 11:40 AM

In D126588#3544110, @reames wrote:

LGTM as is.

And a thought for you; is there a more generic form of this?

Instead of going straight to shNadd, could we use zext.w instead? Something along the lines of:
srli a0, a0, N mask low
zext.w a0, a0 mask high
slli a0, a0, N restore position
add a0, a0, a1 add other value

The first three seem like a generic pattern for and X, C is any contiguous 32 bit mask.

From this form, we could then recognize the shNadd from the last three right?

Not sure if this works, and if it does, not a required follow up. Just an idea for you to think about.

Where would we recognize the shNadd.uw? Post process peephole? Machine IR?

I had thought about trying to match the AND and 32-bit mask to srli+slli.uw later in MachineIR after LICM in case the mask can be pulled out of a loop. That would leave only an AND in the loop which would be cheaper than 2 instructions. The srli+shNadd.uw I've done here seemed like it was always profitable even in a loop.

This revision was landed with ongoing or failed builds.May 29 2022, 6:40 PM

Closed by commit rG6a6cf2e28db5: [RISCV] isel (add (and X, 0x1FFFFFFFE), Y) as (SH1ADD (SRLI X, 1), Y) (authored by craig.topper). · Explain Why

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rG6a6cf2e28db5: [RISCV] isel (add (and X, 0x1FFFFFFFE), Y) as (SH1ADD (SRLI X, 1), Y).

In D126588#3544186, @craig.topper wrote:

In D126588#3544110, @reames wrote:

LGTM as is.

And a thought for you; is there a more generic form of this?

Instead of going straight to shNadd, could we use zext.w instead? Something along the lines of:
srli a0, a0, N mask low
zext.w a0, a0 mask high
slli a0, a0, N restore position
add a0, a0, a1 add other value

The first three seem like a generic pattern for and X, C is any contiguous 32 bit mask.

From this form, we could then recognize the shNadd from the last three right?

Not sure if this works, and if it does, not a required follow up. Just an idea for you to think about.

Where would we recognize the shNadd.uw? Post process peephole? Machine IR?

Shouldn't DAGCombine be able to do this? (Might be wrong here, still recent to this part of things.)

I had thought about trying to match the AND and 32-bit mask to srli+slli.uw later in MachineIR after LICM in case the mask can be pulled out of a loop. That would leave only an AND in the loop which would be cheaper than 2 instructions. The srli+shNadd.uw I've done here seemed like it was always profitable even in a loop.

I see your point here. I think I agree.

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVInstrInfoZb.td

8 lines

test/

CodeGen/

RISCV/

rv64zba.ll

75 lines

Diff 432814

llvm/lib/Target/RISCV/RISCVInstrInfoZb.td

Show First 20 Lines • Show All 1,168 Lines • ▼ Show 20 Lines	def : Pat<(i64 (add (shl (and GPR:$rs1, 0xFFFFFFFF), (i64 3)), non_imm12:$rs2)),
(SH3ADD_UW GPR:$rs1, GPR:$rs2)>;		(SH3ADD_UW GPR:$rs1, GPR:$rs2)>;

def : Pat<(i64 (add (and (shl GPR:$rs1, (i64 1)), 0x1FFFFFFFF), non_imm12:$rs2)),		def : Pat<(i64 (add (and (shl GPR:$rs1, (i64 1)), 0x1FFFFFFFF), non_imm12:$rs2)),
(SH1ADD_UW GPR:$rs1, GPR:$rs2)>;		(SH1ADD_UW GPR:$rs1, GPR:$rs2)>;
def : Pat<(i64 (add (and (shl GPR:$rs1, (i64 2)), 0x3FFFFFFFF), non_imm12:$rs2)),		def : Pat<(i64 (add (and (shl GPR:$rs1, (i64 2)), 0x3FFFFFFFF), non_imm12:$rs2)),
(SH2ADD_UW GPR:$rs1, GPR:$rs2)>;		(SH2ADD_UW GPR:$rs1, GPR:$rs2)>;
def : Pat<(i64 (add (and (shl GPR:$rs1, (i64 3)), 0x7FFFFFFFF), non_imm12:$rs2)),		def : Pat<(i64 (add (and (shl GPR:$rs1, (i64 3)), 0x7FFFFFFFF), non_imm12:$rs2)),
(SH3ADD_UW GPR:$rs1, GPR:$rs2)>;		(SH3ADD_UW GPR:$rs1, GPR:$rs2)>;

		// Use SRLI to clear the LSBs and SHXADD_UW to mask and shift.
		def : Pat<(i64 (add (and GPR:$rs1, 0x1FFFFFFFE), non_imm12:$rs2)),
		(SH1ADD_UW (SRLI GPR:$rs1, 1), GPR:$rs2)>;
		def : Pat<(i64 (add (and GPR:$rs1, 0x3FFFFFFFC), non_imm12:$rs2)),
		(SH2ADD_UW (SRLI GPR:$rs1, 2), GPR:$rs2)>;
		def : Pat<(i64 (add (and GPR:$rs1, 0x7FFFFFFF8), non_imm12:$rs2)),
		(SH3ADD_UW (SRLI GPR:$rs1, 3), GPR:$rs2)>;
} // Predicates = [HasStdExtZba, IsRV64]		} // Predicates = [HasStdExtZba, IsRV64]

let Predicates = [HasStdExtZbcOrZbkc] in {		let Predicates = [HasStdExtZbcOrZbkc] in {
def : PatGprGpr<int_riscv_clmul, CLMUL>;		def : PatGprGpr<int_riscv_clmul, CLMUL>;
def : PatGprGpr<int_riscv_clmulh, CLMULH>;		def : PatGprGpr<int_riscv_clmulh, CLMULH>;
} // Predicates = [HasStdExtZbcOrZbkc]		} // Predicates = [HasStdExtZbcOrZbkc]

let Predicates = [HasStdExtZbc] in		let Predicates = [HasStdExtZbc] in
Show All 36 Lines

llvm/test/CodeGen/RISCV/rv64zba.ll

Show First 20 Lines • Show All 1,161 Lines • ▼ Show 20 Lines	; RV64ZBAZBB-NEXT: ret
%ext = sext i16 %a to i32		%ext = sext i16 %a to i32
%1 = ashr i32 %ext, 9		%1 = ashr i32 %ext, 9
ret i32 %1		ret i32 %1
}		}

; This the IR you get from InstCombine if take the difference of 2 pointers and		; This the IR you get from InstCombine if take the difference of 2 pointers and
; cast is to unsigned before using as an index.		; cast is to unsigned before using as an index.
define signext i16 @sh1adduw_ptrdiff(i64 %diff, i16* %baseptr) {		define signext i16 @sh1adduw_ptrdiff(i64 %diff, i16* %baseptr) {
; CHECK-LABEL: sh1adduw_ptrdiff:		; RV64I-LABEL: sh1adduw_ptrdiff:
; CHECK: # %bb.0:		; RV64I: # %bb.0:
; CHECK-NEXT: li a2, 1		; RV64I-NEXT: li a2, 1
; CHECK-NEXT: slli a2, a2, 33		; RV64I-NEXT: slli a2, a2, 33
; CHECK-NEXT: addi a2, a2, -2		; RV64I-NEXT: addi a2, a2, -2
; CHECK-NEXT: and a0, a0, a2		; RV64I-NEXT: and a0, a0, a2
; CHECK-NEXT: add a0, a1, a0		; RV64I-NEXT: add a0, a1, a0
; CHECK-NEXT: lh a0, 0(a0)		; RV64I-NEXT: lh a0, 0(a0)
; CHECK-NEXT: ret		; RV64I-NEXT: ret
		;
		; RV64ZBA-LABEL: sh1adduw_ptrdiff:
		; RV64ZBA: # %bb.0:
		; RV64ZBA-NEXT: srli a0, a0, 1
		; RV64ZBA-NEXT: sh1add.uw a0, a0, a1
		; RV64ZBA-NEXT: lh a0, 0(a0)
		; RV64ZBA-NEXT: ret
%ptrdiff = lshr exact i64 %diff, 1		%ptrdiff = lshr exact i64 %diff, 1
%cast = and i64 %ptrdiff, 4294967295		%cast = and i64 %ptrdiff, 4294967295
%ptr = getelementptr inbounds i16, i16* %baseptr, i64 %cast		%ptr = getelementptr inbounds i16, i16* %baseptr, i64 %cast
%res = load i16, i16* %ptr		%res = load i16, i16* %ptr
ret i16 %res		ret i16 %res
}		}

define signext i32 @sh2adduw_ptrdiff(i64 %diff, i32* %baseptr) {		define signext i32 @sh2adduw_ptrdiff(i64 %diff, i32* %baseptr) {
; CHECK-LABEL: sh2adduw_ptrdiff:		; RV64I-LABEL: sh2adduw_ptrdiff:
; CHECK: # %bb.0:		; RV64I: # %bb.0:
; CHECK-NEXT: li a2, 1		; RV64I-NEXT: li a2, 1
; CHECK-NEXT: slli a2, a2, 34		; RV64I-NEXT: slli a2, a2, 34
; CHECK-NEXT: addi a2, a2, -4		; RV64I-NEXT: addi a2, a2, -4
; CHECK-NEXT: and a0, a0, a2		; RV64I-NEXT: and a0, a0, a2
; CHECK-NEXT: add a0, a1, a0		; RV64I-NEXT: add a0, a1, a0
; CHECK-NEXT: lw a0, 0(a0)		; RV64I-NEXT: lw a0, 0(a0)
; CHECK-NEXT: ret		; RV64I-NEXT: ret
		;
		; RV64ZBA-LABEL: sh2adduw_ptrdiff:
		; RV64ZBA: # %bb.0:
		; RV64ZBA-NEXT: srli a0, a0, 2
		; RV64ZBA-NEXT: sh2add.uw a0, a0, a1
		; RV64ZBA-NEXT: lw a0, 0(a0)
		; RV64ZBA-NEXT: ret
%ptrdiff = lshr exact i64 %diff, 2		%ptrdiff = lshr exact i64 %diff, 2
%cast = and i64 %ptrdiff, 4294967295		%cast = and i64 %ptrdiff, 4294967295
%ptr = getelementptr inbounds i32, i32* %baseptr, i64 %cast		%ptr = getelementptr inbounds i32, i32* %baseptr, i64 %cast
%res = load i32, i32* %ptr		%res = load i32, i32* %ptr
ret i32 %res		ret i32 %res
}		}

define i64 @sh3adduw_ptrdiff(i64 %diff, i64* %baseptr) {		define i64 @sh3adduw_ptrdiff(i64 %diff, i64* %baseptr) {
; CHECK-LABEL: sh3adduw_ptrdiff:		; RV64I-LABEL: sh3adduw_ptrdiff:
; CHECK: # %bb.0:		; RV64I: # %bb.0:
; CHECK-NEXT: li a2, 1		; RV64I-NEXT: li a2, 1
; CHECK-NEXT: slli a2, a2, 35		; RV64I-NEXT: slli a2, a2, 35
; CHECK-NEXT: addi a2, a2, -8		; RV64I-NEXT: addi a2, a2, -8
; CHECK-NEXT: and a0, a0, a2		; RV64I-NEXT: and a0, a0, a2
; CHECK-NEXT: add a0, a1, a0		; RV64I-NEXT: add a0, a1, a0
; CHECK-NEXT: ld a0, 0(a0)		; RV64I-NEXT: ld a0, 0(a0)
; CHECK-NEXT: ret		; RV64I-NEXT: ret
		;
		; RV64ZBA-LABEL: sh3adduw_ptrdiff:
		; RV64ZBA: # %bb.0:
		; RV64ZBA-NEXT: srli a0, a0, 3
		; RV64ZBA-NEXT: sh3add.uw a0, a0, a1
		; RV64ZBA-NEXT: ld a0, 0(a0)
		; RV64ZBA-NEXT: ret
%ptrdiff = lshr exact i64 %diff, 3		%ptrdiff = lshr exact i64 %diff, 3
%cast = and i64 %ptrdiff, 4294967295		%cast = and i64 %ptrdiff, 4294967295
%ptr = getelementptr inbounds i64, i64* %baseptr, i64 %cast		%ptr = getelementptr inbounds i64, i64* %baseptr, i64 %cast
%res = load i64, i64* %ptr		%res = load i64, i64* %ptr
ret i64 %res		ret i64 %res
}		}