This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64SVEInstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1
named-vector-shuffles-sve.ll
-
sve-ld1-addressing-mode-reg-reg.ll
-
sve-st1-addressing-mode-reg-reg.ll

Differential D100527

[AArch64][SVE] More unpredicated ld1/st1 patterns for reg+reg addressing modes
ClosedPublic

Authored by efriedma on Apr 14 2021, 10:32 PM.

Download Raw Diff

Details

Reviewers

bsmith
paulwalker-arm
peterwaller-arm
joechrisellis

Commits

rG8a40bf6d210f: [AArch64][SVE] More unpredicated ld1/st1 patterns for reg+reg addressing modes

Summary

In some cases, we can improve the generated code by using a load with the "wrong" element width: in particular, using ld1b/st1b when we see reg+reg without a shift.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

efriedma created this revision.Apr 14 2021, 10:32 PM

Herald added subscribers: psnobl, hiraditya, kristof.beyls, tschuett. · View Herald TranscriptApr 14 2021, 10:32 PM

efriedma requested review of this revision.Apr 14 2021, 10:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2021, 10:32 PM

Harbormaster completed remote builds in B98817: Diff 337635.Apr 14 2021, 11:42 PM

junparser added a subscriber: junparser.Apr 16 2021, 1:24 AM

Sounds sensible to me but wonder if there's a nicer way to create the patterns as it seems we're likely to have a significant number of patterns targeting the same instructions. Do you think it's possible to have something like load_8 match all things where LD1_B (reg+reg) can be used?

I'll multiclass the patterns so they don't repeat; maybe a little more verbose than the ideal, but probably good enough.

Updated to handle other relevant types.

efriedma added a subscriber: huihuiz.Apr 29 2021, 1:15 PM

Harbormaster completed remote builds in B101716: Diff 341620.Apr 29 2021, 2:56 PM

Are we concerned about the reduced code quality shown by the splice tests? Is there a way to know if the basic block is, or within, a loop body? so that we can restrict the patterns to instances where we're confident the extra instructions are likely to be hoisted. I guess for the PTRUE case we already know there is a need for some kind of machine pass to remove larger element PTRUEs when a smaller element one is safe to use.

llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
1228–1230	This suggest we should limit the usage to only when there's a single use of the address? or more precisely not kick in unless we can guarantee the add/sub will be omitted.

The splice tests show two issues:

We don't care about the number of uses of the address; this can lead to an extra loop-invariant instruction if one of the operands of the add is a small constant. This is a issue with all patterns using the am_sve_regreg_lsl0 matcher; see the generated code for splice_nxv16i1. This seems like an edge case... you specifically need a small constant offset, and the current formulation probably has lower latency. If you think it's worth addressing, I can try messing with the matcher.
We generate an extra ptrue because of the element width. The ptrue thing seems like a general issue we'd want to address elsewhere; not sure it should block this.

Is there a way to know if the basic block is, or within, a loop body?

SelectionDAGISel has LoopInfo, so in theory, we could compute it? But I don't think we pass through the information at the moment.

Matt added a subscriber: Matt.May 1 2021, 8:08 AM

In D100527#2729910, @efriedma wrote:

If you think it's worth addressing, I can try messing with the matcher.

That's OK, as you say, this is not a problem introduced by this patch and we have a good story for how to resolve the issues at a later date.

This revision is now accepted and ready to land.May 2 2021, 2:34 AM

This revision was landed with ongoing or failed builds.May 3 2021, 3:06 PM

Closed by commit rG8a40bf6d210f: [AArch64][SVE] More unpredicated ld1/st1 patterns for reg+reg addressing modes (authored by efriedma). · Explain Why

This revision was automatically updated to reflect the committed changes.

efriedma added a commit: rG8a40bf6d210f: [AArch64][SVE] More unpredicated ld1/st1 patterns for reg+reg addressing modes.

efriedma mentioned this in D102493: [RISCV] Expand unaligned fixed-length vector memory accesses.May 19 2021, 11:28 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64SVEInstrInfo.td

10 lines

test/

CodeGen/

AArch64/

named-vector-shuffles-sve.ll

15 lines

sve-ld1-addressing-mode-reg-reg.ll

12 lines

sve-st1-addressing-mode-reg-reg.ll

12 lines

Diff 337635

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

	Show First 20 Lines • Show All 1,911 Lines • ▼ Show 20 Lines
	defm : unpred_store< store, nxv8f16, ST1H, ST1H_IMM, PTRUE_H, am_sve_regreg_lsl1>;			defm : unpred_store< store, nxv8f16, ST1H, ST1H_IMM, PTRUE_H, am_sve_regreg_lsl1>;
	defm : unpred_store< store, nxv8bf16, ST1H, ST1H_IMM, PTRUE_H, am_sve_regreg_lsl1>;			defm : unpred_store< store, nxv8bf16, ST1H, ST1H_IMM, PTRUE_H, am_sve_regreg_lsl1>;
	defm : unpred_store< store, nxv4f16, ST1H_S, ST1H_S_IMM, PTRUE_S, am_sve_regreg_lsl1>;			defm : unpred_store< store, nxv4f16, ST1H_S, ST1H_S_IMM, PTRUE_S, am_sve_regreg_lsl1>;
	defm : unpred_store< store, nxv2f16, ST1H_D, ST1H_D_IMM, PTRUE_D, am_sve_regreg_lsl1>;			defm : unpred_store< store, nxv2f16, ST1H_D, ST1H_D_IMM, PTRUE_D, am_sve_regreg_lsl1>;
	defm : unpred_store< store, nxv4f32, ST1W, ST1W_IMM, PTRUE_S, am_sve_regreg_lsl2>;			defm : unpred_store< store, nxv4f32, ST1W, ST1W_IMM, PTRUE_S, am_sve_regreg_lsl2>;
	defm : unpred_store< store, nxv2f32, ST1W_D, ST1W_D_IMM, PTRUE_D, am_sve_regreg_lsl2>;			defm : unpred_store< store, nxv2f32, ST1W_D, ST1W_D_IMM, PTRUE_D, am_sve_regreg_lsl2>;
	defm : unpred_store< store, nxv2f64, ST1D, ST1D_IMM, PTRUE_D, am_sve_regreg_lsl3>;			defm : unpred_store< store, nxv2f64, ST1D, ST1D_IMM, PTRUE_D, am_sve_regreg_lsl3>;

				let Predicates = [IsLE] in {
				def : Pat<(store (nxv8i16 ZPR:$val), (am_sve_regreg_lsl0 GPR64sp:$base, GPR64:$offset)),
				(ST1B ZPR:$val, (PTRUE_B 31), GPR64sp:$base, GPR64:$offset)>;
				}

	multiclass unpred_load<PatFrag Load, ValueType Ty, Instruction RegRegInst,			multiclass unpred_load<PatFrag Load, ValueType Ty, Instruction RegRegInst,
	Instruction RegImmInst, Instruction PTrue,			Instruction RegImmInst, Instruction PTrue,
	ComplexPattern AddrCP> {			ComplexPattern AddrCP> {
	let AddedComplexity = 1 in {			let AddedComplexity = 1 in {
	def _reg: Pat<(Ty (Load (AddrCP GPR64sp:$base, GPR64:$offset))),			def _reg: Pat<(Ty (Load (AddrCP GPR64sp:$base, GPR64:$offset))),
	(RegRegInst (PTrue 31), GPR64sp:$base, GPR64:$offset)>;			(RegRegInst (PTrue 31), GPR64sp:$base, GPR64:$offset)>;
	}			}
	let AddedComplexity = 2 in {			let AddedComplexity = 2 in {
	Show All 34 Lines
	defm : unpred_load< load, nxv8f16, LD1H, LD1H_IMM, PTRUE_H, am_sve_regreg_lsl1>;			defm : unpred_load< load, nxv8f16, LD1H, LD1H_IMM, PTRUE_H, am_sve_regreg_lsl1>;
	defm : unpred_load< load, nxv8bf16, LD1H, LD1H_IMM, PTRUE_H, am_sve_regreg_lsl1>;			defm : unpred_load< load, nxv8bf16, LD1H, LD1H_IMM, PTRUE_H, am_sve_regreg_lsl1>;
	defm : unpred_load< load, nxv4f16, LD1H_S, LD1H_S_IMM, PTRUE_S, am_sve_regreg_lsl1>;			defm : unpred_load< load, nxv4f16, LD1H_S, LD1H_S_IMM, PTRUE_S, am_sve_regreg_lsl1>;
	defm : unpred_load< load, nxv2f16, LD1H_D, LD1H_D_IMM, PTRUE_D, am_sve_regreg_lsl1>;			defm : unpred_load< load, nxv2f16, LD1H_D, LD1H_D_IMM, PTRUE_D, am_sve_regreg_lsl1>;
	defm : unpred_load< load, nxv4f32, LD1W, LD1W_IMM, PTRUE_S, am_sve_regreg_lsl2>;			defm : unpred_load< load, nxv4f32, LD1W, LD1W_IMM, PTRUE_S, am_sve_regreg_lsl2>;
	defm : unpred_load< load, nxv2f32, LD1W_D, LD1W_D_IMM, PTRUE_D, am_sve_regreg_lsl2>;			defm : unpred_load< load, nxv2f32, LD1W_D, LD1W_D_IMM, PTRUE_D, am_sve_regreg_lsl2>;
	defm : unpred_load< load, nxv2f64, LD1D, LD1D_IMM, PTRUE_D, am_sve_regreg_lsl3>;			defm : unpred_load< load, nxv2f64, LD1D, LD1D_IMM, PTRUE_D, am_sve_regreg_lsl3>;

				let Predicates = [IsLE] in {
				def : Pat<(nxv8i16 (load (am_sve_regreg_lsl0 GPR64sp:$base, GPR64:$offset))),
				(LD1B (PTRUE_B 31), GPR64sp:$base, GPR64:$offset)>;
				}

	multiclass unpred_store_predicate<ValueType Ty, Instruction Store> {			multiclass unpred_store_predicate<ValueType Ty, Instruction Store> {
	def _fi : Pat<(store (Ty PPR:$val), (am_sve_fi GPR64sp:$base, simm9:$offset)),			def _fi : Pat<(store (Ty PPR:$val), (am_sve_fi GPR64sp:$base, simm9:$offset)),
	(Store PPR:$val, GPR64sp:$base, simm9:$offset)>;			(Store PPR:$val, GPR64sp:$base, simm9:$offset)>;

	def _default : Pat<(store (Ty PPR:$Val), GPR64:$base),			def _default : Pat<(store (Ty PPR:$Val), GPR64:$base),
	(Store PPR:$Val, GPR64:$base, (i64 0))>;			(Store PPR:$Val, GPR64:$base, (i64 0))>;
	}			}

	▲ Show 20 Lines • Show All 810 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll

	Show First 20 Lines • Show All 723 Lines • ▼ Show 20 Lines

	define <vscale x 8 x i16> @splice_nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {			define <vscale x 8 x i16> @splice_nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
	; CHECK-LABEL: splice_nxv8i16:			; CHECK-LABEL: splice_nxv8i16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-2			; CHECK-NEXT: addvl sp, sp, #-2
	; CHECK-NEXT: ptrue p0.h			; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: mov x8, sp			; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: ptrue p1.b
	; CHECK-NEXT: st1h { z0.h }, p0, [sp]			; CHECK-NEXT: st1h { z0.h }, p0, [sp]
	; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]			; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]
	; CHECK-NEXT: addvl x8, x8, #1			; CHECK-NEXT: addvl x8, x8, #1
	; CHECK-NEXT: sub x8, x8, #16 // =16			; CHECK-NEXT: mov x9, #-16
	; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8]			; CHECK-NEXT: ld1b { z0.b }, p1/z, [x8, x9]
	; CHECK-NEXT: addvl sp, sp, #2			; CHECK-NEXT: addvl sp, sp, #2
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, i32 -8)			%res = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, i32 -8)
	ret <vscale x 8 x i16> %res			ret <vscale x 8 x i16> %res
	}			}

	define <vscale x 8 x i16> @splice_nxv8i16_1(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {			define <vscale x 8 x i16> @splice_nxv8i16_1(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
	; CHECK-LABEL: splice_nxv8i16_1:			; CHECK-LABEL: splice_nxv8i16_1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-2			; CHECK-NEXT: addvl sp, sp, #-2
	; CHECK-NEXT: ptrue p0.h			; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: mov x8, sp			; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: ptrue p1.b
	; CHECK-NEXT: st1h { z0.h }, p0, [sp]			; CHECK-NEXT: st1h { z0.h }, p0, [sp]
	; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]			; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]
	; CHECK-NEXT: addvl x8, x8, #1			; CHECK-NEXT: addvl x8, x8, #1
	; CHECK-NEXT: sub x8, x8, #2 // =2			; CHECK-NEXT: mov x9, #-2
	; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8]			; CHECK-NEXT: ld1b { z0.b }, p1/z, [x8, x9]
	; CHECK-NEXT: addvl sp, sp, #2			; CHECK-NEXT: addvl sp, sp, #2
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, i32 -1)			%res = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, i32 -1)
	ret <vscale x 8 x i16> %res			ret <vscale x 8 x i16> %res
	}			}

	; Ensure number of trailing elements is clamped when we cannot prove it's less than VL.			; Ensure number of trailing elements is clamped when we cannot prove it's less than VL.
	▲ Show 20 Lines • Show All 383 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: splice_nxv8i1:			; CHECK-LABEL: splice_nxv8i1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-2			; CHECK-NEXT: addvl sp, sp, #-2
	; CHECK-NEXT: mov z0.h, p0/z, #1 // =0x1			; CHECK-NEXT: mov z0.h, p0/z, #1 // =0x1
	; CHECK-NEXT: ptrue p0.h			; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: mov z1.h, p1/z, #1 // =0x1			; CHECK-NEXT: mov z1.h, p1/z, #1 // =0x1
	; CHECK-NEXT: mov x8, sp			; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: ptrue p1.b
	; CHECK-NEXT: st1h { z0.h }, p0, [sp]			; CHECK-NEXT: st1h { z0.h }, p0, [sp]
	; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]			; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]
	; CHECK-NEXT: addvl x8, x8, #1			; CHECK-NEXT: addvl x8, x8, #1
	; CHECK-NEXT: sub x8, x8, #2 // =2			; CHECK-NEXT: mov x9, #-2
	; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8]			; CHECK-NEXT: ld1b { z0.b }, p1/z, [x8, x9]
	; CHECK-NEXT: and z0.h, z0.h, #0x1			; CHECK-NEXT: and z0.h, z0.h, #0x1
	; CHECK-NEXT: cmpne p0.h, p0/z, z0.h, #0			; CHECK-NEXT: cmpne p0.h, p0/z, z0.h, #0
	; CHECK-NEXT: addvl sp, sp, #2			; CHECK-NEXT: addvl sp, sp, #2
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 8 x i1> @llvm.experimental.vector.splice.nxv8i1(<vscale x 8 x i1> %a, <vscale x 8 x i1> %b, i32 -1)			%res = call <vscale x 8 x i1> @llvm.experimental.vector.splice.nxv8i1(<vscale x 8 x i1> %a, <vscale x 8 x i1> %b, i32 -1)
	ret <vscale x 8 x i1> %res			ret <vscale x 8 x i1> %res
	}			}
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-4			; CHECK-NEXT: addvl sp, sp, #-4
	; CHECK-NEXT: ptrue p0.s			; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: mov x8, sp			; CHECK-NEXT: mov x8, sp
	; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]			; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]
	; CHECK-NEXT: st1w { z0.s }, p0, [sp]			; CHECK-NEXT: st1w { z0.s }, p0, [sp]
	; CHECK-NEXT: st1w { z3.s }, p0, [x8, #3, mul vl]			; CHECK-NEXT: st1w { z3.s }, p0, [x8, #3, mul vl]
	; CHECK-NEXT: st1w { z2.s }, p0, [x8, #2, mul vl]			; CHECK-NEXT: st1w { z2.s }, p0, [x8, #2, mul vl]
	; CHECK-NEXT: addvl x8, x8, #2			; CHECK-NEXT: addvl x8, x8, #2
	; CHECK-NEXT: sub x8, x8, #32 // =32			; CHECK-NEXT: sub x8, x8, #32 // =32
	; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]			; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				paulwalker-armUnsubmitted Not Done Reply Inline Actions This suggest we should limit the usage to only when there's a single use of the address? or more precisely not kick in unless we can guarantee the add/sub will be omitted. paulwalker-arm: This suggest we should limit the usage to only when there's a single use of the address? or…
	; CHECK-NEXT: ld1w { z1.s }, p0/z, [x8, #1, mul vl]			; CHECK-NEXT: ld1w { z1.s }, p0/z, [x8, #1, mul vl]
	; CHECK-NEXT: addvl sp, sp, #4			; CHECK-NEXT: addvl sp, sp, #4
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 8 x i32> @llvm.experimental.vector.splice.nxv8i32(<vscale x 8 x i32> %a, <vscale x 8 x i32> %b, i32 -8)			%res = call <vscale x 8 x i32> @llvm.experimental.vector.splice.nxv8i32(<vscale x 8 x i32> %a, <vscale x 8 x i32> %b, i32 -8)
	ret <vscale x 8 x i32> %res			ret <vscale x 8 x i32> %res
	}			}

	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-reg.ll

	Show All 9 Lines
	; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0, x1]			; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0, x1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%ptr = getelementptr inbounds i8, i8* %addr, i64 %off			%ptr = getelementptr inbounds i8, i8* %addr, i64 %off
	%ptrcast = bitcast i8* %ptr to <vscale x 16 x i8>*			%ptrcast = bitcast i8* %ptr to <vscale x 16 x i8>*
	%val = load volatile <vscale x 16 x i8>, <vscale x 16 x i8>* %ptrcast			%val = load volatile <vscale x 16 x i8>, <vscale x 16 x i8>* %ptrcast
	ret <vscale x 16 x i8> %val			ret <vscale x 16 x i8> %val
	}			}

				define <vscale x 8 x i16> @ld1_nxv16i8_bitcast_to_i16(i8* %addr, i64 %off) {
				; CHECK-LABEL: ld1_nxv16i8_bitcast_to_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0, x1]
				; CHECK-NEXT: ret
				%ptr = getelementptr inbounds i8, i8* %addr, i64 %off
				%ptrcast = bitcast i8* %ptr to <vscale x 8 x i16>*
				%val = load volatile <vscale x 8 x i16>, <vscale x 8 x i16>* %ptrcast
				ret <vscale x 8 x i16> %val
				}

	define <vscale x 8 x i16> @ld1_nxv8i16_zext8(i8* %addr, i64 %off) {			define <vscale x 8 x i16> @ld1_nxv8i16_zext8(i8* %addr, i64 %off) {
	; CHECK-LABEL: ld1_nxv8i16_zext8:			; CHECK-LABEL: ld1_nxv8i16_zext8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ptrue p0.h			; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: ld1b { z0.h }, p0/z, [x0, x1]			; CHECK-NEXT: ld1b { z0.h }, p0/z, [x0, x1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%ptr = getelementptr inbounds i8, i8* %addr, i64 %off			%ptr = getelementptr inbounds i8, i8* %addr, i64 %off
	%ptrcast = bitcast i8* %ptr to <vscale x 8 x i8>*			%ptrcast = bitcast i8* %ptr to <vscale x 8 x i8>*
	▲ Show 20 Lines • Show All 273 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-reg.ll

	Show All 9 Lines
	; CHECK-NEXT: st1b { z0.b }, p0, [x0, x1]			; CHECK-NEXT: st1b { z0.b }, p0, [x0, x1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%ptr = getelementptr inbounds i8, i8* %addr, i64 %off			%ptr = getelementptr inbounds i8, i8* %addr, i64 %off
	%ptrcast = bitcast i8* %ptr to <vscale x 16 x i8>*			%ptrcast = bitcast i8* %ptr to <vscale x 16 x i8>*
	store <vscale x 16 x i8> %val, <vscale x 16 x i8>* %ptrcast			store <vscale x 16 x i8> %val, <vscale x 16 x i8>* %ptrcast
	ret void			ret void
	}			}

				define void @st1_nxv16i8_bitcast_from_i16(i8* %addr, i64 %off, <vscale x 8 x i16> %val) {
				; CHECK-LABEL: st1_nxv16i8_bitcast_from_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: st1b { z0.b }, p0, [x0, x1]
				; CHECK-NEXT: ret
				%ptr = getelementptr inbounds i8, i8* %addr, i64 %off
				%ptrcast = bitcast i8* %ptr to <vscale x 8 x i16>*
				store <vscale x 8 x i16> %val, <vscale x 8 x i16>* %ptrcast
				ret void
				}

	define void @st1_nxv8i16_trunc8(i8* %addr, i64 %off, <vscale x 8 x i16> %val) {			define void @st1_nxv8i16_trunc8(i8* %addr, i64 %off, <vscale x 8 x i16> %val) {
	; CHECK-LABEL: st1_nxv8i16_trunc8:			; CHECK-LABEL: st1_nxv8i16_trunc8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ptrue p0.h			; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: st1b { z0.h }, p0, [x0, x1]			; CHECK-NEXT: st1b { z0.h }, p0, [x0, x1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%ptr = getelementptr inbounds i8, i8* %addr, i64 %off			%ptr = getelementptr inbounds i8, i8* %addr, i64 %off
	%ptrcast = bitcast i8* %ptr to <vscale x 8 x i8>*			%ptrcast = bitcast i8* %ptr to <vscale x 8 x i8>*
	▲ Show 20 Lines • Show All 195 Lines • Show Last 20 Lines