This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
TargetPassConfig.h
-
lib/
-
CodeGen/
-
TargetPassConfig.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64.h
1/6
AArch64ExpandPseudoInsts.cpp
-
AArch64ISelDAGToDAG.cpp
-
AArch64InstrFormats.td
-
AArch64InstrInfo.h
-
AArch64TargetMachine.cpp
-
CMakeLists.txt
-
SVEConditionalEarlyClobberPass.cpp
2
SVEInstrFormats.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
O3-pipeline.ll
1/4
sve-intrinsics-int-arith-merging.ll
-
sve-intrinsics-shifts-merging.ll
-
sve-movprfx-merging.ll
-
sve-movprfx-zeroing.ll

Differential D80260

[WIP][SVE] Prototype for general merging MOVPRFX support.
AbandonedPublic

Authored by cameron.mcinally on May 19 2020, 5:11 PM.

Download Raw Diff

Details

Reviewers

sdesmalen
paulwalker-arm
kmclaughlin
c-rhodes
efriedma

Summary

This is a prototype for general merging MOVPRFX support. It ties the pseudo's destination register to the passthru register, which seems to work for simple test cases.***

The intention is to use this general merging mechanism to also handle the zero merging case, but that would take some more cleanup work to get there.

***From an offline discussion with Sander, this implementation has limitations that I have not hit yet. Posting this patch to start a discussion.

Diff Detail

Event Timeline

cameron.mcinally created this revision.May 19 2020, 5:11 PM

Herald added a reviewer: efriedma. · View Herald TranscriptMay 19 2020, 5:11 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, psnobl, rkruppe and 2 others. · View Herald Transcript

Harbormaster failed remote builds in B57302: Diff 265089!May 19 2020, 6:11 PM

Hi @cameron.mcinally thanks for this patch! The approach makes sense and nicely extends the mechanism we have for the zeroing forms. We can use similar pseudos for the zeroing case as well. Today I played around a bit with your patch and shared some changes for your reference in D80410. I'm not really planning to land it, but it at least highlights the bug for the zeroing-pseudos that currently exists in master. Feel free to use for reference or ignore/discard if you've already worked on something similar.

We should probably think about where we want to write down the design for this mechanism somewhere, as we use these pseudos to solve multiple problems.

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
393–394	This comment suggests that this is only possible for _ZERO and _UNDEF variants, but I'm not sure if that comment is still correct.
413	Can you update the description of `expand_DestructiveOp` to describe the new style of pseudos like `FSUB_ZPZZ_MERGE_B`?
419	nit: `s/PassIdx/PassthruIdx/` ?
llvm/lib/Target/AArch64/SVEInstrFormats.td
482	I'm happy to keep the constraint for now, but I don't think it is strictly necessary, because if `$Zd != $Zpt` it is still possible to generate a valid instruction sequence. For: define <vscale x 4 x float> @foo(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> %z1, <vscale x 4 x float> %passthru) { %z0_in = select <vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> %passthru %sub = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0_in, <vscale x 4 x float> %z0) ret <vscale x 4 x float> %sub } LLVM will generate: movprfx z2.s, p0/m, z0.s fsub z2.s, p0/m, z2.s, z0.s mov z0.d, z2.d ret Where the last `mov` is needed because the register allocator chose to allocate the pseudo as: z2 = FSUB_PSEUDO p0, z0, z0, z2 with the result operand z2 tied to the merge value. I think we could alternatively fall back on using a `select` instruction if the register allocator didn't have the tied operand restriction and instead generate: sel z0.s, p0, z0.s, z2.s fsub z0.s, p0/m, z0.s, z0.s ret
llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.ll
62	nit: `the %a_z` name implies that the false lanes are zeroed, perhaps `%a_m` is more appropriate.
76	I think we should add tests where the %passthru value is `%b` as that should cause it to use the reverse instruction for e.g. `sub -> subr` (or swap the operands in case of `add`).

cameron.mcinally updated this revision to Diff 265803.May 22 2020, 2:33 PM

cameron.mcinally marked 2 inline comments as done.

Thanks, Sander.

I have a nagging feeling that I forced this implementation on you. My intention was to use this patch as an intuition pump, not necessarily as the path forward. I don't have a lot of intuition around MOVPRFX today, so I'm hoping for guidance on the best way to proceed.

I definitely see the desired flexibility of having a lowering pass pre-regalloc. If you think that's the better solution, I'll work on it. I just don't have a strong opinions on where in the pipeline that pass should live.

Also, I plan to look at your zeroing Diff (D80410) next week. Thanks for that.

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
393–394	Massaged this a bit. Let me know if anything sounds off.
413	Updated. Please let me know if that's what you had in mine. Advertising copy is not my strong suit...
419	Updated, but I just realized that this isn't needed yet. With DstReg tied to the Passthru reg, I can replace `MI.getOperand(PassIdx).getReg()` with `DestReg` below. I think I'll leave it for now in case the untied/zeroing case needs it. Can tear it out later if it's not needed.
llvm/lib/Target/AArch64/SVEInstrFormats.td
482	Ah, ok, I think I see the problem now. In this case we can clobber z0 since it only has the one use. I'll have to explore this further. I think I'll look at the ConditionalEarlyClobberPass to see if this case can be guarded against. If you know it can't, or are aware of other cases that may be problematic, please let me know.
llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.ll
62	Good catch. Fixed.
76	Added. Also added a guard to prevent `MOVPRFX Zx, p0/m, Zx` from being generated. We might not want that for the p0/z zero merging case, but I haven't hit that yet.

In D80260#2051652, @cameron.mcinally wrote:

I have a nagging feeling that I forced this implementation on you.

Not at all, it may be more the other way around :-) We initially went the pseudo-route downstream so we could use the reverse instructions, but we later used this for the cheaper zeroing as well.
For the former (using reverse instructions) there probably isn't much else we can do then go the pseudo route, but for the latter (cheaper merging) there may also be other alternatives to consider.

My intention was to use this patch as an intuition pump, not necessarily as the path forward. I don't have a lot of intuition around MOVPRFX today, so I'm hoping for guidance on the best way to proceed.

Are you happy to bring this topic up in tomorrow's meeting? If there's time, we can talk through the approach in this patch and maybe get some more input.

I definitely see the desired flexibility of having a lowering pass pre-regalloc. If you think that's the better solution, I'll work on it. I just don't have a strong opinions on where in the pipeline that pass should live.

I'm a bit confused by what you mean with 'pre-regalloc lowering pass'? (Do you mean something like the ConditionalEarlyClobber pass mentioned in D80410?)

Sorry for the delay, we've been a bit distracted over here...

Here's the patch that we briefly discussed at last week's Sync-up meeting. The patch attempts to rewrite the MachineInstrs in the general zero-merging case. I agree with @paulwalker-arm that MachineInstr isn't a great place to rewrite a sequence of instructions, but this case is fairly limited, so it might not be too bad. I'll elaborate about that a little now.

The general passthru merging case is largely uninteresting. Unless I've made a mistake, register allocation seems optimal for it (in the cases I've come across, at least). The zeroing case is a bit trickier. We start with a pattern like this:

class SVE_3_Op_Pat_SelZero_Passthru<ValueType vtd, SDPatternOperator op, ValueType vt1,
                   ValueType vt2, ValueType vt3, Instruction inst>
: Pat<(vtd (vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (vt2 SVEDup0:$Dup)), vt3:$Op3))),
      (inst $Op1, $Op2, $Op3, $Dup)>;

The SVEDup0 is the problem. It's accounted for in the Pseudo generated, but the original DUP also hangs around. In cases where the DUP only has one use, it can be removed. That's the hang up, i.e. can we safely remove that superfluous DUP when expanding the Pseudo.

You'll find the proposed solution under the FalseLanes == AArch64::FalseLanesZero block in this Diff. It's not an ideal solution, but I'm not convinced that it's overly risky either. The sequence we're looking for is fairly constrained, *I think*.

You'll also noticed that the test cases I've included are overkill. I've included them to save effort on the reviewer side. Just wanted the register allocation decisions to be easily seen.

Herald added a subscriber: mgorny. · View Herald TranscriptJun 1 2020, 2:18 PM

In D80260#2058560, @sdesmalen wrote:

I definitely see the desired flexibility of having a lowering pass pre-regalloc. If you think that's the better solution, I'll work on it. I just don't have a strong opinions on where in the pipeline that pass should live.

I'm a bit confused by what you mean with 'pre-regalloc lowering pass'? (Do you mean something like the ConditionalEarlyClobber pass mentioned in D80410?)

I'm probably the one confused. :D

We definitely need an extra register available in some of the MOVPRFX cases. Scavenging that reg probably isn't a good fix. So I was thinking it would be easier to generate the MOVPRFX when we're still at the virtual register phase. It's not clear to me how to make this work though, so I might be way off.

@paulwalker-arm mentioned something at last week's Sync-up meeting about generating the MOVPRFX directly during DAGCombine or similar. That sounds promising, but I'm not sure if I see that path clearly yet.

Hi @cameron.mcinally, sorry for my delayed reponse, I was OoO part of last week and didn't have much bandwidth to reply.

The general passthru merging case is largely uninteresting. Unless I've made a mistake, register allocation seems optimal for it (in the cases I've come across, at least).

Yes, that seems a fair assessment.

The zeroing case is a bit trickier. We start with a pattern like this:
class SVE_3_Op_Pat_SelZero_Passthru<ValueType vtd, SDPatternOperator op, ValueType vt1,
                   ValueType vt2, ValueType vt3, Instruction inst>
: Pat<(vtd (vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (vt2 SVEDup0:$Dup)), vt3:$Op3))),
      (inst $Op1, $Op2, $Op3, $Dup)>;
The SVEDup0 is the problem. It's accounted for in the Pseudo generated, but the original DUP also hangs around. In cases where the DUP only has one use, it can be removed. That's the hang up, i.e. can we safely remove that superfluous DUP when expanding the Pseudo.

I don't think the compiler will always be able to remove the superfluous DUP (it may have been hoisted out of the loop for example), but I'm not sure how much of a problem that is in practice. Maybe this is fine for a first implementation. Did you get a chance to try this out on real code?

You'll find the proposed solution under the FalseLanes == AArch64::FalseLanesZero block in this Diff. It's not an ideal solution, but I'm not convinced that it's overly risky either. The sequence we're looking for is fairly constrained, *I think*.

While it may seem trivial to remove that instruction this way, I'd rather leave this to a dedicated pass like DeadMachineInstructionElim that could be run directly after the pseudo-expand pass. There's details and special cases to take into account, such as defs/uses on subregisters (instead of the full register) or the use being a DEBUG instruction. The former could lead to the DUP being removed while it shouldn't, and the latter can prevent it from being removed while it should. The dedicated pass is probably better suited to handle those.

We definitely need an extra register available in some of the MOVPRFX cases. Scavenging that reg probably isn't a good fix. So I was thinking it would be easier to generate the MOVPRFX when we're still at the virtual register phase. It's not clear to me how to make this work though, so I might be way off.

When passing dup(0) to the pseudo, the compiler can always fall back on the explicit merge (predicated mov) and there is no longer a need to scavenge a register. I agree it would be better to do this earlier so we can avoid this altogether though.

@paulwalker-arm mentioned something at last week's Sync-up meeting about generating the MOVPRFX directly during DAGCombine or similar. That sounds promising, but I'm not sure if I see that path clearly yet.

What Paul meant is that we ideally want to add a new constraints to TableGen that allow us to model the constraints for using MOVPRFX more precisely. For the pseudo:

Zd = FSUB_PSEUDO_ZEROING Pg, Zs1, Zs2

Instead of forcing "Zd = Zs1", we want to allow any register allocation as long as one of the registers is unique, because the ExpandPseudo pass can always handle that case with movprfx directly or additionally with a reverse/commutative instruction. By adding a new TableGen constraint like "one_of_is_unique(Zd, Zs1, Zs2)" and teaching the register allocator to implement that constraint, we can avoid more ugly solutions like the conditional early-clobber pass (that only does half a job because it forces Zd to be unique, which may not necessarily be the best choice) or having to pass the dup(0) to the instruction and then trying to remove the instruction afterwards (which it may fail to remove). We could instead model the constraint as we want it. Other pseudos (e.g. for ternary instructions) may require slightly different constraints, but the one_of_is_unique() probably solves 90-95% of the cases that need solving.

cameron.mcinally added a subscriber: greened.Jul 22 2020, 8:03 AM

Herald added a subscriber: nikic. · View Herald TranscriptJul 22 2020, 8:03 AM

Abandoning this revision for now. @greened isn't afraid of the register allocator, so he's going to be picking up this project.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetPassConfig.h

7 lines

lib/

CodeGen/

TargetPassConfig.cpp

3 lines

Target/

AArch64/

AArch64.h

1 line

AArch64ExpandPseudoInsts.cpp

74 lines

AArch64ISelDAGToDAG.cpp

3 lines

AArch64InstrFormats.td

7 lines

AArch64InstrInfo.h

5 lines

AArch64TargetMachine.cpp

9 lines

CMakeLists.txt

1 line

SVEConditionalEarlyClobberPass.cpp

187 lines

SVEInstrFormats.td

117 lines

test/

CodeGen/

AArch64/

O3-pipeline.ll

1 line

sve-intrinsics-int-arith-merging.ll

276 lines

sve-intrinsics-shifts-merging.ll

168 lines

sve-movprfx-merging.ll

88 lines

sve-movprfx-zeroing.ll

120 lines

Diff 267722

llvm/include/llvm/CodeGen/TargetPassConfig.h

Show First 20 Lines • Show All 396 Lines • ▼ Show 20 Lines	protected:
///		///
/// Note if the target overloads addRegAssignAndRewriteOptimized, this may not		/// Note if the target overloads addRegAssignAndRewriteOptimized, this may not
/// be honored. This is also not generally used for the the fast variant,		/// be honored. This is also not generally used for the the fast variant,
/// where the allocation and rewriting are done in one pass.		/// where the allocation and rewriting are done in one pass.
virtual bool addPreRewrite() {		virtual bool addPreRewrite() {
return false;		return false;
}		}

		/// addPostCoalesce - Add passes to the optimized register allocation pipeline
		/// after coalescing is complete, but before further scheduling or register
		/// allocation.
		virtual bool addPostCoalesce() {
		return false;
		}

/// Add passes to be run immediately after virtual registers are rewritten		/// Add passes to be run immediately after virtual registers are rewritten
/// to physical registers.		/// to physical registers.
virtual void addPostRewrite() { }		virtual void addPostRewrite() { }

/// This method may be implemented by targets that want to run passes after		/// This method may be implemented by targets that want to run passes after
/// register allocation pass pipeline but before prolog-epilog insertion.		/// register allocation pass pipeline but before prolog-epilog insertion.
virtual void addPostRegAlloc() { }		virtual void addPostRegAlloc() { }

▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/lib/CodeGen/TargetPassConfig.cpp

Show First 20 Lines • Show All 1,210 Lines • ▼ Show 20 Lines	void TargetPassConfig::addOptimizedRegAlloc() {

// Eventually, we want to run LiveIntervals before PHI elimination.		// Eventually, we want to run LiveIntervals before PHI elimination.
if (EarlyLiveIntervals)		if (EarlyLiveIntervals)
addPass(&LiveIntervalsID, false);		addPass(&LiveIntervalsID, false);

addPass(&TwoAddressInstructionPassID, false);		addPass(&TwoAddressInstructionPassID, false);
addPass(&RegisterCoalescerID);		addPass(&RegisterCoalescerID);

		// Allow targets to change the live ranges after coalescing
		addPostCoalesce();

// The machine scheduler may accidentally create disconnected components		// The machine scheduler may accidentally create disconnected components
// when moving subregister definitions around, avoid this by splitting them to		// when moving subregister definitions around, avoid this by splitting them to
// separate vregs before. Splitting can also improve reg. allocation quality.		// separate vregs before. Splitting can also improve reg. allocation quality.
addPass(&RenameIndependentSubregsID);		addPass(&RenameIndependentSubregsID);

// PreRA instruction scheduling.		// PreRA instruction scheduling.
addPass(&MachineSchedulerID);		addPass(&MachineSchedulerID);

▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64.h

	Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	FunctionPass *createAArch64A53Fix835769();			FunctionPass *createAArch64A53Fix835769();
	FunctionPass *createFalkorHWPFFixPass();			FunctionPass *createFalkorHWPFFixPass();
	FunctionPass *createFalkorMarkStridedAccessesPass();			FunctionPass *createFalkorMarkStridedAccessesPass();
	FunctionPass *createAArch64BranchTargetsPass();			FunctionPass *createAArch64BranchTargetsPass();

	FunctionPass *createAArch64CleanupLocalDynamicTLSPass();			FunctionPass *createAArch64CleanupLocalDynamicTLSPass();

	FunctionPass *createAArch64CollectLOHPass();			FunctionPass *createAArch64CollectLOHPass();
				FunctionPass *createSVEConditionalEarlyClobberPass();
	ModulePass *createSVEIntrinsicOptsPass();			ModulePass *createSVEIntrinsicOptsPass();
	InstructionSelector *			InstructionSelector *
	createAArch64InstructionSelector(const AArch64TargetMachine &,			createAArch64InstructionSelector(const AArch64TargetMachine &,
	AArch64Subtarget &, AArch64RegisterBankInfo &);			AArch64Subtarget &, AArch64RegisterBankInfo &);
	FunctionPass *createAArch64PreLegalizeCombiner(bool IsOptNone);			FunctionPass *createAArch64PreLegalizeCombiner(bool IsOptNone);
	FunctionPass *createAArch64StackTaggingPass(bool MergeInit);			FunctionPass *createAArch64StackTaggingPass(bool MergeInit);
	FunctionPass *createAArch64StackTaggingPreRAPass();			FunctionPass *createAArch64StackTaggingPreRAPass();

	Show All 27 Lines

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp

Show First 20 Lines • Show All 342 Lines • ▼ Show 20 Lines	bool AArch64ExpandPseudo::expandCMP_SWAP_128(
LoadCmpBB->clearLiveIns();		LoadCmpBB->clearLiveIns();
computeAndAddLiveIns(LiveRegs, *LoadCmpBB);		computeAndAddLiveIns(LiveRegs, *LoadCmpBB);

return true;		return true;
}		}

/// \brief Expand Pseudos to Instructions with destructive operands.		/// \brief Expand Pseudos to Instructions with destructive operands.
///		///
/// This mechanism uses MOVPRFX instructions for zeroing the false lanes		/// This mechanism uses MOVPRFX instructions for merging/zeroing the false
/// or for fixing relaxed register allocation conditions to comply with		/// lanes or for fixing relaxed register allocation conditions to comply with
/// the instructions register constraints. The latter case may be cheaper		/// the instructions register constraints. The latter case may be cheaper
/// than setting the register constraints in the register allocator,		/// than setting the register constraints in the register allocator,
/// since that will insert regular MOV instructions rather than MOVPRFX.		/// since that will insert regular MOV instructions rather than MOVPRFX.
///		///
/// Example (after register allocation):		/// Merging example (after register allocation):
		///
		/// FADD_ZPZZ_B Z0, Pg, Z0, Z1, Z2
		///
		/// * The Pseudo FADD_ZPZZ_B maps to FADD_ZPmZ_B, where Z2 is the
		/// Passthru register.
		/// * We cannot map directly to FADD_ZPmZ_B because we need to
		/// carry the explicit passthru register.
		/// * FIXME: Register constraints when they're determined.
		/// * For performance, it's prefered to use the zero/undef merging
		/// variants.
		///
		/// Zeroing example (after register allocation):
///		///
/// FSUB_ZPZZ_ZERO_B Z0, Pg, Z1, Z0		/// FSUB_ZPZZ_ZERO_B Z0, Pg, Z1, Z0
///		///
/// * The Pseudo FSUB_ZPZZ_ZERO_B maps to FSUB_ZPmZ_B.		/// * The Pseudo FSUB_ZPZZ_ZERO_B maps to FSUB_ZPmZ_B.
/// * We cannot map directly to FSUB_ZPmZ_B because the register		/// * We cannot map directly to FSUB_ZPmZ_B because the register
/// constraints of the instruction are not met.		/// constraints of the instruction are not met.
/// * Also the _ZERO specifies the false lanes need to be zeroed.		/// * Also the _ZERO specifies the false lanes need to be zeroed.
///		///
/// We first try to see if the destructive operand == result operand,		/// We first try to see if the destructive operand == result operand,
/// if not, we try to swap the operands, e.g.		/// if not, we try to swap the operands, e.g.
///		///
/// FSUB_ZPmZ_B Z0, Pg/m, Z0, Z1		/// FSUB_ZPmZ_B Z0, Pg/m, Z0, Z1
///		///
/// But because FSUB_ZPmZ is not commutative, this is semantically		/// But because FSUB_ZPmZ is not commutative, this is semantically
/// different, so we need a reverse instruction:		/// different, so we need a reverse instruction:
///		///
/// FSUBR_ZPmZ_B Z0, Pg/m, Z0, Z1		/// FSUBR_ZPmZ_B Z0, Pg/m, Z0, Z1
///		///
/// Then we implement the zeroing of the false lanes of Z0 by adding		/// Then we implement the zeroing of the false lanes of Z0 by adding
/// a zeroing MOVPRFX instruction:		/// a zeroing MOVPRFX instruction:
///		///
/// MOVPRFX_ZPzZ_B Z0, Pg/z, Z0		/// MOVPRFX_ZPzZ_B Z0, Pg/z, Z0
/// FSUBR_ZPmZ_B Z0, Pg/m, Z0, Z1		/// FSUBR_ZPmZ_B Z0, Pg/m, Z0, Z1
///		///
/// Note that this can only be done for _ZERO or _UNDEF variants where		/// Note that this can only be done for merging variants where
		sdesmalenUnsubmitted Not Done Reply Inline Actions This comment suggests that this is only possible for _ZERO and _UNDEF variants, but I'm not sure if that comment is still correct. sdesmalen: This comment suggests that this is only possible for _ZERO and _UNDEF variants, but I'm not…
		cameron.mcinallyAuthorUnsubmitted Done Reply Inline Actions Massaged this a bit. Let me know if anything sounds off. cameron.mcinally: Massaged this a bit. Let me know if anything sounds off.
/// we can guarantee the false lanes to be zeroed (by implementing this)		/// we can guarantee the false lanes are specified, otherwise the
/// or that they are undef (don't care / not used), otherwise the
/// swapping of operands is illegal because the operation is not		/// swapping of operands is illegal because the operation is not
/// (or cannot be emulated to be) fully commutative.		/// (or cannot be emulated to be) fully commutative.
bool AArch64ExpandPseudo::expand_DestructiveOp(		bool AArch64ExpandPseudo::expand_DestructiveOp(
MachineInstr &MI,		MachineInstr &MI,
MachineBasicBlock &MBB,		MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI) {		MachineBasicBlock::iterator MBBI) {
unsigned Opcode = AArch64::getSVEPseudoMap(MI.getOpcode());		unsigned Opcode = AArch64::getSVEPseudoMap(MI.getOpcode());
uint64_t DType = TII->get(Opcode).TSFlags & AArch64::DestructiveInstTypeMask;		uint64_t DType = TII->get(Opcode).TSFlags & AArch64::DestructiveInstTypeMask;
uint64_t FalseLanes = MI.getDesc().TSFlags & AArch64::FalseLanesMask;		uint64_t FalseLanes = MI.getDesc().TSFlags & AArch64::FalseLanesMask;
bool FalseZero = FalseLanes == AArch64::FalseLanesZero;

unsigned DstReg = MI.getOperand(0).getReg();		unsigned DstReg = MI.getOperand(0).getReg();
bool DstIsDead = MI.getOperand(0).isDead();		bool DstIsDead = MI.getOperand(0).isDead();

if (DType == AArch64::DestructiveBinary)		if (DType == AArch64::DestructiveBinary)
assert(DstReg != MI.getOperand(3).getReg());		assert(DstReg != MI.getOperand(3).getReg());

bool UseRev = false;		bool UseRev = false;
unsigned PredIdx, DOPIdx, SrcIdx;		unsigned PredIdx, DOPIdx, SrcIdx, PassthruIdx;
		sdesmalenUnsubmitted Not Done Reply Inline Actions Can you update the description of `expand_DestructiveOp` to describe the new style of pseudos like `FSUB_ZPZZ_MERGE_B`? sdesmalen: Can you update the description of `expand_DestructiveOp` to describe the new style of pseudos…
		cameron.mcinallyAuthorUnsubmitted Not Done Reply Inline Actions Updated. Please let me know if that's what you had in mine. Advertising copy is not my strong suit... cameron.mcinally: Updated. Please let me know if that's what you had in mine. Advertising copy is not my strong…
switch (DType) {		switch (DType) {
case AArch64::DestructiveBinaryComm:		case AArch64::DestructiveBinaryComm:
case AArch64::DestructiveBinaryCommWithRev:		case AArch64::DestructiveBinaryCommWithRev:
if (DstReg == MI.getOperand(3).getReg()) {		if (DstReg == MI.getOperand(3).getReg()) {
// FSUB Zd, Pg, Zs1, Zd ==> FSUBR Zd, Pg/m, Zd, Zs1		// FSUB Zd, Pg, Zs1, Zd ==> FSUBR Zd, Pg/m, Zd, Zs1
std::tie(PredIdx, DOPIdx, SrcIdx) = std::make_tuple(1, 3, 2);		std::tie(PredIdx, DOPIdx, SrcIdx, PassthruIdx) = std::make_tuple(1, 3, 2, 4);
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: `s/PassIdx/PassthruIdx/` ? sdesmalen: nit: `s/PassIdx/PassthruIdx/` ?
		cameron.mcinallyAuthorUnsubmitted Not Done Reply Inline Actions Updated, but I just realized that this isn't needed yet. With DstReg tied to the Passthru reg, I can replace `MI.getOperand(PassIdx).getReg()` with `DestReg` below. I think I'll leave it for now in case the untied/zeroing case needs it. Can tear it out later if it's not needed. cameron.mcinally: Updated, but I just realized that this isn't needed yet. With DstReg tied to the Passthru reg…
UseRev = true;		UseRev = true;
break;		break;
}		}
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case AArch64::DestructiveBinary:		case AArch64::DestructiveBinary:
case AArch64::DestructiveBinaryImm:		case AArch64::DestructiveBinaryImm:
std::tie(PredIdx, DOPIdx, SrcIdx) = std::make_tuple(1, 2, 3);		std::tie(PredIdx, DOPIdx, SrcIdx, PassthruIdx) = std::make_tuple(1, 2, 3, 4);
break;		break;
default:		default:
llvm_unreachable("Unsupported Destructive Operand type");		llvm_unreachable("Unsupported Destructive Operand type");
}		}

#ifndef NDEBUG		#ifndef NDEBUG
// MOVPRFX can only be used if the destination operand		// MOVPRFX can only be used if the destination operand
// is the destructive operand, not as any other operand,		// is the destructive operand, not as any other operand,
// so the Destructive Operand must be unique.		// so the Destructive Operand must be unique.
Show All 18 Lines	if (UseRev) {
if (AArch64::getSVERevInstr(Opcode) != -1)		if (AArch64::getSVERevInstr(Opcode) != -1)
Opcode = AArch64::getSVERevInstr(Opcode);		Opcode = AArch64::getSVERevInstr(Opcode);
else if (AArch64::getSVEOrigInstr(Opcode) != -1)		else if (AArch64::getSVEOrigInstr(Opcode) != -1)
Opcode = AArch64::getSVEOrigInstr(Opcode);		Opcode = AArch64::getSVEOrigInstr(Opcode);
}		}

// Get the right MOVPRFX		// Get the right MOVPRFX
uint64_t ElementSize = TII->getElementSizeForOpcode(Opcode);		uint64_t ElementSize = TII->getElementSizeForOpcode(Opcode);
unsigned MovPrfx, MovPrfxZero;		unsigned MovPrfx, MovPrfxZero, MovPrfxMerge;
switch (ElementSize) {		switch (ElementSize) {
case AArch64::ElementSizeNone:		case AArch64::ElementSizeNone:
case AArch64::ElementSizeB:		case AArch64::ElementSizeB:
MovPrfx = AArch64::MOVPRFX_ZZ;		MovPrfx = AArch64::MOVPRFX_ZZ;
MovPrfxZero = AArch64::MOVPRFX_ZPzZ_B;		MovPrfxZero = AArch64::MOVPRFX_ZPzZ_B;
		MovPrfxMerge = AArch64::MOVPRFX_ZPmZ_B;
break;		break;
case AArch64::ElementSizeH:		case AArch64::ElementSizeH:
MovPrfx = AArch64::MOVPRFX_ZZ;		MovPrfx = AArch64::MOVPRFX_ZZ;
MovPrfxZero = AArch64::MOVPRFX_ZPzZ_H;		MovPrfxZero = AArch64::MOVPRFX_ZPzZ_H;
		MovPrfxMerge = AArch64::MOVPRFX_ZPmZ_H;
break;		break;
case AArch64::ElementSizeS:		case AArch64::ElementSizeS:
MovPrfx = AArch64::MOVPRFX_ZZ;		MovPrfx = AArch64::MOVPRFX_ZZ;
MovPrfxZero = AArch64::MOVPRFX_ZPzZ_S;		MovPrfxZero = AArch64::MOVPRFX_ZPzZ_S;
		MovPrfxMerge = AArch64::MOVPRFX_ZPmZ_S;
break;		break;
case AArch64::ElementSizeD:		case AArch64::ElementSizeD:
MovPrfx = AArch64::MOVPRFX_ZZ;		MovPrfx = AArch64::MOVPRFX_ZZ;
MovPrfxZero = AArch64::MOVPRFX_ZPzZ_D;		MovPrfxZero = AArch64::MOVPRFX_ZPzZ_D;
		MovPrfxMerge = AArch64::MOVPRFX_ZPmZ_D;
break;		break;
default:		default:
llvm_unreachable("Unsupported ElementSize");		llvm_unreachable("Unsupported ElementSize");
}		}

//		//
// Create the destructive operation (if required)		// Create the destructive operation (if required)
//		//
MachineInstrBuilder PRFX, DOP;		MachineInstrBuilder PRFX, DOP;
if (FalseZero) {		if (FalseLanes == AArch64::FalseLanesZero) {
assert(ElementSize != AArch64::ElementSizeNone &&		assert(ElementSize != AArch64::ElementSizeNone &&
"This instruction is unpredicated");		"This instruction is unpredicated");

		// If we're replacing the (DUP #0) with a zeroing MOVPRFX, walk
		// backwards through the MachineInstrs to see if the DUP can be
		// removed.
		unsigned PassthruReg = MI.getOperand(PassthruIdx).getReg();
		MachineBasicBlock::reverse_iterator RIt = MI.getReverseIterator();
		for (MachineInstr &PredI : make_range(std::next(RIt), MBB.rend())) {
		// If there are any uses of the DUP, don't remove it.
		if (PredI.readsRegister(PassthruReg))
		break;

		// If we found the DUP with no other uses, remove it.
		if (PredI.definesRegister(PassthruReg)) {
		PredI.eraseFromParent();
		break;
		}
		}

// Merge source operand into destination register		// Merge source operand into destination register
PRFX = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(MovPrfxZero))		PRFX = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(MovPrfxZero))
.addReg(DstReg, RegState::Define)		.addReg(DstReg, RegState::Define)
.addReg(MI.getOperand(PredIdx).getReg())		.addReg(MI.getOperand(PredIdx).getReg())
.addReg(MI.getOperand(DOPIdx).getReg());		.addReg(MI.getOperand(DOPIdx).getReg());

// After the movprfx, the destructive operand is same as Dst		// After the movprfx, the destructive operand is same as Dst
DOPIdx = 0;		DOPIdx = 0;
		} else if (FalseLanes == AArch64::FalseLanesMerge) {
		unsigned PassthruReg = MI.getOperand(PassthruIdx).getReg();
		unsigned DOPReg = MI.getOperand(DOPIdx).getReg();

		// Generate a MOVPRFX to merge the false lanes. If the src and
		// dst regs are the same, there's nothing to be done.
		if (PassthruReg != DOPReg)
		PRFX = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(MovPrfxMerge))
		.addReg(PassthruReg, RegState::Define)
		.addReg(PassthruReg)
		.addReg(MI.getOperand(PredIdx).getReg())
		.addReg(DOPReg);

		// After the movprfx, the destructive operand is same as Dst
		DOPIdx = 0;
} else if (DstReg != MI.getOperand(DOPIdx).getReg()) {		} else if (DstReg != MI.getOperand(DOPIdx).getReg()) {
PRFX = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(MovPrfx))		PRFX = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(MovPrfx))
.addReg(DstReg, RegState::Define)		.addReg(DstReg, RegState::Define)
.addReg(MI.getOperand(DOPIdx).getReg());		.addReg(MI.getOperand(DOPIdx).getReg());

		// After the movprfx, the destructive operand is same as Dst
DOPIdx = 0;		DOPIdx = 0;
}		}

//		//
// Create the destructive operation		// Create the destructive operation
//		//
DOP = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(Opcode))		DOP = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(Opcode))
.addReg(DstReg, RegState::Define \| getDeadRegState(DstIsDead));		.addReg(DstReg, RegState::Define \| getDeadRegState(DstIsDead));
▲ Show 20 Lines • Show All 502 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	bool SelectDupZeroOrUndef(SDValue N) {
}		}
default:		default:
break;		break;
}		}

return false;		return false;
}		}

bool SelectDupZero(SDValue N) {		bool SelectDupZero(SDValue N, SDValue &Res) {
switch(N->getOpcode()) {		switch(N->getOpcode()) {
case AArch64ISD::DUP:		case AArch64ISD::DUP:
case ISD::SPLAT_VECTOR: {		case ISD::SPLAT_VECTOR: {
		Res = N;
auto Opnd0 = N->getOperand(0);		auto Opnd0 = N->getOperand(0);
if (auto CN = dyn_cast<ConstantSDNode>(Opnd0))		if (auto CN = dyn_cast<ConstantSDNode>(Opnd0))
if (CN->isNullValue())		if (CN->isNullValue())
return true;		return true;
if (auto CN = dyn_cast<ConstantFPSDNode>(Opnd0))		if (auto CN = dyn_cast<ConstantFPSDNode>(Opnd0))
if (CN->isZero())		if (CN->isZero())
return true;		return true;
break;		break;
▲ Show 20 Lines • Show All 4,566 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show All 31 Lines
	def DestructiveUnary : DestructiveInstTypeEnum<2>;			def DestructiveUnary : DestructiveInstTypeEnum<2>;
	def DestructiveBinaryImm : DestructiveInstTypeEnum<3>;			def DestructiveBinaryImm : DestructiveInstTypeEnum<3>;
	def DestructiveBinaryShImmUnpred : DestructiveInstTypeEnum<4>;			def DestructiveBinaryShImmUnpred : DestructiveInstTypeEnum<4>;
	def DestructiveBinary : DestructiveInstTypeEnum<5>;			def DestructiveBinary : DestructiveInstTypeEnum<5>;
	def DestructiveBinaryComm : DestructiveInstTypeEnum<6>;			def DestructiveBinaryComm : DestructiveInstTypeEnum<6>;
	def DestructiveBinaryCommWithRev : DestructiveInstTypeEnum<7>;			def DestructiveBinaryCommWithRev : DestructiveInstTypeEnum<7>;
	def DestructiveTernaryCommWithRev : DestructiveInstTypeEnum<8>;			def DestructiveTernaryCommWithRev : DestructiveInstTypeEnum<8>;

	class FalseLanesEnum<bits<2> val> {			class FalseLanesEnum<bits<3> val> {
	bits<2> Value = val;			bits<3> Value = val;
	}			}
	def FalseLanesNone : FalseLanesEnum<0>;			def FalseLanesNone : FalseLanesEnum<0>;
	def FalseLanesZero : FalseLanesEnum<1>;			def FalseLanesZero : FalseLanesEnum<1>;
	def FalseLanesUndef : FalseLanesEnum<2>;			def FalseLanesUndef : FalseLanesEnum<2>;
				def FalseLanesMerge : FalseLanesEnum<4>;

	// AArch64 Instruction Format			// AArch64 Instruction Format
	class AArch64Inst<Format f, string cstr> : Instruction {			class AArch64Inst<Format f, string cstr> : Instruction {
	field bits<32> Inst; // Instruction encoding.			field bits<32> Inst; // Instruction encoding.
	// Mask of bits that cause an encoding to be UNPREDICTABLE.			// Mask of bits that cause an encoding to be UNPREDICTABLE.
	// If a bit is set, then if the corresponding bit in the			// If a bit is set, then if the corresponding bit in the
	// target encoding differs from its value in the "Inst" field,			// target encoding differs from its value in the "Inst" field,
	// the instruction is UNPREDICTABLE (SoftFail in abstract parlance).			// the instruction is UNPREDICTABLE (SoftFail in abstract parlance).
	field bits<32> Unpredictable = 0;			field bits<32> Unpredictable = 0;
	// SoftFail is the generic name for this field, but we alias it so			// SoftFail is the generic name for this field, but we alias it so
	// as to make it more obvious what it means in ARM-land.			// as to make it more obvious what it means in ARM-land.
	field bits<32> SoftFail = Unpredictable;			field bits<32> SoftFail = Unpredictable;
	let Namespace = "AArch64";			let Namespace = "AArch64";
	Format F = f;			Format F = f;
	bits<2> Form = F.Value;			bits<2> Form = F.Value;

	// Defaults			// Defaults
	FalseLanesEnum FalseLanes = FalseLanesNone;			FalseLanesEnum FalseLanes = FalseLanesNone;
	DestructiveInstTypeEnum DestructiveInstType = NotDestructive;			DestructiveInstTypeEnum DestructiveInstType = NotDestructive;
	ElementSizeEnum ElementSize = ElementSizeNone;			ElementSizeEnum ElementSize = ElementSizeNone;

	let TSFlags{8-7} = FalseLanes.Value;			let TSFlags{9-7} = FalseLanes.Value;
	let TSFlags{6-3} = DestructiveInstType.Value;			let TSFlags{6-3} = DestructiveInstType.Value;
	let TSFlags{2-0} = ElementSize.Value;			let TSFlags{2-0} = ElementSize.Value;

	let Pattern = [];			let Pattern = [];
	let Constraints = cstr;			let Constraints = cstr;
	}			}

	class InstSubst<string Asm, dag Result, bit EmitPriority = 0>			class InstSubst<string Asm, dag Result, bit EmitPriority = 0>
	▲ Show 20 Lines • Show All 11,134 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.h

Show First 20 Lines • Show All 414 Lines • ▼ Show 20 Lines	enum DestructiveInstType {
DestructiveBinaryShImmUnpred = TSFLAG_DESTRUCTIVE_INST_TYPE(0x4),		DestructiveBinaryShImmUnpred = TSFLAG_DESTRUCTIVE_INST_TYPE(0x4),
DestructiveBinary = TSFLAG_DESTRUCTIVE_INST_TYPE(0x5),		DestructiveBinary = TSFLAG_DESTRUCTIVE_INST_TYPE(0x5),
DestructiveBinaryComm = TSFLAG_DESTRUCTIVE_INST_TYPE(0x6),		DestructiveBinaryComm = TSFLAG_DESTRUCTIVE_INST_TYPE(0x6),
DestructiveBinaryCommWithRev = TSFLAG_DESTRUCTIVE_INST_TYPE(0x7),		DestructiveBinaryCommWithRev = TSFLAG_DESTRUCTIVE_INST_TYPE(0x7),
DestructiveTernaryCommWithRev = TSFLAG_DESTRUCTIVE_INST_TYPE(0x8),		DestructiveTernaryCommWithRev = TSFLAG_DESTRUCTIVE_INST_TYPE(0x8),
};		};

enum FalseLaneType {		enum FalseLaneType {
FalseLanesMask = TSFLAG_FALSE_LANE_TYPE(0x3),		FalseLanesMask = TSFLAG_FALSE_LANE_TYPE(0x7),
FalseLanesZero = TSFLAG_FALSE_LANE_TYPE(0x1),		FalseLanesZero = TSFLAG_FALSE_LANE_TYPE(0x1),
FalseLanesUndef = TSFLAG_FALSE_LANE_TYPE(0x2),		FalseLanesUndef = TSFLAG_FALSE_LANE_TYPE(0x2),
		FalseLanesMerge = TSFLAG_FALSE_LANE_TYPE(0x4),
};		};

#undef TSFLAG_ELEMENT_SIZE_TYPE		#undef TSFLAG_ELEMENT_SIZE_TYPE
#undef TSFLAG_DESTRUCTIVE_INST_TYPE		#undef TSFLAG_DESTRUCTIVE_INST_TYPE
#undef TSFLAG_FALSE_LANE_TYPE		#undef TSFLAG_FALSE_LANE_TYPE

int getSVEPseudoMap(uint16_t Opcode);		int getSVEPseudoMap(uint16_t Opcode);
int getSVERevInstr(uint16_t Opcode);		int getSVERevInstr(uint16_t Opcode);
int getSVEOrigInstr(uint16_t Opcode);		int getSVEOrigInstr(uint16_t Opcode);
}		}

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp

Show First 20 Lines • Show All 399 Lines • ▼ Show 20 Lines	if (ST.hasFusion()) {
DAG->addMutation(createAArch64MacroFusionDAGMutation());		DAG->addMutation(createAArch64MacroFusionDAGMutation());
return DAG;		return DAG;
}		}

return nullptr;		return nullptr;
}		}

void addIRPasses() override;		void addIRPasses() override;
		bool addPostCoalesce() override;
bool addPreISel() override;		bool addPreISel() override;
bool addInstSelector() override;		bool addInstSelector() override;
bool addIRTranslator() override;		bool addIRTranslator() override;
void addPreLegalizeMachineIR() override;		void addPreLegalizeMachineIR() override;
bool addLegalizeMachineIR() override;		bool addLegalizeMachineIR() override;
bool addRegBankSelect() override;		bool addRegBankSelect() override;
void addPreGlobalInstructionSelect() override;		void addPreGlobalInstructionSelect() override;
bool addGlobalInstructionSelect() override;		bool addGlobalInstructionSelect() override;
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	addPass(createAArch64StackTaggingPass(/* MergeInit = */ TM->getOptLevel() !=
CodeGenOpt::None));		CodeGenOpt::None));

// Add Control Flow Guard checks.		// Add Control Flow Guard checks.
if (TM->getTargetTriple().isOSWindows())		if (TM->getTargetTriple().isOSWindows())
addPass(createCFGuardCheckPass());		addPass(createCFGuardCheckPass());
}		}

// Pass Pipeline Configuration		// Pass Pipeline Configuration
		bool AArch64PassConfig::addPostCoalesce() {
		// Add a pass that transforms SVE MOVPRFXable Pseudo instructions
		// to add an 'earlyclobber' under certain conditions
		addPass(createSVEConditionalEarlyClobberPass());

		return false;
		}

bool AArch64PassConfig::addPreISel() {		bool AArch64PassConfig::addPreISel() {
// Run promote constant before global merge, so that the promoted constants		// Run promote constant before global merge, so that the promoted constants
// get a chance to be merged		// get a chance to be merged
if (TM->getOptLevel() != CodeGenOpt::None && EnablePromoteConstant)		if (TM->getOptLevel() != CodeGenOpt::None && EnablePromoteConstant)
addPass(createAArch64PromoteConstantPass());		addPass(createAArch64PromoteConstantPass());
// FIXME: On AArch64, this depends on the type.		// FIXME: On AArch64, this depends on the type.
// Basically, the addressable offsets are up to 4095 * Ty.getSizeInBytes().		// Basically, the addressable offsets are up to 4095 * Ty.getSizeInBytes().
// and the offset has to be a multiple of the related size in bytes.		// and the offset has to be a multiple of the related size in bytes.
▲ Show 20 Lines • Show All 183 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/CMakeLists.txt

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	add_llvm_target(AArch64CodeGen
AArch64SpeculationHardening.cpp		AArch64SpeculationHardening.cpp
AArch64StackTagging.cpp		AArch64StackTagging.cpp
AArch64StackTaggingPreRA.cpp		AArch64StackTaggingPreRA.cpp
AArch64StorePairSuppress.cpp		AArch64StorePairSuppress.cpp
AArch64Subtarget.cpp		AArch64Subtarget.cpp
AArch64TargetMachine.cpp		AArch64TargetMachine.cpp
AArch64TargetObjectFile.cpp		AArch64TargetObjectFile.cpp
AArch64TargetTransformInfo.cpp		AArch64TargetTransformInfo.cpp
		SVEConditionalEarlyClobberPass.cpp
SVEIntrinsicOpts.cpp		SVEIntrinsicOpts.cpp
AArch64SIMDInstrOpt.cpp		AArch64SIMDInstrOpt.cpp

DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen
)		)

add_subdirectory(AsmParser)		add_subdirectory(AsmParser)
add_subdirectory(Disassembler)		add_subdirectory(Disassembler)
add_subdirectory(MCTargetDesc)		add_subdirectory(MCTargetDesc)
add_subdirectory(TargetInfo)		add_subdirectory(TargetInfo)
add_subdirectory(Utils)		add_subdirectory(Utils)

llvm/lib/Target/AArch64/SVEConditionalEarlyClobberPass.cpp

This file was added.

Property	Old Value	New Value
File Mode	null	100755

				//==-- SVEConditionalEarlyClobberPass.cpp - Conditionally add early clobber ==//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass solves an issue with MOVPRFXable instructions that
				// have the restriction that the destination register of a MOVPRFX
				// cannot be used in any operand of the next instruction, except for
				// the destructive operand.
				//
				// We chose to create Pseudo instructions to implement false-lane zeroing,
				// where we specifically tried not to use the '$Zd = $Zs1' restriction
				// so that the register allocator doesn't insert normal
				// MOV instructions. The downside of doing that, is that the register
				// allocation of:
				// vreg1 = OP_ZEROING vreg0, vreg0
				// may result in:
				// Z8 = OP_ZEROING Z8, Z8
				//
				// At expand time, the OP_ZEROING will either need a scratch register to
				// implement an actual 'MOV(DUP(0))', or will need to use a MOVPRFX Pg/z
				// with a dummy ('nop'-like) MOVPRFXable instruction, like LSL #0.
				//
				// This is better handled by the register allocator creating an allocation
				// that takes the above restriction into account, e.g.
				// Z3 = OP_ZEROING Z8, Z8
				// which can be correctly expanded into:
				// Z3 = MOVPRFX Pg/z, Z8
				// Z3 = OP Z3, Z8
				//
				// After Coalescing of virtual registers, we know whether the input operands
				// to the instruction will be in the same register or not.
				// For our example:
				// vreg1 = OP_ZEROING vreg0, vreg0
				// we know that vreg0 and vreg0 will be equal, but we don't know the
				// register allocation of vreg1. We want to force that vreg1 will be different
				// from vreg0, which can be done using an 'earlyclobber'.
				//
				// This pass adds the earlyclobber to the machine operand, and also updates
				// the cache of live ranges so that subsequent passes don't need to
				// recalculate those for the newly added earlyclobber.
				//
				//===----------------------------------------------------------------------===//

				#include "AArch64InstrInfo.h"
				#include "llvm/CodeGen/LiveIntervals.h"
				#include "llvm/CodeGen/LivePhysRegs.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/SlotIndexes.h"
				#include "llvm/CodeGen/TargetSubtargetInfo.h"
				using namespace llvm;

				#define PASS_SHORT_NAME "Conditional Early Clobber"

				namespace llvm {
				void initializeSVEConditionalEarlyClobberPassPass(PassRegistry &);
				}

				namespace {
				class SVEConditionalEarlyClobberPass : public MachineFunctionPass {
				public:
				static char ID;
				SVEConditionalEarlyClobberPass() : MachineFunctionPass(ID) {
				initializeSVEConditionalEarlyClobberPassPass(
				*PassRegistry::getPassRegistry());
				}

				bool runOnMachineFunction(MachineFunction &Fn) override;

				StringRef getPassName() const override { return PASS_SHORT_NAME; }

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				AU.addRequired<LiveIntervals>();
				AU.addPreserved<LiveIntervals>();
				AU.addRequired<SlotIndexes>();
				AU.addPreserved<SlotIndexes>();
				MachineFunctionPass::getAnalysisUsage(AU);
				}
				private:
				const TargetInstrInfo *TII;
				LiveIntervals *LIS;

				bool addConditionalEC(MachineInstr &MI);
				bool hasConditionalClobber(const MachineInstr &MI);
				};
				char SVEConditionalEarlyClobberPass::ID = 0;
				}

				INITIALIZE_PASS(SVEConditionalEarlyClobberPass,
				"aarch64-conditional-early-clobber",
				PASS_SHORT_NAME, false, false)

				FunctionPass *llvm::createSVEConditionalEarlyClobberPass() {
				return new SVEConditionalEarlyClobberPass();
				}

				// We could also choose to do this with a new instruction annotation
				// like 'earlyclobberif($Zd=$Zs1)', but because this is so specific to SVE
				// it should be fine to explicitly check the type of SVE operation where
				// we know what the conditions are.
				bool SVEConditionalEarlyClobberPass::hasConditionalClobber(
				const MachineInstr &MI) {
				int Instr = AArch64::getSVEPseudoMap(MI.getOpcode());
				if (Instr == -1)
				return false;

				uint64_t FalseLanesZero = MI.getDesc().TSFlags & AArch64::FalseLanesZero;
				if (!FalseLanesZero)
				return false;

				uint64_t DType =
				TII->get(Instr).TSFlags & AArch64::DestructiveInstTypeMask;
				auto mo_equals = [&](const MachineOperand &MO1, const MachineOperand &MO2) {
				if (MO1.getReg() == MO2.getReg() && MO1.getSubReg() == MO2.getSubReg()) {
				// This is needed to deal with cases where subreg assignment means that
				// the earlyclobber isn't necessary.
				return MI.getOperand(0).getSubReg() == MO1.getSubReg() \|\|
				((MO1.getSubReg() == 0) ^ (MI.getOperand(0).getSubReg() == 0));
				}
				return false;
				};
				switch (DType) {
				case AArch64::DestructiveBinary:
				case AArch64::DestructiveBinaryComm:
				case AArch64::DestructiveBinaryCommWithRev:
				return mo_equals(MI.getOperand(2), MI.getOperand(3));
				case AArch64::DestructiveTernaryCommWithRev:
				return mo_equals(MI.getOperand(2), MI.getOperand(3)) \|\|
				mo_equals(MI.getOperand(2), MI.getOperand(4)) \|\|
				mo_equals(MI.getOperand(3), MI.getOperand(4));
				case AArch64::NotDestructive:
				case AArch64::DestructiveBinaryImm:
				case AArch64::DestructiveBinaryShImmUnpred:
				return false;
				default:
				break;
				}

				llvm_unreachable("Not a known destructive operand type");
				}

				bool SVEConditionalEarlyClobberPass::addConditionalEC(MachineInstr &MI) {
				// If the operand is already 'earlyclobber' or it doesn't require
				// adding a conditional one (based on instruction), then don't bother.
				if (!hasConditionalClobber(MI))
				return false;

				if (MI.getOperand(0).isEarlyClobber())
				return false;

				assert(MI.getOperand(0).isDef());

				// Set the 'EarlyClobber' attribute for when the live ranges need
				// to be recalculated.
				MI.getOperand(0).setIsEarlyClobber(true);

				SlotIndex Index = LIS->getInstructionIndex(MI);
				SlotIndex DefSlot = Index.getRegSlot(0);

				// Update the LiveRange cache by extending the liferange of the
				// 'Def' register to be live earlier, so it overlaps with the
				// live ranges of the input operands.
				unsigned Reg = MI.getOperand(0).getReg();
				auto *Seg = LIS->getInterval(Reg).getSegmentContaining(DefSlot);
				assert(Seg && "Expected Def operand to be live with instruction");
				Seg->start = Index.getRegSlot(true);
				Seg->valno->def = Seg->start;

				return true;
				}

				bool SVEConditionalEarlyClobberPass::runOnMachineFunction(MachineFunction &MF) {
				LIS = &getAnalysis<LiveIntervals>();
				TII = MF.getSubtarget().getInstrInfo();

				bool Modified = false;
				for (auto &MBB : MF)
				for (auto &MI : MBB)
				Modified \|= addConditionalEC(MI);

				return Modified;
				}

llvm/lib/Target/AArch64/SVEInstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 367 Lines • ▼ Show 20 Lines	: Pat<(vtd (op vt1:$Op1, vt2:$Op2, (vt3 ImmTy:$Op3))),
(inst $Op1, $Op2, ImmTy:$Op3)>;		(inst $Op1, $Op2, ImmTy:$Op3)>;

class SVE_4_Op_Imm_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,		class SVE_4_Op_Imm_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,
ValueType vt2, ValueType vt3, ValueType vt4,		ValueType vt2, ValueType vt3, ValueType vt4,
Operand ImmTy, Instruction inst>		Operand ImmTy, Instruction inst>
: Pat<(vtd (op vt1:$Op1, vt2:$Op2, vt3:$Op3, (vt4 ImmTy:$Op4))),		: Pat<(vtd (op vt1:$Op1, vt2:$Op2, vt3:$Op3, (vt4 ImmTy:$Op4))),
(inst $Op1, $Op2, $Op3, ImmTy:$Op4)>;		(inst $Op1, $Op2, $Op3, ImmTy:$Op4)>;

def SVEDup0 : ComplexPattern<i64, 0, "SelectDupZero", []>;		def SVEDup0 : ComplexPattern<vAny, 1, "SelectDupZero", []>;
def SVEDup0Undef : ComplexPattern<i64, 0, "SelectDupZeroOrUndef", []>;		def SVEDup0Undef : ComplexPattern<i64, 0, "SelectDupZeroOrUndef", []>;

let AddedComplexity = 1 in {		let AddedComplexity = 1 in {
class SVE_3_Op_Pat_SelZero<ValueType vtd, SDPatternOperator op, ValueType vt1,		class SVE_3_Op_Pat_SelZero<ValueType vtd, SDPatternOperator op, ValueType vt1,
ValueType vt2, ValueType vt3, Instruction inst>		ValueType vt2, ValueType vt3, Instruction inst>
: Pat<(vtd (vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (SVEDup0)), vt3:$Op3))),		: Pat<(vtd (vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (SVEDup0)), vt3:$Op3))),
(inst $Op1, $Op2, $Op3)>;		(inst $Op1, $Op2, $Op3)>;

		class SVE_3_Op_Pat_Sel_Passthru<ValueType vtd, SDPatternOperator op, ValueType vt1,
		ValueType vt2, ValueType vt3, Instruction inst>
		: Pat<(vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, vt2:$Passthru), vt3:$Op3)),
		(inst $Op1, $Op2, $Op3, $Passthru)>;

		class SVE_3_Op_Pat_SelZero_Passthru<ValueType vtd, SDPatternOperator op, ValueType vt1,
		ValueType vt2, ValueType vt3, Instruction inst>
		: Pat<(vtd (vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (vt2 SVEDup0:$Dup)), vt3:$Op3))),
		(inst $Op1, $Op2, $Op3, $Dup)>;

class SVE_3_Op_Pat_Shift_Imm_SelZero<ValueType vtd, SDPatternOperator op,		class SVE_3_Op_Pat_Shift_Imm_SelZero<ValueType vtd, SDPatternOperator op,
ValueType vt1, ValueType vt2,		ValueType vt1, ValueType vt2,
Operand vt3, Instruction inst>		Operand vt3, Instruction inst>
: Pat<(vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (SVEDup0)), (i32 (vt3:$Op3)))),		: Pat<(vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (SVEDup0)), (i32 (vt3:$Op3)))),
(inst $Op1, $Op2, vt3:$Op3)>;		(inst $Op1, $Op2, vt3:$Op3)>;

		class SVE_3_Op_Pat_Shift_Imm_SelZero_Passthru<ValueType vtd, SDPatternOperator op,
		ValueType vt1, ValueType vt2,
		Operand vt3, Instruction inst>
		: Pat<(vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (vt2 SVEDup0:$Dup)), (i32 (vt3:$Op3)))),
		(inst $Op1, $Op2, vt3:$Op3, $Dup)>;
}		}

//		//
// Common but less generic patterns.		// Common but less generic patterns.
//		//

class SVE_1_Op_AllActive_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,		class SVE_1_Op_AllActive_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,
Instruction inst, Instruction ptrue>		Instruction inst, Instruction ptrue>
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	let hasNoSchedulingInfo = 1 in {
}		}

class PredTwoOpImmPseudo<string name, ZPRRegOp zprty, Operand immty,		class PredTwoOpImmPseudo<string name, ZPRRegOp zprty, Operand immty,
FalseLanesEnum flags = FalseLanesNone>		FalseLanesEnum flags = FalseLanesNone>
: SVEPseudo2Instr<name, 0>,		: SVEPseudo2Instr<name, 0>,
Pseudo<(outs zprty:$Zd), (ins PPR3bAny:$Pg, zprty:$Zs1, immty:$imm), []> {		Pseudo<(outs zprty:$Zd), (ins PPR3bAny:$Pg, zprty:$Zs1, immty:$imm), []> {
let FalseLanes = flags;		let FalseLanes = flags;
}		}

		class PredTwoOpMergePseudo<string name, ZPRRegOp zprty>
		: SVEPseudo2Instr<name, 0>,
		Pseudo<(outs zprty:$Zd), (ins PPR3bAny:$Pg, zprty:$Zs1, zprty:$Zs2, zprty:$Zpt), []> {
		let FalseLanes = FalseLanesMerge;
		let Constraints = "$Zd = $Zpt";
		}
		sdesmalenUnsubmitted Not Done Reply Inline Actions I'm happy to keep the constraint for now, but I don't think it is strictly necessary, because if `$Zd != $Zpt` it is still possible to generate a valid instruction sequence. For: define <vscale x 4 x float> @foo(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> %z1, <vscale x 4 x float> %passthru) { %z0_in = select <vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> %passthru %sub = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0_in, <vscale x 4 x float> %z0) ret <vscale x 4 x float> %sub } LLVM will generate: movprfx z2.s, p0/m, z0.s fsub z2.s, p0/m, z2.s, z0.s mov z0.d, z2.d ret Where the last `mov` is needed because the register allocator chose to allocate the pseudo as: z2 = FSUB_PSEUDO p0, z0, z0, z2 with the result operand z2 tied to the merge value. I think we could alternatively fall back on using a `select` instruction if the register allocator didn't have the tied operand restriction and instead generate: sel z0.s, p0, z0.s, z2.s fsub z0.s, p0/m, z0.s, z0.s ret sdesmalen: I'm happy to keep the constraint for now, but I don't think it is strictly necessary, because…
		cameron.mcinallyAuthorUnsubmitted Not Done Reply Inline Actions Ah, ok, I think I see the problem now. In this case we can clobber z0 since it only has the one use. I'll have to explore this further. I think I'll look at the ConditionalEarlyClobberPass to see if this case can be guarded against. If you know it can't, or are aware of other cases that may be problematic, please let me know. cameron.mcinally: Ah, ok, I think I see the problem now. In this case we can clobber z0 since it only has the one…

		class PredTwoOpMergeZeroPseudo<string name, ZPRRegOp zprty>
		: SVEPseudo2Instr<name, 0>,
		Pseudo<(outs zprty:$Zd), (ins PPR3bAny:$Pg, zprty:$Zs1, zprty:$Zs2, zprty:$Zpt), []> {
		let FalseLanes = FalseLanesZero;
		}

		class PredTwoOpImmMergeZeroPseudo<string name, ZPRRegOp zprty, Operand immty>
		: SVEPseudo2Instr<name, 0>,
		Pseudo<(outs zprty:$Zd), (ins PPR3bAny:$Pg, zprty:$Zs1, immty:$imm, zprty:$Zpt), []> {
		let FalseLanes = FalseLanesZero;
		}
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SVE Predicate Misc Group		// SVE Predicate Misc Group
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class sve_int_pfalse<bits<6> opc, string asm>		class sve_int_pfalse<bits<6> opc, string asm>
: I<(outs PPR8:$Pd), (ins),		: I<(outs PPR8:$Pd), (ins),
▲ Show 20 Lines • Show All 1,124 Lines • ▼ Show 20 Lines	multiclass sve_fp_2op_p_zds_fscale<bits<4> opc, string asm,
def _D : sve_fp_2op_p_zds<0b11, opc, asm, ZPR64>;		def _D : sve_fp_2op_p_zds<0b11, opc, asm, ZPR64>;

def : SVE_3_Op_Pat<nxv8f16, op, nxv8i1, nxv8f16, nxv8i16, !cast<Instruction>(NAME # _H)>;		def : SVE_3_Op_Pat<nxv8f16, op, nxv8i1, nxv8f16, nxv8i16, !cast<Instruction>(NAME # _H)>;
def : SVE_3_Op_Pat<nxv4f32, op, nxv4i1, nxv4f32, nxv4i32, !cast<Instruction>(NAME # _S)>;		def : SVE_3_Op_Pat<nxv4f32, op, nxv4i1, nxv4f32, nxv4i32, !cast<Instruction>(NAME # _S)>;
def : SVE_3_Op_Pat<nxv2f64, op, nxv2i1, nxv2f64, nxv2i64, !cast<Instruction>(NAME # _D)>;		def : SVE_3_Op_Pat<nxv2f64, op, nxv2i1, nxv2f64, nxv2i64, !cast<Instruction>(NAME # _D)>;
}		}

multiclass sve_fp_2op_p_zds_zx<SDPatternOperator op> {		multiclass sve_fp_2op_p_zds_zx<SDPatternOperator op> {
def _ZERO_H : PredTwoOpPseudo<NAME # _H, ZPR16, FalseLanesZero>;		def _ZERO_H : PredTwoOpMergeZeroPseudo<NAME # _H, ZPR16>;
def _ZERO_S : PredTwoOpPseudo<NAME # _S, ZPR32, FalseLanesZero>;		def _ZERO_S : PredTwoOpMergeZeroPseudo<NAME # _S, ZPR32>;
def _ZERO_D : PredTwoOpPseudo<NAME # _D, ZPR64, FalseLanesZero>;		def _ZERO_D : PredTwoOpMergeZeroPseudo<NAME # _D, ZPR64>;

def : SVE_3_Op_Pat_SelZero<nxv8f16, op, nxv8i1, nxv8f16, nxv8f16, !cast<Pseudo>(NAME # _ZERO_H)>;		def : SVE_3_Op_Pat_SelZero_Passthru<nxv8f16, op, nxv8i1, nxv8f16, nxv8f16, !cast<Pseudo>(NAME # _ZERO_H)>;
def : SVE_3_Op_Pat_SelZero<nxv4f32, op, nxv4i1, nxv4f32, nxv4f32, !cast<Pseudo>(NAME # _ZERO_S)>;		def : SVE_3_Op_Pat_SelZero_Passthru<nxv4f32, op, nxv4i1, nxv4f32, nxv4f32, !cast<Pseudo>(NAME # _ZERO_S)>;
def : SVE_3_Op_Pat_SelZero<nxv2f64, op, nxv2i1, nxv2f64, nxv2f64, !cast<Pseudo>(NAME # _ZERO_D)>;		def : SVE_3_Op_Pat_SelZero_Passthru<nxv2f64, op, nxv2i1, nxv2f64, nxv2f64, !cast<Pseudo>(NAME # _ZERO_D)>;

		def _MERGE_H : PredTwoOpMergePseudo<NAME # _H, ZPR16>;
		def _MERGE_S : PredTwoOpMergePseudo<NAME # _S, ZPR32>;
		def _MERGE_D : PredTwoOpMergePseudo<NAME # _D, ZPR64>;

		def : SVE_3_Op_Pat_Sel_Passthru<nxv8f16, op, nxv8i1, nxv8f16, nxv8f16, !cast<Pseudo>(NAME # _MERGE_H)>;
		def : SVE_3_Op_Pat_Sel_Passthru<nxv4f32, op, nxv4i1, nxv4f32, nxv4f32, !cast<Pseudo>(NAME # _MERGE_S)>;
		def : SVE_3_Op_Pat_Sel_Passthru<nxv2f64, op, nxv2i1, nxv2f64, nxv2f64, !cast<Pseudo>(NAME # _MERGE_D)>;
}		}

class sve_fp_ftmad<bits<2> sz, string asm, ZPRRegOp zprty>		class sve_fp_ftmad<bits<2> sz, string asm, ZPRRegOp zprty>
: I<(outs zprty:$Zdn), (ins zprty:$_Zdn, zprty:$Zm, imm32_0_7:$imm3),		: I<(outs zprty:$Zdn), (ins zprty:$_Zdn, zprty:$Zm, imm32_0_7:$imm3),
asm, "\t$Zdn, $_Zdn, $Zm, $imm3",		asm, "\t$Zdn, $_Zdn, $Zm, $imm3",
"",		"",
[]>, Sched<[]> {		[]>, Sched<[]> {
bits<5> Zdn;		bits<5> Zdn;
▲ Show 20 Lines • Show All 3,142 Lines • ▼ Show 20 Lines	def _D : SVEPseudo2Instr<psName # _D, 1>,
let Inst{9-8} = imm{4-3};		let Inst{9-8} = imm{4-3};
}		}

def _B_Z_UNDEF : PredTwoOpImmPseudo<psName # _B, ZPR8, tvecshiftL8, FalseLanesUndef>;		def _B_Z_UNDEF : PredTwoOpImmPseudo<psName # _B, ZPR8, tvecshiftL8, FalseLanesUndef>;
def _H_Z_UNDEF : PredTwoOpImmPseudo<psName # _H, ZPR16, tvecshiftL16, FalseLanesUndef>;		def _H_Z_UNDEF : PredTwoOpImmPseudo<psName # _H, ZPR16, tvecshiftL16, FalseLanesUndef>;
def _S_Z_UNDEF : PredTwoOpImmPseudo<psName # _S, ZPR32, tvecshiftL32, FalseLanesUndef>;		def _S_Z_UNDEF : PredTwoOpImmPseudo<psName # _S, ZPR32, tvecshiftL32, FalseLanesUndef>;
def _D_Z_UNDEF : PredTwoOpImmPseudo<psName # _D, ZPR64, tvecshiftL64, FalseLanesUndef>;		def _D_Z_UNDEF : PredTwoOpImmPseudo<psName # _D, ZPR64, tvecshiftL64, FalseLanesUndef>;

def _B_Z_ZERO : PredTwoOpImmPseudo<psName # _B, ZPR8, tvecshiftL8, FalseLanesZero>;		def _B_Z_ZERO : PredTwoOpImmMergeZeroPseudo<psName # _B, ZPR8, tvecshiftL8>;
def _H_Z_ZERO : PredTwoOpImmPseudo<psName # _H, ZPR16, tvecshiftL16, FalseLanesZero>;		def _H_Z_ZERO : PredTwoOpImmMergeZeroPseudo<psName # _H, ZPR16, tvecshiftL16>;
def _S_Z_ZERO : PredTwoOpImmPseudo<psName # _S, ZPR32, tvecshiftL32, FalseLanesZero>;		def _S_Z_ZERO : PredTwoOpImmMergeZeroPseudo<psName # _S, ZPR32, tvecshiftL32>;
def _D_Z_ZERO : PredTwoOpImmPseudo<psName # _D, ZPR64, tvecshiftL64, FalseLanesZero>;		def _D_Z_ZERO : PredTwoOpImmMergeZeroPseudo<psName # _D, ZPR64, tvecshiftL64>;

def : SVE_3_Op_Pat_Shift_Imm_SelZero<nxv16i8, op, nxv16i1, nxv16i8, tvecshiftL8, !cast<Pseudo>(NAME # _B_Z_ZERO)>;		def : SVE_3_Op_Pat_Shift_Imm_SelZero_Passthru<nxv16i8, op, nxv16i1, nxv16i8, tvecshiftL8, !cast<Pseudo>(NAME # _B_Z_ZERO)>;
def : SVE_3_Op_Pat_Shift_Imm_SelZero<nxv8i16, op, nxv8i1, nxv8i16, tvecshiftL16, !cast<Pseudo>(NAME # _H_Z_ZERO)>;		def : SVE_3_Op_Pat_Shift_Imm_SelZero_Passthru<nxv8i16, op, nxv8i1, nxv8i16, tvecshiftL16, !cast<Pseudo>(NAME # _H_Z_ZERO)>;
def : SVE_3_Op_Pat_Shift_Imm_SelZero<nxv4i32, op, nxv4i1, nxv4i32, tvecshiftL32, !cast<Pseudo>(NAME # _S_Z_ZERO)>;		def : SVE_3_Op_Pat_Shift_Imm_SelZero_Passthru<nxv4i32, op, nxv4i1, nxv4i32, tvecshiftL32, !cast<Pseudo>(NAME # _S_Z_ZERO)>;
def : SVE_3_Op_Pat_Shift_Imm_SelZero<nxv2i64, op, nxv2i1, nxv2i64, tvecshiftL64, !cast<Pseudo>(NAME # _D_Z_ZERO)>;		def : SVE_3_Op_Pat_Shift_Imm_SelZero_Passthru<nxv2i64, op, nxv2i1, nxv2i64, tvecshiftL64, !cast<Pseudo>(NAME # _D_Z_ZERO)>;

def : SVE_3_Op_Imm_Pat<nxv16i8, op, nxv16i1, nxv16i8, i32, tvecshiftL8, !cast<Instruction>(NAME # _B)>;		def : SVE_3_Op_Imm_Pat<nxv16i8, op, nxv16i1, nxv16i8, i32, tvecshiftL8, !cast<Instruction>(NAME # _B)>;
def : SVE_3_Op_Imm_Pat<nxv8i16, op, nxv8i1, nxv8i16, i32, tvecshiftL16, !cast<Instruction>(NAME # _H)>;		def : SVE_3_Op_Imm_Pat<nxv8i16, op, nxv8i1, nxv8i16, i32, tvecshiftL16, !cast<Instruction>(NAME # _H)>;
def : SVE_3_Op_Imm_Pat<nxv4i32, op, nxv4i1, nxv4i32, i32, tvecshiftL32, !cast<Instruction>(NAME # _S)>;		def : SVE_3_Op_Imm_Pat<nxv4i32, op, nxv4i1, nxv4i32, i32, tvecshiftL32, !cast<Instruction>(NAME # _S)>;
def : SVE_3_Op_Imm_Pat<nxv2i64, op, nxv2i1, nxv2i64, i32, tvecshiftL64, !cast<Instruction>(NAME # _D)>;		def : SVE_3_Op_Imm_Pat<nxv2i64, op, nxv2i1, nxv2i64, i32, tvecshiftL64, !cast<Instruction>(NAME # _D)>;
}		}

multiclass sve_int_bin_pred_shift_imm_right<bits<4> opc, string asm, string Ps,		multiclass sve_int_bin_pred_shift_imm_right<bits<4> opc, string asm, string Ps,
Show All 16 Lines	multiclass sve_int_bin_pred_shift_imm_right<bits<4> opc, string asm, string Ps,

def : SVE_3_Op_Imm_Pat<nxv16i8, op, nxv16i1, nxv16i8, i32, tvecshiftR8, !cast<Instruction>(NAME # _B)>;		def : SVE_3_Op_Imm_Pat<nxv16i8, op, nxv16i1, nxv16i8, i32, tvecshiftR8, !cast<Instruction>(NAME # _B)>;
def : SVE_3_Op_Imm_Pat<nxv8i16, op, nxv8i1, nxv8i16, i32, tvecshiftR16, !cast<Instruction>(NAME # _H)>;		def : SVE_3_Op_Imm_Pat<nxv8i16, op, nxv8i1, nxv8i16, i32, tvecshiftR16, !cast<Instruction>(NAME # _H)>;
def : SVE_3_Op_Imm_Pat<nxv4i32, op, nxv4i1, nxv4i32, i32, tvecshiftR32, !cast<Instruction>(NAME # _S)>;		def : SVE_3_Op_Imm_Pat<nxv4i32, op, nxv4i1, nxv4i32, i32, tvecshiftR32, !cast<Instruction>(NAME # _S)>;
def : SVE_3_Op_Imm_Pat<nxv2i64, op, nxv2i1, nxv2i64, i32, tvecshiftR64, !cast<Instruction>(NAME # _D)>;		def : SVE_3_Op_Imm_Pat<nxv2i64, op, nxv2i1, nxv2i64, i32, tvecshiftR64, !cast<Instruction>(NAME # _D)>;
}		}

multiclass sve_int_bin_pred_shift_0_right_zx<SDPatternOperator op = null_frag> {		multiclass sve_int_bin_pred_shift_0_right_zx<SDPatternOperator op = null_frag> {
def _ZERO_B : PredTwoOpImmPseudo<NAME # _B, ZPR8, vecshiftR8, FalseLanesZero>;		def _ZERO_B : PredTwoOpImmMergeZeroPseudo<NAME # _B, ZPR8, vecshiftR8>;
def _ZERO_H : PredTwoOpImmPseudo<NAME # _H, ZPR16, vecshiftR16, FalseLanesZero>;		def _ZERO_H : PredTwoOpImmMergeZeroPseudo<NAME # _H, ZPR16, vecshiftR16>;
def _ZERO_S : PredTwoOpImmPseudo<NAME # _S, ZPR32, vecshiftR32, FalseLanesZero>;		def _ZERO_S : PredTwoOpImmMergeZeroPseudo<NAME # _S, ZPR32, vecshiftR32>;
def _ZERO_D : PredTwoOpImmPseudo<NAME # _D, ZPR64, vecshiftR64, FalseLanesZero>;		def _ZERO_D : PredTwoOpImmMergeZeroPseudo<NAME # _D, ZPR64, vecshiftR64>;

def : SVE_3_Op_Pat_Shift_Imm_SelZero<nxv16i8, op, nxv16i1, nxv16i8, tvecshiftR8, !cast<Pseudo>(NAME # _ZERO_B)>;		def : SVE_3_Op_Pat_Shift_Imm_SelZero_Passthru<nxv16i8, op, nxv16i1, nxv16i8, tvecshiftR8, !cast<Pseudo>(NAME # _ZERO_B)>;
def : SVE_3_Op_Pat_Shift_Imm_SelZero<nxv8i16, op, nxv8i1, nxv8i16, tvecshiftR16, !cast<Pseudo>(NAME # _ZERO_H)>;		def : SVE_3_Op_Pat_Shift_Imm_SelZero_Passthru<nxv8i16, op, nxv8i1, nxv8i16, tvecshiftR16, !cast<Pseudo>(NAME # _ZERO_H)>;
def : SVE_3_Op_Pat_Shift_Imm_SelZero<nxv4i32, op, nxv4i1, nxv4i32, tvecshiftR32, !cast<Pseudo>(NAME # _ZERO_S)>;		def : SVE_3_Op_Pat_Shift_Imm_SelZero_Passthru<nxv4i32, op, nxv4i1, nxv4i32, tvecshiftR32, !cast<Pseudo>(NAME # _ZERO_S)>;
def : SVE_3_Op_Pat_Shift_Imm_SelZero<nxv2i64, op, nxv2i1, nxv2i64, tvecshiftR64, !cast<Pseudo>(NAME # _ZERO_D)>;		def : SVE_3_Op_Pat_Shift_Imm_SelZero_Passthru<nxv2i64, op, nxv2i1, nxv2i64, tvecshiftR64, !cast<Pseudo>(NAME # _ZERO_D)>;
}		}

class sve_int_bin_pred_shift<bits<2> sz8_64, bit wide, bits<3> opc,		class sve_int_bin_pred_shift<bits<2> sz8_64, bit wide, bits<3> opc,
string asm, ZPRRegOp zprty, ZPRRegOp zprty2>		string asm, ZPRRegOp zprty, ZPRRegOp zprty2>
: I<(outs zprty:$Zdn), (ins PPR3bAny:$Pg, zprty:$_Zdn, zprty2:$Zm),		: I<(outs zprty:$Zdn), (ins PPR3bAny:$Pg, zprty:$_Zdn, zprty2:$Zm),
asm, "\t$Zdn, $Pg/m, $_Zdn, $Zm",		asm, "\t$Zdn, $Pg/m, $_Zdn, $Zm",
"",		"",
[]>, Sched<[]> {		[]>, Sched<[]> {
Show All 29 Lines	multiclass sve_int_bin_pred_shift<bits<3> opc, string asm, string Ps,
}		}
def : SVE_3_Op_Pat<nxv16i8, op, nxv16i1, nxv16i8, nxv16i8, !cast<Instruction>(NAME # _B)>;		def : SVE_3_Op_Pat<nxv16i8, op, nxv16i1, nxv16i8, nxv16i8, !cast<Instruction>(NAME # _B)>;
def : SVE_3_Op_Pat<nxv8i16, op, nxv8i1, nxv8i16, nxv8i16, !cast<Instruction>(NAME # _H)>;		def : SVE_3_Op_Pat<nxv8i16, op, nxv8i1, nxv8i16, nxv8i16, !cast<Instruction>(NAME # _H)>;
def : SVE_3_Op_Pat<nxv4i32, op, nxv4i1, nxv4i32, nxv4i32, !cast<Instruction>(NAME # _S)>;		def : SVE_3_Op_Pat<nxv4i32, op, nxv4i1, nxv4i32, nxv4i32, !cast<Instruction>(NAME # _S)>;
def : SVE_3_Op_Pat<nxv2i64, op, nxv2i1, nxv2i64, nxv2i64, !cast<Instruction>(NAME # _D)>;		def : SVE_3_Op_Pat<nxv2i64, op, nxv2i1, nxv2i64, nxv2i64, !cast<Instruction>(NAME # _D)>;
}		}

multiclass sve_int_bin_pred_zx<SDPatternOperator op> {		multiclass sve_int_bin_pred_zx<SDPatternOperator op> {
def _ZERO_B : PredTwoOpPseudo<NAME # _B, ZPR8, FalseLanesZero>;		def _ZERO_B : PredTwoOpMergeZeroPseudo<NAME # _B, ZPR8>;
def _ZERO_H : PredTwoOpPseudo<NAME # _H, ZPR16, FalseLanesZero>;		def _ZERO_H : PredTwoOpMergeZeroPseudo<NAME # _H, ZPR16>;
def _ZERO_S : PredTwoOpPseudo<NAME # _S, ZPR32, FalseLanesZero>;		def _ZERO_S : PredTwoOpMergeZeroPseudo<NAME # _S, ZPR32>;
def _ZERO_D : PredTwoOpPseudo<NAME # _D, ZPR64, FalseLanesZero>;		def _ZERO_D : PredTwoOpMergeZeroPseudo<NAME # _D, ZPR64>;

def : SVE_3_Op_Pat_SelZero<nxv16i8, op, nxv16i1, nxv16i8, nxv16i8, !cast<Pseudo>(NAME # _ZERO_B)>;		def : SVE_3_Op_Pat_SelZero_Passthru<nxv16i8, op, nxv16i1, nxv16i8, nxv16i8, !cast<Pseudo>(NAME # _ZERO_B)>;
def : SVE_3_Op_Pat_SelZero<nxv8i16, op, nxv8i1, nxv8i16, nxv8i16, !cast<Pseudo>(NAME # _ZERO_H)>;		def : SVE_3_Op_Pat_SelZero_Passthru<nxv8i16, op, nxv8i1, nxv8i16, nxv8i16, !cast<Pseudo>(NAME # _ZERO_H)>;
def : SVE_3_Op_Pat_SelZero<nxv4i32, op, nxv4i1, nxv4i32, nxv4i32, !cast<Pseudo>(NAME # _ZERO_S)>;		def : SVE_3_Op_Pat_SelZero_Passthru<nxv4i32, op, nxv4i1, nxv4i32, nxv4i32, !cast<Pseudo>(NAME # _ZERO_S)>;
def : SVE_3_Op_Pat_SelZero<nxv2i64, op, nxv2i1, nxv2i64, nxv2i64, !cast<Pseudo>(NAME # _ZERO_D)>;		def : SVE_3_Op_Pat_SelZero_Passthru<nxv2i64, op, nxv2i1, nxv2i64, nxv2i64, !cast<Pseudo>(NAME # _ZERO_D)>;

		def _MERGE_B : PredTwoOpMergePseudo<NAME # _B, ZPR8>;
		def _MERGE_H : PredTwoOpMergePseudo<NAME # _H, ZPR16>;
		def _MERGE_S : PredTwoOpMergePseudo<NAME # _S, ZPR32>;
		def _MERGE_D : PredTwoOpMergePseudo<NAME # _D, ZPR64>;

		def : SVE_3_Op_Pat_Sel_Passthru<nxv16i8, op, nxv16i1, nxv16i8, nxv16i8, !cast<Pseudo>(NAME # _MERGE_B)>;
		def : SVE_3_Op_Pat_Sel_Passthru<nxv8i16, op, nxv8i1, nxv8i16, nxv8i16, !cast<Pseudo>(NAME # _MERGE_H)>;
		def : SVE_3_Op_Pat_Sel_Passthru<nxv4i32, op, nxv4i1, nxv4i32, nxv4i32, !cast<Pseudo>(NAME # _MERGE_S)>;
		def : SVE_3_Op_Pat_Sel_Passthru<nxv2i64, op, nxv2i1, nxv2i64, nxv2i64, !cast<Pseudo>(NAME # _MERGE_D)>;
}		}

multiclass sve_int_bin_pred_shift_wide<bits<3> opc, string asm,		multiclass sve_int_bin_pred_shift_wide<bits<3> opc, string asm,
SDPatternOperator op> {		SDPatternOperator op> {
def _B : sve_int_bin_pred_shift<0b00, 0b1, opc, asm, ZPR8, ZPR64>;		def _B : sve_int_bin_pred_shift<0b00, 0b1, opc, asm, ZPR8, ZPR64>;
def _H : sve_int_bin_pred_shift<0b01, 0b1, opc, asm, ZPR16, ZPR64>;		def _H : sve_int_bin_pred_shift<0b01, 0b1, opc, asm, ZPR16, ZPR64>;
def _S : sve_int_bin_pred_shift<0b10, 0b1, opc, asm, ZPR32, ZPR64>;		def _S : sve_int_bin_pred_shift<0b10, 0b1, opc, asm, ZPR32, ZPR64>;

▲ Show 20 Lines • Show All 2,857 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/O3-pipeline.ll

	Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Remove unreachable machine basic blocks			; CHECK-NEXT: Remove unreachable machine basic blocks
	; CHECK-NEXT: Live Variable Analysis			; CHECK-NEXT: Live Variable Analysis
	; CHECK-NEXT: Eliminate PHI nodes for register allocation			; CHECK-NEXT: Eliminate PHI nodes for register allocation
	; CHECK-NEXT: Two-Address instruction pass			; CHECK-NEXT: Two-Address instruction pass
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Slot index numbering			; CHECK-NEXT: Slot index numbering
	; CHECK-NEXT: Live Interval Analysis			; CHECK-NEXT: Live Interval Analysis
	; CHECK-NEXT: Simple Register Coalescing			; CHECK-NEXT: Simple Register Coalescing
				; CHECK-NEXT: Conditional Early Clobber
	; CHECK-NEXT: Rename Disconnected Subregister Components			; CHECK-NEXT: Rename Disconnected Subregister Components
	; CHECK-NEXT: Machine Instruction Scheduler			; CHECK-NEXT: Machine Instruction Scheduler
	; CHECK-NEXT: Machine Block Frequency Analysis			; CHECK-NEXT: Machine Block Frequency Analysis
	; CHECK-NEXT: Debug Variable Analysis			; CHECK-NEXT: Debug Variable Analysis
	; CHECK-NEXT: Live Stack Slot Analysis			; CHECK-NEXT: Live Stack Slot Analysis
	; CHECK-NEXT: Virtual Register Map			; CHECK-NEXT: Virtual Register Map
	; CHECK-NEXT: Live Register Matrix			; CHECK-NEXT: Live Register Matrix
	; CHECK-NEXT: Bundle Machine CFG Edges			; CHECK-NEXT: Bundle Machine CFG Edges
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.ll

	Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%a_z = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> zeroinitializer			%a_z = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> zeroinitializer
	%out = call <vscale x 2 x i64> @llvm.aarch64.sve.add.nxv2i64(<vscale x 2 x i1> %pg,			%out = call <vscale x 2 x i64> @llvm.aarch64.sve.add.nxv2i64(<vscale x 2 x i1> %pg,
	<vscale x 2 x i64> %a_z,			<vscale x 2 x i64> %a_z,
	<vscale x 2 x i64> %b)			<vscale x 2 x i64> %b)
	ret <vscale x 2 x i64> %out			ret <vscale x 2 x i64> %out
	}			}

				define <vscale x 16 x i8> @add_i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a,
				<vscale x 16 x i8> %b, <vscale x 16 x i8> %passthru) {
				; CHECK-LABEL: add_i8:
				; CHECK: movprfx z2.b, p0/m, z0.b
				; CHECK-NEXT: add z2.b, p0/m, z2.b, z1.b
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %passthru
				sdesmalenUnsubmitted Not Done Reply Inline Actions nit: `the %a_z` name implies that the false lanes are zeroed, perhaps `%a_m` is more appropriate. sdesmalen: nit: `the %a_z` name implies that the false lanes are zeroed, perhaps `%a_m` is more…
				cameron.mcinallyAuthorUnsubmitted Done Reply Inline Actions Good catch. Fixed. cameron.mcinally: Good catch. Fixed.
				%out = call <vscale x 16 x i8> @llvm.aarch64.sve.add.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a_m,
				<vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %out
				}

				define <vscale x 8 x i16> @add_i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a,
				<vscale x 8 x i16> %b, <vscale x 8 x i16> %passthru) {
				; CHECK-LABEL: add_i16
				; CHECK: movprfx z2.h, p0/m, z0.h
				; CHECK-NEXT: add z2.h, p0/m, z2.h, z1.h
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %passthru
				sdesmalenUnsubmitted Not Done Reply Inline Actions I think we should add tests where the %passthru value is `%b` as that should cause it to use the reverse instruction for e.g. `sub -> subr` (or swap the operands in case of `add`). sdesmalen: I think we should add tests where the %passthru value is `%b` as that should cause it to use…
				cameron.mcinallyAuthorUnsubmitted Not Done Reply Inline Actions Added. Also added a guard to prevent `MOVPRFX Zx, p0/m, Zx` from being generated. We might not want that for the p0/z zero merging case, but I haven't hit that yet. cameron.mcinally: Added. Also added a guard to prevent `MOVPRFX Zx, p0/m, Zx` from being generated. We might not…
				%out = call <vscale x 8 x i16> @llvm.aarch64.sve.add.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a_m,
				<vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 4 x i32> @add_i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a,
				<vscale x 4 x i32> %b, <vscale x 4 x i32> %passthru) {
				; CHECK-LABEL: add_i32:
				; CHECK: movprfx z2.s, p0/m, z0.s
				; CHECK-NEXT: add z2.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> %passthru
				%out = call <vscale x 4 x i32> @llvm.aarch64.sve.add.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a_m,
				<vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @add_i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a,
				<vscale x 2 x i64> %b, <vscale x 2 x i64> %passthru) {
				; CHECK-LABEL: add_i64:
				; CHECK: movprfx z2.d, p0/m, z0.d
				; CHECK-NEXT: add z2.d, p0/m, z2.d, z1.d
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> %passthru
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.add.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a_m,
				<vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %out
				}

				define <vscale x 16 x i8> @add_i8_comm(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a,
				<vscale x 16 x i8> %b) {
				; CHECK-LABEL: add_i8_comm:
				; CHECK: add z1.b, p0/m, z1.b, z0.b
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b
				%out = call <vscale x 16 x i8> @llvm.aarch64.sve.add.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a_m,
				<vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %out
				}

				define <vscale x 8 x i16> @add_i16_comm(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a,
				<vscale x 8 x i16> %b) {
				; CHECK-LABEL: add_i16
				; CHECK: add z1.h, p0/m, z1.h, z0.h
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b
				%out = call <vscale x 8 x i16> @llvm.aarch64.sve.add.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a_m,
				<vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 4 x i32> @add_i32_comm(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a,
				<vscale x 4 x i32> %b) {
				; CHECK-LABEL: add_i32_comm:
				; CHECK: add z1.s, p0/m, z1.s, z0.s
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b
				%out = call <vscale x 4 x i32> @llvm.aarch64.sve.add.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a_m,
				<vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @add_i64_comm(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a,
				<vscale x 2 x i64> %b) {
				; CHECK-LABEL: add_i64_comm:
				; CHECK: add z1.d, p0/m, z1.d, z0.d
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.add.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a_m,
				<vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %out
				}

	;			;
	; SUB			; SUB
	;			;

	define <vscale x 16 x i8> @sub_i8_zero(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {			define <vscale x 16 x i8> @sub_i8_zero(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {
	; CHECK-LABEL: sub_i8_zero:			; CHECK-LABEL: sub_i8_zero:
	; CHECK: movprfx z0.b, p0/z, z0.b			; CHECK: movprfx z0.b, p0/z, z0.b
	; CHECK-NEXT: sub z0.b, p0/m, z0.b, z1.b			; CHECK-NEXT: sub z0.b, p0/m, z0.b, z1.b
	Show All 36 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%a_z = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> zeroinitializer			%a_z = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> zeroinitializer
	%out = call <vscale x 2 x i64> @llvm.aarch64.sve.sub.nxv2i64(<vscale x 2 x i1> %pg,			%out = call <vscale x 2 x i64> @llvm.aarch64.sve.sub.nxv2i64(<vscale x 2 x i1> %pg,
	<vscale x 2 x i64> %a_z,			<vscale x 2 x i64> %a_z,
	<vscale x 2 x i64> %b)			<vscale x 2 x i64> %b)
	ret <vscale x 2 x i64> %out			ret <vscale x 2 x i64> %out
	}			}

				define <vscale x 16 x i8> @sub_i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a,
				<vscale x 16 x i8> %b, <vscale x 16 x i8> %passthru) {
				; CHECK-LABEL: sub_i8:
				; CHECK: movprfx z2.b, p0/m, z0.b
				; CHECK-NEXT: sub z2.b, p0/m, z2.b, z1.b
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %passthru
				%out = call <vscale x 16 x i8> @llvm.aarch64.sve.sub.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a_m,
				<vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %out
				}

				define <vscale x 8 x i16> @sub_i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a,
				<vscale x 8 x i16> %b, <vscale x 8 x i16> %passthru) {
				; CHECK-LABEL: sub_i16
				; CHECK: movprfx z2.h, p0/m, z0.h
				; CHECK-NEXT: sub z2.h, p0/m, z2.h, z1.h
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %passthru
				%out = call <vscale x 8 x i16> @llvm.aarch64.sve.sub.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a_m,
				<vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 4 x i32> @sub_i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a,
				<vscale x 4 x i32> %b, <vscale x 4 x i32> %passthru) {
				; CHECK-LABEL: sub_i32:
				; CHECK: movprfx z2.s, p0/m, z0.s
				; CHECK-NEXT: sub z2.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> %passthru
				%out = call <vscale x 4 x i32> @llvm.aarch64.sve.sub.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a_m,
				<vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @sub_i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a,
				<vscale x 2 x i64> %b, <vscale x 2 x i64> %passthru) {
				; CHECK-LABEL: sub_i64:
				; CHECK: movprfx z2.d, p0/m, z0.d
				; CHECK-NEXT: sub z2.d, p0/m, z2.d, z1.d
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> %passthru
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.sub.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a_m,
				<vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %out
				}

	;			;
	; SUBR			; SUBR
	;			;

	define <vscale x 16 x i8> @subr_i8_zero(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {			define <vscale x 16 x i8> @subr_i8_zero(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {
	; CHECK-LABEL: subr_i8_zero:			; CHECK-LABEL: subr_i8_zero:
	; CHECK: movprfx z0.b, p0/z, z0.b			; CHECK: movprfx z0.b, p0/z, z0.b
	; CHECK-NEXT: subr z0.b, p0/m, z0.b, z1.b			; CHECK-NEXT: subr z0.b, p0/m, z0.b, z1.b
	Show All 36 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%a_z = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> zeroinitializer			%a_z = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> zeroinitializer
	%out = call <vscale x 2 x i64> @llvm.aarch64.sve.subr.nxv2i64(<vscale x 2 x i1> %pg,			%out = call <vscale x 2 x i64> @llvm.aarch64.sve.subr.nxv2i64(<vscale x 2 x i1> %pg,
	<vscale x 2 x i64> %a_z,			<vscale x 2 x i64> %a_z,
	<vscale x 2 x i64> %b)			<vscale x 2 x i64> %b)
	ret <vscale x 2 x i64> %out			ret <vscale x 2 x i64> %out
	}			}

				define <vscale x 16 x i8> @subr_i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a,
				<vscale x 16 x i8> %b, <vscale x 16 x i8> %passthru) {
				; CHECK-LABEL: subr_i8:
				; CHECK: movprfx z2.b, p0/m, z0.b
				; CHECK-NEXT: subr z2.b, p0/m, z2.b, z1.b
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %passthru
				%out = call <vscale x 16 x i8> @llvm.aarch64.sve.subr.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a_m,
				<vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %out
				}

				define <vscale x 8 x i16> @subr_i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a,
				<vscale x 8 x i16> %b, <vscale x 8 x i16> %passthru) {
				; CHECK-LABEL: subr_i16
				; CHECK: movprfx z2.h, p0/m, z0.h
				; CHECK-NEXT: subr z2.h, p0/m, z2.h, z1.h
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %passthru
				%out = call <vscale x 8 x i16> @llvm.aarch64.sve.subr.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a_m,
				<vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 4 x i32> @subr_i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a,
				<vscale x 4 x i32> %b, <vscale x 4 x i32> %passthru) {
				; CHECK-LABEL: subr_i32:
				; CHECK: movprfx z2.s, p0/m, z0.s
				; CHECK-NEXT: subr z2.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> %passthru
				%out = call <vscale x 4 x i32> @llvm.aarch64.sve.subr.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a_m,
				<vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @subr_i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a,
				<vscale x 2 x i64> %b, <vscale x 2 x i64> %passthru) {
				; CHECK-LABEL: subr_i64:
				; CHECK: movprfx z2.d, p0/m, z0.d
				; CHECK-NEXT: subr z2.d, p0/m, z2.d, z1.d
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> %passthru
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.subr.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a_m,
				<vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %out
				}

				define <vscale x 16 x i8> @subr_i8_rev(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a,
				<vscale x 16 x i8> %b) {
				; CHECK-LABEL: subr_i8_rev:
				; CHECK-NOT: movprfx
				; CHECK: subr z1.b, p0/m, z1.b, z0.b
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b
				%out = call <vscale x 16 x i8> @llvm.aarch64.sve.sub.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a_m,
				<vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %out
				}

				define <vscale x 8 x i16> @subr_i16_rev(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a,
				<vscale x 8 x i16> %b) {
				; CHECK-LABEL: subr_i16_rev:
				; CHECK-NOT: movprfx
				; CHECK: subr z1.h, p0/m, z1.h, z0.h
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b
				%out = call <vscale x 8 x i16> @llvm.aarch64.sve.sub.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a_m,
				<vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 4 x i32> @subr_i32_rev(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a,
				<vscale x 4 x i32> %b) {
				; CHECK-LABEL: subr_i32_rev:
				; CHECK-NOT: movprfx
				; CHECK: subr z1.s, p0/m, z1.s, z0.s
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b
				%out = call <vscale x 4 x i32> @llvm.aarch64.sve.sub.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a_m,
				<vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @subr_i64_rev(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a,
				<vscale x 2 x i64> %b) {
				; CHECK-LABEL: subr_i64_rev:
				; CHECK-NOT: movprfx
				; CHECK: subr z1.d, p0/m, z1.d, z0.d
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.sub.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a_m,
				<vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %out
				}

	declare <vscale x 16 x i8> @llvm.aarch64.sve.add.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)			declare <vscale x 16 x i8> @llvm.aarch64.sve.add.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
	declare <vscale x 8 x i16> @llvm.aarch64.sve.add.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)			declare <vscale x 8 x i16> @llvm.aarch64.sve.add.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
	declare <vscale x 4 x i32> @llvm.aarch64.sve.add.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)			declare <vscale x 4 x i32> @llvm.aarch64.sve.add.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
	declare <vscale x 2 x i64> @llvm.aarch64.sve.add.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)			declare <vscale x 2 x i64> @llvm.aarch64.sve.add.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)

	declare <vscale x 16 x i8> @llvm.aarch64.sve.sub.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)			declare <vscale x 16 x i8> @llvm.aarch64.sve.sub.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
	declare <vscale x 8 x i16> @llvm.aarch64.sve.sub.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)			declare <vscale x 8 x i16> @llvm.aarch64.sve.sub.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
	declare <vscale x 4 x i32> @llvm.aarch64.sve.sub.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)			declare <vscale x 4 x i32> @llvm.aarch64.sve.sub.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
	declare <vscale x 2 x i64> @llvm.aarch64.sve.sub.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)			declare <vscale x 2 x i64> @llvm.aarch64.sve.sub.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)

	declare <vscale x 16 x i8> @llvm.aarch64.sve.subr.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)			declare <vscale x 16 x i8> @llvm.aarch64.sve.subr.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
	declare <vscale x 8 x i16> @llvm.aarch64.sve.subr.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)			declare <vscale x 8 x i16> @llvm.aarch64.sve.subr.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
	declare <vscale x 4 x i32> @llvm.aarch64.sve.subr.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)			declare <vscale x 4 x i32> @llvm.aarch64.sve.subr.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
	declare <vscale x 2 x i64> @llvm.aarch64.sve.subr.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)			declare <vscale x 2 x i64> @llvm.aarch64.sve.subr.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)

llvm/test/CodeGen/AArch64/sve-intrinsics-shifts-merging.ll

	Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; CHECK: asr z0.s, p0/m, z0.s, z1.d			; CHECK: asr z0.s, p0/m, z0.s, z1.d
	%a_z = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> zeroinitializer			%a_z = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> zeroinitializer
	%out = call <vscale x 4 x i32> @llvm.aarch64.sve.asr.wide.nxv4i32(<vscale x 4 x i1> %pg,			%out = call <vscale x 4 x i32> @llvm.aarch64.sve.asr.wide.nxv4i32(<vscale x 4 x i1> %pg,
	<vscale x 4 x i32> %a_z,			<vscale x 4 x i32> %a_z,
	<vscale x 2 x i64> %b)			<vscale x 2 x i64> %b)
	ret <vscale x 4 x i32> %out			ret <vscale x 4 x i32> %out
	}			}

				define <vscale x 16 x i8> @asr_i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a,
				<vscale x 16 x i8> %b, <vscale x 16 x i8> %passthru) {
				; CHECK-LABEL: asr_i8:
				; CHECK: movprfx z2.b, p0/m, z0.b
				; CHECK-NEXT: asr z2.b, p0/m, z2.b, z1.b
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %passthru
				%out = call <vscale x 16 x i8> @llvm.aarch64.sve.asr.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a_m,
				<vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %out
				}

				define <vscale x 8 x i16> @asr_i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a,
				<vscale x 8 x i16> %b, <vscale x 8 x i16> %passthru) {
				; CHECK-LABEL: asr_i16:
				; CHECK: movprfx z2.h, p0/m, z0.h
				; CHECK-NEXT: asr z2.h, p0/m, z2.h, z1.h
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %passthru
				%out = call <vscale x 8 x i16> @llvm.aarch64.sve.asr.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a_m,
				<vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 4 x i32> @asr_i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a,
				<vscale x 4 x i32> %b, <vscale x 4 x i32> %passthru) {
				; CHECK-LABEL: asr_i32:
				; CHECK: movprfx z2.s, p0/m, z0.s
				; CHECK-NEXT: asr z2.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> %passthru
				%out = call <vscale x 4 x i32> @llvm.aarch64.sve.asr.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a_m,
				<vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @asr_i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a,
				<vscale x 2 x i64> %b, <vscale x 2 x i64> %passthru) {
				; CHECK-LABEL: asr_i64:
				; CHECK: movprfx z2.d, p0/m, z0.d
				; CHECK-NEXT: asr z2.d, p0/m, z2.d, z1.d
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> %passthru
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.asr.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a_m,
				<vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %out
				}

	;			;
	; ASRD			; ASRD
	;			;

	define <vscale x 16 x i8> @asrd_i8_zero(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a) {			define <vscale x 16 x i8> @asrd_i8_zero(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a) {
	; CHECK-LABEL: asrd_i8_zero:			; CHECK-LABEL: asrd_i8_zero:
	; CHECK: movprfx z0.b, p0/z, z0.b			; CHECK: movprfx z0.b, p0/z, z0.b
	; CHECK-NEXT: asrd z0.b, p0/m, z0.b, #1			; CHECK-NEXT: asrd z0.b, p0/m, z0.b, #1
	▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
	; CHECK: lsl z0.s, p0/m, z0.s, z1.d			; CHECK: lsl z0.s, p0/m, z0.s, z1.d
	%a_z = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> zeroinitializer			%a_z = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> zeroinitializer
	%out = call <vscale x 4 x i32> @llvm.aarch64.sve.lsl.wide.nxv4i32(<vscale x 4 x i1> %pg,			%out = call <vscale x 4 x i32> @llvm.aarch64.sve.lsl.wide.nxv4i32(<vscale x 4 x i1> %pg,
	<vscale x 4 x i32> %a_z,			<vscale x 4 x i32> %a_z,
	<vscale x 2 x i64> %b)			<vscale x 2 x i64> %b)
	ret <vscale x 4 x i32> %out			ret <vscale x 4 x i32> %out
	}			}

				define <vscale x 16 x i8> @lsl_i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a,
				<vscale x 16 x i8> %b, <vscale x 16 x i8> %passthru) {
				; CHECK-LABEL: lsl_i8:
				; CHECK: movprfx z2.b, p0/m, z0.b
				; CHECK-NEXT: lsl z2.b, p0/m, z2.b, z1.b
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %passthru
				%out = call <vscale x 16 x i8> @llvm.aarch64.sve.lsl.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a_m,
				<vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %out
				}

				define <vscale x 8 x i16> @lsl_i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a,
				<vscale x 8 x i16> %b, <vscale x 8 x i16> %passthru) {
				; CHECK-LABEL: lsl_i16:
				; CHECK: movprfx z2.h, p0/m, z0.h
				; CHECK-NEXT: lsl z2.h, p0/m, z2.h, z1.h
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %passthru
				%out = call <vscale x 8 x i16> @llvm.aarch64.sve.lsl.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a_m,
				<vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 4 x i32> @lsl_i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a,
				<vscale x 4 x i32> %b, <vscale x 4 x i32> %passthru) {
				; CHECK-LABEL: lsl_i32:
				; CHECK: movprfx z2.s, p0/m, z0.s
				; CHECK-NEXT: lsl z2.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> %passthru
				%out = call <vscale x 4 x i32> @llvm.aarch64.sve.lsl.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a_m,
				<vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @lsl_i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a,
				<vscale x 2 x i64> %b, <vscale x 2 x i64> %passthru) {
				; CHECK-LABEL: lsl_i64:
				; CHECK: movprfx z2.d, p0/m, z0.d
				; CHECK-NEXT: lsl z2.d, p0/m, z2.d, z1.d
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> %passthru
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.lsl.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a_m,
				<vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %out
				}

	;			;
	; LSR			; LSR
	;			;

	define <vscale x 16 x i8> @lsr_i8_zero(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {			define <vscale x 16 x i8> @lsr_i8_zero(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {
	; CHECK-LABEL: lsr_i8_zero:			; CHECK-LABEL: lsr_i8_zero:
	; CHECK: movprfx z0.b, p0/z, z0.b			; CHECK: movprfx z0.b, p0/z, z0.b
	; CHECK-NEXT: lsr z0.b, p0/m, z0.b, z1.b			; CHECK-NEXT: lsr z0.b, p0/m, z0.b, z1.b
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; CHECK: lsr z0.s, p0/m, z0.s, z1.d			; CHECK: lsr z0.s, p0/m, z0.s, z1.d
	%a_z = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> zeroinitializer			%a_z = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> zeroinitializer
	%out = call <vscale x 4 x i32> @llvm.aarch64.sve.lsr.wide.nxv4i32(<vscale x 4 x i1> %pg,			%out = call <vscale x 4 x i32> @llvm.aarch64.sve.lsr.wide.nxv4i32(<vscale x 4 x i1> %pg,
	<vscale x 4 x i32> %a_z,			<vscale x 4 x i32> %a_z,
	<vscale x 2 x i64> %b)			<vscale x 2 x i64> %b)
	ret <vscale x 4 x i32> %out			ret <vscale x 4 x i32> %out
	}			}

				define <vscale x 16 x i8> @lsr_i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a,
				<vscale x 16 x i8> %b, <vscale x 16 x i8> %passthru) {
				; CHECK-LABEL: lsr_i8:
				; CHECK: movprfx z2.b, p0/m, z0.b
				; CHECK-NEXT: lsr z2.b, p0/m, z2.b, z1.b
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %passthru
				%out = call <vscale x 16 x i8> @llvm.aarch64.sve.lsr.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a_m,
				<vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %out
				}

				define <vscale x 8 x i16> @lsr_i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a,
				<vscale x 8 x i16> %b, <vscale x 8 x i16> %passthru) {
				; CHECK-LABEL: lsr_i16:
				; CHECK: movprfx z2.h, p0/m, z0.h
				; CHECK-NEXT: lsr z2.h, p0/m, z2.h, z1.h
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %passthru
				%out = call <vscale x 8 x i16> @llvm.aarch64.sve.lsr.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a_m,
				<vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 4 x i32> @lsr_i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a,
				<vscale x 4 x i32> %b, <vscale x 4 x i32> %passthru) {
				; CHECK-LABEL: lsr_i32:
				; CHECK: movprfx z2.s, p0/m, z0.s
				; CHECK-NEXT: lsr z2.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> %passthru
				%out = call <vscale x 4 x i32> @llvm.aarch64.sve.lsr.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a_m,
				<vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @lsr_i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a,
				<vscale x 2 x i64> %b, <vscale x 2 x i64> %passthru) {
				; CHECK-LABEL: lsr_i64:
				; CHECK: movprfx z2.d, p0/m, z0.d
				; CHECK-NEXT: lsr z2.d, p0/m, z2.d, z1.d
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%a_m = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> %passthru
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.lsr.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a_m,
				<vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %out
				}

	declare <vscale x 16 x i8> @llvm.aarch64.sve.asr.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)			declare <vscale x 16 x i8> @llvm.aarch64.sve.asr.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
	declare <vscale x 8 x i16> @llvm.aarch64.sve.asr.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)			declare <vscale x 8 x i16> @llvm.aarch64.sve.asr.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
	declare <vscale x 4 x i32> @llvm.aarch64.sve.asr.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)			declare <vscale x 4 x i32> @llvm.aarch64.sve.asr.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
	declare <vscale x 2 x i64> @llvm.aarch64.sve.asr.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)			declare <vscale x 2 x i64> @llvm.aarch64.sve.asr.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)

	declare <vscale x 16 x i8> @llvm.aarch64.sve.asr.wide.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 2 x i64>)			declare <vscale x 16 x i8> @llvm.aarch64.sve.asr.wide.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 2 x i64>)
	declare <vscale x 8 x i16> @llvm.aarch64.sve.asr.wide.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 2 x i64>)			declare <vscale x 8 x i16> @llvm.aarch64.sve.asr.wide.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 2 x i64>)
	declare <vscale x 4 x i32> @llvm.aarch64.sve.asr.wide.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 2 x i64>)			declare <vscale x 4 x i32> @llvm.aarch64.sve.asr.wide.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 2 x i64>)
	Show All 23 Lines

llvm/test/CodeGen/AArch64/sve-movprfx-merging.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s


				define <vscale x 4 x float> @fsub_merge_z0_z0_z0(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0) {
				; CHECK-LABEL: fsub_merge_z0_z0_z0
				; CHECK: fsub z0.s, p0/m, z0.s, z0.s
				; CHECK-NEXT: ret
				%z0_in = select <vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> %z0
				%sub = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %z0_in,
				<vscale x 4 x float> %z0)
				ret <vscale x 4 x float> %sub
				}

				define <vscale x 4 x float> @fsub_merge_z0_z1(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0,
				<vscale x 4 x float> %z1, <vscale x 4 x float> %pt) {
				; CHECK-LABEL: fsub_merge_z0_z1
				; CHECK: movprfx z2.s, p0/m, z0.s
				; CHECK-NEXT: fsub z2.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%z0_in = select <vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> %pt
				%sub = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %z0_in,
				<vscale x 4 x float> %z1)
				ret <vscale x 4 x float> %sub
				}

				define <vscale x 4 x float> @fsub_merge_reuse_z0_z1_pt(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0,
				<vscale x 4 x float> %z1, <vscale x 4 x float> %pt) {
				; CHECK-LABEL: fsub_merge_reuse_z0_z1_pt
				; CHECK: mov z3.d, z2.d
				; CHECK: movprfx z3.s, p0/m, z0.s
				; CHECK-NEXT: fsub z3.s, p0/m, z3.s, z1.s
				; CHECK: movprfx z2.s, p0/m, z0.s
				; CHECK-NEXT: fsub z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%z0_in = select <vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> %pt
				%sub = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %z0_in,
				<vscale x 4 x float> %z1)
				%sub2 = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %z0_in,
				<vscale x 4 x float> %sub)
				ret <vscale x 4 x float> %sub2
				}

				define <vscale x 4 x float> @fsub_merge_reuse2_z0_z1_pt(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0,
				<vscale x 4 x float> %z1, <vscale x 4 x float> %pt) {
				; CHECK-LABEL: fsub_merge_reuse2_z0_z1_pt
				; CHECK: sel z3.s, p0, z0.s, z2.s
				; CHECK: movprfx z2.s, p0/m, z0.s
				; CHECK-NEXT: fsub z2.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: fsub z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%z0_in = select <vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> %pt
				%sub = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %z0_in,
				<vscale x 4 x float> %z1)
				%sub2 = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %sub,
				<vscale x 4 x float> %z0_in)
				ret <vscale x 4 x float> %sub2
				}

				define <vscale x 4 x float> @fsub_merge_z0_z1_pt_reuse(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0,
				<vscale x 4 x float> %z1, <vscale x 4 x float> %pt) {
				; CHECK-LABEL: fsub_merge_z0_z1_pt
				; CHECK: mov z3.d, z2.d
				; CHECK: movprfx z3.s, p0/m, z0.s
				; CHECK-NEXT: fsub z3.s, p0/m, z3.s, z1.s
				; CHECK-NEXT: fsub z3.s, p0/m, z3.s, z2.s
				; CHECK-NEXT: mov z0.d, z3.d
				; CHECK-NEXT: ret
				%z0_in = select <vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> %pt
				%sub = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %z0_in,
				<vscale x 4 x float> %z1)
				%sub2 = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %sub,
				<vscale x 4 x float> %pt)
				ret <vscale x 4 x float> %sub2
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)

llvm/test/CodeGen/AArch64/sve-movprfx-zeroing.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s

				define <vscale x 4 x float> @fsub_zero_z0_z0(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0) {
				; CHECK-LABEL: fsub_zero_z0_z0
				; CHECK: movprfx z1.s, p0/z, z0.s
				; CHECK-NEXT: fsub z1.s, p0/m, z1.s, z0.s
				; CHECK-NEXT: z0.d, z1.d
				; CHECK-NEXT: ret
				%z0_in = select <vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> zeroinitializer
				%sub = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %z0_in,
				<vscale x 4 x float> %z0)
				ret <vscale x 4 x float> %sub
				}

				define <vscale x 4 x float> @fsub_zero_z0_z1(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0,
				<vscale x 4 x float> %z1) {
				; CHECK-LABEL: fsub_zero_z0_z1
				; CHECK: movprfx z0.s, p0/z, z0.s
				; CHECK-NEXT: fsub z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				%z0_in = select <vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> zeroinitializer
				%sub = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %z0_in,
				<vscale x 4 x float> %z1)
				ret <vscale x 4 x float> %sub
				}

				define <vscale x 4 x float> @fsub_zero_z0_reuse_z01(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> %z1) {
				; CHECK-LABEL: fsub_zero_z0_reuse_z01
				; CHECK: movprfx z1.s, p0/z, z1.s
				; CHECK-NEXT: fsubr z1.s, p0/m, z1.s, z0.s
				; CHECK-NEXT: fsub z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				%z0_in = select <vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> zeroinitializer
				%sub = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %z0_in,
				<vscale x 4 x float> %z1)
				%sub2 = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %z0,
				<vscale x 4 x float> %sub)
				ret <vscale x 4 x float> %sub2
				}

				define <vscale x 4 x float> @fsub_zero_z0_z0_fsub_zero_z0_z0(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0) {
				; CHECK-LABEL: fsub_zero_z0_z0_fsub_zero_z0_z0
				; CHECK: movprfx z1.s, p0/z, z0.s
				; CHECK-NEXT: fsub z1.s, p0/m, z1.s, z0.s
				; CHECK-NEXT: fsub z1.s, p0/m, z1.s, z1.s
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				%z0_in = select <vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> zeroinitializer
				%sub = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %z0_in,
				<vscale x 4 x float> %z0)
				%sub2 = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %z0_in,
				<vscale x 4 x float> %z0)
				%sub3 = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %sub,
				<vscale x 4 x float> %sub2)
				ret <vscale x 4 x float> %sub3
				}

				define <vscale x 4 x float> @fsub_zero_z0_z1_fsub_zero_z0_z2(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0,
				<vscale x 4 x float> %z1, <vscale x 4 x float> %z2) {
				; CHECK-LABEL: fsub_zero_z0_z1_fsub_zero_z0_z2
				; CHECK: movprfx z1.s, p0/z, z1.s
				; CHECK-NEXT: fsubr z1.s, p0/m, z1.s, z0.s
				; CHECK-NEXT: movprfx z0.s, p0/z, z0.s
				; CHECK-NEXT: fsub z0.s, p0/m, z0.s, z2.s
				; CHECK-NEXT: fsub z1.s, p0/m, z1.s, z0.s
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				%z0_in = select <vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> zeroinitializer
				%sub = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0_in, <vscale x 4 x float> %z1)
				%sub2 = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0_in, <vscale x 4 x float> %z2)
				%sub3 = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p, <vscale x 4 x float> %sub, <vscale x 4 x float> %sub2)
				ret <vscale x 4 x float> %sub3
				}

				define <vscale x 4 x float> @fsub_zero_z0_z0_reuse(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0) {
				; CHECK-LABEL: fsub_zero_z0_z0_reuse
				; CHECK: movprfx z1.s, p0/z, z0.s
				; CHECK-NEXT: fsub z1.s, p0/m, z1.s, z0.s
				; CHECK-NEXT: fsub z1.s, p0/m, z1.s, z0.s
				; CHECK-NEXT: z0.d, z1.d
				; CHECK-NEXT: ret
				%z0_in = select <vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> zeroinitializer
				%sub = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %z0_in,
				<vscale x 4 x float> %z0)
				%sub2 = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %sub,
				<vscale x 4 x float> %z0)
				ret <vscale x 4 x float> %sub2
				}

				define <vscale x 4 x float> @fsub_zero_reuse_z0_z0(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0) {
				; CHECK-LABEL: fsub_zero_reuse_z0_z0
				; CHECK: mov z2.s, #0
				; CHECK-NEXT: sel z3.s, p0, z0.s, z2.s
				; CHECK-NEXT: movprfx z1.s, p0/z, z0.s
				; CHECK-NEXT: fsub z1.s, p0/m, z1.s, z0.s
				; CHECK-NEXT: fsub z1.s, p0/m, z1.s, z3.s
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				%z0_in = select <vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> zeroinitializer
				%sub = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %z0_in,
				<vscale x 4 x float> %z0)
				%sub2 = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p,
				<vscale x 4 x float> %sub,
				<vscale x 4 x float> %z0_in)
				ret <vscale x 4 x float> %sub2
				}

				declare <vscale x 8 x half> @llvm.aarch64.sve.fsub.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 2 x double> @llvm.aarch64.sve.fsub.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)

This is an archive of the discontinued LLVM Phabricator instance.

[WIP][SVE] Prototype for general merging MOVPRFX support.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 267722

llvm/include/llvm/CodeGen/TargetPassConfig.h

llvm/lib/CodeGen/TargetPassConfig.cpp

llvm/lib/Target/AArch64/AArch64.h

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

llvm/lib/Target/AArch64/AArch64InstrFormats.td

llvm/lib/Target/AArch64/AArch64InstrInfo.h

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp

llvm/lib/Target/AArch64/CMakeLists.txt

llvm/lib/Target/AArch64/SVEConditionalEarlyClobberPass.cpp

llvm/lib/Target/AArch64/SVEInstrFormats.td

llvm/test/CodeGen/AArch64/O3-pipeline.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-shifts-merging.ll

llvm/test/CodeGen/AArch64/sve-movprfx-merging.ll

llvm/test/CodeGen/AArch64/sve-movprfx-zeroing.ll

[WIP][SVE] Prototype for general merging MOVPRFX support.
AbandonedPublic