This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
21/23
AArch64ExpandPseudoInsts.cpp
-
AArch64SVEInstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-intrinsics-int-arith-merging.ll
-
sve-intrinsics-int-arith-merging.mir

Differential D124325

[AArch64][SVE] Support logical operation BIC with DestructiveBinary patterns
ClosedPublic

Authored by Allen on Apr 23 2022, 2:23 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
dmgreen
thakis
efriedma
dancgr
david-arm

Summary

Logical operation BIC with DestructiveBinary patterns is temporarily removed as
causes an assert (commit 3c382ed71f15), so try to fix that.
The most significant being that for pseudo instructions that do not have real instructions (including movpfx'd ones) that cover all combinations of register allocation, their expansion will be broken. This is the main reason the zeroing is an experimental feature because it has known bugs.
So we add an extra LSL for movprfx expand BIC_ZPZZ_ZERO A, P, A, A when necessary.

movprfx	z0.s, p0/z, z0.s
lsl z0.b, p0/m, z0.b, #0
bic	z0.s, p0/m, z0.s, z0.s

Depends on D88595

Diff Detail

Event Timeline

Allen created this revision.Apr 23 2022, 2:23 AM

Herald added a reviewer: efriedma. · View Herald TranscriptApr 23 2022, 2:23 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: psnobl, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

Allen requested review of this revision.Apr 23 2022, 2:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2022, 2:23 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

add a new test case

Harbormaster completed remote builds in B161020: Diff 424709.Apr 23 2022, 3:26 AM

Hi @Allen, you might be interested in D88595.

This is the main reason the zeroing is an experimental feature because it has known bugs. The most significant being that for pseudo instructions that do not have real instructions (including movpfx'd ones) that cover all combinations of register allocation, their expansion will be broken. I believe BIC is an example of this because there's no way to movprfx expand BIC_ZPZZ_ZERO A, P, A, A. An unrealistic usage I know but it's still legitamet.

define <vscale x 4 x i32> @foo(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) #0 {
entry:
  %t1 = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> zeroinitializer
  %t2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.bic.nxv4i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %t1, <vscale x 4 x i32> %a)
  ret <vscale x 4 x i32> %t2
}

This will trigger an assert, even after this patch, because otherwise it would emit

movprfx	z0.s, p0/z, z0.s
bic	z0.s, p0/m, z0.s, z0.s

which is not a valid use of movprfx.

For ACfL(https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-compiler-for-linux) we solve this using a couple of machine function passes but I feel they're not the correct solution hence the above patch where we'd rather have a mechanism to better express the register requirements as part of the pseudo's definition.

Allen added a reviewer: dancgr.Apr 23 2022, 5:59 PM

In D124325#3469815, @paulwalker-arm wrote:
Hi @Allen, you might be interested in D88595.

This is the main reason the zeroing is an experimental feature because it has known bugs. The most significant being that for pseudo instructions that do not have real instructions (including movpfx'd ones) that cover all combinations of register allocation, their expansion will be broken. I believe BIC is an example of this because there's no way to movprfx expand BIC_ZPZZ_ZERO A, P, A, A. An unrealistic usage I know but it's still legitamet.
define <vscale x 4 x i32> @foo(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) #0 {
entry:
  %t1 = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> zeroinitializer
  %t2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.bic.nxv4i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %t1, <vscale x 4 x i32> %a)
  ret <vscale x 4 x i32> %t2
}
This will trigger an assert, even after this patch, because otherwise it would emit
movprfx	z0.s, p0/z, z0.s
bic	z0.s, p0/m, z0.s, z0.s
which is not a valid use of movprfx.

For ACfL(https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-compiler-for-linux) we solve this using a couple of machine function passes but I feel they're not the correct solution hence the above patch where we'd rather have a mechanism to better express the register requirements as part of the pseudo's definition.

Thanks @paulwalker-arm for detail explaning the issue of of register allocation. After read the D88595, I think there is two candidate ways to address this, do I missing something ？

1、the instruction bic is special, so when all the registers are same, it should be expansion with zero  (i.e. BIC (A, A) = A & ~A = 0)

2、make use of  not_all_same, and  {not_all_same(Dst, Op0, Op1), earlyclobber(Dst)} can be simplified as {earlyclobber(Dst)}, which can guard the registers without all same.

Allen added a reviewer: david-arm.Apr 25 2022, 7:42 PM

I don't think either of these are good options. I don't like the idea of having instruction specific handling within the expand code and I'm not sure where the logic from (2) comes from but regardless using earlyclobber in this way will force Dst to be unique and thus forces the need for movpfrx (i.e. it blocks the BIC_ZPZZ_ZERO A, P, A, B use case ).

To my mind the best option is to allow the register requirements to be better expressed as part of the pseudo instruction. Until then I cannot see how we can move this feature out of its experimental phase. Perhaps there's an alternative to D88595 but I've not thought about it for a while. How important is movpfrx based zeroing to you right now? Have you got a problematic use case you can share?

In D124325#3475793, @paulwalker-arm wrote:

I don't think either of these are good options. I don't like the idea of having instruction specific handling within the expand code and I'm not sure where the logic from (2) comes from but regardless using earlyclobber in this way will force Dst to be unique and thus forces the need for movpfrx (i.e. it blocks the BIC_ZPZZ_ZERO A, P, A, B use case ).

To my mind the best option is to allow the register requirements to be better expressed as part of the pseudo instruction. Until then I cannot see how we can move this feature out of its experimental phase. Perhaps there's an alternative to D88595 but I've not thought about it for a while. How important is movpfrx based zeroing to you right now? Have you got a problematic use case you can share?

Thanks, I'll wait for D88595 , and I'm not in a hurry for this, but only confirm the idea.
BTW, the logic from (2) comes from the comment https://reviews.llvm.org/D88595#inline-886782

Allen retitled this revision from [AArch64][SVE] Support logical operation BIC with DestructiveBinary patterns to [WIP][AArch64][SVE] Support logical operation BIC with DestructiveBinary patterns.Apr 26 2022, 9:28 PM

Allen edited the summary of this revision. (Show Details)Apr 26 2022, 9:28 PM

Allen added a parent revision: D88595: [TableGen] Add not_all_same constraint check.

Matt added a subscriber: Matt.Apr 27 2022, 11:47 AM

I don't like the idea of having instruction specific handling within the expand code

Would this really be so terrible? I mean, it's arguably a bit of a hack, but it's not that different from the way we handle other pseudo-instructions.

In D124325#3481275, @efriedma wrote:

I don't like the idea of having instruction specific handling within the expand code

Would this really be so terrible? I mean, it's arguably a bit of a hack, but it's not that different from the way we handle other pseudo-instructions.

I think so. Pseudo-instruction expansion is often instruction specific but for the movprfx handling we've detached the logic from the instructions (because there's 100s of them) and instead split them across various categories. So I'd much rather see problems solved for a whole category rather than partially for a single instruction within a category.

In D124325#3484543, @paulwalker-arm wrote:

In D124325#3481275, @efriedma wrote:

I don't like the idea of having instruction specific handling within the expand code

Would this really be so terrible? I mean, it's arguably a bit of a hack, but it's not that different from the way we handle other pseudo-instructions.

I think so. Pseudo-instruction expansion is often instruction specific but for the movprfx handling we've detached the logic from the instructions (because there's 100s of them) and instead split them across various categories. So I'd much rather see problems solved for a whole category rather than partially for a single instruction within a category.

I see what you mean. And I guess some of the instructions don't have any sort of "identity" result like this.

I guess you could use an alternative sequence. I can't come up with a two-instruction sequence, but I guess you can movprfx a dummy instruction, like movprfx z0.b, p0/z, z0.b; add z0.b, z0.b, #0; bic z0.b, p0/m, z0.b, z0.b.

In D124325#3485956, @efriedma wrote:

In D124325#3484543, @paulwalker-arm wrote:

In D124325#3481275, @efriedma wrote:

I don't like the idea of having instruction specific handling within the expand code

Would this really be so terrible? I mean, it's arguably a bit of a hack, but it's not that different from the way we handle other pseudo-instructions.

I think so. Pseudo-instruction expansion is often instruction specific but for the movprfx handling we've detached the logic from the instructions (because there's 100s of them) and instead split them across various categories. So I'd much rather see problems solved for a whole category rather than partially for a single instruction within a category.

I see what you mean. And I guess some of the instructions don't have any sort of "identity" result like this.

I guess you could use an alternative sequence. I can't come up with a two-instruction sequence, but I guess you can movprfx a dummy instruction, like movprfx z0.b, p0/z, z0.b; add z0.b, z0.b, #0; bic z0.b, p0/m, z0.b, z0.b.

As D88595 will not be accepted in a short time, so I'll try with your idea.
Before doing so, I want to check your suggestion:
do you mean the additonal add z0.b, z0.b, #0; should match the above movprfx instuction(meet the constraints of movprfx instruction), then the bic instruction will get "identity" result?

The sequence movprfx z0.b, p0/z, z0.b; add z0.b, z0.b, #0; zeros the lanes in z0 that aren't active in p0. (It's a slightly weird way to write it, but as far as I know there isn't any single instruction with equivalent semantics.) Then we can just use the regular instruction that leaves the lanes we just zeroed unmodified. This works regardless of the operation (as long as it doesn't do cross-lane processing).

The "identity" thing is just noting that if the two operands to bic are identical, the result is always zero. But as noted before, that doesn't generalize to other operations.

Add additional add z0.b, z0.b, #0 according comment

Harbormaster completed remote builds in B192500: Diff 468206.Oct 17 2022, 8:05 AM

I think you're missing the point... we only want to insert the extra add instruction if the movprfx would be otherwise be illegal.

Add additional add z0.b, z0.b, #0 when necessary

Harbormaster completed remote builds in B192753: Diff 468545.Oct 18 2022, 8:01 AM

paulwalker-arm added inline comments.Oct 19 2022, 10:31 AM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
590	I don't believe this is safe because only predicated instructions are allowed to follow a predicated `movprfx` instruction. There's a section within https://developer.arm.com/documentation/ddi0487/latest/ Data processing - SVE -> Move operations --> Move prefix that details which instructions are allowed to follow a `movprfx`.
590–598	Do any of the tests exercise this code?

efriedma added inline comments.Oct 19 2022, 10:39 AM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
590	Oh, oops, I misread the documentation. Can we use `lsl z0.b, p0/m, z0.b, #0` instead?

paulwalker-arm added inline comments.Oct 19 2022, 10:48 AM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
590	I think so but will check if there's an architecture preferred answer and will report back.

Allen removed a parent revision: D88595: [TableGen] Add not_all_same constraint check.Oct 20 2022, 1:48 AM

Allen added inline comments.Oct 20 2022, 6:05 PM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
590	Thanks @paulwalker-arm and @efriedma. If we don't have a better architecture preferred answer, how about using lsl z0.b, p0/m, z0.b, #0 first? In fact, there are very few scenarios where this additional instruction is required, such as case bic_i64_zero_no_comm.

paulwalker-arm added inline comments.Oct 21 2022, 2:49 AM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
590	lsl is not a great fit for current implementations but as you say this will only be used in rare cases that we ultimately want to solve during register allocation anyway. Plus, looking over the instruction set I'm not sure there's a non-shift alternative so yes let's go with `lsl z0.b, p0/m, z0.b, #0`.

Update add z0.b, z0.b, #0 with lsl z0.b, p0/m, z0.b, #0 as comment

Harbormaster completed remote builds in B193706: Diff 469851.Oct 21 2022, 8:12 PM

Allen marked 5 inline comments as done.Oct 21 2022, 8:12 PM

Allen added inline comments.

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
590	Done, thanks for your suggestion.

Allen removed a parent revision: D88595: [TableGen] Add not_all_same constraint check.Oct 21 2022, 8:14 PM

Allen edited the summary of this revision. (Show Details)Oct 23 2022, 7:09 AM

Allen added a parent revision: D88595: [TableGen] Add not_all_same constraint check.

ping ?

Still missing a testcase that actually triggers the "lsl" codepath.

Add a mir case to trigger the "lsl" codepath.

Harbormaster completed remote builds in B194094: Diff 470369.Oct 24 2022, 10:03 PM

Allen removed a parent revision: D88595: [TableGen] Add not_all_same constraint check.Oct 27 2022, 9:06 AM

any new suggestion about the last update? Thanks.

ping ?

paulwalker-arm added inline comments.Nov 1 2022, 11:34 AM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
495–499	Rather than forcing `DOPRegIsUnique` to an incorrect value perhaps this switch serves a different purpose and at least one[1] of the matching asserts are just not relevant anymore. Your fix is essentially saying "if we cannot prefix the requested instruction we'll instead emit a prefixed_zeroing_mov". Which suggests this fixes the problem for all `DType`s and thus we no longer require `DOPRegIsUnique == true` for correctness (although it is a hint to slightly poor code generation). [1] I say at least one because for this patch you only care about the zering case so really the `Create the additional LSL to zero` code belongs in the `if (FalseZero)` block and thus the other `DOPRegIsUnique` assert remains valid. That said we can always emit a normal COPY/MOV for the `!FalseZero` case which it one instruction rather than two. However, that's not a scenario that can occur with the current code so you wouldn't be able to write tests and thus best avoided.
591–592	If you keep the above code then does `if (!DOPRegIsUnique)` work here? and as mentioned I believe it should sit with the `if (FalseZero)` block.

address comments

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
495–499	Done, delete the code about forcing DOPRegIsUnique to an incorrect value, thanks. This is version is expect to fix the DestructiveBinary only to begin with.
591–592	No, the above code is only active when the #ifndef NDEBUG is true, so it is depend on our configue. Apply your comment and move into the if (FalseZero) block, thanks

Harbormaster completed remote builds in B195644: Diff 472527.Nov 2 2022, 1:14 AM

paulwalker-arm added inline comments.Nov 2 2022, 7:34 AM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
591–592	Sure, but what I'm suggesting is that the code is no longer `DEBUG` only as we now have a real world use for it.

@Allen It looks like something has gone wrong because the latest version looks like a different piece of work, unrelated to movprfx.

Allen removed a parent revision: D88595: [TableGen] Add not_all_same constraint check.Nov 2 2022, 7:47 AM

update to fix the different piece of work

Harbormaster completed remote builds in B195705: Diff 472614.Nov 2 2022, 8:02 AM

In D124325#3902259, @paulwalker-arm wrote:

@Allen It looks like something has gone wrong because the latest version looks like a different piece of work, unrelated to movprfx.

Oh. sorry for the mistake, and now it is the right version

Allen removed a parent revision: D88595: [TableGen] Add not_all_same constraint check.Nov 2 2022, 8:04 AM

ping ?

@Allen Did you see my comment from "Wed, Nov 2, 2:34 PM"? Not a requirement but I figure moving this code out of DEBUG and ensuring DOPRegIsUnique is correct for AArch64::DestructiveBinary might be better in the long run. I know at this time you only care about DestructiveBinary, but I think your solution can easily be extended to the other destructive types later on, although part of me thinks they'll just work after this patch.

Delete the NDEBUG as comment, thanks

Harbormaster completed remote builds in B196464: Diff 473641.Nov 7 2022, 5:59 AM

Allen removed a parent revision: D88595: [TableGen] Add not_all_same constraint check.Nov 7 2022, 6:00 AM

In D124325#3911789, @paulwalker-arm wrote:

@Allen Did you see my comment from "Wed, Nov 2, 2:34 PM"? Not a requirement but I figure moving this code out of DEBUG and ensuring DOPRegIsUnique is correct for AArch64::DestructiveBinary might be better in the long run. I know at this time you only care about DestructiveBinary, but I think your solution can easily be extended to the other destructive types later on, although part of me thinks they'll just work after this patch.

sorry @paulwalker-arm for missing that. Now I deleted the DEBUG.

paulwalker-arm added inline comments.Nov 9 2022, 10:39 AM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
454–455	Is it possible to move this logic into the `DOPRegIsUnique` switch statement like case AArch64::DestructiveBinary: DOPRegIsUnique = DstReg != MI.getOperand(SrcIdx).getReg();
568	I think this reads better as `DOPRegIsUnique \|\| AArch64::DestructiveBinary == DType` because the second part os the exception.
582–583	With my previous suggestion does if (DType == AArch64::DestructiveBinary && !DOPRegIsUnique) work here?

Address comment

Allen removed a parent revision: D88595: [TableGen] Add not_all_same constraint check.Nov 9 2022, 11:34 PM

Harbormaster completed remote builds in B197029: Diff 474456.Nov 9 2022, 11:35 PM

Allen marked an inline comment as done.Nov 9 2022, 11:35 PM

Allen added inline comments.

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
454–455	Apply your comment, thanks
568	Done
582–583	Yes, it works with your previous suggestion 'DOPRegIsUnique = DstReg != MI.getOperand(SrcIdx).getReg();'

paulwalker-arm accepted this revision.Nov 10 2022, 9:39 AM

This revision is now accepted and ready to land.Nov 10 2022, 9:39 AM

closed with commit ffb109b6852d248c9d2e3202477dccf20aac7151

rupprecht added a subscriber: rupprecht.Nov 10 2022, 9:04 PM

rupprecht added inline comments.

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
496	Is fall through intended here? I assume you want a `break;`? llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp:495:3: warning: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough] case AArch64::DestructiveBinaryComm:

Allen marked an inline comment as done.Nov 10 2022, 10:06 PM

Allen added inline comments.

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
496	Yes, I'll add break, thanks

rupprecht added inline comments.Nov 10 2022, 10:07 PM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
496	Added a break in 094c0eccdf959c3b9c85219e33c3fcfbab024b61 to avoid fallthrough. Please take a look if that's not what you want.

rupprecht added inline comments.Nov 10 2022, 10:14 PM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
496	Ah, race condition -- I applied the break at the same time you responded. Glad I assumed correctly :)

Allen marked 3 inline comments as done.Nov 10 2022, 10:19 PM

Allen added inline comments.

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
496	Thanks for fixing.

Allen mentioned this in D141471: [AArch64][SVE] Fix crash for DestructiveBinaryComm zero merging.Jan 11 2023, 2:12 AM

Allen mentioned this in rG2deb10c10842: [AArch64][SVE] Fix crash for DestructiveBinaryComm zero merging.Jan 17 2023, 4:46 AM

Allen added a child revision: D141471: [AArch64][SVE] Fix crash for DestructiveBinaryComm zero merging.Jan 17 2023, 5:15 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ExpandPseudoInsts.cpp

27 lines

AArch64SVEInstrInfo.td

2 lines

test/

CodeGen/

AArch64/

sve-intrinsics-int-arith-merging.ll

45 lines

sve-intrinsics-int-arith-merging.mir

39 lines

Diff 470369

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp

Show First 20 Lines • Show All 445 Lines • ▼ Show 20 Lines	bool AArch64ExpandPseudo::expand_DestructiveOp(
unsigned Opcode = AArch64::getSVEPseudoMap(MI.getOpcode());		unsigned Opcode = AArch64::getSVEPseudoMap(MI.getOpcode());
uint64_t DType = TII->get(Opcode).TSFlags & AArch64::DestructiveInstTypeMask;		uint64_t DType = TII->get(Opcode).TSFlags & AArch64::DestructiveInstTypeMask;
uint64_t FalseLanes = MI.getDesc().TSFlags & AArch64::FalseLanesMask;		uint64_t FalseLanes = MI.getDesc().TSFlags & AArch64::FalseLanesMask;
bool FalseZero = FalseLanes == AArch64::FalseLanesZero;		bool FalseZero = FalseLanes == AArch64::FalseLanesZero;

Register DstReg = MI.getOperand(0).getReg();		Register DstReg = MI.getOperand(0).getReg();
bool DstIsDead = MI.getOperand(0).isDead();		bool DstIsDead = MI.getOperand(0).isDead();

if (DType == AArch64::DestructiveBinary)
assert(DstReg != MI.getOperand(3).getReg());
paulwalker-armUnsubmitted Done Reply Inline Actions Is it possible to move this logic into the `DOPRegIsUnique` switch statement like case AArch64::DestructiveBinary: DOPRegIsUnique = DstReg != MI.getOperand(SrcIdx).getReg(); paulwalker-arm: Is it possible to move this logic into the `DOPRegIsUnique` switch statement like ``` case…
AllenAuthorUnsubmitted Done Reply Inline Actions Apply your comment, thanks Allen: Apply your comment, thanks

bool UseRev = false;		bool UseRev = false;
unsigned PredIdx, DOPIdx, SrcIdx, Src2Idx;		unsigned PredIdx, DOPIdx, SrcIdx, Src2Idx;
switch (DType) {		switch (DType) {
case AArch64::DestructiveBinaryComm:		case AArch64::DestructiveBinaryComm:
case AArch64::DestructiveBinaryCommWithRev:		case AArch64::DestructiveBinaryCommWithRev:
if (DstReg == MI.getOperand(3).getReg()) {		if (DstReg == MI.getOperand(3).getReg()) {
// FSUB Zd, Pg, Zs1, Zd ==> FSUBR Zd, Pg/m, Zd, Zs1		// FSUB Zd, Pg, Zs1, Zd ==> FSUBR Zd, Pg/m, Zd, Zs1
std::tie(PredIdx, DOPIdx, SrcIdx) = std::make_tuple(1, 3, 2);		std::tie(PredIdx, DOPIdx, SrcIdx) = std::make_tuple(1, 3, 2);
Show All 25 Lines	bool AArch64ExpandPseudo::expand_DestructiveOp(
}		}

#ifndef NDEBUG		#ifndef NDEBUG
// MOVPRFX can only be used if the destination operand		// MOVPRFX can only be used if the destination operand
// is the destructive operand, not as any other operand,		// is the destructive operand, not as any other operand,
// so the Destructive Operand must be unique.		// so the Destructive Operand must be unique.
bool DOPRegIsUnique = false;		bool DOPRegIsUnique = false;
switch (DType) {		switch (DType) {
		case AArch64::DestructiveBinary:
		// Don't check the SrcIdx for DOPRegIsUnique to avoid the crash, will be
		rupprechtUnsubmitted Done Reply Inline Actions Is fall through intended here? I assume you want a `break;`? llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp:495:3: warning: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough] case AArch64::DestructiveBinaryComm: rupprecht: Is fall through intended here? I assume you want a `break;`? ```…
		rupprechtUnsubmitted Done Reply Inline Actions Added a break in 094c0eccdf959c3b9c85219e33c3fcfbab024b61 to avoid fallthrough. Please take a look if that's not what you want. rupprecht: Added a break in 094c0eccdf959c3b9c85219e33c3fcfbab024b61 to avoid fallthrough. Please take a…
		AllenAuthorUnsubmitted Done Reply Inline Actions Yes, I'll add break, thanks Allen: Yes, I'll add break, thanks
		rupprechtUnsubmitted Not Done Reply Inline Actions Ah, race condition -- I applied the break at the same time you responded. Glad I assumed correctly :) rupprecht: Ah, race condition -- I applied the break at the same time you responded. Glad I assumed…
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks for fixing. Allen: Thanks for fixing.
		// addressed by additional LSL when necessary.
		DOPRegIsUnique = DstReg == MI.getOperand(DOPIdx).getReg();
		break;
		paulwalker-armUnsubmitted Done Reply Inline Actions Rather than forcing `DOPRegIsUnique` to an incorrect value perhaps this switch serves a different purpose and at least one[1] of the matching asserts are just not relevant anymore. Your fix is essentially saying "if we cannot prefix the requested instruction we'll instead emit a prefixed_zeroing_mov". Which suggests this fixes the problem for all `DType`s and thus we no longer require `DOPRegIsUnique == true` for correctness (although it is a hint to slightly poor code generation). [1] I say at least one because for this patch you only care about the zering case so really the `Create the additional LSL to zero` code belongs in the `if (FalseZero)` block and thus the other `DOPRegIsUnique` assert remains valid. That said we can always emit a normal COPY/MOV for the `!FalseZero` case which it one instruction rather than two. However, that's not a scenario that can occur with the current code so you wouldn't be able to write tests and thus best avoided. paulwalker-arm: Rather than forcing `DOPRegIsUnique` to an incorrect value perhaps this switch serves a…
		AllenAuthorUnsubmitted Done Reply Inline Actions Done, delete the code about forcing DOPRegIsUnique to an incorrect value, thanks. This is version is expect to fix the DestructiveBinary only to begin with. Allen: Done, delete the code about forcing DOPRegIsUnique to an incorrect value, thanks. This is…
case AArch64::DestructiveBinaryComm:		case AArch64::DestructiveBinaryComm:
case AArch64::DestructiveBinaryCommWithRev:		case AArch64::DestructiveBinaryCommWithRev:
DOPRegIsUnique =		DOPRegIsUnique =
DstReg != MI.getOperand(DOPIdx).getReg() \|\|		DstReg != MI.getOperand(DOPIdx).getReg() \|\|
MI.getOperand(DOPIdx).getReg() != MI.getOperand(SrcIdx).getReg();		MI.getOperand(DOPIdx).getReg() != MI.getOperand(SrcIdx).getReg();
break;		break;
case AArch64::DestructiveUnaryPassthru:		case AArch64::DestructiveUnaryPassthru:
case AArch64::DestructiveBinaryImm:		case AArch64::DestructiveBinaryImm:
Show All 16 Lines	if ((NewOpcode = AArch64::getSVERevInstr(Opcode)) != -1)
Opcode = NewOpcode;		Opcode = NewOpcode;
// e.g. DIVR -> DIV		// e.g. DIVR -> DIV
else if ((NewOpcode = AArch64::getSVENonRevInstr(Opcode)) != -1)		else if ((NewOpcode = AArch64::getSVENonRevInstr(Opcode)) != -1)
Opcode = NewOpcode;		Opcode = NewOpcode;
}		}

// Get the right MOVPRFX		// Get the right MOVPRFX
uint64_t ElementSize = TII->getElementSizeForOpcode(Opcode);		uint64_t ElementSize = TII->getElementSizeForOpcode(Opcode);
unsigned MovPrfx, MovPrfxZero;		unsigned MovPrfx, LSLZero, MovPrfxZero;
switch (ElementSize) {		switch (ElementSize) {
case AArch64::ElementSizeNone:		case AArch64::ElementSizeNone:
case AArch64::ElementSizeB:		case AArch64::ElementSizeB:
MovPrfx = AArch64::MOVPRFX_ZZ;		MovPrfx = AArch64::MOVPRFX_ZZ;
		LSLZero = AArch64::LSL_ZPmI_B;
MovPrfxZero = AArch64::MOVPRFX_ZPzZ_B;		MovPrfxZero = AArch64::MOVPRFX_ZPzZ_B;
break;		break;
case AArch64::ElementSizeH:		case AArch64::ElementSizeH:
MovPrfx = AArch64::MOVPRFX_ZZ;		MovPrfx = AArch64::MOVPRFX_ZZ;
		LSLZero = AArch64::LSL_ZPmI_H;
MovPrfxZero = AArch64::MOVPRFX_ZPzZ_H;		MovPrfxZero = AArch64::MOVPRFX_ZPzZ_H;
break;		break;
case AArch64::ElementSizeS:		case AArch64::ElementSizeS:
MovPrfx = AArch64::MOVPRFX_ZZ;		MovPrfx = AArch64::MOVPRFX_ZZ;
		LSLZero = AArch64::LSL_ZPmI_S;
MovPrfxZero = AArch64::MOVPRFX_ZPzZ_S;		MovPrfxZero = AArch64::MOVPRFX_ZPzZ_S;
break;		break;
case AArch64::ElementSizeD:		case AArch64::ElementSizeD:
MovPrfx = AArch64::MOVPRFX_ZZ;		MovPrfx = AArch64::MOVPRFX_ZZ;
		LSLZero = AArch64::LSL_ZPmI_D;
MovPrfxZero = AArch64::MOVPRFX_ZPzZ_D;		MovPrfxZero = AArch64::MOVPRFX_ZPzZ_D;
break;		break;
default:		default:
llvm_unreachable("Unsupported ElementSize");		llvm_unreachable("Unsupported ElementSize");
}		}

//		//
// Create the destructive operation (if required)		// Create the destructive operation (if required)
//		//
MachineInstrBuilder PRFX, DOP;		MachineInstrBuilder PRFX, DOP;
if (FalseZero) {		if (FalseZero) {
#ifndef NDEBUG		#ifndef NDEBUG
assert(DOPRegIsUnique && "The destructive operand should be unique");		assert(DOPRegIsUnique && "The destructive operand should be unique");
#endif		#endif
assert(ElementSize != AArch64::ElementSizeNone &&		assert(ElementSize != AArch64::ElementSizeNone &&
"This instruction is unpredicated");		"This instruction is unpredicated");
		paulwalker-armUnsubmitted Done Reply Inline Actions I think this reads better as `DOPRegIsUnique \|\| AArch64::DestructiveBinary == DType` because the second part os the exception. paulwalker-arm: I think this reads better as `DOPRegIsUnique \|\| AArch64::DestructiveBinary == DType` because…
		AllenAuthorUnsubmitted Done Reply Inline Actions Done Allen: Done

// Merge source operand into destination register		// Merge source operand into destination register
PRFX = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(MovPrfxZero))		PRFX = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(MovPrfxZero))
.addReg(DstReg, RegState::Define)		.addReg(DstReg, RegState::Define)
.addReg(MI.getOperand(PredIdx).getReg())		.addReg(MI.getOperand(PredIdx).getReg())
.addReg(MI.getOperand(DOPIdx).getReg());		.addReg(MI.getOperand(DOPIdx).getReg());

// After the movprfx, the destructive operand is same as Dst		// After the movprfx, the destructive operand is same as Dst
DOPIdx = 0;		DOPIdx = 0;
} else if (DstReg != MI.getOperand(DOPIdx).getReg()) {		} else if (DstReg != MI.getOperand(DOPIdx).getReg()) {
#ifndef NDEBUG		#ifndef NDEBUG
assert(DOPRegIsUnique && "The destructive operand should be unique");		assert(DOPRegIsUnique && "The destructive operand should be unique");
#endif		#endif
PRFX = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(MovPrfx))		PRFX = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(MovPrfx))
.addReg(DstReg, RegState::Define)		.addReg(DstReg, RegState::Define)
		paulwalker-armUnsubmitted Done Reply Inline Actions With my previous suggestion does if (DType == AArch64::DestructiveBinary && !DOPRegIsUnique) work here? paulwalker-arm: With my previous suggestion does ``` if (DType == AArch64::DestructiveBinary && !
		AllenAuthorUnsubmitted Done Reply Inline Actions Yes, it works with your previous suggestion 'DOPRegIsUnique = DstReg != MI.getOperand(SrcIdx).getReg();' Allen: Yes, it works with your previous suggestion 'DOPRegIsUnique = DstReg != MI.getOperand(SrcIdx).
.addReg(MI.getOperand(DOPIdx).getReg());		.addReg(MI.getOperand(DOPIdx).getReg());
DOPIdx = 0;		DOPIdx = 0;
}		}

		// Create the additional LSL to zero the lanes when the DstReg is not unique.
		// Zeros the lanes in z0 that aren't active in p0 with sequence movprfx
		// z0.b, p0/z, z0.b; lsl z0.b, p0/m, z0.b, #0;
		paulwalker-armUnsubmitted Done Reply Inline Actions I don't believe this is safe because only predicated instructions are allowed to follow a predicated `movprfx` instruction. There's a section within https://developer.arm.com/documentation/ddi0487/latest/ Data processing - SVE -> Move operations --> Move prefix that details which instructions are allowed to follow a `movprfx`. paulwalker-arm: I don't believe this is safe because only predicated instructions are allowed to follow a…
		efriedmaUnsubmitted Done Reply Inline Actions Oh, oops, I misread the documentation. Can we use `lsl z0.b, p0/m, z0.b, #0` instead? efriedma: Oh, oops, I misread the documentation. Can we use `lsl z0.b, p0/m, z0.b, #0` instead?
		paulwalker-armUnsubmitted Done Reply Inline Actions I think so but will check if there's an architecture preferred answer and will report back. paulwalker-arm: I think so but will check if there's an architecture preferred answer and will report back.
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks @paulwalker-arm and @efriedma. If we don't have a better architecture preferred answer, how about using lsl z0.b, p0/m, z0.b, #0 first? In fact, there are very few scenarios where this additional instruction is required, such as case bic_i64_zero_no_comm. Allen: Thanks @paulwalker-arm and @efriedma. If we don't have a better architecture preferred answer…
		paulwalker-armUnsubmitted Done Reply Inline Actions lsl is not a great fit for current implementations but as you say this will only be used in rare cases that we ultimately want to solve during register allocation anyway. Plus, looking over the instruction set I'm not sure there's a non-shift alternative so yes let's go with `lsl z0.b, p0/m, z0.b, #0`. paulwalker-arm: lsl is not a great fit for current implementations but as you say this will only be used in…
		AllenAuthorUnsubmitted Done Reply Inline Actions Done, thanks for your suggestion. Allen: Done, thanks for your suggestion.
		if (DType == AArch64::DestructiveBinary &&
		DstReg == MI.getOperand(SrcIdx).getReg()) {
		paulwalker-armUnsubmitted Not Done Reply Inline Actions If you keep the above code then does `if (!DOPRegIsUnique)` work here? and as mentioned I believe it should sit with the `if (FalseZero)` block. paulwalker-arm: If you keep the above code then does `if (!DOPRegIsUnique)` work here? and as mentioned I…
		AllenAuthorUnsubmitted Done Reply Inline Actions No, the above code is only active when the #ifndef NDEBUG is true, so it is depend on our configue. Apply your comment and move into the if (FalseZero) block, thanks Allen: * No, the above code is only active when the #ifndef NDEBUG is true, so it is depend on our…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Sure, but what I'm suggesting is that the code is no longer `DEBUG` only as we now have a real world use for it. paulwalker-arm: Sure, but what I'm suggesting is that the code is no longer `DEBUG` only as we now have a real…
		BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(LSLZero))
		.addReg(DstReg, RegState::Define)
		.add(MI.getOperand(PredIdx))
		.addReg(DstReg)
		.addImm(0);
		}
		paulwalker-armUnsubmitted Done Reply Inline Actions Do any of the tests exercise this code? paulwalker-arm: Do any of the tests exercise this code?

//		//
// Create the destructive operation		// Create the destructive operation
//		//
DOP = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(Opcode))		DOP = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(Opcode))
.addReg(DstReg, RegState::Define \| getDeadRegState(DstIsDead));		.addReg(DstReg, RegState::Define \| getDeadRegState(DstIsDead));

switch (DType) {		switch (DType) {
case AArch64::DestructiveUnaryPassthru:		case AArch64::DestructiveUnaryPassthru:
DOP.addReg(MI.getOperand(DOPIdx).getReg(), RegState::Kill)		DOP.addReg(MI.getOperand(DOPIdx).getReg(), RegState::Kill)
.add(MI.getOperand(PredIdx))		.add(MI.getOperand(PredIdx))
.add(MI.getOperand(SrcIdx));		.add(MI.getOperand(SrcIdx));
break;		break;
		case AArch64::DestructiveBinary:
case AArch64::DestructiveBinaryImm:		case AArch64::DestructiveBinaryImm:
case AArch64::DestructiveBinaryComm:		case AArch64::DestructiveBinaryComm:
case AArch64::DestructiveBinaryCommWithRev:		case AArch64::DestructiveBinaryCommWithRev:
DOP.add(MI.getOperand(PredIdx))		DOP.add(MI.getOperand(PredIdx))
.addReg(MI.getOperand(DOPIdx).getReg(), RegState::Kill)		.addReg(MI.getOperand(DOPIdx).getReg(), RegState::Kill)
.add(MI.getOperand(SrcIdx));		.add(MI.getOperand(SrcIdx));
break;		break;
case AArch64::DestructiveTernaryCommWithRev:		case AArch64::DestructiveTernaryCommWithRev:
▲ Show 20 Lines • Show All 871 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

	Show First 20 Lines • Show All 427 Lines • ▼ Show 20 Lines
	let Predicates = [HasSVEorSME, UseExperimentalZeroingPseudos] in {			let Predicates = [HasSVEorSME, UseExperimentalZeroingPseudos] in {
	defm ADD_ZPZZ : sve_int_bin_pred_zeroing_bhsd<int_aarch64_sve_add>;			defm ADD_ZPZZ : sve_int_bin_pred_zeroing_bhsd<int_aarch64_sve_add>;
	defm SUB_ZPZZ : sve_int_bin_pred_zeroing_bhsd<int_aarch64_sve_sub>;			defm SUB_ZPZZ : sve_int_bin_pred_zeroing_bhsd<int_aarch64_sve_sub>;
	defm SUBR_ZPZZ : sve_int_bin_pred_zeroing_bhsd<int_aarch64_sve_subr>;			defm SUBR_ZPZZ : sve_int_bin_pred_zeroing_bhsd<int_aarch64_sve_subr>;

	defm ORR_ZPZZ : sve_int_bin_pred_zeroing_bhsd<int_aarch64_sve_orr>;			defm ORR_ZPZZ : sve_int_bin_pred_zeroing_bhsd<int_aarch64_sve_orr>;
	defm EOR_ZPZZ : sve_int_bin_pred_zeroing_bhsd<int_aarch64_sve_eor>;			defm EOR_ZPZZ : sve_int_bin_pred_zeroing_bhsd<int_aarch64_sve_eor>;
	defm AND_ZPZZ : sve_int_bin_pred_zeroing_bhsd<int_aarch64_sve_and>;			defm AND_ZPZZ : sve_int_bin_pred_zeroing_bhsd<int_aarch64_sve_and>;
	defm BIC_ZPZZ : sve_int_bin_pred_zeroing_bhsd<null_frag>;			defm BIC_ZPZZ : sve_int_bin_pred_zeroing_bhsd<int_aarch64_sve_bic>;
	} // End HasSVEorSME, UseExperimentalZeroingPseudos			} // End HasSVEorSME, UseExperimentalZeroingPseudos

	let Predicates = [HasSVEorSME] in {			let Predicates = [HasSVEorSME] in {
	defm ADD_ZI : sve_int_arith_imm0<0b000, "add", add>;			defm ADD_ZI : sve_int_arith_imm0<0b000, "add", add>;
	defm SUB_ZI : sve_int_arith_imm0<0b001, "sub", sub>;			defm SUB_ZI : sve_int_arith_imm0<0b001, "sub", sub>;
	defm SUBR_ZI : sve_int_arith_imm0<0b011, "subr", AArch64subr>;			defm SUBR_ZI : sve_int_arith_imm0<0b011, "subr", AArch64subr>;
	defm SQADD_ZI : sve_int_arith_imm0<0b100, "sqadd", saddsat>;			defm SQADD_ZI : sve_int_arith_imm0<0b100, "sqadd", saddsat>;
	defm UQADD_ZI : sve_int_arith_imm0<0b101, "uqadd", uaddsat>;			defm UQADD_ZI : sve_int_arith_imm0<0b101, "uqadd", uaddsat>;
	▲ Show 20 Lines • Show All 3,113 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.ll

	Show First 20 Lines • Show All 338 Lines • ▼ Show 20 Lines

	;			;
	; BIC			; BIC
	;			;

	define <vscale x 16 x i8> @bic_i8_zero(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {			define <vscale x 16 x i8> @bic_i8_zero(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {
	; CHECK-LABEL: bic_i8_zero:			; CHECK-LABEL: bic_i8_zero:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov z2.b, #0 // =0x0			; CHECK-NEXT: movprfx z0.b, p0/z, z0.b
	; CHECK-NEXT: sel z0.b, p0, z0.b, z2.b
	; CHECK-NEXT: bic z0.b, p0/m, z0.b, z1.b			; CHECK-NEXT: bic z0.b, p0/m, z0.b, z1.b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%a_z = select <vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> zeroinitializer			%a_z = select <vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> zeroinitializer
	%out = call <vscale x 16 x i8> @llvm.aarch64.sve.bic.nxv16i8(<vscale x 16 x i1> %pg,			%out = call <vscale x 16 x i8> @llvm.aarch64.sve.bic.nxv16i8(<vscale x 16 x i1> %pg,
	<vscale x 16 x i8> %a_z,			<vscale x 16 x i8> %a_z,
	<vscale x 16 x i8> %b)			<vscale x 16 x i8> %b)
	ret <vscale x 16 x i8> %out			ret <vscale x 16 x i8> %out
	}			}

	define <vscale x 8 x i16> @bic_i16_zero(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b) {			define <vscale x 8 x i16> @bic_i16_zero(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b) {
	; CHECK-LABEL: bic_i16_zero:			; CHECK-LABEL: bic_i16_zero:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov z2.h, #0 // =0x0			; CHECK-NEXT: movprfx z0.h, p0/z, z0.h
	; CHECK-NEXT: sel z0.h, p0, z0.h, z2.h
	; CHECK-NEXT: bic z0.h, p0/m, z0.h, z1.h			; CHECK-NEXT: bic z0.h, p0/m, z0.h, z1.h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%a_z = select <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> zeroinitializer			%a_z = select <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> zeroinitializer
	%out = call <vscale x 8 x i16> @llvm.aarch64.sve.bic.nxv8i16(<vscale x 8 x i1> %pg,			%out = call <vscale x 8 x i16> @llvm.aarch64.sve.bic.nxv8i16(<vscale x 8 x i1> %pg,
	<vscale x 8 x i16> %a_z,			<vscale x 8 x i16> %a_z,
	<vscale x 8 x i16> %b)			<vscale x 8 x i16> %b)
	ret <vscale x 8 x i16> %out			ret <vscale x 8 x i16> %out
	}			}

	define <vscale x 4 x i32> @bic_i32_zero(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {			define <vscale x 4 x i32> @bic_i32_zero(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {
	; CHECK-LABEL: bic_i32_zero:			; CHECK-LABEL: bic_i32_zero:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov z2.s, #0 // =0x0			; CHECK-NEXT: movprfx z0.s, p0/z, z0.s
	; CHECK-NEXT: sel z0.s, p0, z0.s, z2.s
	; CHECK-NEXT: bic z0.s, p0/m, z0.s, z1.s			; CHECK-NEXT: bic z0.s, p0/m, z0.s, z1.s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%a_z = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> zeroinitializer			%a_z = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> zeroinitializer
	%out = call <vscale x 4 x i32> @llvm.aarch64.sve.bic.nxv4i32(<vscale x 4 x i1> %pg,			%out = call <vscale x 4 x i32> @llvm.aarch64.sve.bic.nxv4i32(<vscale x 4 x i1> %pg,
	<vscale x 4 x i32> %a_z,			<vscale x 4 x i32> %a_z,
	<vscale x 4 x i32> %b)			<vscale x 4 x i32> %b)
	ret <vscale x 4 x i32> %out			ret <vscale x 4 x i32> %out
	}			}

	define <vscale x 2 x i64> @bic_i64_zero(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b) {			define <vscale x 2 x i64> @bic_i64_zero(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b) {
	; CHECK-LABEL: bic_i64_zero:			; CHECK-LABEL: bic_i64_zero:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov z2.d, #0 // =0x0			; CHECK-NEXT: movprfx z0.d, p0/z, z0.d
	; CHECK-NEXT: sel z0.d, p0, z0.d, z2.d
	; CHECK-NEXT: bic z0.d, p0/m, z0.d, z1.d			; CHECK-NEXT: bic z0.d, p0/m, z0.d, z1.d
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%a_z = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> zeroinitializer			%a_z = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> zeroinitializer
	%out = call <vscale x 2 x i64> @llvm.aarch64.sve.bic.nxv2i64(<vscale x 2 x i1> %pg,			%out = call <vscale x 2 x i64> @llvm.aarch64.sve.bic.nxv2i64(<vscale x 2 x i1> %pg,
	<vscale x 2 x i64> %a_z,			<vscale x 2 x i64> %a_z,
	<vscale x 2 x i64> %b)			<vscale x 2 x i64> %b)
	ret <vscale x 2 x i64> %out			ret <vscale x 2 x i64> %out
	}			}

				; BIC (i.e. A & ~A) is illegal operation with movprfx, so the codegen depend on IR before expand-pseudo
				define <vscale x 2 x i64> @bic_i64_zero_no_unique_reg(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) {
				; CHECK-LABEL: bic_i64_zero_no_unique_reg:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov z1.d, #0 // =0x0
				; CHECK-NEXT: mov z1.d, p0/m, z0.d
				; CHECK-NEXT: movprfx z0.d, p0/z, z0.d
				; CHECK-NEXT: bic z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				%a_z = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> zeroinitializer
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.bic.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a_z,
				<vscale x 2 x i64> %a_z)
				ret <vscale x 2 x i64> %out
				}

				; BIC (i.e. A & ~B) is not a commutative operation, so disable it when the
				; destination operand is not the destructive operand
				define <vscale x 2 x i64> @bic_i64_zero_no_comm(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: bic_i64_zero_no_comm:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov z2.d, #0 // =0x0
				; CHECK-NEXT: sel z0.d, p0, z0.d, z2.d
				; CHECK-NEXT: bic z1.d, p0/m, z1.d, z0.d
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				%a_z = select <vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> zeroinitializer
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.bic.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b,
				<vscale x 2 x i64> %a_z)
				ret <vscale x 2 x i64> %out
				}

	declare <vscale x 16 x i8> @llvm.aarch64.sve.add.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)			declare <vscale x 16 x i8> @llvm.aarch64.sve.add.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
	declare <vscale x 8 x i16> @llvm.aarch64.sve.add.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)			declare <vscale x 8 x i16> @llvm.aarch64.sve.add.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
	declare <vscale x 4 x i32> @llvm.aarch64.sve.add.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)			declare <vscale x 4 x i32> @llvm.aarch64.sve.add.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
	declare <vscale x 2 x i64> @llvm.aarch64.sve.add.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)			declare <vscale x 2 x i64> @llvm.aarch64.sve.add.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)

	declare <vscale x 16 x i8> @llvm.aarch64.sve.sub.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)			declare <vscale x 16 x i8> @llvm.aarch64.sve.sub.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
	declare <vscale x 8 x i16> @llvm.aarch64.sve.sub.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)			declare <vscale x 8 x i16> @llvm.aarch64.sve.sub.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
	declare <vscale x 4 x i32> @llvm.aarch64.sve.sub.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)			declare <vscale x 4 x i32> @llvm.aarch64.sve.sub.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
	Show All 26 Lines

llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=aarch64 -mattr=+sve -mattr=+use-experimental-zeroing-pseudos -run-pass=aarch64-expand-pseudo %s -o - \| FileCheck %s

				# Should create an additional LSL to zero the lanes as the DstReg is not unique

				--- \|
				define <vscale x 8 x i16> @bic_i16_zero(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a){
				%a_z = select <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> zeroinitializer
				%out = call <vscale x 8 x i16> @llvm.aarch64.sve.bic.nxv8i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a_z, <vscale x 8 x i16> %a_z)
				ret <vscale x 8 x i16> %out
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.bic.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				...
				---
				name: bic_i16_zero
				alignment: 4
				tracksRegLiveness: true
				tracksDebugUserValues: true
				registers: []
				liveins:
				- { reg: '$p0', virtual-reg: '' }
				- { reg: '$z0', virtual-reg: '' }
				body: \|
				bb.0 (%ir-block.0):
				liveins: $p0, $z0

				; CHECK-LABEL: name: bic_i16_zero
				; CHECK: liveins: $p0, $z0
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: BUNDLE implicit-def $z0, implicit-def $q0, implicit-def $d0, implicit-def $s0, implicit-def $h0, implicit-def $b0, implicit-def $z0_hi, implicit killed $p0, implicit $z0 {
				; CHECK-NEXT: $z0 = MOVPRFX_ZPzZ_H $p0, $z0
				; CHECK-NEXT: $z0 = LSL_ZPmI_H killed renamable $p0, internal $z0, 0
				; CHECK-NEXT: $z0 = BIC_ZPmZ_H killed renamable $p0, internal killed $z0, internal killed renamable $z0
				; CHECK-NEXT: }
				; CHECK-NEXT: RET undef $lr, implicit $z0
				renamable $z0 = BIC_ZPZZ_ZERO_H killed renamable $p0, killed renamable $z0, killed renamable $z0
				RET_ReallyLR implicit $z0
				...

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Support logical operation BIC with DestructiveBinary patternsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 470369

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.mir

[AArch64][SVE] Support logical operation BIC with DestructiveBinary patterns
ClosedPublic