This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Support logical operation BIC with DestructiveBinary patterns
ClosedPublic

Authored by Allen on Apr 23 2022, 2:23 AM.

Details

Summary

Logical operation BIC with DestructiveBinary patterns is temporarily removed as
causes an assert (commit 3c382ed71f15), so try to fix that.
The most significant being that for pseudo instructions that do not have real instructions (including movpfx'd ones) that cover all combinations of register allocation, their expansion will be broken. This is the main reason the zeroing is an experimental feature because it has known bugs.
So we add an extra LSL for movprfx expand BIC_ZPZZ_ZERO A, P, A, A when necessary.

movprfx	z0.s, p0/z, z0.s
lsl z0.b, p0/m, z0.b, #0
bic	z0.s, p0/m, z0.s, z0.s

Depends on D88595

Diff Detail

Event Timeline

Allen created this revision.Apr 23 2022, 2:23 AM
Allen requested review of this revision.Apr 23 2022, 2:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2022, 2:23 AM
Allen updated this revision to Diff 424709.Apr 23 2022, 2:33 AM

add a new test case

Hi @Allen, you might be interested in D88595.

This is the main reason the zeroing is an experimental feature because it has known bugs. The most significant being that for pseudo instructions that do not have real instructions (including movpfx'd ones) that cover all combinations of register allocation, their expansion will be broken. I believe BIC is an example of this because there's no way to movprfx expand BIC_ZPZZ_ZERO A, P, A, A. An unrealistic usage I know but it's still legitamet.

define <vscale x 4 x i32> @foo(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) #0 {
entry:
  %t1 = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> zeroinitializer
  %t2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.bic.nxv4i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %t1, <vscale x 4 x i32> %a)
  ret <vscale x 4 x i32> %t2
}

This will trigger an assert, even after this patch, because otherwise it would emit

movprfx	z0.s, p0/z, z0.s
bic	z0.s, p0/m, z0.s, z0.s

which is not a valid use of movprfx.

For ACfL(https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-compiler-for-linux) we solve this using a couple of machine function passes but I feel they're not the correct solution hence the above patch where we'd rather have a mechanism to better express the register requirements as part of the pseudo's definition.

Allen added a comment.EditedApr 23 2022, 6:15 PM

Hi @Allen, you might be interested in D88595.

This is the main reason the zeroing is an experimental feature because it has known bugs. The most significant being that for pseudo instructions that do not have real instructions (including movpfx'd ones) that cover all combinations of register allocation, their expansion will be broken. I believe BIC is an example of this because there's no way to movprfx expand BIC_ZPZZ_ZERO A, P, A, A. An unrealistic usage I know but it's still legitamet.

define <vscale x 4 x i32> @foo(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) #0 {
entry:
  %t1 = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> zeroinitializer
  %t2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.bic.nxv4i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %t1, <vscale x 4 x i32> %a)
  ret <vscale x 4 x i32> %t2
}

This will trigger an assert, even after this patch, because otherwise it would emit

movprfx	z0.s, p0/z, z0.s
bic	z0.s, p0/m, z0.s, z0.s

which is not a valid use of movprfx.

For ACfL(https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-compiler-for-linux) we solve this using a couple of machine function passes but I feel they're not the correct solution hence the above patch where we'd rather have a mechanism to better express the register requirements as part of the pseudo's definition.

Thanks @paulwalker-arm for detail explaning the issue of of register allocation. After read the D88595, I think there is two candidate ways to address this, do I missing something ?

1、the instruction bic is special, so when all the registers are same, it should be expansion with zero  (i.e. BIC (A, A) = A & ~A = 0)

2、make use of  not_all_same, and  {not_all_same(Dst, Op0, Op1), earlyclobber(Dst)} can be simplified as {earlyclobber(Dst)}, which can guard the registers without all same.
paulwalker-arm added a comment.EditedApr 26 2022, 4:09 PM

I don't think either of these are good options. I don't like the idea of having instruction specific handling within the expand code and I'm not sure where the logic from (2) comes from but regardless using earlyclobber in this way will force Dst to be unique and thus forces the need for movpfrx (i.e. it blocks the BIC_ZPZZ_ZERO A, P, A, B use case ).

To my mind the best option is to allow the register requirements to be better expressed as part of the pseudo instruction. Until then I cannot see how we can move this feature out of its experimental phase. Perhaps there's an alternative to D88595 but I've not thought about it for a while. How important is movpfrx based zeroing to you right now? Have you got a problematic use case you can share?

Allen added a comment.Apr 26 2022, 6:37 PM

I don't think either of these are good options. I don't like the idea of having instruction specific handling within the expand code and I'm not sure where the logic from (2) comes from but regardless using earlyclobber in this way will force Dst to be unique and thus forces the need for movpfrx (i.e. it blocks the BIC_ZPZZ_ZERO A, P, A, B use case ).

To my mind the best option is to allow the register requirements to be better expressed as part of the pseudo instruction. Until then I cannot see how we can move this feature out of its experimental phase. Perhaps there's an alternative to D88595 but I've not thought about it for a while. How important is movpfrx based zeroing to you right now? Have you got a problematic use case you can share?

Thanks, I'll wait for D88595 , and I'm not in a hurry for this, but only confirm the idea.
BTW, the logic from (2) comes from the comment https://reviews.llvm.org/D88595#inline-886782

Allen retitled this revision from [AArch64][SVE] Support logical operation BIC with DestructiveBinary patterns to [WIP][AArch64][SVE] Support logical operation BIC with DestructiveBinary patterns.Apr 26 2022, 9:28 PM
Allen edited the summary of this revision. (Show Details)Apr 26 2022, 9:28 PM
Matt added a subscriber: Matt.Apr 27 2022, 11:47 AM

I don't like the idea of having instruction specific handling within the expand code

Would this really be so terrible? I mean, it's arguably a bit of a hack, but it's not that different from the way we handle other pseudo-instructions.

I don't like the idea of having instruction specific handling within the expand code

Would this really be so terrible? I mean, it's arguably a bit of a hack, but it's not that different from the way we handle other pseudo-instructions.

I think so. Pseudo-instruction expansion is often instruction specific but for the movprfx handling we've detached the logic from the instructions (because there's 100s of them) and instead split them across various categories. So I'd much rather see problems solved for a whole category rather than partially for a single instruction within a category.

I don't like the idea of having instruction specific handling within the expand code

Would this really be so terrible? I mean, it's arguably a bit of a hack, but it's not that different from the way we handle other pseudo-instructions.

I think so. Pseudo-instruction expansion is often instruction specific but for the movprfx handling we've detached the logic from the instructions (because there's 100s of them) and instead split them across various categories. So I'd much rather see problems solved for a whole category rather than partially for a single instruction within a category.

I see what you mean. And I guess some of the instructions don't have any sort of "identity" result like this.

I guess you could use an alternative sequence. I can't come up with a two-instruction sequence, but I guess you can movprfx a dummy instruction, like movprfx z0.b, p0/z, z0.b; add z0.b, z0.b, #0; bic z0.b, p0/m, z0.b, z0.b.

Allen added a comment.Oct 15 2022, 8:11 AM

I don't like the idea of having instruction specific handling within the expand code

Would this really be so terrible? I mean, it's arguably a bit of a hack, but it's not that different from the way we handle other pseudo-instructions.

I think so. Pseudo-instruction expansion is often instruction specific but for the movprfx handling we've detached the logic from the instructions (because there's 100s of them) and instead split them across various categories. So I'd much rather see problems solved for a whole category rather than partially for a single instruction within a category.

I see what you mean. And I guess some of the instructions don't have any sort of "identity" result like this.

I guess you could use an alternative sequence. I can't come up with a two-instruction sequence, but I guess you can movprfx a dummy instruction, like movprfx z0.b, p0/z, z0.b; add z0.b, z0.b, #0; bic z0.b, p0/m, z0.b, z0.b.

As D88595 will not be accepted in a short time, so I'll try with your idea.
Before doing so, I want to check your suggestion:
do you mean the additonal add z0.b, z0.b, #0; should match the above movprfx instuction(meet the constraints of movprfx instruction), then the bic instruction will get "identity" result?

The sequence movprfx z0.b, p0/z, z0.b; add z0.b, z0.b, #0; zeros the lanes in z0 that aren't active in p0. (It's a slightly weird way to write it, but as far as I know there isn't any single instruction with equivalent semantics.) Then we can just use the regular instruction that leaves the lanes we just zeroed unmodified. This works regardless of the operation (as long as it doesn't do cross-lane processing).

The "identity" thing is just noting that if the two operands to bic are identical, the result is always zero. But as noted before, that doesn't generalize to other operations.

Allen updated this revision to Diff 468206.Oct 17 2022, 8:04 AM
Allen retitled this revision from [WIP][AArch64][SVE] Support logical operation BIC with DestructiveBinary patterns to [AArch64][SVE] Support logical operation BIC with DestructiveBinary patterns.
Allen edited the summary of this revision. (Show Details)

Add additional add z0.b, z0.b, #0 according comment

I think you're missing the point... we only want to insert the extra add instruction if the movprfx would be otherwise be illegal.

Allen updated this revision to Diff 468545.Oct 18 2022, 8:01 AM

Add additional add z0.b, z0.b, #0 when necessary

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
590

I don't believe this is safe because only predicated instructions are allowed to follow a predicated movprfx instruction. There's a section within

https://developer.arm.com/documentation/ddi0487/latest/

Data processing - SVE
-> Move operations
--> Move prefix

that details which instructions are allowed to follow a movprfx.

590–598

Do any of the tests exercise this code?

efriedma added inline comments.Oct 19 2022, 10:39 AM
llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
590

Oh, oops, I misread the documentation. Can we use lsl z0.b, p0/m, z0.b, #0 instead?

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
590

I think so but will check if there's an architecture preferred answer and will report back.

Allen added inline comments.Oct 20 2022, 6:05 PM
llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
590

Thanks @paulwalker-arm and @efriedma.
If we don't have a better architecture preferred answer, how about using lsl z0.b, p0/m, z0.b, #0 first? In fact, there are very few scenarios where this additional instruction is required, such as case bic_i64_zero_no_comm.

paulwalker-arm added inline comments.Oct 21 2022, 2:49 AM
llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
590

lsl is not a great fit for current implementations but as you say this will only be used in rare cases that we ultimately want to solve during register allocation anyway. Plus, looking over the instruction set I'm not sure there's a non-shift alternative so yes let's go with lsl z0.b, p0/m, z0.b, #0.

Allen updated this revision to Diff 469851.Oct 21 2022, 8:11 PM

Update add z0.b, z0.b, #0 with lsl z0.b, p0/m, z0.b, #0 as comment

Allen marked 5 inline comments as done.Oct 21 2022, 8:12 PM
Allen added inline comments.
llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
590

Done, thanks for your suggestion.

Allen edited the summary of this revision. (Show Details)Oct 23 2022, 7:09 AM
Allen marked an inline comment as done.Oct 23 2022, 7:17 PM

ping ?

Still missing a testcase that actually triggers the "lsl" codepath.

Allen updated this revision to Diff 470369.Oct 24 2022, 10:02 PM

Add a mir case to trigger the "lsl" codepath.

Allen added a comment.Oct 27 2022, 5:11 PM

any new suggestion about the last update? Thanks.

paulwalker-arm added inline comments.Nov 1 2022, 11:34 AM
llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
495–499

Rather than forcing DOPRegIsUnique to an incorrect value perhaps this switch serves a different purpose and at least one[1] of the matching asserts are just not relevant anymore.

Your fix is essentially saying "if we cannot prefix the requested instruction we'll instead emit a prefixed_zeroing_mov". Which suggests this fixes the problem for all DTypes and thus we no longer require DOPRegIsUnique == true for correctness (although it is a hint to slightly poor code generation).

[1] I say at least one because for this patch you only care about the zering case so really the Create the additional LSL to zero code belongs in the if (FalseZero) block and thus the other DOPRegIsUnique assert remains valid. That said we can always emit a normal COPY/MOV for the !FalseZero case which it one instruction rather than two. However, that's not a scenario that can occur with the current code so you wouldn't be able to write tests and thus best avoided.

591–592

If you keep the above code then does if (!DOPRegIsUnique) work here? and as mentioned I believe it should sit with the if (FalseZero) block.

Allen updated this revision to Diff 472527.Nov 2 2022, 1:14 AM
Allen marked an inline comment as done.

address comments

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
495–499

Done, delete the code about forcing DOPRegIsUnique to an incorrect value, thanks.

This is version is expect to fix the DestructiveBinary only to begin with.

591–592
  • No, the above code is only active when the #ifndef NDEBUG is true, so it is depend on our configue.
  • Apply your comment and move into the if (FalseZero) block, thanks
paulwalker-arm added inline comments.Nov 2 2022, 7:34 AM
llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
591–592

Sure, but what I'm suggesting is that the code is no longer DEBUG only as we now have a real world use for it.

paulwalker-arm added a comment.EditedNov 2 2022, 7:35 AM

@Allen It looks like something has gone wrong because the latest version looks like a different piece of work, unrelated to movprfx.

Allen updated this revision to Diff 472614.Nov 2 2022, 8:01 AM

update to fix the different piece of work

Allen added a comment.EditedNov 2 2022, 8:02 AM

@Allen It looks like something has gone wrong because the latest version looks like a different piece of work, unrelated to movprfx.

Oh. sorry for the mistake, and now it is the right version

@Allen Did you see my comment from "Wed, Nov 2, 2:34 PM"? Not a requirement but I figure moving this code out of DEBUG and ensuring DOPRegIsUnique is correct for AArch64::DestructiveBinary might be better in the long run. I know at this time you only care about DestructiveBinary, but I think your solution can easily be extended to the other destructive types later on, although part of me thinks they'll just work after this patch.

Allen updated this revision to Diff 473641.Nov 7 2022, 5:58 AM

Delete the NDEBUG as comment, thanks

Allen added a comment.Nov 7 2022, 7:13 AM

@Allen Did you see my comment from "Wed, Nov 2, 2:34 PM"? Not a requirement but I figure moving this code out of DEBUG and ensuring DOPRegIsUnique is correct for AArch64::DestructiveBinary might be better in the long run. I know at this time you only care about DestructiveBinary, but I think your solution can easily be extended to the other destructive types later on, although part of me thinks they'll just work after this patch.

sorry @paulwalker-arm for missing that. Now I deleted the DEBUG.

paulwalker-arm added inline comments.Nov 9 2022, 10:39 AM
llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
454–455

Is it possible to move this logic into the DOPRegIsUnique switch statement like

case AArch64::DestructiveBinary:
  DOPRegIsUnique = DstReg != MI.getOperand(SrcIdx).getReg();
568

I think this reads better as DOPRegIsUnique || AArch64::DestructiveBinary == DType because the second part os the exception.

582–583

With my previous suggestion does

if (DType == AArch64::DestructiveBinary && !DOPRegIsUnique)

work here?

Allen updated this revision to Diff 474456.Nov 9 2022, 11:34 PM
Allen marked 2 inline comments as done.

Address comment

Allen marked an inline comment as done.Nov 9 2022, 11:35 PM
Allen added inline comments.
llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
454–455

Apply your comment, thanks

568

Done

582–583

Yes, it works with your previous suggestion 'DOPRegIsUnique = DstReg != MI.getOperand(SrcIdx).getReg();'

paulwalker-arm accepted this revision.Nov 10 2022, 9:39 AM
This revision is now accepted and ready to land.Nov 10 2022, 9:39 AM
Allen closed this revision.Nov 10 2022, 5:21 PM
Allen marked an inline comment as done.
rupprecht added inline comments.
llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
496

Is fall through intended here? I assume you want a break;?

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp:495:3: warning: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
  case AArch64::DestructiveBinaryComm:
Allen marked an inline comment as done.Nov 10 2022, 10:06 PM
Allen added inline comments.
llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
496

Yes, I'll add break, thanks

rupprecht added inline comments.Nov 10 2022, 10:07 PM
llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
496

Added a break in 094c0eccdf959c3b9c85219e33c3fcfbab024b61 to avoid fallthrough. Please take a look if that's not what you want.

rupprecht added inline comments.Nov 10 2022, 10:14 PM
llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
496

Ah, race condition -- I applied the break at the same time you responded. Glad I assumed correctly :)

Allen marked 3 inline comments as done.Nov 10 2022, 10:19 PM
Allen added inline comments.
llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
496

Thanks for fixing.