This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Lower fixed length VECREDUCE_SEQ_FADD operation
ClosedPublic

Authored by cameron.mcinally on Oct 9 2020, 2:00 PM.

Details

Summary

@paulwalker-arm, I'm guessing this hits the reduction legalisation problems that @kmclaughlin is working on? This patch currently falls apart during splitting of the reduction nodes. If not, I'll continue to build it out.

Some other notes:

  1. It looks like NEON FADDA support is missing upstream too.
  1. We'll likely need to change how the OperationAction types are determined for the reduction-with-accumulator ISD nodes (see LegalizeDAG.cpp change). The types are currently based off the start_value type, not the vector op type. @sdesmalen

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptOct 9 2020, 2:00 PM
cameron.mcinally requested review of this revision.Oct 9 2020, 2:00 PM

Are you planning to add support for the normal VECREDUCE_FADD? I ask because that's likely to see more initial use upstream than the SEQ variant.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1154–1158 ↗(On Diff #297329)

You'll need to move these below the main VECREDUCE_ options because VECREDUCE_FADD & VECREDUCE_FMUL only take a single operand.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1132–1135

I doubt it's worth custom lowering v1f64 being the expansion will be a single neon ADD. Or is this because there is current no expand code of the SEQ reductions?

What about operations on vectors of f16.

16270

Given there's likely to be only one use of this I think calling this LowerVECREDUCE_SEQ_FADD is more inline with the naming of the other lowering functions.

16281–16284

Looks like we're missing some obvious patterns. I've created D89235 to add them.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
223–224

This'll need to be more verbose because having SVE is only safe for scalable vectors. You'll need to look at the useSVEForFixedLengthVectorVT question if II is operating on fixed length types.

llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll
57

Looks like a stray X here.

You're right on all the comments. I stopped midway when I hit the legalisation issue, so that's why this patch is rough. That would need to be built out first before this work could continue. And I think NEON support should be built out before that. It sounds like I'm not stepping on toes, so I'll go in that order. Thanks.

Oh, and I can do VECREDUCE_FADD first. Just hit VECREDUCE_SEQ_FADD before I realized I need to add 'fast' to the intrinsic calls.

cameron.mcinally retitled this revision from [SVE][WIP] Lower fixed length VECREDUCE_SEQ_FADD operation to [SVE] Lower fixed length VECREDUCE_SEQ_FADD operation.

This is ready for review now...

  1. Notice that I made useSVEForFixedLengthVectors*(...) public members so that the ExpandReductions pass can access them. I don't keep up with C++ best practices, so I'm not sure if there's a better way to grant access to these functions. Any suggestions?
  1. Also, expansion is kind of ugly for this operation. The ExpandReduction pass is an IR pass that runs before legalization, so the decision to expand is done early. It would be possible to not rely on ExpandReductions, and rather extend normal legalization to do an ordered lowering, but that would be duplicating what is already done. Does anyone feel strongly that expansion should be moved to Legalize or otherwise?

Some other notes:

It looks like NEON FADDA support is missing upstream too.

I made a mistake here. I don't see an FADDA instruction in the NEON ISA, only SVE. Not sure why I thought otherwise...

cameron.mcinally marked 5 inline comments as done.Oct 19 2020, 9:13 AM
cameron.mcinally added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16270

I moved LowerVECREDUCE_SEQ_FADD(...) out from under LowerVECREDUCE(...). Not sure if that helps or hurts @kmclaughlin, but I can change it if there's a problem.

cameron.mcinally marked an inline comment as done.

Fix 80 column issue. No other changes intended...

Try again with 80 column fix...

Sorry @cameron.mcinally I've not had much time for code reviews this week although will take proper look tomorrow. I have a question though. You've added extra legalisation support but I don't see any explicit tests (or at least ones with matching check lines) for it. Is this something you need for this patch? (I'm guessing sve-fixed-length-fp-reduce.ll's stock NEON run line triggers the cases?) If so then there really should be a neon specific test file that verifies the widening and scalarisation changes as the NEON run line for the "fixed-length" tests is more about ensuring no SVE instructions slip through.

Sorry @cameron.mcinally I've not had much time for code reviews this week although will take proper look tomorrow. I have a question though. You've added extra legalisation support but I don't see any explicit tests (or at least ones with matching check lines) for it. Is this something you need for this patch? (I'm guessing sve-fixed-length-fp-reduce.ll's stock NEON run line triggers the cases?) If so then there really should be a neon specific test file that verifies the widening and scalarisation changes as the NEON run line for the "fixed-length" tests is more about ensuring no SVE instructions slip through.

Thanks, Paul. No rush at all. Digressing a bit, we should probably sync-up on what else needs to be done for fixed length lowering. I believe this is the last operation on my list.

The legalisation changes are specific to VECREDUCE_SEQ_FADD on SVE. Prior to this patch, VECREDUCE_SEQ_FADD is expanded in the ExpandReductions pass before legalisation runs, so VECREDUCE_SEQ_FADD would never reach legalisation on NEON. (That's assuming NEON has no FADDA support. I might be wrong about that.)

The new tests would be broken without the legalisation changes, so I'm assuming that those are enough coverage. Maybe I'm missing something though...

The new tests would be broken without the legalisation changes, so I'm assuming that those are enough coverage. Maybe I'm missing something though...

Are you sure? I took your patch for a test drive and removed all but the TLI.getOperationAction related change from Legalize*.{cpp, h} and the tests passed.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16293

I think this can be ContainerVT?

16297

You use this in a couple of places, so might be worth extracting into Zero = ?

llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll
65

What does this mean?

I was going to say you're missing VBITS_EQ_256 lines here but then noticed the output. What's you're feeling on this? I see two routes:

Stick with the expanded output for this patch, in which case just having VBITS_EQ_256-COUNT-33: fadd is good enough.

For better code generation we're just missing the ability to split VECREDUCE_SEQ_FADD operations, which I don't think should be that hard.

VECREDUCE_SEQ_FADD A, X_512 => VECREDUCE_SEQ_FADD (VECREDUCE_SEQ_FADD X, X_LO), X_HI ?

I'm happy with either approach.

cameron.mcinally added a comment.EditedOct 23 2020, 8:03 AM

The new tests would be broken without the legalisation changes, so I'm assuming that those are enough coverage. Maybe I'm missing something though...

Are you sure? I took your patch for a test drive and removed all but the TLI.getOperationAction related change from Legalize*.{cpp, h} and the tests passed.

That doesn't sound right. E.g. we need the scalarization code for <1 x f64>. I'll check again...

llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll
65

For better code generation we're just missing the ability to split VECREDUCE_SEQ_FADD operations, which I don't think should be that hard.

Oh, you're right. I initially thought we couldn't preserve the ordering constraint if we split, but we could use the accumulator to chain them. Makes sense. I'll look into that...

nikic requested changes to this revision.Oct 23 2020, 8:16 AM
nikic added a subscriber: nikic.

This should be split into two parts:

  1. A patch that implements all the necessary legalizations for VECREDUCE_SEQ_* and enables use of SDAG legalization on AArch64 (and possibly ARM) unconditionally. This may need some additional test coverage to trigger all possible legalizations. Necessary legalizations also include float legalization, not just vector legalization.
  2. A patch that adds improved lowerings using SVE instructions based on that.
This revision now requires changes to proceed.Oct 23 2020, 8:16 AM

@nikic Why does wanting to implement VECREDUCE_SEQ_FADD for SVE necessitate having to implement full support for NEON (AArch64 and Arm)?

nikic added a comment.Oct 23 2020, 8:31 AM

@paulwalker-arm There is no need to implement any target-specific support. The legalization outcome will be a simple chain of extracts and fadd/fmuls. It does not need to generate good code, just not assert for any VTs.

@paulwalker-arm There is no need to implement any target-specific support. The legalization outcome will be a simple chain of extracts and fadd/fmuls. It does not need to generate good code, just not assert for any VTs.

Sure, I think there's been a misunderstanding. The only target-specific piece we're talking about is the existing shouldExpandReduction hook that controls if the code generator will see the VECREDUCE_SEQ_FADD to legalise. So I'm wondering if an acceptable ordering is to allow the lowering of legal VECREDUCE_SEQ_FADD operations (which is only what shouldExpandReduction will let through) and then we can tackle the type legalisation problem second so that shouldExpandReduction can let everything through when SVE is enable (of which they'll already be a test ready and waiting) and leave NEON as it is today.

cameron.mcinally marked an inline comment as done.

Updating patch, but not ready for a serious review yet as I haven't started the splitting work. I'm still not convinced we can handle splitting appropriately with the current setup, but will comment on that seperately.

I caused a big misunderstanding here, so let me try to unwind it. Paul is correct. The legalizations are NOT necessary for SVE support. I made a mistake when preparing this patch. I.e. I built the legalization changes out before I finalized the shouldExpandReduction(...) changes, so ended up in a weird state with NEON.

The ExpandReductions pass is somewhat unusual and that threw me off. It's an IR pass that runs early in llc, before legalisation. I've never seen one of those, so it tripped me up. I apologize for the confusion. That said, I wonder if the reductions should really be expanded during legalization. That would have been a more natural choice to me.

@cameron.mcinally: I'm sure you know this but just in case it saves some time I can confirm you will not hit the splitting code until after you relax shouldExpandReduction. So I think your current patch is complete so it really comes down to whether adding the support this way round (i.e. only allow legal types for SVE, then allow all types and implement the legalisation) is acceptable to @nikic .

nikic resigned from this revision.Oct 23 2020, 10:10 AM

Thanks for the clarification, I indeed misunderstood this. If you have a way to accurately pick out the legal types, then this is of course fine. My only concern was to not introduce incomplete legalization support for these opcodes.

The ExpandReductions pass is somewhat unusual and that threw me off. It's an IR pass that runs early in llc, before legalisation. I've never seen one of those, so it tripped me up. I apologize for the confusion. That said, I wonder if the reductions should really be expanded during legalization. That would have been a more natural choice to me.

The eventual goal here is to expand reductions during legalization only. The IR expansion exists because the DAG legalization support has been patchy historically, with VECREDUCE_SEQ_* being the last remaining hole.

The eventual goal here is to expand reductions during legalization only. The IR expansion exists because the DAG legalization support has been patchy historically, with VECREDUCE_SEQ_* being the last remaining hole.

Ah, ok. I'm happy to hear we're all in agreement. I might be able to volunteer some time for that work, but it's not really my prime directive.

Again, apologies for the mix up. There were a lot of moving parts for this patch and I dropped the ball. *^_^*

@paulwalker-arm, back to the splitting discussion...

Looks like we need to wait for the full VECREDUCE_SEQ_FADD legalization changes before the splitting work can be done. We could make changes to shouldExpandReduction(), to recognize large vectors that could be split, but I suspect that code will end up too hacky.

Unless I'm missing something. What are your thoughts?

paulwalker-arm accepted this revision.Oct 23 2020, 11:53 AM

With this patch[1] landed I believe operation legalisation is now a solved problem for SVE (well for fixed length vectors). I think it's worth tackling the type legalisation side of things rather than overcomplicating shouldExpandReduction. Of course that's easy for me to say given it's your time :) but you've already done part of the work based on the older patch.

Your WidenVecOp_VECREDUCE changes looked good and we've already discussed how splitting can be done in a generic way. With these two pieces I think shouldExpandReduction just needs to test for SVE.

When that lands we hit another decision point to decide whether it's worth avoiding SVE for very small vectors and thus whether to implement the scalarisation.

[1] I just wanted to highlight my previous VBITS_EQ_256-COUNT-33: fadd comment as this gives us a bit more test coverage and is something that will obviously fail (in a good way) when the splitting work is available.

This revision is now accepted and ready to land.Oct 23 2020, 11:53 AM

[1] I just wanted to highlight my previous VBITS_EQ_256-COUNT-33: fadd comment as this gives us a bit more test coverage and is something that will obviously fail (in a good way) when the splitting work is available.

Got it. Also added a FIXME to remind us to move the useSVEForFixedLengthVectors*() functions back to private scope when the reduction legalization is complete.