This is an archive of the discontinued LLVM Phabricator instance.

[SVE][CodeGen] Lower scalable integer vector reductions
ClosedPublic

Authored by kmclaughlin on Oct 14 2020, 4:19 AM.

Details

Summary

This patch uses the existing LowerFixedLengthReductionToSVE function to also lower
scalable vector reductions. A separate function has been added to lower VECREDUCE_AND
& VECREDUCE_OR operations with predicate types using ptest.

Lowering scalable floating-point reductions will be addressed in a follow up patch,
for now these will hit the assertion added to expandVecReduce() in TargetLowering.

Diff Detail

Event Timeline

kmclaughlin created this revision.Oct 14 2020, 4:19 AM
Herald added a project: Restricted Project. · View Herald Transcript
kmclaughlin requested review of this revision.Oct 14 2020, 4:19 AM
paulwalker-arm added inline comments.Oct 15 2020, 6:10 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1042–1043

FYI: I guess we'll just promote[1] things like VECREDCUDE_ADD but I think [u/s][min/max] are just variants of AND and OR when it comes to i1 types.

[1] Depending on the cost of ptest and the fact VECREDUCE_AND requires the extra xor, it might end up better to just promote all i1 base operations. Eitherway I'm happy with your current approach I'm just making you aware of a potential future change.

16352

Is OverrideNEON needed here? I ask because when SrcVT is a fixed length vector the value of OverrideNEON seems irrelevant as by calling this function the only safe action is to pack the vector into a scalable type and thus you may as well just always pass in true.

16358–16359

My preference would be to not have any interdependence between these two lowering functions. If you agree and considering LowerPredReductionToSVE has it's own switch statement, can this be moved into LowerVECREDUCE within the if (SrcVT.isScalableVector... block and have LowerPredReductionToSVE create its own predicate..

llvm/test/CodeGen/AArch64/sve-int-reduce.ll
51

This is personal preference so feel free to ignore but considering they go down different code paths and have structurally different output I'd prefer to have the predicate tests within their own ll file.

kmclaughlin marked 3 inline comments as done.
  • Lower vecreduce_[s|u]max & vecreduce_[s|u]min to AND and OR for i1 types
  • Add lowering for vecreduce_xor to LowerPredReductionToSVE using the cntp intrinsic
  • Split predicate reduction tests into separate .ll files
paulwalker-arm added inline comments.Oct 20 2020, 3:44 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3936–3937

On reflection and since you're going the extra mile, I'd rather have this transformation (and the matching reduction ones) at the SelectionDAG::getNode() level so that we can just force a canonical representation for all targets as early as possible and then it'll never be a concern from a legalisation point of view.

kmclaughlin marked an inline comment as done.
  • Moved the transformations of i1 [s|u]min & [s|u]max -> and/or to SelectionDAG::getNode()
  • Removed custom lowering of vecreduce_[s|u]min & vecreduce_[s|u]max for predicate types
  • Changed getNode() to check if the operand type of vecreduce min/max is i1 instead of the result type
  • Fixed a mistake with the changes to getNode() in the previous patch, where the transformations of [s|u]min & [s|u]max would also apply to other operations
paulwalker-arm added inline comments.Oct 23 2020, 5:12 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3324

I think you want to just reject extracts from scalable vectors here. There's code at the top of this function which normally does the job, but it relies on the result type and so EXTRACT_VECTOR-ELT slips through. FYI: Eli has the same fix in D87651 so I'd match that and crown whoever lands first the winner.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9807–9809

Given LowerPredReductionToSVE has it's own switch block can this be done universally regardless of the opcode. i.e.

if (SrcVT.isScalableVector() ||
      useSVEForFixedLengthVectorVT(SrcVT, OverrideNEON)) {

  if (SrcVT.getVectorElementType() == MVT::i1)
    return LowerPredReductionToSVE(Op, DAG);

  switch (Op.getOpcode()) {
16357–16358

As you're moving this, can you put it immediately before the SDValue Rdx line so it has some company?

llvm/test/CodeGen/AArch64/vecreduce-umax-legalization.ll
145–152 ↗(On Diff #299730)

Presumably this is down to the canonicalisation? I'm not sure there's any immediate performance concerns here, and I stand behind canonicalisation being the correct solution. I guess we can see if anybody disagrees with this.

The worst that can happen is more custom lowering for NEON is required but because the existing lowering doesn't do anything special for OR, and MIN/MAX operations on i1 is weird anyways, I'd rather not do that in this patch unless we really have to.

kmclaughlin marked 3 inline comments as done.
  • Changed fix for the warning in computeKnownBits for extract_vector_elt to match D87651
  • Moved check for i1 types in LowerVECREDUCE outside of the switch statement
kmclaughlin added inline comments.Oct 26 2020, 11:09 AM
llvm/test/CodeGen/AArch64/vecreduce-umax-legalization.ll
145–152 ↗(On Diff #299730)

It is, I needed to change this test after I moved the transformation of vecreduce_umax to getNode()

paulwalker-arm added inline comments.Oct 28 2020, 5:28 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16312

Shouldn't this be ||?

16329

This will cause a compiler warning if anybody adds a case statement after ISD::VECREDUCE_XOR. Personally I'd just wrap all the case blocks in {} and move AArch64CC::CondCode Cond into the blocks that need it.

16333–16335

The AND is not required because ISD::VECREDUCE_XOR only says the bottom N bits are defined, where N is the size of the operand's vector element.

The call to getAnyExtOrTrunc should be using ReduceOp.getValueType() and not hardwiring MVT::i32.

Given this change ReduceOp.getValueType() will be used three times so I'd just have EVT VT = ReduceOp.getValueType(); at the top of the function, which might have the affect that AArch64CC::####_ACTIVE can be used in place and still fit on the line and thus remove the need for {} for those entries.

kmclaughlin marked 3 inline comments as done.

Changes to LowerPredReductionToSVE:

  • Fixed if statement which should be using ||
  • Removed unnecessary And when lowering VECREDUCE_XOR
  • Wrapped case blocks in {} where necessary
paulwalker-arm accepted this revision.Oct 29 2020, 7:26 AM
This revision is now accepted and ready to land.Oct 29 2020, 7:26 AM

Thanks for reviewing this, @paulwalker-arm!