This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine][ARM] Custom lower smaller-than-legal MULH/AVG/ABD
Needs ReviewPublic

Authored by dmgreen on Feb 6 2022, 2:06 AM.

Details

Summary

MVE only has 128bit legal vectors, no 64bit vectors. There are a number of combines for nodes (MULH/AVG/ABD) that are beneficial for these smaller-than-legal vectors, and often created by the vectorizers, but are not currently transformed. There is no way to tell the target independent dag-combiner that it should, allowing the ARM backend to legalize them.

This changes the legality check in those nodes from TLI.isOperationLegalOrCustom(Opc, VT) (which inherently checks isTypeLegal(VT)) to TLI.isOperationLegal(Opc, VT) || TLI.isOperationCustom(Opc, VT), which allows the backend to mark the nodes as Custom for illegal types, legalising the nodes as it requires. The actual legalisation on MVE uses any_extends and vector casts to perform the operation on a legal vector, truncating the result back to the original type.

There may be other ways to do this - let me know if there is a better way. We can do the same for AArch64 for small types but I've not done that in this same patch.

Diff Detail

Event Timeline

dmgreen created this revision.Feb 6 2022, 2:06 AM
dmgreen requested review of this revision.Feb 6 2022, 2:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2022, 2:06 AM

Why not handle this generically - accept AVG patterns of any type pre-legalization and expand back during legalization if necessary?

Why not handle this generically - accept AVG patterns of any type pre-legalization and expand back during legalization if necessary?

Hmm. Won't that be quite inefficient? Creating nodes for architectures that don't even expect to use them? A mulh for example - I wouldn't expect the generic DAG combiner to create i8 mulh on Arm targets - it's not something it will ever need, and I feel it would just get in the way of the optimisations it is trying to make.

Fair enough - I was mainly thinking in terms of the AVG opcodes, but I can understand that it might cause other issues.

efriedma added inline comments.Feb 7 2022, 10:26 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9454–9457

I think the predicate you actually need here is something like !TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT) && (LegalTypes || !TLI.isOperationCustom(MulhOpcode, NarrowVT)). i.e. before type legalization, allow custom lowering. After type legalization, only allow custom lowering if the type is legal.

Constructing an operator with an illegal type after type legalization is likely to crash the compiler.

dmgreen updated this revision to Diff 407913.Feb 11 2022, 9:26 AM

Fair enough - I was mainly thinking in terms of the AVG opcodes, but I can understand that it might cause other issues.

I often find myself fighting against DAG combines more than I would like. I am a fan of more control from the target.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9454–9457

Thanks for the suggestion.

I'd like to avoid target-specific custom lowering here, i.e. copy-pasting LowerBinopWithBitcast into every target. A few possible alternatives.

  1. Make the combine detect whatever pattern shows up after type legalization, and use the legal nodes. Not sure how hard this is; I guess type legalization splits the nodes in ways that make it hard to detect the relevant pattern?
  2. Shove something equivalent to LowerBinopWithBitcast directly into the relevant DAGCombines. I guess the difficulty here is mostly that there isn't any target-independent equivalent to VECTOR_REG_CAST?
  3. Teach type legalization to legalize these nodes, then make DAGCombine produce illegal nodes if it predicts the legalized result is cheap. (Not sure this approach is best, though; "predicting" is sort of hard, and we don't really want to produce these nodes if they aren't going to lower to a single instruction.)
dmgreen updated this revision to Diff 483097.Dec 15 2022, 2:41 AM

Rebase the current patch. I would like to try and get something like this in if we can.

Unfortunately you seem to have given some pretty good reasons why the suggestions you made will not easily work. The lowering is target specific to uses the NVCast. I was attempting to make use of larger (potentially legal) hadds in combineShiftToAVG, but x86 wants to widen, not promote, and really wants to use the existing custom lowering it has. It would be better for Arm to use extend(hadd) than hadd(extend, extend), but as far as I understand that needs NVCast to be correct for BE.

Any suggestions on how to proceed?

Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2022, 2:41 AM
Herald added a subscriber: StephenFan. · View Herald Transcript