This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Add expansion and promotion of [US]MUL_LOHI
ClosedPublic

Authored by nhaehnle on Sep 27 2016, 3:36 AM.

Details

Summary

Most targets set the action for these nodes to Expand even though there
isn't actually any code for them in ExpandNode. Instead, targets simply
relied on the fact that no code generates these nodes as long as the
nodes aren't legal or custom.

However, generating these nodes can be useful e.g. for divide-by-constant
in wider integer types. Targets will want to deal with [US]MUL_LOHI
differently depending on the available instructions, e.g. targets with a
native 64-bit multiply will want to Promote the 32-bit [US]MUL_LOHI
instructions, and targets with native MULH[US] instructions will want to
Split.

This patch intends to not change the generated code, but indirect effects
are possible since expansions/promotions that were previously done in
DAGCombine may now be done in LegalizeDAG.

See D24822 for a change that actually uses the new expansion.

Diff Detail

Repository
rL LLVM

Event Timeline

nhaehnle updated this revision to Diff 72620.Sep 27 2016, 3:36 AM
nhaehnle retitled this revision from to [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI.
nhaehnle updated this object.
nhaehnle added a subscriber: llvm-commits.
efriedma added inline comments.Oct 4 2016, 9:51 AM
include/llvm/Target/TargetLowering.h
147 ↗(On Diff #72620)

It doesn't look like None is actually used anywhere... did you mean to use it somehow?

lib/CodeGen/SelectionDAG/TargetLowering.cpp
3156 ↗(On Diff #72620)

Why are you using ADDC+ADDE rather than just plain ADD? It seems strange to split the operation, given that it might be legal. (I might be missing something here, though.)

lib/Target/Sparc/SparcISelLowering.cpp
1696 ↗(On Diff #72620)

This FIXME doesn't make sense here; Promote is just the right thing to do here.

lib/Target/X86/X86ISelLowering.cpp
32532 ↗(On Diff #72620)

I'm confused; you're returning "Scalarize" for scalar types?

RKSimon added a subscriber: RKSimon.Oct 5 2016, 7:04 AM
nhaehnle updated this revision to Diff 73677.Oct 5 2016, 11:34 AM
nhaehnle marked 4 inline comments as done.

Thank you for taking a look. This patch should address all your comments,
plus I'm adding a guard against incorrectly using the known bits
optimizations with vector types.

nhaehnle added inline comments.Oct 5 2016, 11:35 AM
include/llvm/Target/TargetLowering.h
147 ↗(On Diff #72620)

I used it at some point to preserve the original behaviour. I'll remove it.

lib/CodeGen/SelectionDAG/TargetLowering.cpp
3156 ↗(On Diff #72620)

That's just how the code happened to evolve because AMDGPU (which is where this all started) doesn't have 64-bit additions, but it's a good point. I'm changing it to use additions in the larger type.

lib/Target/Sparc/SparcISelLowering.cpp
1696 ↗(On Diff #72620)

FIXME removed.

lib/Target/X86/X86ISelLowering.cpp
32532 ↗(On Diff #72620)

On X86 this only gets called for vector types. I'm adding an assertion to make that explicit.

I don't really understand why this needs to be a new target hook. Can't it have similar code as the MUL expansion code at LegalizeDAG.cpp:3316 (in the base version): if MULH and MUL are LegalOrCustom, then you'd split the MUL_LOHI into those two operations. Otherwise, you'd do half-width multiplies?

Also, I've fixed the sparc backend to properly implement [US]MUL_LOHI in r283381, removing the FIXME here.

I don't really understand why this needs to be a new target hook. Can't it have similar code as the MUL expansion code at LegalizeDAG.cpp:3316 (in the base version): if MULH and MUL are LegalOrCustom, then you'd split the MUL_LOHI into those two operations. Otherwise, you'd do half-width multiplies?

IMHO, explicit is better than implicit. Also, how would you choose between scalarizing and half-width multiplies for vector types? You might be able to come up with a magic heuristic that works now, only to be broken when a new backend comes along.

Also, I've fixed the sparc backend to properly implement [US]MUL_LOHI in r283381, removing the FIXME here.

Thanks!

IMHO, explicit is better than implicit. Also, how would you choose between scalarizing and half-width multiplies for vector types? You might be able to come up with a magic heuristic that works now, only to be broken when a new backend comes along.

I don't think it makes much sense to require backends to decide between Split and Half; whenever you can split into MULH and MUL, you should, and do 4 half-width multiplies otherwise.

I don't know much about the vector question -- how is that done for other operations? Surely the question about which axis to "expand" when expanding a vector operation isn't only an question for multiplication, so a MUL-specific solution to this issue feels strange -- I'd expect that either there's already a solution in use elsewhere, or a wider problem.

RKSimon added inline comments.Oct 12 2016, 8:59 AM
lib/CodeGen/SelectionDAG/TargetLowering.cpp
3035 ↗(On Diff #73677)

If this is going to start being used by vectors, shouldn't these getSizeInBits() calls be replaced with getScalarSizeInBits()?

efriedma edited edge metadata.Oct 12 2016, 10:14 AM

Legalization for certain vector operations uses "PROMOTE" to indicate that the operation should be performed in a different vector type. That really stretches the meaning if we're using it to mean "split into multiple vector multiplies", but I guess it could work. See VectorLegalizer::Promote in LegalizeVectorOps.cpp.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3331 ↗(On Diff #73677)

Sorry, I really haven't worked with this code in a while; I should have spotted the issue here earlier. :(

We can't ever scalarize an operation in LegalizeDAG: scalarizing can produce illegal types, and LegalizeDAG can't handle them. We have to handle this a bit earlier, in LegalizeVectorOps.

nhaehnle updated this revision to Diff 75296.Oct 20 2016, 7:09 AM
nhaehnle marked 2 inline comments as done.
nhaehnle edited edge metadata.

Thank you all for looking into this and sorry for the delay. I think I've
addressed all your comments:

  • getMulExpansion is gone
  • vector scalarization is done in LegalizeVectorOps
  • use getScalarSizeInBits() throughout to prepare for vector use

There's also the question of "promoting" / using half-width multiplies for
vectors. That could become useful to have more aggressive replacement of
divide-by-constant with the equivalent multiply+shift sequence also for
vector types, i.e. pushing D24822 even further.

However, there are more problems there (X86 gets unnecessarily worse looking
instruction sequences in some tests), so that should be a separate change.

nhaehnle updated this revision to Diff 75297.Oct 20 2016, 7:42 AM

Simplify the Expand part in LegalizeDAG, since vectors are now expanded
earlier.

Ping.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3331 ↗(On Diff #73677)

Okay, fixing this.

lib/CodeGen/SelectionDAG/TargetLowering.cpp
3035 ↗(On Diff #73677)

Fixed, thanks.

Sorry about the delay...

Have you seen https://reviews.llvm.org/D26628 ? Does that interact with the patch in any way?

TargetLowering::expandMUL_LOHI is kind of difficult to review with all the different unrelated changes involved; would it be possible to separate out the non-functional changes?

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3340 ↗(On Diff #75297)

Maybe add an assertion here that HalfType is a legal type? Some architectures could trip over this...

lib/CodeGen/SelectionDAG/TargetLowering.cpp
3257 ↗(On Diff #75297)

Should there be an assertion here? If expandMUL_LOHI isn't returning exactly two results, something went wrong.

nhaehnle updated this revision to Diff 79142.Nov 23 2016, 1:11 PM

Thank you for taking a look.

D26628 does not directly interact, since it affected a custom sequence used
by DAGTypeLegalizer if the call to TargetLowering has failed. I suppose now
that TargetLowering no longer insists on only using legal instructions one
could think about using that mode instead of the custom sequence in
DAGTypeLegalizer, but that sequence is also qualitatively different (using a
regular NxN-bit multiply on values where everything but the low N/2 bits
have been masked out).

I've addressed your minor comments and split this up into three patches,
the first two being D27063 and D27064.

I'm going to add them as dependencies as well. Unfortunately, handling
patch series with Phabricator seems really awkward :(

nhaehnle added inline comments.Nov 23 2016, 1:15 PM
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3340 ↗(On Diff #75297)

I missed this one, sorry.

nhaehnle updated this revision to Diff 79143.Nov 23 2016, 1:26 PM

Add assertion that HalfType is legal.

efriedma added inline comments.Dec 2 2016, 10:58 AM
include/llvm/Target/TargetLowering.h
3053 ↗(On Diff #79143)

Please make OnlyLegalOrCustom an enum; otherwise it's completely impossible to tell what the parameter does without looking at the header.

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2210 ↗(On Diff #79143)

Can you replace this code with a call to expandMUL?

nhaehnle updated this revision to Diff 80587.Dec 7 2016, 6:50 AM
nhaehnle marked an inline comment as done.
  • Use an enum instead of OnlyLegalOrCustom
  • Handle a corner case where shift amounts are too large for the native shift type
nhaehnle added inline comments.Dec 7 2016, 6:52 AM
lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2210 ↗(On Diff #79143)

I played around with that, but I'd rather not do it in this patch. The generated code sequences are different, and in particular we'd generate non-legal instructions that at least the X86 target doesn't expect and doesn't handle properly. This is visible in test/CodeGen/X86/imul-256/512/1024.ll.

efriedma accepted this revision.Dec 7 2016, 10:51 AM
efriedma edited edge metadata.

LGTM with minor comments addressed.

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2210 ↗(On Diff #79143)

Okay, that's fine.

lib/CodeGen/SelectionDAG/TargetLowering.cpp
3155 ↗(On Diff #80587)

MaskedValueIsZero should do the right thing for vectors, I think, with recent changes to computeKnownBits.

3181 ↗(On Diff #80587)

Hmm...

I was going to say that ShiftAmount always fits into ShiftAmountTy because it's a defined shift (ShiftAmount < OuterBitSize < maximum unsigned value), but getShiftAmountTy doesn't always return a sensible result for illegal types. Please just note this problem with a FIXME.

This revision is now accepted and ready to land.Dec 7 2016, 10:51 AM
This revision was automatically updated to reflect the committed changes.
chapuni added a subscriber: chapuni.Dec 8 2016, 7:13 AM

I pruned three of \param(s) in r289057. Could you recheck them?

llvm/trunk/include/llvm/Target/TargetLowering.h
3051

HalfVT and OnlyLegalOrCustom cannot be seen here.

3067

ditto.