This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Enable SMLAL[B|T] instruction selection
ClosedPublic

Authored by samparker on Feb 16 2017, 6:22 AM.

Details

Summary

Enable the selection of the 64-bit signed multiply accumulate instructions which operate on 16-bit operands. These are enabled for ARMv5TE onwards for ARM and for V6T2 and other DSP enabled Thumb architectures. Added a new file to help with pattern matching across ISelLowering and ISelDAG.

Diff Detail

Repository
rL LLVM

Event Timeline

samparker created this revision.Feb 16 2017, 6:22 AM
t.p.northover added inline comments.
lib/Target/ARM/ARMISelDAGToDAG.cpp
3132–3146 ↗(On Diff #88727)

The only functional change here appears to be limiting the first block to Thumb2, but the second block is no more correct on Thumb1 machines so I don't think it actually improves anything.

3150 ↗(On Diff #88727)

You shouldn't need C++ for this. Patterns embedded in Instructions can't produce more than one value, but by a quirk of notation instantiations of Pat can. So you should be able to write something like

def : Pat<(smlalbb GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi),
   (SMLALBB $Rn, $Rm, $RLo, $RHi)>;

(after mapping ARMISD::SMLALBB to smlalbb of course).

lib/Target/ARM/ARMISelLowering.cpp
9472 ↗(On Diff #88727)

What if it's not a constant?

9476 ↗(On Diff #88727)

What if it's not the same mul as before?

9481 ↗(On Diff #88727)

Don't you also need to know that the base type was i16? I don't see any checks here.

lib/Target/ARM/ARMPatternHelpers.h
1 ↗(On Diff #88727)

Names beginning with a underscore and an upper-case letter are reserved in all contexts so you should drop the leading underscore.

You'll also need to add a copyright header. Though this whole file becomes irrelevant if you implement the actual selection in .td files and these functions can move to ARMISelLowering.h.

efriedma requested changes to this revision.Feb 16 2017, 11:59 AM
efriedma added a subscriber: efriedma.

Please put an explanation of the .td file changes into the commit message. (I think I follow why it's related, but it wasn't immediately obvious.)

Would it make sense to use ComputeNumSignBits to figure out whether a value is sign-extended? That would handle, for example, cases where the 16-bit inputs are loads, or one of the 16-bit inputs is a constant.

lib/Target/ARM/ARMISelLowering.cpp
9448 ↗(On Diff #88727)

The .td file says v5TE is required; this says DSP is required. Which is correct?

lib/Target/ARM/ARMPatternHelpers.cpp
7 ↗(On Diff #88727)

I know you're just moving the code, but it doesn't make sense to check for SIGN_EXTEND here. ARM doesn't have 16-bit integer registers.

test/CodeGen/ARM/longMAC.ll
243 ↗(On Diff #88727)

The "CHECK-V6-THUMB2-NOT" aren't that useful; can you instead use CHECK-NEXT to check that there aren't any extra instructions in the function?

You might as well just use explicit register names here; given the calling convention is known, there's only one possible register assignment.

This revision now requires changes to proceed.Feb 16 2017, 11:59 AM

Would it make sense to use ComputeNumSignBits to figure out whether a value is sign-extended? That would handle, for example, cases where the 16-bit inputs are loads, or one of the 16-bit inputs is a constant.

This operations only act upon registers, but I will definitely look into this and the load issue, thanks.

lib/Target/ARM/ARMISelDAGToDAG.cpp
3150 ↗(On Diff #88727)

Ah excellent, I will clean all this up.

lib/Target/ARM/ARMISelLowering.cpp
9448 ↗(On Diff #88727)

Yes good point, hasDSP is only the Thumb check.

9476 ↗(On Diff #88727)

Sorry I don't get your point, what I am missing?

lib/Target/ARM/ARMPatternHelpers.h
1 ↗(On Diff #88727)

Ok. I'd still want to keep the helpers because they're used for pattern matching of different instructions in DAGToDAG as well.

test/CodeGen/ARM/longMAC.ll
243 ↗(On Diff #88727)

I will change to explicit registers names, but I would like to keep the extra NOT checks because it caught me when I had originally lowered the operands incorrectly, which left the sxth and shifts in.

samparker added inline comments.Feb 17 2017, 5:19 AM
test/CodeGen/ARM/longMAC.ll
243 ↗(On Diff #88727)

Sorry, I now understand your comment! I will make the changes

samparker updated this revision to Diff 88881.Feb 17 2017, 6:59 AM
samparker edited edge metadata.
  • Enabled for V5TE and added it as a test target.
  • Simplifed the tests, because there were three targets testing for the same output.
  • Numerous fixes that Tim pointed out.
  • Replaced the final isel with tablegen matching.
  • Tried using ComputeNumSignBits for identifying sign extending, but this didn't work because for the top values we need to perform SRA, but this then appears like a sext and results in the wrong opcodes being selected.
  • Added support for sign extending loads as well as a test for this

Tried using ComputeNumSignBits for identifying sign extending, but this didn't work because for the top values we need to perform SRA, but this then appears like a sext and results in the wrong opcodes being selected.

That should just be a matter of performing the checks in the right order, I think; you want to generate some sort of SMLAL if the inputs are sign-extended, but you want to special-case operands which are exactly "sra x, 16" or "SIGN_EXTEND_INREG(x)".

t.p.northover added inline comments.Feb 17 2017, 1:55 PM
lib/Target/ARM/ARMISelLowering.cpp
9476 ↗(On Diff #88727)

Just knowing that the input to the SRA is some multiply operation isn't sufficient. You need to make sure it's the same one that produced the low bits.

samparker updated this revision to Diff 89096.Feb 20 2017, 2:51 AM

Now use ComputeNumSignBits for bottom 16-bit checking, after checking for the special case of sra, and renamed the function appropriately.

samparker added inline comments.Feb 20 2017, 2:55 AM
lib/Target/ARM/ARMISelLowering.cpp
9476 ↗(On Diff #88727)

Is this not what I have done? I'm checking that the MUL operand of the ADDC is the same operand to the SRA.

t.p.northover added inline comments.Feb 21 2017, 9:57 AM
lib/Target/ARM/ARMISelLowering.cpp
9476 ↗(On Diff #88727)

Bother, sorry I completely misread it.

lib/Target/ARM/ARMInstrInfo.td
4202 ↗(On Diff #89096)

I think these should be ARMPat or they'll still be valid in Thumb mode. You'll also need Thumb patterns to select that variant.

lib/Target/ARM/ARMPatternHelpers.cpp
45–60 ↗(On Diff #89096)

This combination looks extremely dodgy to me. ComputeNumSignBits makes no promises about how it's going to find those bits, and in fact traverses nodes down to a depth of 6.

You've already found two cases where the final SDNode's operand(0) isn't actually the unextended node, I see no reason to think that's all there is.

samparker added inline comments.Feb 22 2017, 7:31 AM
lib/Target/ARM/ARMInstrInfo.td
4202 ↗(On Diff #89096)

cheers!

lib/Target/ARM/ARMPatternHelpers.cpp
45–60 ↗(On Diff #89096)

Ah yes... Do you think it's worth adding a function to SelectionDAG that returns the SDValue that ComputeNumSignBits finds and uses for its calculation?

efriedma added inline comments.Feb 22 2017, 8:43 AM
lib/Target/ARM/ARMPatternHelpers.cpp
45–60 ↗(On Diff #89096)

The function you want already exists; it's called SimplifyDemandedBits.

samparker added inline comments.Feb 22 2017, 8:53 AM
lib/Target/ARM/ARMPatternHelpers.cpp
45–60 ↗(On Diff #89096)

Great, I'll look into it. Thanks

samparker updated this revision to Diff 90180.Mar 1 2017, 8:13 AM

A large change because I wanted to remove the new pattern helper file, so I've moved the smulwb, smulwt isel code from iseldag to lowering. The accumulator versions of those instructions are now handled in tablegen.

SimplifyDemandedBits is now used to get a signed 16-bit for the various instructions. I manually check LOAD and AssertSext nodes as these weren't handled by SimplifyDemandedBits.

(See also https://reviews.llvm.org/D30401 ; not sure if it conflicts.)

Could you split the change to match ARMISD::SMULWB earlier into a separate patch? (You can make this patch depend on that, if you want; it just makes it easier to review, or track down regressions, if changes are smaller.)

lib/Target/ARM/ARMISelLowering.cpp
1503 ↗(On Diff #90180)

This is a weird way to use SimplifyDemandedBits; normally you would first generate the ARMISD::SMLALBB node, then separately run SimplifyDemandedBits on all ARMISD::SMLALBB nodes. It composes better with other optimizations (which might expose additional simplifications), and it's more obviously correct (it isn't clear whether N has other users).

Yeah sure, i'll split it up and I'll have a look at more examples of SimplifyDemandedBits. cheers

samparker updated this revision to Diff 91698.Mar 14 2017, 4:07 AM

Split out the SMULWTB stuff and committed it. This has now been rebased and we also now use the helper functions around SimplifyDemandeBits.

efriedma accepted this revision.Mar 14 2017, 2:06 PM

LGTM.

This revision is now accepted and ready to land.Mar 14 2017, 2:06 PM
Closed by commit rL297809: [ARM] Enable SMLAL[B|T] isel (authored by sam_parker). · Explain WhyMar 15 2017, 1:39 AM
This revision was automatically updated to reflect the committed changes.

Just committed https://reviews.llvm.org/rL298752 . Please test more carefully in the future; reviewers don't always spot every issue.