Enable the selection of the 64-bit signed multiply accumulate instructions which operate on 16-bit operands. These are enabled for ARMv5TE onwards for ARM and for V6T2 and other DSP enabled Thumb architectures. Added a new file to help with pattern matching across ISelLowering and ISelDAG.
Details
Diff Detail
Event Timeline
lib/Target/ARM/ARMISelDAGToDAG.cpp | ||
---|---|---|
3132–3146 | The only functional change here appears to be limiting the first block to Thumb2, but the second block is no more correct on Thumb1 machines so I don't think it actually improves anything. | |
3150 | You shouldn't need C++ for this. Patterns embedded in Instructions can't produce more than one value, but by a quirk of notation instantiations of Pat can. So you should be able to write something like def : Pat<(smlalbb GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi), (SMLALBB $Rn, $Rm, $RLo, $RHi)>; (after mapping ARMISD::SMLALBB to smlalbb of course). | |
lib/Target/ARM/ARMISelLowering.cpp | ||
9472 | What if it's not a constant? | |
9476 | What if it's not the same mul as before? | |
9481 | Don't you also need to know that the base type was i16? I don't see any checks here. | |
lib/Target/ARM/ARMPatternHelpers.h | ||
2 | Names beginning with a underscore and an upper-case letter are reserved in all contexts so you should drop the leading underscore. You'll also need to add a copyright header. Though this whole file becomes irrelevant if you implement the actual selection in .td files and these functions can move to ARMISelLowering.h. |
Please put an explanation of the .td file changes into the commit message. (I think I follow why it's related, but it wasn't immediately obvious.)
Would it make sense to use ComputeNumSignBits to figure out whether a value is sign-extended? That would handle, for example, cases where the 16-bit inputs are loads, or one of the 16-bit inputs is a constant.
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
9448 | The .td file says v5TE is required; this says DSP is required. Which is correct? | |
lib/Target/ARM/ARMPatternHelpers.cpp | ||
8 | I know you're just moving the code, but it doesn't make sense to check for SIGN_EXTEND here. ARM doesn't have 16-bit integer registers. | |
test/CodeGen/ARM/longMAC.ll | ||
243 | The "CHECK-V6-THUMB2-NOT" aren't that useful; can you instead use CHECK-NEXT to check that there aren't any extra instructions in the function? You might as well just use explicit register names here; given the calling convention is known, there's only one possible register assignment. |
This operations only act upon registers, but I will definitely look into this and the load issue, thanks.
lib/Target/ARM/ARMISelDAGToDAG.cpp | ||
---|---|---|
3150 | Ah excellent, I will clean all this up. | |
lib/Target/ARM/ARMISelLowering.cpp | ||
9448 | Yes good point, hasDSP is only the Thumb check. | |
9476 | Sorry I don't get your point, what I am missing? | |
lib/Target/ARM/ARMPatternHelpers.h | ||
2 | Ok. I'd still want to keep the helpers because they're used for pattern matching of different instructions in DAGToDAG as well. | |
test/CodeGen/ARM/longMAC.ll | ||
243 | I will change to explicit registers names, but I would like to keep the extra NOT checks because it caught me when I had originally lowered the operands incorrectly, which left the sxth and shifts in. |
test/CodeGen/ARM/longMAC.ll | ||
---|---|---|
243 | Sorry, I now understand your comment! I will make the changes |
- Enabled for V5TE and added it as a test target.
- Simplifed the tests, because there were three targets testing for the same output.
- Numerous fixes that Tim pointed out.
- Replaced the final isel with tablegen matching.
- Tried using ComputeNumSignBits for identifying sign extending, but this didn't work because for the top values we need to perform SRA, but this then appears like a sext and results in the wrong opcodes being selected.
- Added support for sign extending loads as well as a test for this
Tried using ComputeNumSignBits for identifying sign extending, but this didn't work because for the top values we need to perform SRA, but this then appears like a sext and results in the wrong opcodes being selected.
That should just be a matter of performing the checks in the right order, I think; you want to generate some sort of SMLAL if the inputs are sign-extended, but you want to special-case operands which are exactly "sra x, 16" or "SIGN_EXTEND_INREG(x)".
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
9476 | Just knowing that the input to the SRA is some multiply operation isn't sufficient. You need to make sure it's the same one that produced the low bits. |
Now use ComputeNumSignBits for bottom 16-bit checking, after checking for the special case of sra, and renamed the function appropriately.
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
9476 | Is this not what I have done? I'm checking that the MUL operand of the ADDC is the same operand to the SRA. |
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
9476 | Bother, sorry I completely misread it. | |
lib/Target/ARM/ARMInstrInfo.td | ||
4202 | I think these should be ARMPat or they'll still be valid in Thumb mode. You'll also need Thumb patterns to select that variant. | |
lib/Target/ARM/ARMPatternHelpers.cpp | ||
45–60 | This combination looks extremely dodgy to me. ComputeNumSignBits makes no promises about how it's going to find those bits, and in fact traverses nodes down to a depth of 6. You've already found two cases where the final SDNode's operand(0) isn't actually the unextended node, I see no reason to think that's all there is. |
lib/Target/ARM/ARMPatternHelpers.cpp | ||
---|---|---|
45–60 | The function you want already exists; it's called SimplifyDemandedBits. |
lib/Target/ARM/ARMPatternHelpers.cpp | ||
---|---|---|
45–60 | Great, I'll look into it. Thanks |
A large change because I wanted to remove the new pattern helper file, so I've moved the smulwb, smulwt isel code from iseldag to lowering. The accumulator versions of those instructions are now handled in tablegen.
SimplifyDemandedBits is now used to get a signed 16-bit for the various instructions. I manually check LOAD and AssertSext nodes as these weren't handled by SimplifyDemandedBits.
(See also https://reviews.llvm.org/D30401 ; not sure if it conflicts.)
Could you split the change to match ARMISD::SMULWB earlier into a separate patch? (You can make this patch depend on that, if you want; it just makes it easier to review, or track down regressions, if changes are smaller.)
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
1503 | This is a weird way to use SimplifyDemandedBits; normally you would first generate the ARMISD::SMLALBB node, then separately run SimplifyDemandedBits on all ARMISD::SMLALBB nodes. It composes better with other optimizations (which might expose additional simplifications), and it's more obviously correct (it isn't clear whether N has other users). |
Yeah sure, i'll split it up and I'll have a look at more examples of SimplifyDemandedBits. cheers
Split out the SMULWTB stuff and committed it. This has now been rebased and we also now use the helper functions around SimplifyDemandeBits.
Just committed https://reviews.llvm.org/rL298752 . Please test more carefully in the future; reviewers don't always spot every issue.
The only functional change here appears to be limiting the first block to Thumb2, but the second block is no more correct on Thumb1 machines so I don't think it actually improves anything.