This is an archive of the discontinued LLVM Phabricator instance.

[PPC] exploit rotate-left-then-mask-insert instructions for bitfield insert
AbandonedPublic

Authored by inouehrs on May 31 2017, 2:07 AM.

Details

Summary

We can use rotate-left-then-mask-insert instructions (rlwimi and rldimi) for efficient implementation of bitfield insert (and similar code sequences generated by SROA etc). However, the current LLVM generates inefficient code sequence for this purpose. For example of bitfieldinsert64 in the added unit test, it generates four instructions instead of just one rldimi instruction.
We already have a method to generate rotate-left-then-mask-insert for 32-bit integer (tryBitfieldInsert) in PPCDAGToDAGISel, but it is not executed for most of the simple bitfield insert since tryBitPermutation is executed before tryBitfieldInsert and it generates suboptimal code.

This patch makes tryBitfieldInsert executed before tryBitPermutation with the limited targets of the simplest cases. This patch also adds a 64-bit version of tryBitfieldInsert to generated rldimi instruction.

Diff Detail

Event Timeline

inouehrs created this revision.May 31 2017, 2:07 AM
kbarton requested changes to this revision.Jun 5 2017, 11:12 AM

If I understand this correctly, you're trying to use the tryBitfieldInsert before the tryBitPermutation function because you can get cleaner code out of some cases of tryBitfieldInsert. However, you only want to use the tryBitfieldInsert on some cases, not on all. Is this correct?
If so, I think it makes sense to refactor the tryBitfieldInsert to focus on the specific cases you want to try and identify and then call the existing tryBitPermutation and the remaining tryBitfieldInsert afterwards. There may be some code duplication as a result (hopefully that can be minimal) but the logic will be easier to understand.

lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.h
68

I don't understand why this has been transformed into a templated function. Is this necessary for this patch, or just some kind of cleanup?

80

I don't think I've ever seen a ternary used in an if statement like this.
If we don't have precedent for this, could you please put spaces around the ? and :
I find this difficult to read as is.

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
614–621

Is it possible to refactor this to separate the SimpleCase from the non-simple case without requiring too much code duplication?
This will get rid of the boolean parameter, which makes it harder to follow.

This revision now requires changes to proceed.Jun 5 2017, 11:12 AM
nemanjai added inline comments.Jun 5 2017, 3:06 PM
lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
1194

Do we not want to use uint32_t instead of unsigned to emphasize that this is a 32-bit value?

lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.h
80

Nit: spaces between binary operators and operands.

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
751

Overwrapping? It seems you're checking to ensure they don't overlap (i.e. no bits in common).

755

Hard to count all the F's, please use a simpler expression. Maybe ~0ULL or even ~(TargetMask | InsertMask) == 0ULL) if it's all of them (or something along those lines.

Also, a comment about why this condition is being checked.

3954

Between this and the early exits in the functions that find Rotate-and-insert opportunities, it seems that the simple case is (or (or %a, %b) (or %c, %d)) where the two operands of the outer or have known-zero bits in complementary locations.
If that's the case, please add a comment for this and an explanation why this case is special. Furthermore (as Kit has already alluded to) it is likely that this simple case has simple handling and should get a corresponding simple function rather than passing a bool parameter to these functions to tell them where they're called from.

inouehrs updated this revision to Diff 101727.Jun 7 2017, 6:45 AM
inouehrs edited edge metadata.

addressed comments from Kit and Nemanja

inouehrs marked 7 inline comments as done.Jun 7 2017, 6:56 AM
inouehrs added inline comments.
lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
1194

I changed the type of BM instead of specifying type in isRunOfOnes since BM is used only in this function. I feel this is cleaner than explicitly saying uint32_t for isRunOfOnes.

lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.h
68

I need 64-bit version of isRunOfOnes. Since 32-bit and 64-bit versions are almost same, I use template to avoid writing almost same function twice.

80

I hope this version is easier to read.

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
614–621

I added a new tryBitfieldInsert for checking simple bitfield insert cases.
The original tryBitfieldInsert actually covers wider range than a simple bitfield insert; so I renamed it to tryRotateThenMaskInsert

751

Fixed typo. Thanks!

755

Done.

3954

As Kit and you suggested, I refactored the code to avoid a boolean parameter.

inouehrs updated this revision to Diff 102658.Jun 15 2017, 5:35 AM
inouehrs marked 6 inline comments as done.
  • rebased to the latest source tree.
  • minor touch up (initialize local variables, removing trailing spaces)
inouehrs updated this revision to Diff 105587.Jul 6 2017, 10:38 PM
  • rebased to the latest source tree.
  • minor touch up in comments
nemanjai requested changes to this revision.Nov 29 2017, 12:04 AM

The semantics of the patch seem perfectly fine. Please refactor the functions with duplicated code.

lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.h
68

I find it rather surprising that an adequate function doesn't exist in either APInt or somewhere in MathExtras.h. I wonder if adding it there might be a better place. Perhaps check with frequent contributors to those files?

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
730

It really seems like this is a bunch of unnecessary code duplication. The only difference in the first 70 lines of this function vs. tryRotateThenMaskInsert32() is replacing 32 with 64. Since you have the bit width available from the parameter, I think you should unify these two functions into one (and if it makes more sense, split out the last few lines into a pair of small functions/lambda).

This revision now requires changes to proceed.Nov 29 2017, 12:04 AM
inouehrs updated this revision to Diff 124722.Nov 29 2017, 4:44 AM
inouehrs edited edge metadata.
  • refactored using template to avoid code duplication.
  • rebased to the latest code
inouehrs marked an inline comment as done.Nov 29 2017, 4:50 AM
inouehrs added inline comments.
lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.h
68

isRunOfOnes is a wrapper around isShiftedMask defined in MathExtras.h and just returns the results of countLeadingZeros.
I found similar function in ARM, AArch64, NVPTX and AMDGPU, but they all use countTrailingZeros after isShiftedMask. It seems that what we need around isShiftedMask depends on ISA.

jtony added inline comments.Dec 4 2017, 1:27 PM
lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.h
90–92

Missing space before :. Actually, we may want to put this short ternary expression in one line.

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
4233–4235

Like Nemanja have mentioned above in his comment at line 1190 (lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp). It may be more clear if we use uint32_t instead of unsigned here since we are using uint64_t for the 64 bit. Or we could also use unsigned long long instead of uint64_t to match the 32 bit type unsigned. Personally, I would like we use this type pairs consistently (either uint64_t, uint32_t or unsigned, unsigned long long is fine), but it is up to you.

inouehrs updated this revision to Diff 125472.Dec 4 2017, 10:34 PM
inouehrs marked 2 inline comments as done.Dec 4 2017, 10:35 PM
inouehrs updated this revision to Diff 133207.Feb 7 2018, 6:01 AM
  • rebased to the latest code
inouehrs updated this revision to Diff 139773.Mar 26 2018, 4:57 AM

− rebased onto the latest code base

  • minor touch up
inouehrs updated this revision to Diff 142113.Apr 11 2018, 11:11 PM
  • fix code format (e.g. too long lines)
  • rebased to the latest
inouehrs abandoned this revision.Dec 29 2018, 8:17 PM

BitPermutationSelector has been enhanced to generate better code for bitfield insert. So we do not need to specially handle bitfield insert anymore.
https://reviews.llvm.org/D49076