Page MenuHomePhabricator

[PowerPC] Custom lowering for funnel shifts

Authored by foad on Thu, Jul 16, 6:36 AM.


Group Reviewers
Restricted Project
rG28e322ea9393: [PowerPC] Custom lowering for funnel shifts

The custom lowering saves an instruction over the generic expansion, by
taking advantage of the fact that PowerPC shift instructions are well
defined in the shift-by-bitwidth case.

Diff Detail

Event Timeline

foad created this revision.Thu, Jul 16, 6:36 AM
foad marked 3 inline comments as done.Thu, Jul 16, 6:39 AM
foad added inline comments.

This is just guesswork. I'm really not sure which types we should do this for, under what conditions.


Can I pre-commit this new test case, and fshr_i64?


This regressions seems to be caused by not constant folding based on the fact that r3 is known to be 4. Can anyone suggest how to fix it? Do I have to spot known constant shift amounts in PPCTargetLowering::LowerFunnelShift?

lkail added a reviewer: Restricted Project.Thu, Jul 16, 7:12 AM
spatel added a subscriber: spatel.Thu, Jul 16, 7:16 AM
spatel added inline comments.

Yes, please pre-commit those, so we see the diffs.

foad updated this revision to Diff 278470.Thu, Jul 16, 7:22 AM

Rebase after precommitting new test cases.

spatel added inline comments.Thu, Jul 16, 7:36 AM

This is an unusual test because it contains "i216" types, but that's apparently the way the code was written in:

Maybe because of that, the constants are marked opaque from DAG creation time:
Creating constant: t6: i216 = OpaqueConstant<4> default constant folding is bypassed. Someone with more PPC knowledge probably needs to decide what -- if anything -- needs to be done here.

Have you triaged whats going on in the mulfixsat test regressions?


Maybe also show the pseudocode as used by PPC for comparison?

foad marked 3 inline comments as done.Mon, Jul 20, 3:40 AM
foad added inline comments.

This is pseudocode for the PPC expansion. Did you mean, show the more complicated expansion that TargetLowering would have used?

12 ↗(On Diff #278470)

I think the extra instruction here is just due to bad luck in the register allocation or scheduling.

29 ↗(On Diff #278470)

It's hard to see because of the register allocation and scheduling differences but there is a slight regression here, from:

rotlwi  3, 3, 31
rlwimi 3, 5, 31, 0, 0


srwi 5, 5, 1
slwi 4, 3, 31
or 4, 4, 5

I suppose I have regressed funnel shifts by a constant amount. I'll see if I can fix it.

RKSimon added inline comments.Mon, Jul 20, 4:51 AM

Doh! Sorry about that, I misread it this morning - no need to add anything here.

foad updated this revision to Diff 279190.Mon, Jul 20, 5:23 AM

Enable combining of PPC-specific shift opcodes.

foad added a subscriber: inouehrs.Mon, Jul 20, 5:28 AM

Enable combining of PPC-specific shift opcodes.

My rationale is that in PPCTargetLowering::LowerFunnelShift it's correct to use PPCISD::SHL and PPCISD::SRL (instead of the ISD:: versions) precisely because they are defined in the shift-by-bitwidth case. But I think it's safe for BitPermutationSelector to treat them the same as the ISD:: versions. @hfinkel @inouehrs does this seem reasonable?

This seems reasonable to me @hfinkel @nemanjai any more comments?

nemanjai accepted this revision.Mon, Jul 27, 7:16 AM



I agree with Sanjay, this is due to the weird size. However, it is not a concern. I'll post a very small tweak to our Reg+Reg -> Reg+Imm transformation that recovers this.

This revision is now accepted and ready to land.Mon, Jul 27, 7:16 AM

The fix to recover the regression:

This revision was automatically updated to reflect the committed changes.