This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Replace MFVSRLD with MFVSRD when the vector is symmetrical
Needs ReviewPublic

Authored by stefanp on Nov 30 2021, 6:38 AM.

Details

Reviewers
lei
nemanjai
Group Reviewers
Restricted Project
Summary

The MFVSRD is faster than the MFVSRLD instruction and if the input vector is
symmetrical then both instructions produce the same result and we should prefer
the faster one.

This patch mainly looks at symmetrical situations that are known to arise after
a vector doubleword swap.

Diff Detail

Event Timeline

stefanp created this revision.Nov 30 2021, 6:38 AM
stefanp requested review of this revision.Nov 30 2021, 6:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2021, 6:38 AM
stefanp added a reviewer: Restricted Project.Nov 30 2021, 6:41 AM

Can we fix this at the place where MFVSRLD is generated? (in DAG-ISEL?)

nemanjai added inline comments.Dec 1 2021, 4:24 PM
llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
411

Is this just any instruction that sets isCommutable = 1?

amyk added a subscriber: amyk.Dec 3 2021, 1:01 PM
amyk added inline comments.
llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
411

Could we maybe add a comment as to why we are looking at these instructions specifically?

I noticed that Nemanja is mostly correct in that a majority of these set isCommutable = 1, but with the exception of VMULLD, VMULHSD, VMULHUD.

stefanp updated this revision to Diff 394870.Dec 16 2021, 7:16 AM

Updated the patch to use isCommutable instead of listing the instructions
individually.

Can we fix this at the place where MFVSRLD is generated? (in DAG-ISEL?)

I see what you mean. This can be done in DAG-ISEL.
However, there are a couple of reasons why I chose to do this here.

  1. I think it is easier to do here and the code is simpler to understand. We are looking to replace MFVSRLD so we just look for that instruction and then replace it. In DAG ISEL we would have to first figure out which extract element node eventually gets turned into this instruction and then see if it can be replaced.
  2. There are situations where the choice of preceding instructions matters to whether or not we can use the other doubleword. Again, it would be a situation where we would have to figure out what a node is turned into before deciding if the doubleword is symmetrical.
amyk added inline comments.Jan 7 2022, 7:03 AM
llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
408

Maybe we can try an early exit if this works:

if (!DefVecReg->getDesc().isCommutable())
  return false;
stefanp updated this revision to Diff 400978.Jan 18 2022, 2:04 PM

Added an early exit.

qiucf added a subscriber: qiucf.Jan 18 2022, 6:36 PM

Can we fix this at the place where MFVSRLD is generated? (in DAG-ISEL?)

I see what you mean. This can be done in DAG-ISEL.
However, there are a couple of reasons why I chose to do this here.

  1. I think it is easier to do here and the code is simpler to understand. We are looking to replace MFVSRLD so we just look for that instruction and then replace it. In DAG ISEL we would have to first figure out which extract element node eventually gets turned into this instruction and then see if it can be replaced.
  2. There are situations where the choice of preceding instructions matters to whether or not we can use the other doubleword. Again, it would be a situation where we would have to figure out what a node is turned into before deciding if the doubleword is symmetrical.

Will putting code in post-isel peephole help? See D97658.

Can we fix this at the place where MFVSRLD is generated? (in DAG-ISEL?)

I see what you mean. This can be done in DAG-ISEL.
However, there are a couple of reasons why I chose to do this here.

  1. I think it is easier to do here and the code is simpler to understand. We are looking to replace MFVSRLD so we just look for that instruction and then replace it. In DAG ISEL we would have to first figure out which extract element node eventually gets turned into this instruction and then see if it can be replaced.
  2. There are situations where the choice of preceding instructions matters to whether or not we can use the other doubleword. Again, it would be a situation where we would have to figure out what a node is turned into before deciding if the doubleword is symmetrical.

Will putting code in post-isel peephole help? See D97658.

I think I see what you mean based on that patch. I could try to move it there and re-use isVSXSwap. I will try to move the code there and see what I get.

Something to consider when it comes to deciding where to do something like this is that any work we do on the SDAG:

  1. Will be basic-block local obviously
  2. Will not be useful in the future when we switch to GISel
stefanp updated this revision to Diff 401977.Jan 21 2022, 7:08 AM

Added a test case to show the issue with basic block local.

Can we fix this at the place where MFVSRLD is generated? (in DAG-ISEL?)

I see what you mean. This can be done in DAG-ISEL.
However, there are a couple of reasons why I chose to do this here.

  1. I think it is easier to do here and the code is simpler to understand. We are looking to replace MFVSRLD so we just look for that instruction and then replace it. In DAG ISEL we would have to first figure out which extract element node eventually gets turned into this instruction and then see if it can be replaced.
  2. There are situations where the choice of preceding instructions matters to whether or not we can use the other doubleword. Again, it would be a situation where we would have to figure out what a node is turned into before deciding if the doubleword is symmetrical.

Will putting code in post-isel peephole help? See D97658.

I think I see what you mean based on that patch. I could try to move it there and re-use isVSXSwap. I will try to move the code there and see what I get.

So, I ended up also fully implementing this in post-isel peephole and then I compared the implementations. The conclusion I came to is that I would prefer to leave the implementation here in the MIPeephole.

Now, here is the long answer.
I took Nemanja's comment into consideration:

Something to consider when it comes to deciding where to do something like this is that any work we do on the SDAG:

  1. Will be basic-block local obviously
  2. Will not be useful in the future when we switch to GISel

For point number 1. the basic-block local aspect of the ISel actually causes the ISel implementation to miss opportunities. To demonstrate this I added this test: ppc64-mfvsrld-removal.ll.

If you look at that test in ISel the Basic Block that has the MFVSRLD instruction looks like this:

SelectionDAG has 8 nodes:
  t0: ch = EntryToken
        t2: v2i64,ch = CopyFromReg t0, Register:v2i64 %0
      t4: i64 = MFVSRLD t2
    t6: ch = CopyToReg t0, Register:i64 %1, t4
  t8: ch = B BasicBlock:ch<cleanup 0x10024b22b10>, t6

There isn't enough history in that BB to allow us to transform the instruction.
On the other hand, in MIPeephole we can see where the values are coming from:

# Machine code for function getVecSplit: IsSSA, TracksLiveness
Function Live Ins: $x3 in %4, $v2 in %5, $v3 in %6

bb.0.entry:
  successors: %bb.2(0x30000000), %bb.1(0x50000000); %bb.2(37.50%), %bb.1(62.50%)
  liveins: $x3, $v2, $v3
  %6:vrrc = COPY $v3
  %5:vrrc = COPY $v2
  %4:g8rc = COPY $x3
  %7:gprc = COPY %4.sub_32:g8rc
  %8:vrrc = XXPERMDI %5:vrrc, %5:vrrc, 2
  %0:vrrc = VADDUDM killed %8:vrrc, %5:vrrc     <---- Comes from here
  %9:crrc = CMPLWI killed %7:gprc, 0
  BCC 76, killed %9:crrc, %bb.2
  B %bb.1

bb.1.if.then:
; predecessors: %bb.0
  successors: %bb.3(0x80000000); %bb.3(100.00%)

  %1:g8rc = MFVSRLD %0:vrrc
  B %bb.3

bb.2.if.else:
; predecessors: %bb.0
  successors: %bb.3(0x80000000); %bb.3(100.00%)

  %10:vrrc = VADDUDM %0:vrrc, %6:vrrc
  ADJCALLSTACKDOWN 32, 0, implicit-def dead $r1, implicit $r1
  $v2 = COPY %10:vrrc
  BL8_NOP @callee, <regmask $cr2 $cr3 $cr4 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24 $f25 $f26 $f27 $f28 $f29 $f30 $f31 $r14 $r15 $r16 $r17 $r18 $r19 $r20 $r21 $r22 $r23 $r24 $r25 and 60 more...>, implicit-def dead $lr8, implicit $rm, implicit $v2, implicit $x2, implicit-def $r1, implicit-def $x3
  ADJCALLSTACKUP 32, 0, implicit-def dead $r1, implicit $r1
  %11:g8rc_and_g8rc_nox0 = COPY $x3
  %2:g8rc = nsw ADDI8 %11:g8rc_and_g8rc_nox0, 42

bb.3.cleanup:
; predecessors: %bb.1, %bb.2

  %3:g8rc = PHI %1:g8rc, %bb.1, %2:g8rc, %bb.2
  $x3 = COPY %3:g8rc
  BLR8 implicit $lr8, implicit $rm, implicit $x3

# End machine code for function getVecSplit.

Therefore for this test we only catch the opportunity if we check for it in MIPeephole.

Also there are other reasons why I wanted to keep the patch as-is:

  1. As Nemanja mentioned we may be moving to GISel and so this work would have to be re-done in that situation if we did it in ISel.
  2. In ISel there are a number of nodes that we have to "look though" like COPY_TO_REGCLASS that makes the code more complicated. In MIPeephole a lot of those nodes have now been simplified away.
  3. I didn't see any advantages to doing this earlier as I don't think that this transformation will make much of a difference to passes that run between ISel and MIPeephole.

I hope I explained this ok. Let me know if there are more questions about this.