Page MenuHomePhabricator

[PPC] Better codegen for AND, ANY_EXT, SRL sequence

Authored by amehsan on Sep 26 2016, 9:35 AM.



This fixes the first issue exposed by PR30483. (See comment 1 in the IR).

First I tried to fix a more general problem in dag combine. I wanted to address all code sequences of shifts and ands, with an extension in the middle. My solution was to bring the extension insn to the beginning of the sequence so the sequence of shifts and ands can be merged together during isel. That approach did not work because this particular sequence is created in target independent dag combine in DAGCombiner::visitSRL.

So alternatively I decided to address this particular sequence in isel. If we encounter similar issues, we can think of a more general solution.

Diff Detail

Event Timeline

amehsan updated this revision to Diff 72504.Sep 26 2016, 9:35 AM
amehsan retitled this revision from to [PPC] Better codegen for AND, ANY_EXT, SRL sequence.
amehsan updated this object.
amehsan added reviewers: hfinkel, kbarton, nemanjai.
amehsan added subscribers: llvm-commits, Carrot, echristo.
amehsan added inline comments.Sep 26 2016, 9:36 AM

I should probably remove this line :)

amehsan added inline comments.Sep 26 2016, 10:15 AM

I am not sure if this is always legal. Will check that.

hfinkel edited edge metadata.Sep 26 2016, 10:41 AM

Alternatively, we might teach the BitPermutationSelector to look through extends, which would be more general. Had you looked at that?


We shouldn't speculatively create new nodes if we can avoid it.


You need an architecture or a triple here too.


Yes, although if you keep the cpu attribute here, you don't need it on the command line.

amehsan added inline comments.Sep 26 2016, 10:46 AM

Sorry, I am not sure I understand this comment. What is speculative here? I have a i32 and want to convert it to i64. I tried a couple of different options and this sequence was the only one that worked. This appeared in a small kernel that I wrote and included a similar conversion.

hfinkel added inline comments.Sep 26 2016, 10:54 AM

No, I mean that you're calling getMachineNode here to generate new SDAG nodes; are you sure that when you do this one of the conditions below will match and these will never just end up being garbage collected?

amehsan added inline comments.Sep 26 2016, 11:00 AM

The conditions below and the ones that reaches this line of code are mutually exclusive. Note that this basic block is ended in line 2650. The next condition will not be satisfied (because we have proved that Val.getOpcode() == ISD::ANY_EXTEND and in the next condition they want Val.getOpcode() == ISD::SRL) and we go straight to the line 2665 and generate code.

Alternatively, we might teach the BitPermutationSelector to look through extends, which would be more general. Had you looked at that?

There is potentially a more general solution as well. I can call it "bubbling up" any_ext and zero_ext. So for example in the following selection DAG

Optimized type-legalized selection DAG: BB#0 '_Z3fooRK3PB2S1_:entry'
SelectionDAG has 17 nodes:
  t0: ch = EntryToken
              t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
            t37: i32,ch = load<LD1[%arrayidx.i6](align=8), anyext from i8> t0, t2, undef:i64
              t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
            t42: i32,ch = load<LD1[%arrayidx.i37](align=8), anyext from i8> t0, t4, undef:i64
          t54: i32 = xor t37, t42
        t55: i32 = srl t54, Constant:i64<3>
      t56: i64 = any_extend t55
    t51: i64 = and t56, Constant:i64<1>
  t20: ch,glue = CopyToReg t0, Register:i64 %X3, t51
  t21: ch = PPCISD::RET_FLAG t20, Register:i64 %X3, t20:1

We can first push any_ext before srl. (This is what I did in my first attempt). But we don't need to stop here. We can then push it before xor and apply (and merge it) to load instructions that feed xor. Given that zero extension is free for PPC loads this is easier than applying the similar idea to sign_extend.

I also played a little bit with BitPermutationSelector. I added the following code to BitPermutationSelector::getValueBits

+    case ISD::ANY_EXTEND: {
+        auto Size = V.getOperand(0).getNode()->getValueType(0).getSizeInBits();
+        DEBUG(dbgs() << "LOC B1\n" << NumBits << "\n" << Size <<"\n" ; );
+        const SmallVector<ValueBit, 64> *InnerBits;
+        std::tie(Interesting, InnerBits) = getValueBits(V.getOperand(0), Size);
+        for (unsigned i = 0; i < Size; ++i)
+          Bits[i] = (*InnerBits)[i];
+        for (unsigned i = Size; i < NumBits; ++i) {
+          Bits[i] = ValueBit(ValueBit::ConstZero);
+        }
+        return std::make_pair(Interesting, &Bits);
+      }
+      break;

This works, but the codegen currently does not generate correct code for i32 to i64 conversions. So more work is needed. As we discussed, this pattern might be special given it is generated in the target independent codegen. So a more general solution may not be needed. So I will add a note to our readme files, with a pointer to this comment, so in future if we need better handling of this opcodes, we follow one of these ideas.

amehsan updated this revision to Diff 72859.Sep 28 2016, 10:36 AM
amehsan edited edge metadata.
kbarton accepted this revision.Oct 20 2016, 1:09 PM
kbarton edited edge metadata.


This revision is now accepted and ready to land.Oct 20 2016, 1:09 PM
amehsan closed this revision.Oct 26 2016, 9:20 AM

Commited 284983