This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Convert reg/reg instructions fed by constants to reg/imm instructions (pre and post RA)
ClosedPublic

Authored by nemanjai on Apr 8 2017, 10:23 AM.

Details

Summary

This patch does the following:

  • Adds the infrastructure to convert instructions that have a reg+imm form into that form
  • Does that conversion in the MI Peephole on SSA form (off by default to allow some testing in the wild). This is done in a fixed-point transformation (controlled by an option)
  • Does the same conversion again in a pre-emit peephole since a number of opportunities are created late - in block placement, etc. (off by default to allow some testing in the wild).
  • Adds mir test cases for both peepholes that cover all the converted opcodes

Please note that this patch also moves the initialization of PPCMIPeepholePass to allow mir testing.

In current LLVM bootstrap, this converts 19,600 instructions to r+i form (SSA) and 2,600 (post RA). It also removes 96 cmp+isel sequences as they were comparing a constant to a constant. This allows us to eliminate 16,400 instructions (SSA) and 1,200 (post RA).

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai updated this revision to Diff 95287.Apr 14 2017, 4:44 AM
nemanjai retitled this revision from [PowerPC] Eliminate compares in instruction selection - Vol. 6 to [PowerPC] Eliminate compares - convert reg/reg instructions fed by constants to reg/imm instructions.
nemanjai edited the summary of this revision. (Show Details)
nemanjai updated this revision to Diff 118462.Oct 10 2017, 1:41 PM

This isn't really dependent on the compare elimination patches (although it will find more opportunities once those land).

  • Rebased to top of trunk
  • Added handling for (isel %truereg, %falsereg, (cmpi (li $imm1), $imm2)) and another test case
  • Added an option to turn off fixed-point transformation
  • Added a few more useful stats
nemanjai updated this revision to Diff 120836.Oct 30 2017, 9:46 AM
nemanjai retitled this revision from [PowerPC] Eliminate compares - convert reg/reg instructions fed by constants to reg/imm instructions to [PowerPC] Convert reg/reg instructions fed by constants to reg/imm instructions (pre and post RA).
nemanjai edited the summary of this revision. (Show Details)
hfinkel edited edge metadata.Oct 30 2017, 6:32 PM

This is really interesting. So we're seeing a number of cases where, after instruction selection, we're doing something that makes operations have constant operands. Do you know what that is? Maybe one thing we could do with this is have it assert, instead of transforming the code, and then reduce the test cases in order to understand why this happens.

This is really interesting. So we're seeing a number of cases where, after instruction selection, we're doing something that makes operations have constant operands. Do you know what that is? Maybe one thing we could do with this is have it assert, instead of transforming the code, and then reduce the test cases in order to understand why this happens.

I don't think that we can catch all of these early enough not to need something like this. I think that fundamentally, we're limited in our ability to select the right instruction when values come from other blocks. Most of the early cases are likely due to SDAG being bb-local. The late opportunities arise from late transformations that may eliminate or duplicate things. The motivating example for the late pass was a case where an input to a compare comes from a PHI. One of the PHI inputs is a load and the other is a constant. Block placement then duplicates the block and we end up with one block having

li 3, 15
cmpld  24, 3

which this patch transforms in the obvious way to cmpldi 24, 15.

Also, what initially prompted the early pass is that we are looking to commit the code that will produce GPR-only sequences for compares (https://reviews.llvm.org/D38575), which we have to do in C++ code. Rather than adding special cases for constant comparisons, we clean it up with something like this and get the benefits of cleaning up any other such code we emit now and in the future.

Finally, I am afraid that leaving such an assert in the code might make it likely to trigger due to other changes that may be made in target-independent code. This would be good to prompt us to fix yet another issue, but might possibly lead to:

  1. Frustrating the community as their patches trigger this assert in the PPC back end
  2. Adding special cases where we accept that we emit code like this under specific conditions
  3. An outright removal of the assert by someone

I don't think that we can catch all of these early enough not to need something like this. I think that fundamentally, we're limited in our ability to select the right instruction when values come from other blocks. Most of the early cases are likely due to SDAG being bb-local.

One definite source of these opportunities is something like this:

int a;
void fn1() { __atomic_fetch_add(&a, 0, 4); }

The atomic pseudo instructions are as general as possible so don't have immediate forms. Then of course, when we expand the pseudo, we end up with something like li 5, 4 -> add 3, 4, 5. This pass will clean it up. Of course, we can add all the immediate form atomic pseudo's, but I'm just afraid we'll keep finding more such cases.

This is really interesting. So we're seeing a number of cases where, after instruction selection, we're doing something that makes operations have constant operands. Do you know what that is? Maybe one thing we could do with this is have it assert, instead of transforming the code, and then reduce the test cases in order to understand why this happens.

I don't think that we can catch all of these early enough not to need something like this. I think that fundamentally, we're limited in our ability to select the right instruction when values come from other blocks. Most of the early cases are likely due to SDAG being bb-local.

But that shouldn't affect constants. We should do constant propagation at the IR level, so all constants should be manifest by the time you get to the SDAG.

The late opportunities arise from late transformations that may eliminate or duplicate things. The motivating example for the late pass was a case where an input to a compare comes from a PHI. One of the PHI inputs is a load and the other is a constant. Block placement then duplicates the block and we end up with one block having

li 3, 15
cmpld  24, 3

which this patch transforms in the obvious way to cmpldi 24, 15.

Seems like a good example. Can you please document this in a comment in the pass as an example of why such a thing might be needed. I find this convincing.

Also, what initially prompted the early pass is that we are looking to commit the code that will produce GPR-only sequences for compares (https://reviews.llvm.org/D38575), which we have to do in C++ code. Rather than adding special cases for constant comparisons, we clean it up with something like this and get the benefits of cleaning up any other such code we emit now and in the future.

I'm not sure this is a good example, however. This leads to exactly the kind of phase-ordering effects that we might end up wanting to avoid. Doing this in the SDAG means that you'll get the SDAG's CSE. Doing later means that you don't, and maybe MachienCSE will clean this up later, and nothing will be lost in the mean time, but maybe not.

Finally, I am afraid that leaving such an assert in the code might make it likely to trigger due to other changes that may be made in target-independent code. This would be good to prompt us to fix yet another issue, but might possibly lead to:

  1. Frustrating the community as their patches trigger this assert in the PPC back end
  2. Adding special cases where we accept that we emit code like this under specific conditions
  3. An outright removal of the assert by someone

Oh, that's not what I meant. We should never add something like this with an assert by default. I meant that you add it with a flag that causes it to assert.

I don't think that we can catch all of these early enough not to need something like this. I think that fundamentally, we're limited in our ability to select the right instruction when values come from other blocks. Most of the early cases are likely due to SDAG being bb-local.

One definite source of these opportunities is something like this:

int a;
void fn1() { __atomic_fetch_add(&a, 0, 4); }

The atomic pseudo instructions are as general as possible so don't have immediate forms. Then of course, when we expand the pseudo, we end up with something like li 5, 4 -> add 3, 4, 5. This pass will clean it up. Of course, we can add all the immediate form atomic pseudo's, but I'm just afraid we'll keep finding more such cases.

It sounds like we might indeed want to add pseudos with immediate forms. Also an interesting example.

Seems like a good example. Can you please document this in a comment in the pass as an example of why such a thing might be needed. I find this convincing.

I'll certainly add this to the comments.

Also, what initially prompted the early pass is that we are looking to commit the code that will produce GPR-only sequences for compares (https://reviews.llvm.org/D38575), which we have to do in C++ code. Rather than adding special cases for constant comparisons, we clean it up with something like this and get the benefits of cleaning up any other such code we emit now and in the future.

I'm not sure this is a good example, however. This leads to exactly the kind of phase-ordering effects that we might end up wanting to avoid. Doing this in the SDAG means that you'll get the SDAG's CSE. Doing later means that you don't, and maybe MachienCSE will clean this up later, and nothing will be lost in the mean time, but maybe not.

Sorry, I wasn't completely clear. This is done on the SDAG (part of the Select() function in PPCISelDAGToDAG.cpp). We just have to do it in C++ code because a lot of the patterns involve MachineSDNode's that have two results (GPR and CARRY) and TblGen can't handle that.

Oh, that's not what I meant. We should never add something like this with an assert by default. I meant that you add it with a flag that causes it to assert.

Fair enough.

It sounds like we might indeed want to add pseudos with immediate forms. Also an interesting example.

I certainly don't mind adding these forms in a separate patch.

Another example that comes up a lot is DS-Form loads and stores. We will not emit them on unaligned data for reasons specified in the comment in PPCInstrInfo.td and will fall back on the X-Forms. However, once we know the constants, we should be able to safely replace the X-Form instructions with DS-Form ones as this patch does.

So at this time, we have the following reasons this comes up:

  • X-Form vs. DS-Form issues
  • Atomics with constant operands
  • Tail-duplication induced redundancy
  • Compare elimination patch not explicitly handling constant operands
  • Anything else that I haven't explicitly identified yet

Although most of these root causes can be fixed, I imagine there are others that exist now and I haven't identified. And there will likely be others in the future. My opinion is that even if we were to fix all of these (and it isn't clear to me how we'd fix at least the tail-duplication one) and provide an assert to aid in investigating related problems in the future, we'd likely still be missing opportunities.
The reason I say this is that it is unlikely that we'd be diligent and thorough enough to regularly build representative code with this assert turned on in the future.

But of course, that is just my opinion and I'm happy to do what the majority feels is appropriate.

nemanjai updated this revision to Diff 124209.Nov 24 2017, 7:27 AM

Fix handling for R0/X0 where it's special.
Add test cases for the special handling there.
Turn the peepholes on by default.
Fix test cases that change due to having this on by default.

inouehrs edited edge metadata.Nov 28 2017, 3:04 AM

I think that this pass can be used to optimize ISEL if the false reg is constant zero by inverting the predicate (but this can be another patch).

I think that this pass can be used to optimize ISEL if the false reg is constant zero by inverting the predicate (but this can be another patch).

That's a good point @inouehrs. I can add this in a follow-up patch.

@hfinkel What do you think about this? Do we want to proceed with this? We actually convert a lot of instructions with this and it has a proven effect on performance. The updated patch actually converts and deletes an extra ~1,000 instructions in bootstrap.

hfinkel accepted this revision.Dec 12 2017, 8:07 AM

@hfinkel What do you think about this? Do we want to proceed with this? We actually convert a lot of instructions with this and it has a proven effect on performance. The updated patch actually converts and deletes an extra ~1,000 instructions in bootstrap.

Yes, we should proceed.

lib/Target/PowerPC/PPCInstrInfo.cpp
2253

I see why you did this, but I don't like the interface to this function: it is unintuitive that it might return non-null, and yet, you also need to check that the ConstantOp is not -1 to know if the return value is valid. Can you please change this so that it uniformly returns non-null only if it succeeds (and, thus, the constant is valid).

This revision is now accepted and ready to land.Dec 12 2017, 8:07 AM
nemanjai added inline comments.Dec 13 2017, 6:50 AM
lib/Target/PowerPC/PPCInstrInfo.cpp
2253

That is a really good point. I'll change it to return non-null only if the constant operand is a valid operand of the instruction.

lib/Target/PowerPC/PPCTargetMachine.cpp