This is an archive of the discontinued LLVM Phabricator instance.

Ensure all uses of permute instruction feed vector stores
ClosedPublic

Authored by kbarton on Jun 28 2016, 10:43 AM.

Details

Reviewers
wschmidt
hfinkel
Summary

There is a problem in VSXSwapRemoval where it is incorrectly removing permute instructions.
In this case, the permute is feeding both a vector store and also a non-store instruction. In this case, the permute cannot be removed.

The fix is to simply look at all the uses of the vector register defined by the permute and ensure that all the uses are vector store instructions.

This problem was reported in PR 27735 (https://llvm.org/bugs/show_bug.cgi?id=27735).

Test case based on the original problem reported.

Diff Detail

Event Timeline

kbarton updated this revision to Diff 62110.Jun 28 2016, 10:43 AM
kbarton retitled this revision from to Ensure all uses of permute instruction feed vector stores.
kbarton updated this object.
kbarton added reviewers: wschmidt, hfinkel.
kbarton added subscribers: llvm-commits, nemanjai, amehsan.

I'm just curious how this happens since the swap instruction that defines the register for the store is emitted together with the actual store. How is there another user for a node that the back end added as part of DAG combine? I assume this is some sort of CSE that is running between the combine step and the swap removal step? If it is, might it also be advantageous to teach it not to eliminate expressions when one of them is a swap feeding a store?

I'm not suggesting this is necessarily the right thing to do, but one thing that will prevent MachineCSE from eliminating swaps is by adding let hasSideEffects = 1 in the definition of XXPERMDI. That fixes this issue, but has the disadvantage of disabling CSE for other forms of the instruction.

In any case, disabling MachineCSE for swaps might be something we want to consider.

wschmidt accepted this revision.Jun 29 2016, 4:50 AM
wschmidt edited edge metadata.

The patch LGTM. One question: Did you check for the reverse situation? Can we have a swap that is fed by more than one instruction, where one is not a load, or is that already handled in the code? I realize this may not come up in practice if we don't have a machine sinking pass.

@nemanjai, yes, you could disable other optimizations, but I wouldn't recommend it. If one were going to spend more time on swap optimization, I'd instead recommend optimizing the web but inserting swaps in specific cases like this where something other than an STXVD2X occurs, which would always be a win in practice (though you could construct pathological cases where the code could get larger, though still no slower). But since P8 optimization is off the table right now, I'd just leave things lie.

This revision is now accepted and ready to land.Jun 29 2016, 4:50 AM

The patch LGTM. One question: Did you check for the reverse situation? Can we have a swap that is fed by more than one instruction, where one is not a load, or is that already handled in the code? I realize this may not come up in practice if we don't have a machine sinking pass.

@nemanjai, yes, you could disable other optimizations, but I wouldn't recommend it. If one were going to spend more time on swap optimization, I'd instead recommend optimizing the web but inserting swaps in specific cases like this where something other than an STXVD2X occurs, which would always be a win in practice (though you could construct pathological cases where the code could get larger, though still no slower). But since P8 optimization is off the table right now, I'd just leave things lie.

Excellent, thanks for answering the question Bill. I really only asked it because I don't know enough about this to know if allowing MachineCSE to eliminate swaps might lead to other situations where we do the wrong thing in our pass.

I'm just curious how this happens since the swap instruction that defines the register for the store is emitted together with the actual store. How is there another user for a node that the back end added as part of DAG combine? I assume this is some sort of CSE that is running between the combine step and the swap removal step? If it is, might it also be advantageous to teach it not to eliminate expressions when one of them is a swap feeding a store?

I'm not suggesting this is necessarily the right thing to do, but one thing that will prevent MachineCSE from eliminating swaps is by adding let hasSideEffects = 1 in the definition of XXPERMDI. That fixes this issue, but has the disadvantage of disabling CSE for other forms of the instruction.

In any case, disabling MachineCSE for swaps might be something we want to consider.

I thought about disabling MachineCSE in this case, but after talking to Bill I decided it wasn't the right approach. If we do disable it, we would need to do it through the cost model. There isn't a flag that we can use to make MachineCSE illegal on swap instructions that wouldn't have broader impacts throughout the optimizer. Marking XXPERMDI as having side effects, for example, can limit many other things that move code around, which we don't want.

There is also the problem that XXPERMDI doesn't really have side effects, so marking it to disable MachineCSE would likely draw some "attention" during reviews ;)

The patch LGTM. One question: Did you check for the reverse situation? Can we have a swap that is fed by more than one instruction, where one is not a load, or is that already handled in the code? I realize this may not come up in practice if we don't have a machine sinking pass.

This is already handled in the code. If the register defined by a swapping load is used in something other than a swapping load/swapping store, then the web is rejected. Unless there is another case outside of this that I'm not seeing....

hfinkel edited edge metadata.Jun 29 2016, 10:45 AM

I'm just curious how this happens since the swap instruction that defines the register for the store is emitted together with the actual store. How is there another user for a node that the back end added as part of DAG combine? I assume this is some sort of CSE that is running between the combine step and the swap removal step? If it is, might it also be advantageous to teach it not to eliminate expressions when one of them is a swap feeding a store?

I'm not suggesting this is necessarily the right thing to do, but one thing that will prevent MachineCSE from eliminating swaps is by adding let hasSideEffects = 1 in the definition of XXPERMDI. That fixes this issue, but has the disadvantage of disabling CSE for other forms of the instruction.

In any case, disabling MachineCSE for swaps might be something we want to consider.

I thought about disabling MachineCSE in this case, but after talking to Bill I decided it wasn't the right approach. If we do disable it, we would need to do it through the cost model. There isn't a flag that we can use to make MachineCSE illegal on swap instructions that wouldn't have broader impacts throughout the optimizer. Marking XXPERMDI as having side effects, for example, can limit many other things that move code around, which we don't want.

There is also the problem that XXPERMDI doesn't really have side effects, so marking it to disable MachineCSE would likely draw some "attention" during reviews ;)

Indeed it would. ;) -- There are certainly cases where we don't want to CSE certain kinds of instructions (generally because of macro fusion and other microarchitectural pattern matching, to manage register pressure, etc.), but in those cases, you'd want to remateralize later instead of preventing CSE earlier (because CSEing the swap might enable CSEing other things).

The patch LGTM. One question: Did you check for the reverse situation? Can we have a swap that is fed by more than one instruction, where one is not a load, or is that already handled in the code? I realize this may not come up in practice if we don't have a machine sinking pass.

This is already handled in the code. If the register defined by a swapping load is used in something other than a swapping load/swapping store, then the web is rejected. Unless there is another case outside of this that I'm not seeing....

Great, thanks. That was my only concern.

kbarton closed this revision.Jul 6 2016, 11:12 AM

Committed with r274645 = b4b959a20eac7f5cab1a86c17a85063627a66615 (refs/remotes/origin/master)