This is an archive of the discontinued LLVM Phabricator instance.

[X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware
ClosedPublic

Authored by craig.topper on Jun 29 2018, 2:33 PM.

Details

Summary

Inspired by what AArch64 does for shifts, this patch attempts to replace shift amounts with neg if we can.

This is done directly as part of isel so its as late as possible to avoid breaking some BZHI patterns since those patterns need an unmasked (32-n) to be correct.

To avoid manual load folding and custom instruction selection for the negate. I've inserted new nodes in the DAG above the shift node in topological order. Though as I'm writing this, I'm wondering if it might make more sense to put them above the original shift amount node instead. By inserting them in the topogical order we should be able to run isel on them independently.

This patch does risk leaving behind a subtract and creating a negate if the subtract was used by something other than a shift.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Jun 29 2018, 2:33 PM
lebedev.ri added inline comments.Jun 29 2018, 2:59 PM
lib/Target/X86/X86ISelDAGToDAG.cpp
2698–2699 ↗(On Diff #153571)

Why? I would think we can simply add one more pattern to the BZHI?
I think this is bad because it not only not "messing up a BZHI pattern",
but anything else too, since it does not check that the user is and %val, %mask.

test/CodeGen/X86/extract-lowbits.ll
1–11 ↗(On Diff #153571)

I'll commit the change to this file from D48490 tomorrow.
That will expose more fold possibilities here.

craig.topper added inline comments.Jun 29 2018, 3:11 PM
lib/Target/X86/X86ISelDAGToDAG.cpp
2698–2699 ↗(On Diff #153571)

This code runs when the root node considered for isel is a srl/sra/shl. And it runs before any tablegen patterns on that node. Any patterns that used a shift, but where the shift was not the root node have already been matched.

So I'm specifically blocking out this one bzhi pattern that has a shift as a root node.

// x << (bitwidth - y) >> (bitwidth - y)
defm : _bmi_bzhi_pattern<(srl (shl RC:$src, (sub bitwidth, GR8:$lz)),
                              (sub bitwidth, GR8:$lz)),
                         (srl (shl (x86memop addr:$src),
                                    (sub bitwidth, GR8:$lz)),
                              (sub bitwidth, GR8:$lz)),
                         RC, VT, DstInst, DstMemInst>;
craig.topper added a reviewer: niravd.

Place the new nodes before the original shift amount in the topological order of the DAG. This ensures that all shifts that had the same shift amount will be selected and run this transform before any of the new nodes are selected. The selection DAG should CSE all the new nodes together for multiple shifts.

niravd added inline comments.Jul 2 2018, 10:42 AM
lib/Target/X86/X86ISelDAGToDAG.cpp
2737 ↗(On Diff #153592)

There's a definitely an issue with insertDAGNode where collisions due to CSE could require a nontrivial reordering in AllNodes to maintain the 'users before operands' property and we do at most repositioning one node. At some point we should probably replace insertDAGNode with a backend-generic way to corectly reposition that node and it's operands recursively to guarantee correct behavior.

That said, hitting the issue in a way that will mess with correctness is pretty hard.We should probably change the position given to insertDAGNode to be the immediate operand to minimize the probability of reordering size.

2758 ↗(On Diff #153592)

You should overwrite N with the return value of the update in case the update is CSEd with another node, e.g., we are selecting in a block that has both the pre-optimized and post-optimized versions. Actually, we should probably add a test case for this.

N = CurDAG->UpdateNodeOperands(N, N->getOperand(0), NewShiftAmt);

@niravd would it be better to just do the full manual isel instead of trying to do it this way? Selecting the negate is easy. It just the variations of the shift that kind of ugly due to legacy vs BMI2 and BMI2 allowing load folding.

@niravd would it be better to just do the full manual isel instead of trying to do it this way? Selecting the negate is easy. It just the variations of the shift that kind of ugly due to legacy vs BMI2 and BMI2 allowing load folding.

Probably not; the CSE collision I can envision seems like it wouldn't happen in the wild. Also, it should manifest as an ICE/assertion. If we wanted to go the manual select route we should really do it for all uses of InsertDAGNode (both in X86 and SystemZ) which seems like a pain. I think we're happier keeping it as, and adding a note for insertDAGNode. I'll put making a fixup to insertDAGNode on my to-do list.

craig.topper added inline comments.Jul 2 2018, 4:30 PM
lib/Target/X86/X86ISelDAGToDAG.cpp
2758 ↗(On Diff #153592)

I'm having a hard time getting the topological sort to give me an order that can cause this to happen. We're always doing isel on the post-optimized version i explicitly put in the IR first. There are more nodes in that path and that seems to be making it so that the pre-optimized path gets fully sorted into the topological order before the post-optimized side does.

niravd added inline comments.Jul 6 2018, 11:11 AM
lib/Target/X86/X86ISelDAGToDAG.cpp
2758 ↗(On Diff #153592)

I don't have the details of the topological sort worked out, but it definitely seems reasonable that it precludes any problematic orderings. Given that, let's skip on adding a test case for this.

I don't suppose we have any chance of SimplifyDemandedBits being able to do some/all of this for us?

Without an And the target independent node wouldn't allow this. I thought about trying to insert an And into the DAG during DAG combine which would make simplify demanded bits trigger. But it breaks recognizition of (x << (32 - n)) >> (32 - n)) for bzhi. It would become (x << (-n & 0x1f)) >> (-n & 0x1f) and now we don't know if 'n' is safe to feed directly to bzhi.

Can you rebase this since D48767 has landed?
I think those might be affected by the patch.

Rebase and capture the value returned form UpdateNodeOperands.

@niravd, let me make sure I understand what will happen in the CSE case. We'll walk the isel matching table using the CSE node, but at the end of the match we'll call MorphNodeTo using the original node that is still held in NodeToMatch in the target independent isel code? Then later we'll revisit the CSE node when its turn in the topological order comes up?

Judging by the code in SystemZISelDAGToDAG.cpp that calls UpdateNodeOperands during select, I don't think it works the way I thought. The SystemZ code calls ReplaceNode if UpdateNodeOperands does CSE. I tried to change that to just assign over Node. Which would make it similar to what I'm trying to do here. But it ended up leaving a node unselected.

Remove bad comment

Remove a change that snuck into my last update.

Other than nits i have no further comments, thanks.

lib/Target/X86/X86ISelDAGToDAG.cpp
2715 ↗(On Diff #154533)

Are you sure this shouldn't be N mod Size == 0?
At least i think that is what the code does.

2720 ↗(On Diff #154533)

Same

2698–2699 ↗(On Diff #153571)

I see, thank you for the explanation.

Did this just slip off the radar, or are there some (unlisted) prerequisite fixes needed?
The remaining comments here seem very minor.

I’m pretty sure the case where UpdateNodeOperands returns an existing node instead of updating N doesn’t work. But I haven’t been able to trigger it with a test.

I’m pretty sure the case where UpdateNodeOperands returns an existing node instead of updating N doesn’t work. But I haven’t been able to trigger it with a test.

Ah, I think you're right; if we find an existing node, other references to the old node will not be updated. But calling ReplaceNode in that case should be sufficient.
If it's too hard to get a test case, I'm in favor of commit this without a test case.

So if we call ReplaceNode, do I still continue on to selecting that node? Which I think means we select that node earlier than its other users and prevents the users from seeing it as an 'and' that they can match in their patterns?

I think you should just return immediately and let it get replaced the next time.

Alternatively since we can't find an example,should we just do the obviously safe fix and just fail out if the UpdateNodeOperands returns an existing node? UpdateNodeOperands doesn't modify anything when it finds an existing node so this is easy.

So if we call ReplaceNode, do I still continue on to selecting that node? Which I think means we select that node earlier than its other users and prevents the users from seeing it as an 'and' that they can match in their patterns?

Rebase and fix the CSE handling.

niravd accepted this revision.Aug 22 2018, 12:27 PM

LGTM.

This revision is now accepted and ready to land.Aug 22 2018, 12:27 PM
This revision was automatically updated to reflect the committed changes.