This is an archive of the discontinued LLVM Phabricator instance.

[x86] shrink 'and' immediate values by setting the high bits (PR35907)
ClosedPublic

Authored by spatel on Jan 15 2018, 2:25 PM.

Details

Summary

Try to reverse the constant-shrinking that happens in SimplifyDemandedBits() for 'and' masks when it results in a smaller immediate.

Other targets might want to share some of this logic by enabling this under a target hook, but I didn't see diffs for simple cases with PowerPC or AArch64, so I think they already have some specialized logic for this kind of thing.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Jan 15 2018, 2:25 PM
craig.topper added inline comments.Jan 16 2018, 10:16 AM
lib/Target/X86/X86ISelDAGToDAG.cpp
2461 ↗(On Diff #129907)

You could create a mask from the APInt::getHighBitSet call you need to do anyway and use the DAG.MaskedValueIsZero which wraps computeKnownBits?

2466 ↗(On Diff #129907)

Should this check be before the more costly computeKnownBits?

2471 ↗(On Diff #129907)

One of these days we should really decide between 'DL' and 'dl'...

2473 ↗(On Diff #129907)

Just to make sure I understand. If the new mask is all ones, getNode constant folds the AND away leaving the constant disconnected. But isel will notice the constant is unused and not select it? Or maybe isel doesn't see the constant at all because it wasn't there when the selection process started?

Though if we can prove the upper bits are 0 and the result is all ones why didn't simplify demanded bits during DAG combine already remove the AND. I'm just trying to understand what happened on the tests in divide-by-constant.ll

I wonder if we should do this in PreprocessISelDAG so the killing off of the AND when the mask is all 1s doesn't seem quite so weird. Right now when it happens the Select call ends up doing selection on the input to the AND.

spatel marked 2 inline comments as done.Jan 16 2018, 12:14 PM

I wonder if we should do this in PreprocessISelDAG so the killing off of the AND when the mask is all 1s doesn't seem quite so weird. Right now when it happens the Select call ends up doing selection on the input to the AND.

Let me take a closer look at how this escapes SimplifyDemandedBits. There might be a hole in our AssertZext logic. We get it right if we start with this IR:

define i32 @test3mul(i8 zeroext %x) {
  %z = zext i8 %x to i32
  %m = mul nuw nsw i32 %z, 171
  %s = lshr i32 %m, 9
  %a = and i32 %s, 127
  ret i32 %a
}
lib/Target/X86/X86ISelDAGToDAG.cpp
2471 ↗(On Diff #129907)

I thought 'dl' was a holdover from the days when variables were lower-case? I'd be happy to reformat the whole file. :)

2473 ↗(On Diff #129907)

Yes, getNode has a fold for 'and X, -1' --> X.

If we look at 'test3', we have this going in:

      t4: i32 = AssertZext t2, ValueType:ch:i8
    t41: i32 = mul t4, Constant:i32<171>
  t38: i32 = srl t41, Constant:i8<9>
t39: i32 = and t38, Constant:i32<127>

So getNode() returns the srl (t38), that replaces the and (t39), and we select that srl now:

Selecting: t39: i32 = and t38, Constant:i32<127>
Creating constant: t43: i32 = Constant<-1>
ISEL: Starting pattern match on root node: t38: i32 = srl t41, Constant:i8<9>
craig.topper added inline comments.Jan 16 2018, 2:19 PM
lib/Target/X86/X86ISelDAGToDAG.cpp
2471 ↗(On Diff #129907)

It feels like everytime I touch DAGCombiner or X86ISelLowering I have to figure out which name is already used in the function I'm editing.

And to really make it confusing, there are a couple places in X86ISelLowering that have DataLayout in DL and an SDLoc in dl.

2473 ↗(On Diff #129907)

I guess I'm slightly worried about someone coming along and adding code in front of the call to Select that checks the opcode of the root node and now we've switched to a different root. We do have such code now, but it doesn't apply here and is admittedly a hack.

if (Node->isStrictFPOpcode())
  Node = CurDAG->mutateStrictFPToFP(Node);

Select(Node);

It means we can only match strict FP nodes when they are the root node and their opcode gets converted. But it might have otherwise been save to fold a vector select to use a masked op.

But I guess in general its probably not safe to look at the root node before the select. So this may be an unfounded fear.

spatel added a comment.EditedJan 17 2018, 9:48 AM

Here's an example of the SimplifyDemandedBits failure that avoids the preliminary div transform in the existing test:

define i32 @test3mul(i8 zeroext %x) {
  %z = zext i8 %x to i16
  %m = mul nuw nsw i16 %z, 171
  %s = lshr i16 %m, 9
  %z2 = zext i16 %s to i32
  %a = and i32 %z2, 127
  ret i32 %a
}

So what happens is that we do remove the 'and' the first time we combine, and we have this after legalization:

       t4: i32 = AssertZext t2, ValueType:ch:i8
      t19: i16 = truncate t4
    t8: i16 = mul nuw nsw t19, Constant:i16<171>
  t11: i16 = srl t8, Constant:i8<9>
t12: i32 = zero_extend t11

Now when we promote the 'srl' to 32-bit, we get:

Promoting t11: i16 = srl t8, Constant:i8<9>
Creating new node: t20: i32 = any_extend t8
Creating constant: t21: i32 = Constant<65535>
Creating new node: t22: i32 = and t20, Constant:i32<65535>
Creating new node: t23: i32 = srl t22, Constant:i8<9>
Creating new node: t24: i16 = truncate t23

When we combine the wider srl:

Combining: t23: i32 = srl t26, Constant:i8<9>
Creating constant: t27: i32 = Constant<127>
Creating new node: t28: i32 = srl t20, Constant:i8<9>
Creating new node: t29: i32 = and t28, Constant:i32<127>

Our 'and' is reincarnated! But notice that we used an 'any_extend' while promoting this to 32-bit. That means when we combine and check demanded bits on the 'and', we have:

     t8: i16 = mul nuw nsw t19, Constant:i16<171>
   t20: i32 = any_extend t8
 t28: i32 = srl t20, Constant:i8<9>
t29: i32 = and t28, Constant: i32<127>

...so demanded bits can't kill the 'and'; the high bits are undef instead of zero. After the mul is promoted to 32-bits, we have:

      t4: i32 = AssertZext t2, ValueType:ch:i8
    t31: i32 = mul t4, Constant:i32<171>
  t28: i32 = srl t31, Constant:i8<9>
t29: i32 = and t28, Constant:i32<127>

...but we don't revisit the later nodes for combining, so that's why we see the unnecessary when we get to Select(). Ideas about how to solve that?

I removed the srl/and reversing transform from X86 to see if we did any better. But end up with an and with 65024 in from of the shift.

Part of the problem seems to be that we promote the srl using an any_extend and a zero_extend_inreg(an and) instead of using a zero_extend directly. Then simplifydemandedbits is able to mess with the and before we get a chance to run the DAG combine that combines and+any_extend into zero_extend. That combine requires the and to still be with 65535, but it had already been turned into 65024.

spatel updated this revision to Diff 130411.Jan 18 2018, 7:50 AM

Patch updated:

  1. Use DAG.MaskedValueIsZero() to make the code cleaner.
  2. Move the check for shrinkability ahead of the more expensive call to compute known bits.
  3. Check for a -1 'and' mask. In this case, replace and return. Don't create an unnecessary constant and don't select out-of-turn. Allow the normal selection sequence to continue.

Thinking about this some more -- and as we mentioned in D42090 -- I don't think it's possible to guarantee that we never see a dead 'and' op here in selection. We don't run DAGCombine to fix-point, so nothing is certain.

So even though we could try to fix the cases seen in the regression tests upstream from here, it won't prevent needing to handle the case of a -1 mask. Thus, I've added code for that situation to avoid unexpected selection behavior.

We could still move this over to PreprocessISelDAG() if that seems better.

spatel added inline comments.Jan 18 2018, 8:02 AM
lib/Target/X86/X86ISelDAGToDAG.cpp
2471 ↗(On Diff #129907)

I didn't make the connection to DataLayout. What's also weird is that we name these 'DL' at all...because an SDLoc isn't a DebugLoc...it contains a DL. No respect for the other member of the class!

To skirt controversy, I just removed the local variable here. :)

craig.topper added inline comments.Jan 18 2018, 12:08 PM
lib/Target/X86/X86ISelDAGToDAG.cpp
2471 ↗(On Diff #129907)

You said you were happy to reformat the whole file to 'DL' but there are places in X86ISelLowering where the same function simultaneously uses the variable name dl for an SDLoc and DL for a DataLayout reference. So a blind rename would fail to build. Anyway this isn't relevant to this review.

This revision is now accepted and ready to land.Jan 18 2018, 7:39 PM
This revision was automatically updated to reflect the committed changes.