This is an archive of the discontinued LLVM Phabricator instance.

[X86] Move X86DAGToDAGISel::matchBEXTRFromAnd() into X86ISelLowering
ClosedPublic

Authored by lebedev.ri on Sep 24 2018, 8:45 AM.

Details

Summary

As discussed in PR38938,
we fail to emit BEXTR if the mask is shifted.
We can't deal with that in X86DAGToDAGISel before the address mode for the inc is selected,
and we can't really do it in the normal DAGCombine, because we don't have generic ISD::BitFieldExtract node,
and if we simply turn the shifted mask into a normal mask + shift-left, it will be folded back.
So it would seem X86ISelLowering is the place to handle this.

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Sep 24 2018, 8:45 AM
lebedev.ri added inline comments.
test/CodeGen/X86/tbm_patterns.ll
52–64 ↗(On Diff #166697)

This seems to be a miscompile, BEXTR does not touch EFLAGS?

lebedev.ri retitled this revision from [X86] Move X86DAGToDAGISel::matchBEXTRFromAnd() into X86ISelLowering [WIP] to [X86] Move X86DAGToDAGISel::matchBEXTRFromAnd() into X86ISelLowering.Sep 24 2018, 9:38 AM
lebedev.ri edited the summary of this revision. (Show Details)
lebedev.ri marked an inline comment as done.
lebedev.ri added inline comments.
test/CodeGen/X86/tbm_patterns.ll
52–64 ↗(On Diff #166697)
craig.topper added inline comments.Sep 24 2018, 4:37 PM
lib/Target/X86/X86ISelLowering.cpp
35252 ↗(On Diff #166697)

Remove the call to dump.

35308 ↗(On Diff #166697)

Remove call to dump.

test/CodeGen/X86/extract-bits.ll
5797 ↗(On Diff #166697)

This is not an improvement. We traded a shift right plus an and for a move immediate, a 2 uop bextr, and a shift left. So we went from 2 uops to 4. At least on Haswell.

lebedev.ri marked an inline comment as done.Sep 24 2018, 11:50 PM

Thank you for taking a look!

test/CodeGen/X86/extract-bits.ll
5797 ↗(On Diff #166697)

Aha. So the D52293 has the same problem, obviously.
Is adding FeatureSlowBEXTR the way forward?

craig.topper added inline comments.Sep 25 2018, 10:54 PM
test/CodeGen/X86/extract-bits.ll
5797 ↗(On Diff #166697)

It's still an increase in instruction even on AMD in the BMI1 case. We still went from 2 uops to 3 uops. We'd only be ok with BEXTRI TBM instruction.

In the case from PR38938 we were able to fold the shl into an addressing calculation which made it beneficial.

I'm not sure matchBEXTRFromAnd() for BMI was ever a good idea for Intel CPUs. I took it out and got performance improvements on several benchmarks in our internal list. Only a couple regressions and one of those was on a test that's really sensitive to code layout.

We'd only be ok with BEXTRI TBM instruction.
I'm not sure matchBEXTRFromAnd() for BMI was ever a good idea for Intel CPUs. I took it out and got performance improvements on several benchmarks in our internal list. Only a couple regressions and one of those was on a test that's really sensitive to code layout.
In the case from PR38938 we were able to fold the shl into an addressing calculation which made it beneficial.

Okay, so these simple cases are ok if there is TBM.
But the PR38938 - AMD Jaguar - does not have TBM.
There really isn't any fundamental difference in the IR between this, and the D52293,
so i would have guessed they should be using the *same* profitability check, correct?
So what would it be? "have TBM, or shifting a newly-loaded value?"
@RKSimon

@lebedev.ri Are you able to refactor this now that D52570 has landed?

lebedev.ri marked 2 inline comments as done.

@lebedev.ri Are you able to refactor this now that D52570 has landed?

Rebased now that D52570 has landed.

Can you remove the D52293 dependency and just move the current code to DAG so we can fix PR38938?

lebedev.ri updated this revision to Diff 167721.Oct 1 2018, 6:35 AM

Drop parent review.

RKSimon added inline comments.Oct 1 2018, 11:23 AM
lib/Target/X86/X86ISelLowering.cpp
35245 ↗(On Diff #167721)

Most of this could be replaced with:

return (VT == MVT::i32 || (VT == MVT::i64 && Subtarget.is64Bit()));
35261 ↗(On Diff #167721)

You could just use the EVT value all the way through if you changed hasBEXTR to take an EVT instead of MVT

lebedev.ri marked 2 inline comments as done.

Consistently use EVT.

craig.topper added inline comments.Oct 1 2018, 11:40 AM
lib/Target/X86/X86ISelLowering.cpp
35407 ↗(On Diff #167788)

This should probably be below the LegalizeOps check. We should give ample opportunity for AND based DAG combines to optimize this.

lebedev.ri marked an inline comment as done.

Move to after isBeforeLegalizeOps().

lib/Target/X86/X86ISelLowering.cpp
35407 ↗(On Diff #167788)

I'm not sure about the test for this.
Also, should it be *right* after the isBeforeLegalizeOps(), or somewhere at the end of the block?

RKSimon added inline comments.Oct 10 2018, 4:33 AM
lib/Target/X86/X86ISelLowering.cpp
35249 ↗(On Diff #167790)

asserts should have a message

test/CodeGen/X86/bmi-x86_64.ll
105 ↗(On Diff #167790)

Why is this call non_bextr64?

lebedev.ri added inline comments.Oct 10 2018, 4:37 AM
test/CodeGen/X86/bmi-x86_64.ll
105 ↗(On Diff #167790)

That is how you named it when adding in rL232580 / https://github.com/llvm-mirror/llvm/commit/cbaefea0c0c1792390375b20c31b7c1fe8d0d2c7
Should i rename it?

105 ↗(On Diff #167790)

Whoops, typo, s/you//.
Still, should i rename it?

I'm a bit worried about these test changes - I thought this patch was about moving the existing code, not altering the pattern matching features.

I'm a bit worried about these test changes - I thought this patch was about moving the existing code, not altering the pattern matching features.

I guess i mislabelled it then. Please see the description of this differential for the context.
Would it be better to split this into a NFC code move + non-NFC "support shifted mask"?

Split into two reviews.
This only moves the code, but one test still changes..

RKSimon accepted this revision.Oct 10 2018, 8:12 AM

LGTM - thanks

This revision is now accepted and ready to land.Oct 10 2018, 8:12 AM

LGTM - thanks

Thank you for the review!

This revision was automatically updated to reflect the committed changes.

For the PR we're trying to fix, perhaps we should be looking at these 3 similar functions in X86ISelDAGToDAG.cpp that are called when an address computation comes from a SHL+AND

if (!foldMaskAndShiftToExtract(*CurDAG, N, Mask, Shift, X, AM))
  return false;

// Try to fold the mask and shift directly into the scale.
if (!foldMaskAndShiftToScale(*CurDAG, N, Mask, Shift, X, AM))
  return false;

// Try to swap the mask and shift to place shifts which can be done as
// a scale on the outside of the mask.
if (!foldMaskedShiftToScaledMask(*CurDAG, N, Mask, Shift, X, AM))
  return false;