This is an archive of the discontinued LLVM Phabricator instance.

[X86] Select BEXTR when there is only BMI1.
AbandonedPublic

Authored by lebedev.ri on Jun 22 2018, 10:03 AM.

Details

Summary

This does catch all the cases the BZHI case does.
And does not duplicate all the patterns.

I'm not sure if we want to squeeze the LSHR into the BEXTR itself.

Now that i have actually done this, i'm having second thoughts.
This clearly results in less instructions, which is great, since this
is quite common code pattern in some hottest bit manipulation loops.
But i'm having a bit of hard time coming up with the right sequence
of instructions to model the old output via llvm-mca..

Also, FIXME, can i somehow pass just the GR32:$lz,
is there some NOP dag node that would just return it's only argument?

Diff Detail

Repository
rL LLVM

Event Timeline

Matching multiple nodes in isel patterns without one use checks and then emitting multiple instructions for the pattern seems like dangerous territory. Is there more canonicalizing we should be doing in DAG combines to get these sequences into a form we can handle in isel?

Matching multiple nodes in isel patterns without one use checks and then emitting multiple instructions for the pattern seems like dangerous territory. Is there more canonicalizing we should be doing in DAG combines to get these sequences into a form we can handle in isel?

; Patterns:
;   a) x &  (1 << nbits) - 1
;   b) x & ~(-1 << nbits)
;   c) x &  (-1 >> (32 - y))
;   d) x << (32 - y) >> (32 - y)

We do canonicalize a -> b and d -> c in instcombine.
We could do the same in dagcombine.

Matching multiple nodes in isel patterns without one use checks and then emitting multiple instructions for the pattern seems like dangerous territory.
Is there more canonicalizing we should be doing in DAG combines to get these sequences into a form we can handle in isel?

The [[ https://www.felixcloutier.com/x86/BEXTR.html | BEXTR ]] is older, 'more powerful' version of [[ https://www.felixcloutier.com/x86/BZHI.html | BZHI ]].
It takes destination reg, source reg/mem, but unlike [[ https://www.felixcloutier.com/x86/BZHI.html | BZHI ]],
it does not take just the starting bit from which to zero out.
It actually *extracts* the bits. Meaning, it also requires
the knowledge about the bit starting from which to extract.
But unfortunately, it still takes only one parameter - control.
And the control is basically a packed structure with 2 8-bit fields:

  • 0..7 (starting with low bits) - starting position for extraction
  • 8..15 - specifies the number of bits to extract.

So we need to pass the len in 8..15 bits. It has to get there somehow.
So no, i'm not seeing how we can avoid that via canonicalization :/

Just to reiterate, i don't see what else we can do here.
We do need to emit two instructions.
And i do think we should indeed be emitting bextr in these situations.

Can we DAG combine to X86ISD::BEXTR for these cases? Then we can properly check one use of the inner parts of the pattern. The other option might be to use PatFrags that check for one use like the existing srl_su fragment. My concern is that if the inner nodes of the patterns here have more uses, the emitted pattern won't completely remove the need for the inner node. So you'll end up emitting multiple instructions for this pattern and additional instructions for the remaining use of the inner node.

Can we DAG combine to X86ISD::BEXTR for these cases? Then we can properly check one use of the inner parts of the pattern.

Do we care whether the *entire* old patter goes away, or just one more instruction?
But yes, should work, let's see.

The other option might be to use PatFrags that check for one use like the existing srl_su fragment. My concern is that if the inner nodes of the patterns here have more uses, the emitted pattern won't completely remove the need for the inner node. So you'll end up emitting multiple instructions for this pattern and additional instructions for the remaining use of the inner node.

Can we DAG combine to X86ISD::BEXTR for these cases? Then we can properly check one use of the inner parts of the pattern.

Do we care whether the *entire* old patter goes away, or just one more instruction?
But yes, should work, let's see.

Hopefully combining in DAG would help https://bugs.llvm.org/show_bug.cgi?id=38938.

Not sure if we should allow TBM BEXTR (imm) more leniently than BMI1 BEXTR (reg)?

What's the status of this patch right now?

What's the status of this patch right now?

I'll abandon it once D52348 and the similar follow-ups finish migrating this pattern-matching into X86DAGToDAGISel::matchBEXTR().

lebedev.ri abandoned this revision.Oct 30 2018, 5:43 AM

Okay, great.
There are some follow-ups to be done:

  • It is likely that we can extract the lshr and store it into BEXTR control.
  • Some fixes for handling of differently-sized inputs and outputs are likely needed.