This does catch all the cases the BZHI case does.
And does not duplicate all the patterns.
I'm not sure if we want to squeeze the LSHR into the BEXTR itself.
Now that i have actually done this, i'm having second thoughts.
This clearly results in less instructions, which is great, since this
is quite common code pattern in some hottest bit manipulation loops.
But i'm having a bit of hard time coming up with the right sequence
of instructions to model the old output via llvm-mca..
Also, FIXME, can i somehow pass just the GR32:$lz,
is there some NOP dag node that would just return it's only argument?