As discussed in D48491, we can't really do this in the TableGen, since we need to produce *two* instructions.
This only implements one single pattern. The other 3 patterns will be in follow-ups.
I'm not sure yet if we want to also fuse shift into here (i.e (x >> start) & ...)
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
2664–2671 | Do we get rid of at least one extra instruction, or of both of them? |
Use ISD::ANY_EXTEND. This gets rid of the extra movzbl of the NBits.
At least that is what D48768 already produced.
If NBits is larger than 64 (in 64-bit case), the shift was already undefined, i think?
I'm gonna stop here, i think this is the state for the review.
Fix stupid typo in comments - '8-bit low subreg' is AL, not AX.
lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
2696 | This can just be NBits.getSimpleValueType(). Using the -> is taking you to SDNode, but you already had SDValue. But can NBits ever not be i8? It came from a shift didn't it? | |
2704 | I think I'd prefer to avoid using AL/AH here. It makes it seem like the instruction uses those registers and would be restricted to A/B/C/D registers. |
Don't truncate NBits (it came from shift, therefore it's i8 already),
adjust comments.
lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
2696 | In *these* two patterns - yes, it came from shift. |
(I feel like pointing out that this patch is largely free of all that BEXTRI drama, since the mask is not a constant.)
lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
2697 | Shouldn't we be using TargetOpcode::IMPLICIT_DEF not Undef? |
lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
2697 | Hm, i thought i tried that initially. |
I don't see why?
This only operates on non-constants, that only operates on constants.
There might be some rebasing pain, but it is my problem.
lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
2671 | ISD::isAllOnesConstant |
Thank you for the review!
We will now have a duplicating matchers in the tablegen, and here, for these patterns.
What do you think about also matching the BZHI here, and dropping the tablegen side?
Else, there will be no test coverage for the x & (-1 >> (bitwidth - c)) pattern, because it will be unfolded already.
Do we get rid of at least one extra instruction, or of both of them?