We favor 'and' and later 'test' in earlier phases, and that's usually the better option, but we can save a few instruction bytes by converting a mask constant to a shift here.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I like this, but I don't know if any intel archs have slow shift-by-imm ops (slow eflags updates) like they do for non-immediate shift amounts?
I think it should be ok. The decoders can see the shift amount and do some tricks to make the flags efficient.
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
5673 | Most likely yes. Let me try to put my transformation into this part of the code... |
I was thinking about the same thing, though looking around https://www.uops.info/table.html it seems that many CPUs can only schedule shifts on 1 or 2 of their ports while and typically can be scheduled on all of them. I'm not really sure whether that means we should only perform this transformation when going for code-size or whether it's unlikely anyway to hit port constraints so we should always do it...
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
5616–5617 | Guess this is no longer matching just IMM64 | |
5624 | I think the hasOneUse() check makes sense for the cases where you turning a testl xx, IMM32 into a shift. However for the IMM64 case we save a whole movabsq instruction so even if we end up with an extra COPY its a good deal. |
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
5624 | OK, we should add a test for that if it's worth doing. I think we should finalize D121320 first since that's the larger patch. Then we can add this as a small enhancement with the appropriate restrictions in place to avoid regressions. This patch is only saving 3 bytes. :) |
Patch updated:
Rebased now that D121320 has landed. The checks are adjusted to fit the new code structure. We have negative tests in place to verify that we don't enable any size-increasing transforms.
I have no hard numbers or experience how to value smaller code size against port constraints; Given that there's no other opinions and my intuition is that code size is likely more valuable here.
LGTM
Guess this is no longer matching just IMM64