This is an archive of the discontinued LLVM Phabricator instance.

[x86] try harder to use shift instead of test if it can save some immediate bytes
ClosedPublic

Authored by spatel on Mar 7 2022, 12:23 PM.

Details

Summary

We favor 'and' and later 'test' in earlier phases, and that's usually the better option, but we can save a few instruction bytes by converting a mask constant to a shift here.

Diff Detail

Event Timeline

spatel created this revision.Mar 7 2022, 12:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 12:23 PM
spatel requested review of this revision.Mar 7 2022, 12:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 12:23 PM

I like this, but I don't know if any intel archs have slow shift-by-imm ops (slow eflags updates) like they do for non-immediate shift amounts?

I like this, but I don't know if any intel archs have slow shift-by-imm ops (slow eflags updates) like they do for non-immediate shift amounts?

I think it should be ok. The decoders can see the shift amount and do some tricks to make the flags efficient.

RKSimon added inline comments.
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
5673

If we added shifted mask detection would we hit any of the folds that D121320 is targetting?

MatzeB added inline comments.Mar 9 2022, 3:19 PM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
5673

Most likely yes. Let me try to put my transformation into this part of the code...

MatzeB added a comment.Mar 9 2022, 3:35 PM

I was thinking about the same thing, though looking around https://www.uops.info/table.html it seems that many CPUs can only schedule shifts on 1 or 2 of their ports while and typically can be scheduled on all of them. I'm not really sure whether that means we should only perform this transformation when going for code-size or whether it's unlikely anyway to hit port constraints so we should always do it...

MatzeB added inline comments.Mar 9 2022, 4:06 PM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
5673

I changed D121320 now to update this part of the code. However I think the cases are different enough that we can discuss this diff and D121320 separately.

MatzeB added inline comments.Mar 10 2022, 10:40 AM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
5616–5617

Guess this is no longer matching just IMM64

5624

I think the hasOneUse() check makes sense for the cases where you turning a testl xx, IMM32 into a shift. However for the IMM64 case we save a whole movabsq instruction so even if we end up with an extra COPY its a good deal.

spatel added inline comments.Mar 10 2022, 1:53 PM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
5624

OK, we should add a test for that if it's worth doing.

I think we should finalize D121320 first since that's the larger patch. Then we can add this as a small enhancement with the appropriate restrictions in place to avoid regressions. This patch is only saving 3 bytes. :)

spatel updated this revision to Diff 415899.Mar 16 2022, 10:36 AM

Patch updated:
Rebased now that D121320 has landed. The checks are adjusted to fit the new code structure. We have negative tests in place to verify that we don't enable any size-increasing transforms.

spatel marked 2 inline comments as done.Mar 16 2022, 10:38 AM
MatzeB accepted this revision.Mar 16 2022, 1:46 PM

I have no hard numbers or experience how to value smaller code size against port constraints; Given that there's no other opinions and my intuition is that code size is likely more valuable here.

LGTM

This revision is now accepted and ready to land.Mar 16 2022, 1:46 PM
This revision was landed with ongoing or failed builds.Mar 17 2022, 6:14 AM
This revision was automatically updated to reflect the committed changes.