This is an archive of the discontinued LLVM Phabricator instance.

[x86] Bring back the MOVZX64rr* pseudo instructions so that they can be coalesced using X86InstrInfo::isCoalescableExtInstr
Needs RevisionPublic

Authored by craig.topper on Sep 14 2017, 7:19 PM.

Details

Summary

The InstrEmitter and the peephole pass both know how to use TTI->isCoalescableExtInstr to coalesce subregister accesses around MOVZX/MOVSX instructions since the lower bits are not altered. Unfortunately, they don't understand SUBREG_TO_REG MOV32rr/MOVZX32rr16/MOVZX32rr8 we currently emit for zero extending to 64-bits.

This patch reintroduces pseudo instructions that we can recognize and handle in isCoalescableExtInst. These pseudo instructions used to exist, but were removed in r182921 in favor of SUBREG_TO_REG. I've modified the pseudos relative to the original versions by making them true pseudos which get converted after register allocation instead of in X86MCInstLower. I didn't add back the memory versions because we only care about register->register for this. We may need to teach load foldMemoryOperand about these things too, but I don't have a test case for that right now.

I think this broke what the tail-dup-merge-loop-headers.ll was trying to test. But the test case was a complicated case that the comments say bugpoint wasn't able to reduce very well so I have no idea how I could reconstruct it.

Diff Detail

Event Timeline

craig.topper created this revision.Sep 14 2017, 7:19 PM

Is it possible to change InstrEmitter/peephole pass to coalesce even in cases of SUBREG_TO_REG?

I imagine it would be possible, but we'd also need a new target hook to indicate that MOV32rr is coalescable. It just seemed easier to use the existing stuff rather than implement a second way to do it.

I imagine it would be possible, but we'd also need a new target hook to indicate that MOV32rr is coalescable. It just seemed easier to use the existing stuff rather than implement a second way to do it.

The benefit of coalescing SUBREG_TO_REG is that there might be cases of zero/sign extension cases (other than 64 bit zero extension) that will be also optimized by InstrEmitter/peephole passes.
Maybe it is possible to add the support of SUBREG_TO_REG into isCoalescableExtInstr (maybe in a recursive way).
Anyway I didn't try that, but it might be worth considering the possibility of a more generic coalescing.

RKSimon accepted this revision.Nov 4 2017, 10:37 AM

LGTM

This revision is now accepted and ready to land.Nov 4 2017, 10:37 AM
craig.topper planned changes to this revision.Nov 13 2017, 4:48 PM

I noticed this is missing some load folding of stack reloads in the multiply tests. I'm also not seeing much benefit and a couple small regressions in our benchmark runs. So I'm going to hold off on this.

D52757 just reminded me about this one - is this still being looked at?

Rebase. I'm going to rerun our benchmark list against it.

This revision is now accepted and ready to land.Oct 2 2018, 1:14 PM
RKSimon requested changes to this revision.Oct 8 2018, 12:05 PM

Waiting on benchmarks

This revision now requires changes to proceed.Oct 8 2018, 12:05 PM