The InstrEmitter and the peephole pass both know how to use TTI->isCoalescableExtInstr to coalesce subregister accesses around MOVZX/MOVSX instructions since the lower bits are not altered. Unfortunately, they don't understand SUBREG_TO_REG MOV32rr/MOVZX32rr16/MOVZX32rr8 we currently emit for zero extending to 64-bits.
This patch reintroduces pseudo instructions that we can recognize and handle in isCoalescableExtInst. These pseudo instructions used to exist, but were removed in r182921 in favor of SUBREG_TO_REG. I've modified the pseudos relative to the original versions by making them true pseudos which get converted after register allocation instead of in X86MCInstLower. I didn't add back the memory versions because we only care about register->register for this. We may need to teach load foldMemoryOperand about these things too, but I don't have a test case for that right now.
I think this broke what the tail-dup-merge-loop-headers.ll was trying to test. But the test case was a complicated case that the comments say bugpoint wasn't able to reduce very well so I have no idea how I could reconstruct it.