This is probably the least important of our movmsk problems, but I'm starting at the bottom to reduce distractions.
We were creating a select_cc which bypasses the select and bitmask codegen optimizations that we have now. If we produce a compare+negate instead, we allow things like neg/sbb carry bit hacks, and in all cases we avoid a cmov. There's no partial register update danger in these sequences because we always produce the zero-register xor ahead of the 'set' if needed.
There seems to be a missing fold for sext of a bool bit here:
negl %ecx movslq %ecx, %rax
...but that's an independent transform.