Just a hack to experiment with D21774.
I first tried the naive INSERT_SUBREG, but that ended up generating non-ideal code, like:
vcomisd %xmm1, %xmm0 setae %cl xorl %eax, %eax movb %cl, %al retl
We looked into this with Matthias and Quentin, and I think I got it working, roughly?
Long story short: it's more complicated than it could be because we need to make it friendly to the register coalescer.
If we do the naive INSERT_SUBREG (without any of the glues, nor the custom setcc selection), we end up with:
16B %vreg1<def> = COPY %XMM1; VR128:%vreg1 32B %vreg0<def> = COPY %XMM0; VR128:%vreg0 48B Int_VCOMISDrr %vreg0, %vreg1, %EFLAGS<imp-def>; VR128:%vreg0,%vreg1 64B %vreg2<def> = SETAEr %EFLAGS<imp-use,kill>; GR8:%vreg2 80B %vreg3<def> = MOV32ri 0; GR32:%vreg3 96B %vreg4<def> = COPY %vreg3; GR32_ABCD:%vreg4 GR32:%vreg3 112B %vreg4:sub_8bit<def> = COPY %vreg2; GR32_ABCD:%vreg4 GR8:%vreg2 128B %EAX<def> = COPY %vreg4; GR32_ABCD:%vreg4 144B RET 0, %EAX<kill>
We really want the coalescer to produce:
64B %vreg4<def> = SETAEr %EFLAGS<imp-use,kill>; GR8:%vreg2 80B %vreg3<def> = MOV32ri 0; GR32:%vreg3 112B %vreg4:sub_8bit<def> = COPY %vreg3; GR32_ABCD:%vreg4 GR8:%vreg2
but that's impossible without moving the MOV above the SETcc (which the coalescer doesn't do).
So, instead, we glue the INSERT_SUBREG to the SETcc (itself glued to the EFLAGS copy, because EFLAGS).
A nice consequence of this approach is that the DAG scheduler is the one that ends up deciding where to put the XOR, so that we don't have to care about that and EFLAGS.
WDYT?