As noted in D52747, if we prefer IR to use trunc for bool vectors rather than and+icmp, we can expose codegen shortcomings as seen here with masked store.
We can replace a hard-coded PCMPGT simplification with the more general demanded bits call here to improve things. The AVX1 pattern still isn't handled, so that's another potential dependency for the instcombine patch (although I'm not sure how much masked op usage we prefer with only AVX1).