The motivating bugs are:
https://bugs.llvm.org/show_bug.cgi?id=41340
https://bugs.llvm.org/show_bug.cgi?id=42697
As discussed there, we could view this as a failure of IR canonicalization, but then we would need to implement a backend fixup with target overrides to get this right in all cases. Instead, we can just view this as a target-specific opportunity. It's not even clear for x86 exactly when we should favor test+set; some CPUs have better theoretical throughput for the ALU ops than bt/test.
This patch is made more complicated than I expected because there's an early DAGCombine for 'and' that can change types of the intermediate ops via trunc+anyext.
Is it possible for the shift to be on a type larger than 64 bits and have an absurdly out of bounds shift amount that we haven't collapsed to undef yet such that this asserts? Or for that matter a 64 bit shift amount that's larger than 0xffffffff and not been folded yet. Since we truncate a uint64_t to unsigned here.