It is possible to create an IR pattern that results in exponential execution
time in CollectBitParts, by making a tree where every OR refers to one other OR via *both*
operands:
for (i = 0; i < N; ++i) b |= b >> 1; // Causes 2^N executions of CollectBitParts!
To defend against this (and because ORs are the only node that cause us to
fork control flow) we keep a counter alive between all invocations of
CollectBitParts, expected to be initialized to bitwidth(b).
This seems a bit much if you have i128. What is NumOrsRemaining typically when this optimization kicks in?