Truncate currently uses a udivrem call which is going to be slow particularly for larger than 64-bit widths.
As far as I can tell all we were trying to do was modulo LowerDiv by (MaxValue+1) and make sure whatever value was effectively subtracted from LowerDiv was also subtracted from UpperDiv.
This patch recognizes that MaxValue+1 is a power of 2 so we can just use a bitwise AND to accomplish a modulo operation or isolate the upper bits.
Note this countTrailingOnes check isn't strictly necessary. If Upper is equal to MaxValue(DstTy), Union will be FullSet. We could continue on and trust unionWith to return FullSet.