__countl_zero should count zeros (bits) starting from the left.
The bug fixed by this only affects big integers that cannot be handled in one go.
The idea is to successively take blocks from the left and apply the opperation recursively.
To do that we rotate to the left and the left-most block gets wrapped-around to the right-most position,
where we can apply the operation to it alone (by casting it to a smaller integer).
The current rotation used is to the right (64-bits), which works for 128-bits because it just swaps the halves.