The motivation here is the following inner loop in fp16/fp24 -> fp32 expander,

that runs as part of the floating-point DNG decompression in RawSpeed library:

https://github.com/darktable-org/rawspeed/blob/cd380bb9a209bd2e7a0e7022b0cab04528d151e7/src/librawspeed/decompressors/DeflateDecompressor.cpp#L112-L115

while (!(fp32_fraction & (1 << 23))) { fp32_exponent -= 1; fp32_fraction <<= 1; }

(https://godbolt.org/z/r13YMh)

As one might notice, that loop is currently uncountable, and that whole code stays scalar.

Yet, it is rather trivial to make that loop countable:

https://godbolt.org/z/do8WMz

and we can prove that via alive2:

https://alive2.llvm.org/ce/z/7vQnji (ha nice, isn't it?)

... and that allow for the whole fp16->fp32 code to vectorize:

https://godbolt.org/z/7hYr13

Now, while i'd love to get there, i feel like i should take it in steps.

For now, this introduces support for the most basic case,

where the bit position is known as a variable,

and the loop *will* go away (has no live-outs other than the recurrence,

no extra instructions in the loop).

I have added sufficient (i believe) test coverage,

and alive2 is happy with those transforms.

Thoughts?