The main change here is to add a widenScalarToNextPow2 before the
clampScalar so that non-power-of-two sizes between 32 and 64 are
turned into s64 count trailing zeroes.
However, if you make the legalisation rules depend on TypeIdx 0 (the
output), then you still get crashes for the s65 testcase, which I solved
by instead flipping the rules around to be about TypeIdx 1 (the input),
with a scalarSameSizeAs at the end to tie index 0 to index 1. This,
incidentally, is how things are written for G_CTLZ.