We can implement find_first_unset_in() in the same function if every BitWord we use is first flipped.
Performance wise, it should be the same because the branch predictor will do a very good job
since Set stays the same in a single run of the function. Plus, this won't even be a real
branch in x86, it will be just a cmov.