The loop in `findNextSetBit()` runs one pass more than it should.

On 64-bit architectures this doesn't cause a problem, but 32-bit architectures mask the shift count to 5 bits which limits the count range to 0 to 31. Shifting by 32 has the same effect as shifting by 0, so if the first bit in the set is 1 the function will return with `Index` different from `EndIndexVal`. Because of that, the ranged for loops iterating thorough architectures will continue until hitting a 0 in the set, resulting in *n* additional iterations, where *n* is equal to the number of consecutive 1 bits at the start the set.

Ultimately TBDv1.WriteFile and TBDv2.WriteFile will output additional architectures causing a failure in the unit tests.