The loop in findNextSetBit() runs one pass more than it should.
On 64-bit architectures this doesn't cause a problem, but 32-bit architectures mask the shift count to 5 bits which limits the count range to 0 to 31. Shifting by 32 has the same effect as shifting by 0, so if the first bit in the set is 1 the function will return with Index different from EndIndexVal. Because of that, the ranged for loops iterating thorough architectures will continue until hitting a 0 in the set, resulting in n additional iterations, where n is equal to the number of consecutive 1 bits at the start the set.
Ultimately TBDv1.WriteFile and TBDv2.WriteFile will output additional architectures causing a failure in the unit tests.