In 16-bit mode, some of the nop patterns used in 32-bit mode can end up
mangling other instructions. For instance, an aligned "movz" instruction
may have the 0x66 and 0x67 prefixes omitted, because the nop that's used
messes things up.
xorl %ebx, %ebx .p2align 4, 0x90 movzbl (%esi,%ebx), %ecx
Use instead nop patterns we know 16-bit mode can handle.
I think we could just return 1 in 16-bit mode. There's no real reason to do nop-optimization in that mode, is there? Then the rest of the patch isn't necessary.