The LD -> LE optimization for Thread-Local Storage without PLT requires
an additional "66" prefix, otherwise the next instruction will be
corrupted, causing runtime misbehavior (crashes) of the linked object.
The other (GD -> IE/LD) optimizations are the same with or without PLT,
but at tests for completeness. Those instructions are copied from
https://raw.githubusercontent.com/wiki/hjl-tools/x86-psABI/x86-64-psABI-1.0.pdf#subsection.11.1.2
This does not try to address ILP32 (x32) support.