Microcode update for Jump Conditional Code Erratum may cause performance
loss for some workloads:
Here is the patch to mitigate performance impact by aligning branches
within 32-byte boundary. The impacted instructions are:
a. Conditional jump. b. Fused conditional jump. c. Unconditional jump. d. Indirect jump. e. Ret. f. Call.
Add an option -mbranches-within-32B-boundaries to align branches within a
32-Byte boundary to reduce the potential performance loss of the microcode
update. The option is equivalent to the combination of three options:
and add -x86-branches-within-32B-boundaries for llvm-mc to enable
More fine options added for clang:
- -malign-branch-boundary=NUM aligns branches within NUM byte boundary.
- -malign-branch=TYPE[+TYPE...] specifies types of branches to align.
- -malign-branch-prefix-size=NUM limits the prefix size by NUM
The correponding options for llvm-mc are -x86-align-branch-boundary=NUM,
A new MCFragment type, MCMachineDependentFragment, is added, which has
- BranchPadding: The variable size frag to insert NOP before branch.
- BranchPrefix: The variable size frag to insert segment prefixes to an instruction. The choice of prefixes are: a. Use the existing segment prefix if there is one. b. Use CS segment prefix in 64-bit mode. c. In 32-bit mode, use SS segment prefix with ESP/EBP base register and use DS segment prefix without ESP/EBP base register.
- FusedJccPadding: The variable size frag to insert NOP before fused conditional jump.
- BranchSplit: The 0 size frag to separate the instruction which is fused with the following conditional jump from fused jcc.
- HardCodeBegin: The zero size frag to mark the begin of the sequence of hard code.
- HardCodeEnd: The zero size fragment to mark the end of the sequence of hard code.
alignBranchesBegin and alignBranchesEnd are used to
insert MCMachineDependentFragment before instructions, relaxMachineDependent
grows or shrinks sizes of prefix and NOP to align the next branch frag:
- First we try to add segment prefixes to instructions before a branch.
- If there is no sufficient room to add segment prefixes, NOP will be
inserted before a branch.
The prefix or nop padding is disabled in two cases:
- If the previous item is hard code, which may be used to hardcode an
instruction, since there is no clear instruction boundary.
- If instruction may be rewritten by the linker, such as TLS Call.