Microcode update for Jump Conditional Code Erratum may cause performance
loss for some workloads:
https://www.intel.com/content/www/us/en/support/articles/000055650.html
Here is the patch to mitigate performance impact by aligning branches
within 32-byte boundary. The impacted instructions are:
a. Conditional jump.
b. Fused conditional jump.
c. Unconditional jump.
d. Indirect jump.
e. Ret.
f. Call.
Add an option -mbranches-within-32B-boundaries to align branches within atwo options for llvm-mc:
32-Byte boundary to reduce the potential performance loss of the microcode1. `-x86-align-branch-boundary=NUM` aligns branches within NUM byte boundary.
update2. The option is equivalent to the combination of three options:
-malign-branch-boundary=32
-malign-branch=fused+jcc+jmp
-malign-branch-prefix-size=5
and add -x86-branches-within-32B-boundaries for llvm-mc to enable
-x86-align-branch-boundary=32
-x86-align-branch=fused+jcc+jmp
-x86-align-branch-prefix-size=5
More fine options added for clang:
1. -malign-branch-boundary=NUM aligns branches within NUM byte boundary`-x86-align-branch=TYPE[+TYPE...]` specifies types of branches to align.
2. -mto align- branch=TYPE[+TYPE...] specifies types of branches to align.es within a 32-Byte boundary to reduce the potential performance
3. -malign-branch-prefix-size=NUM limits the prefix size by NUM
per instructionloss of the microcode update.
The correponding options for llvm-mc are -x86-align-branch-boundary=NUM,
-x86-align-branch=TYPE[+TYPE...], -x86-align-branch-prefix-size=NUM.
A new MCFragment type, MCMachineDependentMCBoundaryAlignFragment, is added, which has
6 subtypes:
1. BranchPadding: The variable size frag to insert NOP before branch.
2. BranchPrefix: The variable size frag to insert segment prefixes to an4 subtypes:
instruction. The choice of prefixes are:
a. Use the existing segment prefix if there is one.
b. Use CS segment prefix in 64-bit mode.
c. In 32-bit mode,1. use SS segment prefix with ESP/EBP base register`BranchPadding`: The variable size frag to insert NOP before branch.
and use DS segment prefix without ESP/EBP base register.
32. `FusedJccPadding:`: The variable size frag to insert NOP before fused
conditional jump.
43. Branch`FusedJccSplit`: The 0zero size frag to separate the instruction which is fused
with the following conditional jump from fused jcc.
54. HardCodeBegin: The zero siz`FusiblePlaceHolder`: The frag to mark the begin ofment to be inserted before the sequence ofinstruction that
hard code.
6is valid as first instruction in macro fusion. HardCodeEnd: The zero size fragment to mark the end of the sequence ofIt would turn into
hard codeFusedJccPadding if macro fusion really happened.
alignBranchesBegin and alignBranchesEnd are used to
insert MCMachineDependentFragment before instructions, relaxMachineDependent
grows or shrinks sizes of prefix and NOP to align the next branch frag:`alignBranchesBegin` inserts `MCBoundaryAlignFragment` before instructions,
1. First we try to add segment prefixes to instructions before a branch.`alignBranchesEnd` sets the target branch for the `MCBoundaryAlignFragment`,
2. If there is no sufficient room to add segment prefixes,`relaxBoundaryAlign` grows or shrinks sizes of NOP will be
inserted before ato align the target branch.
The prefix or nop padding is disabled in two cases:
1. If the previous item is hard code, which may be used to hardcode an
instruction, since there is no clear instruction boundary.Nop padding is disabled when the instruction may be rewritten by the linker,
2. If instruction may be rewritten by the linker, such as TLS Call.