Introduce an option x86-align-for-macrofusion to prevent a pair of
macro-fusion eligible instructions from being split by a given alignment
boundary by automatically padding the first instruction in a pair with
a minimal size nop.
In effect, it ensures that a pair of macro-fusible instructions is not split by
a cache line boundary, which is a precondition for macro-op fusion in
modern Intel Cores (see Intel Architecture Optimization Reference Manual,
2.3.2.1 Legacy Decode Pipeline: Macro-Fusion).
The comments here is weired after you added the new usage to the fragment, you need to refine it.