Introduce an option x86-align-for-macrofusion to prevent a pair of 
macro-fusion eligible instructions from being split by a given alignment 
boundary by automatically padding the first instruction in a pair with 
a minimal size nop.
In effect, it ensures that a pair of macro-fusible instructions is not split by
a cache line boundary, which is a precondition for macro-op fusion in
modern Intel Cores (see Intel Architecture Optimization Reference Manual, 
2.3.2.1 Legacy Decode Pipeline: Macro-Fusion).
The comments here is weired after you added the new usage to the fragment, you need to refine it.