Noop certainly does not use resources.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
The Intel optimization manual talks about nops executing and says that the multi byte nop has an execution dependency on whatever register is encoded in the modrm byte. So that sorta sounds like it uses resources.
I meant issue ports (llvm ProcResources). I went and looked at the optimization manual, that states:
The one byte NOP:[XCHG EAX,EAX] has special hardware support. Although it still consumes a µop and its accompanying resources, the dependence upon the old value of EAX is removed. This µop can be executed at the earliest possible opportunity, reducing the number of outstanding instructions and is the lowest cost NOP. The other NOPs have no special hardware support. Their input and output registers are interpreted by the hardware. Therefore, a code generator should arrange to use the register containing the oldest value as input, so that the NOP will dispatch and release RS resources at the earliest possible opportunity.
On the other hand, elsewhere, it says:
Some micro-ops can execute to completion during rename and are removed from the pipeline at that point, effectively costing no execution bandwidth. These include: • Zero idioms (dependency breaking idioms). • NOP. • VZEROUPPER. • FXCHG
I guess what it means is that multi-byte NOPs still consume a ROB entry and wait for deata dependencies, but we do measure multi-byte NOPs and see no issue port usage:
The only way they would wait for a data dependency is if they actually went into the RS. And I would expect the only way out of the RS is to go out an execution port. I wonder if Intel failed to update the optimization manual after some uarch change. If I remember from my days in hardware design long ago during Nehalem, the only uop that was removed during rename was FXCH. So maybe some things changed during Sandy Bridge when the phyiscal register file was added. Or when move elimimination was added.
I guess that's the manual was never updated then. I've done some more experimenting, see the code here: https://github.com/google/EXEgesis/tree/master/exegesis/mysteries/nop
Unless I'm mistaken, it really seems that the NOP uop makes it out of the RS without passing through an execution port (or never gets there in the first place).