This is an archive of the discontinued LLVM Phabricator instance.

[X86] Fix NOOP sched overrides on BDW/HSW/SKL.
ClosedPublic

Authored by courbet on Jun 11 2018, 8:02 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

courbet created this revision.Jun 11 2018, 8:02 AM
courbet updated this revision to Diff 150764.Jun 11 2018, 8:05 AM

Update missing test.

The Intel optimization manual talks about nops executing and says that the multi byte nop has an execution dependency on whatever register is encoded in the modrm byte. So that sorta sounds like it uses resources.

The Intel optimization manual talks about nops executing and says that the multi byte nop has an execution dependency on whatever register is encoded in the modrm byte. So that sorta sounds like it uses resources.

I meant issue ports (llvm ProcResources). I went and looked at the optimization manual, that states:

The one byte NOP:[XCHG EAX,EAX] has special hardware support. Although it still consumes a µop and
its accompanying resources, the dependence upon the old value of EAX is removed. This µop can be
executed at the earliest possible opportunity, reducing the number of outstanding instructions and is the
lowest cost NOP.
The other NOPs have no special hardware support. Their input and output registers are interpreted by the
hardware. Therefore, a code generator should arrange to use the register containing the oldest value as
input, so that the NOP will dispatch and release RS resources at the earliest possible opportunity.

On the other hand, elsewhere, it says:

Some micro-ops can execute to completion during rename and are removed from the pipeline at that
point, effectively costing no execution bandwidth. These include:
• Zero idioms (dependency breaking idioms).
• NOP.
• VZEROUPPER.
• FXCHG

I guess what it means is that multi-byte NOPs still consume a ROB entry and wait for deata dependencies, but we do measure multi-byte NOPs and see no issue port usage:


The only way they would wait for a data dependency is if they actually went into the RS. And I would expect the only way out of the RS is to go out an execution port. I wonder if Intel failed to update the optimization manual after some uarch change. If I remember from my days in hardware design long ago during Nehalem, the only uop that was removed during rename was FXCH. So maybe some things changed during Sandy Bridge when the phyiscal register file was added. Or when move elimimination was added.

I guess that's the manual was never updated then. I've done some more experimenting, see the code here: https://github.com/google/EXEgesis/tree/master/exegesis/mysteries/nop

Unless I'm mistaken, it really seems that the NOP uop makes it out of the RS without passing through an execution port (or never gets there in the first place).

This revision is now accepted and ready to land.Jun 17 2018, 10:03 PM
This revision was automatically updated to reflect the committed changes.