This is an archive of the discontinued LLVM Phabricator instance.

[X86] Apply SlowIncDec feature to Sandybridge/Ivybridge CPUs as well
ClosedPublic

Authored by craig.topper on Aug 28 2017, 10:37 PM.

Details

Summary

Currently we start applying this on Haswell and newer. I don't believe anything changed in the Haswell architecture to make this the right cutoff point. The partial flag handling around this has been roughly the same since Sandybridge.

gcc seems to disable inc/dec all the way back to at least core2. We could do the same if we want.

Diff Detail

Event Timeline

craig.topper created this revision.Aug 28 2017, 10:37 PM
chandlerc accepted this revision.Aug 29 2017, 8:50 AM

I'm fine with this, but it would be good to get confirmation that there is a genuine partial flag update stall issue we're avoiding with this... As long as you all can confirm that this is a real issue even on modern processors, then all this makes sense. =D

This revision is now accepted and ready to land.Aug 29 2017, 8:50 AM

I think Pentium 4 and Silvermont read the old CF value and pass it through the INC/DEC instruction.

I think the "core" line prior to Sandy Bridge any instruction that reads the CF flag after INC/DEC experienced a stall until the INC/DEC instruction retired.

I think from Sandy Bridge onward, it is no longer a stall until retirement, but an instruction that reads the CF flag after INC/DEC become dependent on the last writer of CF before the INC/DEC.

I think Pentium 4 and Silvermont read the old CF value and pass it through the INC/DEC instruction.

I think the "core" line prior to Sandy Bridge any instruction that reads the CF flag after INC/DEC experienced a stall until the INC/DEC instruction retired.

I think from Sandy Bridge onward, it is no longer a stall until retirement, but an instruction that reads the CF flag after INC/DEC become dependent on the last writer of CF before the INC/DEC.

Is it only an instruction that reads the *CF* flag?

Because if so, then we might want to drop this entirely... If an instruction read the CF flag after INC/DEC, either some other instruction def'ed the flag for that read killing the dependency, it is *trying* to read CF from a previous instruction in a fit of cleverness, or it is just buggy code... In any case, the dependency doesn't seem like a problem?

(Or maybe we get "false" dependencies because of how this is implemented inside the chip?)

Put differently, maybe we only want this on processors *older* than sandybridge... but today, we don't. and we don't have a lot of tuning for older processors... so maybe we should just nuke it going forward.

There is definitely a false dependency issue that can be triggered by "shift CL" because variables shifts are defined to not update any flags if CL is 0, but the shift value isn't know until it executes so the shift is dependent on the old flags including CF

There is definitely a false dependency issue that can be triggered by "shift CL" because variables shifts are defined to not update any flags if CL is 0, but the shift value isn't know until it executes so the shift is dependent on the old flags including CF

That sounds bad regardless of inc/dec though -- should it just be handled by a flag-clearing instruction to break the dependency? (Either something like xorl %eax, %eax or orl %eax, %eax depending on whether there is a dead register...)

From HSW on we should be using SHLX/SHRX/SARX instead of any "shift CL" to avoid this silliness. Those instruction don't update flags.

'xor' would definitely break the dependency. 'or' would turn the flag dependency into a register dependency so that's not really better.

This revision was automatically updated to reflect the committed changes.