Page MenuHomePhabricator

[X86] Break false dependencies for POPCNT, LZCNT, TZCNT

Authored by myatsina on Nov 21 2017, 10:21 PM.



Add POPCNT, LZCNT, TZCNT to the list of instructions that have false dependency.
Add a test to make sure BreakFalseDeps breaks the dependencies for these instructions.
Update affected tests.

This fixes bugzilla

This is the last of 5 patches that fix bugzilla
Previous patches:

Diff Detail


Event Timeline

myatsina created this revision.Nov 21 2017, 10:21 PM
myatsina edited the summary of this revision. (Show Details)Nov 22 2017, 2:30 AM
myatsina set the repository for this revision to rL LLVM.
myatsina added a subscriber: llvm-commits.
zvi edited edge metadata.Nov 22 2017, 9:24 AM

I think we will need a subtarget feature indicating whether the false-dependency on these instructions happens. For starters, does this happen in AMD processors?

craig.topper edited edge metadata.Nov 22 2017, 9:41 AM

Yes we need a feature flag. As far as I know it doesn't happen on AMD. On Intel LZCNT/TZCNT have a problem on Sandybridge, ivybridge, haswell, broadwell. It's fixed on skylake. For popcnt, it has a problem on sandybridge, ivybridge, haswell, broadwell, skylake. I believe it is fixed in cannonlake.

craig.topper added inline comments.Nov 22 2017, 1:17 PM
8011 ↗(On Diff #123867)

Why are the register forms missing? They have the problem too.

8234 ↗(On Diff #123867)

Use XOR32rr on the 32-bit sub register. Shorter encoding and 0s the upper bits.

8244 ↗(On Diff #123867)

XOR16rr is not a dependency breaking instruction. It still reads bits 31:16 of the register and passes them through.

craig.topper added inline comments.Nov 22 2017, 2:21 PM
8244 ↗(On Diff #123867)

It doesn't look like gcc breaks the dependency on 16-bit versions. There is a real dependency there due to the need to preserve bits 31:16. But I would think the zeroing would still break the dependency on the previous iteration of the loop since the zeroing xor doesn't require execution resources.

myatsina updated this revision to Diff 124847.Nov 29 2017, 4:26 PM

Added subtarget features.
Adding the rr versions of the instructions to hasPartialRegUpdate affected folding of memory operands for these instructions (if an instruction has partial reg update, then it folds only if optimizing for size) - Made a small adjustment the test to recreate the problematic scenario after this change.

What are you planning to do with 16-bit instructions? Like I said XOR16 is insufficient to break a dependency.

7972 ↗(On Diff #124847)

Pass subtarget by const reference.

1 ↗(On Diff #124847)

The file name for this test case is missing a digit and we usually use "pr" instead of "bug". Though in this case I think you should name it something like more meaningful like bitcnt-false-dep.ll. We usually only name test cases with bug numbers when its a reduced test case for a crash or something where its harder to find a meaningful name.

4 ↗(On Diff #124847)

Test 64-bit as well?

What are you planning to do with 16-bit instructions? Like I said XOR16 is insufficient to break a dependency.

I'm leading towards adding a 32bit XOR. Do you agree with that solution?

87 ↗(On Diff #124847)

Will fix this in the commit

I don't know if you'd be allowed to widen the register class to 32-bit without doing some live register analysis to know the upper bits aren't used later.

But the dependency on 16-bit isn't completely "false", its needed to preserve the upper bits due to how partial registers are implemented for all processors after Sandybridge. Zeroing the register would break the dependency, but you would still need to do that on for 16-bit instructions even on Skylake and Cannonlake. The fixes made to the 32/64 instructions on those CPUs didn't change anything about the 16-bit instruction.

So I think we should just remove the 16-bit instructions from this patch.

myatsina updated this revision to Diff 125286.Dec 3 2017, 2:03 AM
myatsina marked 5 inline comments as done.

Addressed comments.

craig.topper added inline comments.Dec 4 2017, 3:54 PM
8245 ↗(On Diff #125286)

I think you need a .addreg(Reg, RegState::ImplicitDefine) here too. See the VXORPSrr code above.

myatsina updated this revision to Diff 125546.Dec 5 2017, 9:04 AM
myatsina marked an inline comment as done.

Added .addreg(Reg, RegState::ImplicitDefine)

This revision is now accepted and ready to land.Dec 6 2017, 12:13 PM
atdt added a subscriber: atdt.Dec 15 2017, 1:08 PM
This revision was automatically updated to reflect the committed changes.