This is an archive of the discontinued LLVM Phabricator instance.

[X86] Break false dependencies before partial register updates when the source operand is in memory
ClosedPublic

Authored by mkuper on Dec 11 2014, 7:51 AM.

Details

Summary

The list of opcodes false-dependency-breaking looks at consists only of register-source variants, but not memory-source.
It doesn't look like there's a good reason for that, since memory-source variants suffer from the same problem - e.g. the loop in the attached testcase gets a roughly 3x performance improvement on sandybridge.

Diff Detail

Repository
rL LLVM

Event Timeline

mkuper updated this revision to Diff 17174.Dec 11 2014, 7:51 AM
mkuper retitled this revision from to [X86] Break false dependencies before partial register updates when the source operand is in memory.
mkuper updated this object.
mkuper edited the test plan for this revision. (Show Details)
mkuper added reviewers: atrick, nadav.
mkuper added a subscriber: Unknown Object (MLST).

Thanks for looking at this Michael.

Why didn't you add the ROUNDS*, RSQRTSS and SQRTS* m_Int variants?

Also I believe the SQRTSD ops can be added as well?

mkuper added a comment.EditedDec 14 2014, 1:14 AM

Thanks a lot for the review, Simon.

ROUNDS* don't currently have m_Int variants defined.
I thought RSQRTSS and SQRTSS don't have them defined either, but apparently they do and I just missed them, I'll add them, thanks!

Regarding SQRTSD, I don't see why the r variants weren't on the original list. I'll add all variants.

Heh.

Apparently even though SQRTSD was never on the list, there was a test that checked that memory operands into SQRTSD were not load-folded. The test apparently passed not because of this code, but because they weren't folded regardless. At some point, Manman improved the folding logic, and changed the test to verify that SQRTSD memory operands ARE in fact folded.

Rolling the test back as well.

mkuper updated this revision to Diff 17261.Dec 14 2014, 1:40 AM

Please can you double check that cvtsd2ss test and if possible add tests for what other partial ops you can?

Also - should your test case be under sse-domains.ll or one of the break-*-dep.ll test files?

Unless you are wanting your reviewers to confirm I'm happy to approve this after that.

Cheers, Simon.

test/CodeGen/X86/break-sse-dep.ll
18 ↗(On Diff #17261)

Should cvtsd2ss be folded?

Regarding break-sse-dep vs. sse-domains, there are actually 3 different things going on:

  1. Avoiding load-folding that creates false dependencies.
  2. Avoiding domain crossing.
  3. Breaking false dependencies that are already present in the code.

Before this patch, break-sse-dep.ll checked (1), and sse-domains.ll checked (2) and (3), and I've added another test checking (3) into sse-domains.
Do you think it's worth moving (3) together with (1) instead?

test/CodeGen/X86/break-sse-dep.ll
18 ↗(On Diff #17261)

Yes, the optsize versions should be folded.
(The test is equivalent to t1 except for the optsize attribute).

Regarding break-sse-dep vs. sse-domains, there are actually 3 different things going on:

  1. Avoiding load-folding that creates false dependencies.
  2. Avoiding domain crossing.
  3. Breaking false dependencies that are already present in the code.

Before this patch, break-sse-dep.ll checked (1), and sse-domains.ll checked (2) and (3), and I've added another test checking (3) into sse-domains.
Do you think it's worth moving (3) together with (1) instead?

It appears tests for (1) and (3) are the main contributions from your testcase so I think it'd be more relevant to being included in break-sse-dep.ll

test/CodeGen/X86/break-sse-dep.ll
18 ↗(On Diff #17261)

Ah yes - missed that bit (just thought it was the inverse of t1).

atrick accepted this revision.Dec 14 2014, 7:40 PM
atrick edited edge metadata.

LGTM

This revision is now accepted and ready to land.Dec 14 2014, 7:40 PM
This revision was automatically updated to reflect the committed changes.