The list of opcodes false-dependency-breaking looks at consists only of register-source variants, but not memory-source.
It doesn't look like there's a good reason for that, since memory-source variants suffer from the same problem - e.g. the loop in the attached testcase gets a roughly 3x performance improvement on sandybridge.
Details
Diff Detail
Event Timeline
Thanks for looking at this Michael.
Why didn't you add the ROUNDS*, RSQRTSS and SQRTS* m_Int variants?
Also I believe the SQRTSD ops can be added as well?
Thanks a lot for the review, Simon.
ROUNDS* don't currently have m_Int variants defined.
I thought RSQRTSS and SQRTSS don't have them defined either, but apparently they do and I just missed them, I'll add them, thanks!
Regarding SQRTSD, I don't see why the r variants weren't on the original list. I'll add all variants.
Heh.
Apparently even though SQRTSD was never on the list, there was a test that checked that memory operands into SQRTSD were not load-folded. The test apparently passed not because of this code, but because they weren't folded regardless. At some point, Manman improved the folding logic, and changed the test to verify that SQRTSD memory operands ARE in fact folded.
Rolling the test back as well.
Please can you double check that cvtsd2ss test and if possible add tests for what other partial ops you can?
Also - should your test case be under sse-domains.ll or one of the break-*-dep.ll test files?
Unless you are wanting your reviewers to confirm I'm happy to approve this after that.
Cheers, Simon.
test/CodeGen/X86/break-sse-dep.ll | ||
---|---|---|
18 ↗ | (On Diff #17261) | Should cvtsd2ss be folded? |
Regarding break-sse-dep vs. sse-domains, there are actually 3 different things going on:
- Avoiding load-folding that creates false dependencies.
- Avoiding domain crossing.
- Breaking false dependencies that are already present in the code.
Before this patch, break-sse-dep.ll checked (1), and sse-domains.ll checked (2) and (3), and I've added another test checking (3) into sse-domains.
Do you think it's worth moving (3) together with (1) instead?
test/CodeGen/X86/break-sse-dep.ll | ||
---|---|---|
18 ↗ | (On Diff #17261) | Yes, the optsize versions should be folded. |
It appears tests for (1) and (3) are the main contributions from your testcase so I think it'd be more relevant to being included in break-sse-dep.ll
test/CodeGen/X86/break-sse-dep.ll | ||
---|---|---|
18 ↗ | (On Diff #17261) | Ah yes - missed that bit (just thought it was the inverse of t1). |