This is an archive of the discontinued LLVM Phabricator instance.

[X86] Improved sched models for X86 CMPXCHG* instructions
ClosedPublic

Authored by avt77 on Jul 31 2018, 7:58 AM.

Details

Summary

This patch improves sched models for CMPXCHG* X86 instrs removing unnecessary redefinitions of instr infos. This patch is based on results of D48222.

Diff Detail

Event Timeline

avt77 created this revision.Jul 31 2018, 7:58 AM
craig.topper edited the summary of this revision. (Show Details)Jul 31 2018, 9:54 AM
RKSimon added inline comments.Jul 31 2018, 11:02 AM
lib/Target/X86/X86SchedBroadwell.td
122

What about the WriteCMPXCHGLd cases?

avt77 added inline comments.Aug 1 2018, 2:05 AM
lib/Target/X86/X86SchedBroadwell.td
122

I decided (as with BT*rr instrs) it'd be better to have separate patches for 'rr' and 'Ld' versions.The problem is:

automaticaly generated WriteCMPXCHGLd does not fit to real numbers that's why it's better to use X86WriteRes for both 'rr' and 'Ld' versions instead of X86SchedWritePair. Should I do it in this patch?
avt77 updated this revision to Diff 158552.Aug 1 2018, 9:20 AM

The WriteCMPXCHGLd was implemented.

craig.topper added inline comments.Aug 1 2018, 9:39 AM
lib/Target/X86/X86InstrInfo.td
2083

If I remember right from arithmetic instructions, this doesn't make the latency additive the way it should be. This should probably be just WriteCMPXCHGRMW implemented as a WriteSequence like we do for WriteALURMW

avt77 added inline comments.Aug 2 2018, 12:54 AM
lib/Target/X86/X86InstrInfo.td
2083

The given implementation is very similar to suggested WriteSequence (in fact it's the same) and keeps all current sched values w/o changes. Could we stay with the current implementation or it's better to re-implement it?

avt77 updated this revision to Diff 160325.Aug 13 2018, 4:29 AM

The code was re-based. We have 2 changed tests for SandyBridge (Generic). I did not find the proper numbers in Intel SDMs or on Agner site that's why I don't know what's beter the new results or the old ones. The old sched model does not use SBPort4 for memory ops and that looks strange for me. Could anyone help me with these tests?

craig.topper added inline comments.Aug 13 2018, 12:08 PM
lib/Target/X86/X86SchedSandyBridge.td
963

My best guess is 6 uops and 8 latency like the other Intel CPUs.

Resource cycles
Port015 - 1 cycle (equivalent to 0156 on other Intel CPUs)
Port05 - 2 cycles (equivalent to the Port06 on other Intel CPUs)
Port4 - 1 cycle
Port23 - 2 cycle

avt77 updated this revision to Diff 160554.Aug 14 2018, 4:51 AM

I fixed sched parameters accordingly to craig.topper suggestions.

craig.topper added inline comments.Aug 14 2018, 12:14 PM
lib/Target/X86/X86SchedSandyBridge.td
963

I didn't notice the regular expression here had changed. The information I gave was for WriteCMPXCHGLd

avt77 updated this revision to Diff 161201.Aug 17 2018, 3:46 AM

I fixed the issue with WriteCMPXCHGLd - tnx to craig.topper.

RKSimon added inline comments.Aug 21 2018, 2:52 AM
lib/Target/X86/X86InstrInfo.td
2083

Wasn't the plan to add a WriteCMPXCHGRMW WriteSequence class?

lib/Target/X86/X86SchedSandyBridge.td
963

Why has the regex changed?

avt77 added inline comments.Aug 21 2018, 7:09 AM
lib/Target/X86/X86InstrInfo.td
2083

It was not the plan because it's the same change. And even more: I tried to switch to WriteCMPXCHGRMW and it did not work here. Should I try it again?

lib/Target/X86/X86SchedSandyBridge.td
963

Because we don't need this SBWriteResGroup81 any more. Maybe I would change the name but I kept it.

craig.topper added inline comments.Aug 21 2018, 11:33 AM
lib/Target/X86/X86SchedSandyBridge.td
118

The port4 and one of the port23 shouldn't be here. They come from WriteRMW that WriteCMPXCHGLd is paired with.

avt77 added inline comments.Aug 22 2018, 2:28 AM
lib/Target/X86/X86SchedSandyBridge.td
118

Oh, yes, you're right. Tnx.

avt77 added inline comments.Aug 22 2018, 5:12 AM
lib/Target/X86/X86SchedSandyBridge.td
118

The problem here is the following: we can't use the paired version because it uses the default values [1,1] but we need 2 cycles for port23. It seems we have to redefine these intrs for SB or to remove the pairing with WriteRMW. I'd prefer to remove the pairing if you don't mind.

craig.topper added inline comments.Aug 22 2018, 11:27 AM
lib/Target/X86/X86SchedSandyBridge.td
118

I'm fine with removing the pairing. But change the name to WriteCMPXCHGRMW

avt77 updated this revision to Diff 162740.Aug 27 2018, 1:38 PM

Now we have WriteCMPXCHGRMW accordingly to craig.topper requirement.

craig.topper added inline comments.Aug 27 2018, 7:14 PM
lib/Target/X86/X86SchedSandyBridge.td
118

Shouldn't this be 6 uops not 7?

avt77 added inline comments.Aug 28 2018, 10:18 AM
lib/Target/X86/X86SchedSandyBridge.td
118

Yes, you suggested 6 in your first comments I'll fix it asap.

avt77 updated this revision to Diff 163009.Aug 28 2018, 10:33 PM

The issue with SB uops was fixed.

This revision is now accepted and ready to land.Aug 29 2018, 12:18 PM
This revision was automatically updated to reflect the committed changes.