This patch improves sched models for CMPXCHG* X86 instrs removing unnecessary redefinitions of instr infos. This patch is based on results of D48222.
Details
Diff Detail
Event Timeline
lib/Target/X86/X86SchedBroadwell.td | ||
---|---|---|
122–126 | What about the WriteCMPXCHGLd cases? |
lib/Target/X86/X86SchedBroadwell.td | ||
---|---|---|
122–126 | I decided (as with BT*rr instrs) it'd be better to have separate patches for 'rr' and 'Ld' versions.The problem is: automaticaly generated WriteCMPXCHGLd does not fit to real numbers that's why it's better to use X86WriteRes for both 'rr' and 'Ld' versions instead of X86SchedWritePair. Should I do it in this patch? |
lib/Target/X86/X86InstrInfo.td | ||
---|---|---|
2083 | If I remember right from arithmetic instructions, this doesn't make the latency additive the way it should be. This should probably be just WriteCMPXCHGRMW implemented as a WriteSequence like we do for WriteALURMW |
lib/Target/X86/X86InstrInfo.td | ||
---|---|---|
2083 | The given implementation is very similar to suggested WriteSequence (in fact it's the same) and keeps all current sched values w/o changes. Could we stay with the current implementation or it's better to re-implement it? |
The code was re-based. We have 2 changed tests for SandyBridge (Generic). I did not find the proper numbers in Intel SDMs or on Agner site that's why I don't know what's beter the new results or the old ones. The old sched model does not use SBPort4 for memory ops and that looks strange for me. Could anyone help me with these tests?
lib/Target/X86/X86SchedSandyBridge.td | ||
---|---|---|
965 | My best guess is 6 uops and 8 latency like the other Intel CPUs. Resource cycles |
lib/Target/X86/X86SchedSandyBridge.td | ||
---|---|---|
965 | I didn't notice the regular expression here had changed. The information I gave was for WriteCMPXCHGLd |
lib/Target/X86/X86InstrInfo.td | ||
---|---|---|
2083 | It was not the plan because it's the same change. And even more: I tried to switch to WriteCMPXCHGRMW and it did not work here. Should I try it again? | |
lib/Target/X86/X86SchedSandyBridge.td | ||
965 | Because we don't need this SBWriteResGroup81 any more. Maybe I would change the name but I kept it. |
lib/Target/X86/X86SchedSandyBridge.td | ||
---|---|---|
120 | The port4 and one of the port23 shouldn't be here. They come from WriteRMW that WriteCMPXCHGLd is paired with. |
lib/Target/X86/X86SchedSandyBridge.td | ||
---|---|---|
120 | Oh, yes, you're right. Tnx. |
lib/Target/X86/X86SchedSandyBridge.td | ||
---|---|---|
120 | The problem here is the following: we can't use the paired version because it uses the default values [1,1] but we need 2 cycles for port23. It seems we have to redefine these intrs for SB or to remove the pairing with WriteRMW. I'd prefer to remove the pairing if you don't mind. |
lib/Target/X86/X86SchedSandyBridge.td | ||
---|---|---|
120 | I'm fine with removing the pairing. But change the name to WriteCMPXCHGRMW |
lib/Target/X86/X86SchedSandyBridge.td | ||
---|---|---|
120 | Shouldn't this be 6 uops not 7? |
lib/Target/X86/X86SchedSandyBridge.td | ||
---|---|---|
120 | Yes, you suggested 6 in your first comments I'll fix it asap. |
If I remember right from arithmetic instructions, this doesn't make the latency additive the way it should be. This should probably be just WriteCMPXCHGRMW implemented as a WriteSequence like we do for WriteALURMW