This is an archive of the discontinued LLVM Phabricator instance.

[X86] Improved sched models for X86 XCHG*rr and XADD*rr instructions
ClosedPublic

Authored by avt77 on Jul 26 2018, 8:41 AM.

Details

Summary

This patch improves sched models for XCHG*rr X86 instrs removing unnecessary redefinitions of instr infos. This patch is based on results of D48222.

Diff Detail

Event Timeline

avt77 created this revision.Jul 26 2018, 8:41 AM

I think this needs mca test coverage first.

llvm/test/tools/llvm-mca/X86$ grep -ri XCHG | wc -l
0

But we have a lot of XCHG tests in other places. For example:

.../llvm/test/CodeGen/X86$ grep -ri XCHG | wc -l
730

Or:

schedule-x86_32.ll:; ATOM-NEXT: xchgl %eax, %eax # sched: [2:1.00]
schedule-x86_32.ll:; ATOM-NEXT: xchgl %ecx, %eax # sched: [2:1.00]
schedule-x86_32.ll:; ATOM-NEXT: xchgl %eax, (%edx) # sched: [3:1.50]
schedule-x86_32.ll:; SLM-LABEL: test_xchg_32:
schedule-x86_32.ll:; SLM-NEXT: xchgl %eax, %eax # sched: [1:0.50]
schedule-x86_32.ll:; SLM-NEXT: xchgl %ecx, %eax # sched: [1:0.50]
schedule-x86_32.ll:; SLM-NEXT: xchgl %eax, (%edx) # sched: [4:2.00]
schedule-x86_32.ll:; SANDY-LABEL: test_xchg_32:
schedule-x86_32.ll:; SANDY-NEXT: xchgl %eax, %eax # sched: [2:1.00]
schedule-x86_32.ll:; SANDY-NEXT: xchgl %ecx, %eax # sched: [2:1.00]
schedule-x86_32.ll:; SANDY-NEXT: xchgl %eax, (%edx) # sched: [6:1.00]
schedule-x86_32.ll:; HASWELL-LABEL: test_xchg_32:
schedule-x86_32.ll:; HASWELL-NEXT: xchgl %eax, %eax # sched: [2:0.75]
schedule-x86_32.ll:; HASWELL-NEXT: xchgl %ecx, %eax # sched: [2:0.75]
schedule-x86_32.ll:; HASWELL-NEXT: xchgl %eax, (%edx) # sched: [9:1.00]
schedule-x86_32.ll:; BROADWELL-LABEL: test_xchg_32:
schedule-x86_32.ll:; BROADWELL-NEXT: xchgl %eax, %eax # sched: [2:0.75]
schedule-x86_32.ll:; BROADWELL-NEXT: xchgl %ecx, %eax # sched: [2:0.75]
schedule-x86_32.ll:; BROADWELL-NEXT: xchgl %eax, (%edx) # sched: [8:1.00]
schedule-x86_32.ll:; SKYLAKE-LABEL: test_xchg_32:
schedule-x86_32.ll:; SKYLAKE-NEXT: xchgl %eax, %eax # sched: [2:0.75]
schedule-x86_32.ll:; SKYLAKE-NEXT: xchgl %ecx, %eax # sched: [2:0.75]
schedule-x86_32.ll:; SKYLAKE-NEXT: xchgl %eax, (%edx) # sched: [10:1.25]

And this patch does not change any of those tests - it's very importent.

avt77 added a comment.Jul 27 2018, 2:39 AM

But of course you're right: we need such mca tests. I'll try to prepare such tests for XCHG* and CMPXCHG* instrs (I've just started to work with CMPXCHG*).

In general - looks good, assuming that after rebasing ontop of D49912 there are no regressions.
A few nits.
The SLM & btver2 - are those placeholders, or those are real values from agner?

lib/Target/X86/X86SchedHaswell.td
128

Nit: align?

lib/Target/X86/X86SchedSkylakeClient.td
114

Nit: align?

lib/Target/X86/X86Schedule.td
122

// Exchange the contents of two operands
?

lib/Target/X86/X86ScheduleBtVer2.td
172

These are from agner?

avt77 added a comment.Jul 28 2018, 4:47 AM

The SLM & btver2 - are those placeholders, or those are real values from agner?

They did not have any notes about XCHG*rr instrs and that means that default values were OK for them. I kept the default values and tests were OK but it's better to check with agner of course. I'll do it.

avt77 added inline comments.Jul 28 2018, 5:08 AM
lib/Target/X86/X86ScheduleBtVer2.td
172

Agner shows different values for r8,r8 and r,r while AMD64_16h_InstrLatency_1.1 shows 'ucode' for all variants. I used the default values because it keeps the current tests untouched. That's our old problem: what should be selected as "right" value?

lebedev.ri added inline comments.Jul 28 2018, 5:13 AM
lib/Target/X86/X86ScheduleBtVer2.td
172

Ideally - what llvm-exegesis says for these.
Realistically, i'd guess the current change - just cleanup, don't change test output - is ok.

Tag XADD instructions with WriteXCHG as well?

avt77 added a comment.Aug 1 2018, 5:02 AM

Tag XADD instructions with WriteXCHG as well?

Should we use WriteXCHGADD in this case?

Tag XADD instructions with WriteXCHG as well?

Should we use WriteXCHGADD in this case?

If you want to rename them that's OK - but XADD is just a form of XCHG (it just happens to perform the sum as well).

avt77 added a comment.Aug 2 2018, 4:22 AM

Tag XADD instructions with WriteXCHG as well?

Here we have the same situation like in CMPXCHG. We can implement it in 2 ways:

  • like it's done in D50070
  • or with help of WriteSequence like craig.topper suggests here.

I think the both implementations are identical but let's select the one.

avt77 updated this revision to Diff 158763.Aug 2 2018, 8:13 AM
avt77 retitled this revision from [X86] Improved sched models for X86 XCHG*rr instructions to [X86] Improved sched models for X86 XCHG*rr and XADD*rr instructions.

XADD*rr instrs were added: I did not add rm version because we should decide how to do it beter:

like in D50070
or with help of WriteSequence suggested by craig.topper

RKSimon added inline comments.Aug 2 2018, 9:34 AM
lib/Target/X86/X86Schedule.td
122

Add

// Compare+Exchange - TODO RMW support.
avt77 updated this revision to Diff 158976.Aug 3 2018, 4:37 AM

The comment was added:

def WriteXCHG : SchedWrite; // Compare+Exchange - TODO RMW support.

I think you're almost there but D49912 must be completed and committed first

avt77 added a comment.Aug 3 2018, 7:37 AM

I think you're almost there but D49912 must be completed and committed first

I'm working on it but every change requires additional testing and time.

avt77 updated this revision to Diff 159665.Aug 8 2018, 2:52 AM

The patch was rebased after D49912.

RKSimon accepted this revision.Aug 8 2018, 8:41 AM

LGTM

This revision is now accepted and ready to land.Aug 8 2018, 8:41 AM
This revision was automatically updated to reflect the committed changes.