Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
courbet
GGanesh
andreadb

Commits

rGd23b6831deb7: [X86][Znver1] Specify Register Files, RCU; FP scheduler capacity.
rL335099: [X86][Znver1] Specify Register Files, RCU; FP scheduler capacity.

Summary

First off: i do not have any access to that processor,
so this is purely theoretical, no benchmarks.

I have been looking into bdver2 scheduling profile, and while cross-referencing
the existing btver2, znver1 profiles, and the reference docs
(Software Optimization Guide for AMD Family {15,16,17}h Processors),
i have noticed that only btver2 scheduling profile specifies these.

Also, there is no mca test coverage.

So i'm wondering if this info should be specified?

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.Jun 2 2018, 12:22 PM

Herald added a reviewer: andreadb. · View Herald TranscriptJun 2 2018, 12:22 PM

Herald added a subscriber: gbedwell. · View Herald Transcript

GGanesh added inline comments.Jun 3 2018, 10:30 PM

lib/Target/X86/X86ScheduleZnver1.td
96 ↗	(On Diff #149616)	I am not sure why we include CCR register.
100 ↗	(On Diff #149616)	This number is fine.
111 ↗	(On Diff #149616)	The retire unit is shared between integer and FP ops. In SMT mode it is 96 entry per thread. So, I think we shall consider only 96 entry as a conservative value.
113 ↗	(On Diff #149616)	I assume these are the load/store queue entries. The FPU has 44 entry Load Queue 72 Out of Order Loads 44 entry Store Queue So, we are concerned only about the queues, we can have only 44 marked for LD and ST.

Addressed @GGanesh's review notes.

lebedev.ri added a parent revision: D47697: [llvm-mca] Make sure not to end the test files with an empty line..Jun 4 2018, 12:23 AM

lebedev.ri added inline comments.

lib/Target/X86/X86ScheduleZnver1.td
111 ↗	(On Diff #149616)	Aha, i was wondering how SMT was considered here. But then what about `MicroOpBufferSize` in `SchedMachineModel`? Is that supposed to be keep at `192`?
113 ↗	(On Diff #149616)	I think i would prefer to keep this just as a fixme note for now, since while llvm-mca does have an options to specify the sizes of these queues, it does not seem to read those from the sched model otherwise, i guess because it can't currently be expressed here.

Any further comments?

Split off the baseline into separate differential, rebased.

lebedev.ri added a parent revision: D48190: [MCA][x86][NFC] Add tests for -register-file-stats, -scheduler-stats.Jun 14 2018, 1:04 PM

andreadb added inline comments.Jun 15 2018, 3:34 AM

lib/Target/X86/X86ScheduleZnver1.td
96 ↗	(On Diff #149616)	As a rule of thumb: a register class should only be added if you want to allow those registers to be renamed. In this case, CCR should be added if you want to count EFLAGS renames agains the number of available physical registers in the PRF. If you don't think EFLAGS renames should be counted against the total, then it is correct to remove it. Otherwise, it is incorrect.
111 ↗	(On Diff #149616)	llvm-mca doesn't make assumptions on whether the CPU is in SMT mode or not. Same for the scheduling model, which assumes an optimistic micro-op buffer. For now, it is better to specify the resources seen available by a single thread running on the CPU (i.e. with no other concurrent threads). Basically, I think you should not go for a conservative value here.
113 ↗	(On Diff #149616)	In theory, the processor model could specify load/store queues as a buffered processor resources that are consumed by instructions. In practice, llvm-mca has the concept of `LSUnit`. There is an open bugzilla about how to better describe load/store queues in llvm-mca and the scheduling model.

Diffusion mentioned this in rL334832: [MCA][x86][NFC] Add tests for -register-file-stats, -scheduler-stats.Jun 15 2018, 7:06 AM

Rebased, added CCR back.

lib/Target/X86/X86ScheduleZnver1.td
96 ↗	(On Diff #149616)	I'm not seeing anything in docs, so i'm going to be conservative, and add it back.
111 ↗	(On Diff #149616)	Hmm. Is that documented somewhere? I would have naively expected that what @GGanesh wrote is the approach..
113 ↗	(On Diff #149616)	(That matches what i said)

andreadb added inline comments.Jun 15 2018, 7:53 AM

lib/Target/X86/X86ScheduleZnver1.td
111 ↗	(On Diff #149616)	This limitation was mentioned in the RFC. You cannot possibly make any reasonable assumptions with SMT. The problem is not just the reorder buffer but any other resources which may or may not be competitively shared (or statically/dynamically partitioned). Even if we know that the cpu is multi threaded, there is no way to predict how the other thread will use/consume hardware resources.

lebedev.ri added inline comments.Jun 15 2018, 8:04 AM

lib/Target/X86/X86ScheduleZnver1.td
111 ↗	(On Diff #149616)	Hmm. https://support.amd.com/TechDocs/55723_SOG_Fam_17h_Processors_3.00.pdf Page 35: `The retire queue can hold up to 192 micro ops or 96 per thread in SMT mode.`

@GGanesh any further thoughts?

andreadb added inline comments.Jun 19 2018, 3:56 AM

lib/Target/X86/X86ScheduleZnver1.td
111 ↗	(On Diff #149616)	You shouldn't be making assumptions on SMT and use a conservative value for the retire queue. See my previous comment. Without a proper framework to emulate SMT in llvm-mca, It is better to keep this value to the theoretical maximum. At least, you will get more accurate numbers when there is only a single thread active on the cpu.

lebedev.ri added inline comments.Jun 19 2018, 4:05 AM

lib/Target/X86/X86ScheduleZnver1.td
111 ↗	(On Diff #149616)	You shouldn't be making assumptions on SMT Ah, on the SMT itself too, not only on what happens in SMT mode. I missed that remark initially. Does the scheduling use `RetireControlUnit<>` values, or is it only used in mca right now?

andreadb added inline comments.Jun 19 2018, 4:32 AM

lib/Target/X86/X86ScheduleZnver1.td
111 ↗	(On Diff #149616)	My point is that in an SMT processor, resources (not just the ROB) are shared by multiple hardware threads. Some resources may be competitively shared among the threads. Not knowing what the other thread is doing, makes it impossible to correctly predict how buffers are used. By setting the RCU size to half of its capacity, you are assuming that the processor is conservatively running in SMT mode. However, you are mixing apples with oranges here. You are not saying how other resources should behave in SMT mode. What about the schedulers, register file, dispatch logic in SMT mode? My point is: there is no framework at the moment to deal with SMT in general. So, there is no advantage in using a conservative value for the RCU here (unless you think it better matches your perf results). To answer to your last question: for now, the RetireControlUnit information is only used by mca.

Don't be conservative re RetireControlUnit size.

lib/Target/X86/X86ScheduleZnver1.td
111 ↗	(On Diff #149616)	Thank you for this detailed explanation! To answer to your last question: for now, the RetireControlUnit information is only used by mca. Ok then!

LGTM!

In D47676#1137291, @GGanesh wrote:

LGTM!

Thank you for the review!
Could you please also click the accept button? :)

GGanesh accepted this revision.Jun 19 2018, 11:16 PM

This revision is now accepted and ready to land.Jun 19 2018, 11:16 PM

Closed by commit rL335099: [X86][Znver1] Specify Register Files, RCU; FP scheduler capacity. (authored by lebedevri). · Explain WhyJun 20 2018, 12:05 AM

This revision was automatically updated to reflect the committed changes.

This is an archive of the discontinued LLVM Phabricator instance.

[X86][Znver1] Specify Register Files, RCU; FP scheduler capacity.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 152027

llvm/trunk/lib/Target/X86/X86ScheduleZnver1.td

llvm/trunk/test/tools/llvm-mca/X86/register-file-statistics.s

llvm/trunk/test/tools/llvm-mca/X86/scheduler-queue-usage.s

This is an archive of the discontinued LLVM Phabricator instance.

[X86][Znver1] Specify Register Files, RCU; FP scheduler capacity.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 152027

llvm/trunk/lib/Target/X86/X86ScheduleZnver1.td

llvm/trunk/test/tools/llvm-mca/X86/register-file-statistics.s

llvm/trunk/test/tools/llvm-mca/X86/scheduler-queue-usage.s

[X86][Znver1] Specify Register Files, RCU; FP scheduler capacity.
ClosedPublic