This is an archive of the discontinued LLVM Phabricator instance.

[System Model] Introduce a target system model
Needs ReviewPublic

Authored by greened on Feb 27 2019, 1:37 PM.

Details

Summary

Add the concept of a per-subtarget system model, incorporating information
about caches, execution resources, write-combining buffers and software
prefetching configuration.

This is TableGen-driven so that targets may conveniently define new system
models and associate system models with subtargets.

By default, processor classes use a system model that captures the legacy
values exposed by TargetTransformInfo and/or MCSubtarget and friends.
Targets may opt-in to custom system models by defining them and associating
them in instantiations of the Processor template, similarly to how schedulers
are associated.

This patch is for overall higher-level design discussion, to provide a view of where this is going. Smaller patches to review for actual merging will follow.

Diff Detail

Event Timeline

greened created this revision.Feb 27 2019, 1:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2019, 1:37 PM

A larger design question I have about this is the proper place to put software prefetching configuration. Right now it lives at the memory model level, in that a memory model specifies a cache heirachy along with a software prefetch configuration. I wonder if we should allow for software prefetching configuration for each cache level, as targets might want different policies depending on which cache level they are prefetching into. I don't think we have any examples of that in the codebase today but I can imagine cases where targets might want it.

Thank you for pushing this forward and sorry for the delay.

Could you add some central high-level documentation about what the memory system model is? E.g. describe that an MCSystemModel has a list of execution resources, memory hierarchies, prefetch configs and write-combining buffers. A Cache hierarchy as a total size, line size, associativity, etc. To get the interpretation eight, please add more details about ever parameter, particularly the prefetch configs. Some other examples than ARM big.LITTLE would be nice as well.

What exactly is a "prefetch config"? Is there a prefetch config for each cache level? Different hardware mechanisms for prefetch (e.g. stride detection or software-estabslished). Different strategies for inserting prefetch instruction selectable at compile-time?

AFAICS, this patch does uses the default model for all targets?

llvm/include/llvm/Analysis/TargetTransformInfo.h
777–784

[bikeshedding] The name should indicate that these methods just return some information. How about isPrefetchingReads/etc.?

1185–1210

It is interesting that getCacheSize/getCacheAssociativity return llvm::Optionals and are cache-level specific, but not getCacheLineSize

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
379–383

Could you add some documentation about what these special values mean?

llvm/include/llvm/MC/MCSubtargetInfo.h
98–105

I don't understand the idea of the resolve... API. What is a target overriding it supposed to do? There are not overrides in this patch. Why not providing a default implementation of getCacheSize that targets can override?

llvm/include/llvm/MC/MCSystemModel.h
407–408

[style] These trailing empty comment lines don't seem to be useful. The current LLVM code base does not have them.

444

Could we avoid the static initializer?

465

The type has 'Set' in its name, but is an alias for a vector?

469

Why does it need to be mutable?

llvm/include/llvm/Target/TargetSoftwarePrefetchConfig.td
45–48

ISA instructions or µOps? Why not cycles?

49–53

Isn't this algorithm-dependent, i.e. the size of the loop?

llvm/lib/MC/MCSubtargetInfo.cpp
307

I find this handling of --help strange, but the current MCSubtargetInfo::getSchedModelForCPU does the same thing.

llvm/lib/MC/MCSystemModel.cpp
28–49

where are these used?

67

[typo] lttle

This takes a while to digest. Some quick remarks for now (also inline):

  • Is there a way to query the number of (automatic) HW prefetchers?
  • Does the interface provide the latency of each cache level (hit)/memory (miss)?
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
383

What is this method supposed to return?
The right value seems to be a property of a specific pair of a loop and a target architecture rather than just the target alone.

Meinersbur added inline comments.Mar 19 2019, 9:58 AM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
383

It is used by LoopDataPrefetch. TargetTransformInfo.h contains a description:

/// \return The maximum number of iterations to prefetch ahead.  If the
/// required number of iterations is more than this number, no prefetching is
/// performed.
unsigned getMaxPrefetchIterationsAhead() const;

I assume it was added just because the current code base already defines it.

Thank you for pushing this forward and sorry for the delay.

Could you add some central high-level documentation about what the memory system model is? E.g. describe that an MCSystemModel has a list of execution resources, memory hierarchies, prefetch configs and write-combining buffers. A Cache hierarchy as a total size, line size, associativity, etc. To get the interpretation eight, please add more details about ever parameter, particularly the prefetch configs. Some other examples than ARM big.LITTLE would be nice as well.

Will do.

What exactly is a "prefetch config"? Is there a prefetch config for each cache level? Different hardware mechanisms for prefetch (e.g. stride detection or software-estabslished). Different strategies for inserting prefetch instruction selectable at compile-time?

This is certainly an area for exploration. Currently the model doesn't have a prefetch config for each cache level but we could make it so. I'm not sure if the flexibility will be needed or not. We haven't needed in the past but processors are getting a lot more complicated in this area.

The intent is to describe parameters for software prefetching. It doesn't attempt to describe hardware prefetchers but that may be useful depending on the problem at hand. I think that's something we could consider for later.

AFAICS, this patch does uses the default model for all targets?

Yes, currently. My intent was not to disrupt how anything currently works as far as what TTI returns for its interfaces. Individual subtargets can then opt-in by defining a non-default model.

This takes a while to digest. Some quick remarks for now (also inline):

Yes, I know it's a big patch. I wanted to provide the whole context but can certainly break it up into smaller pieces for review if that's easier. Would it be useful to post smaller pieces for review/commit but maintain this patch for reference? I don't intend to actually commit all this as one big change.

  • Is there a way to query the number of (automatic) HW prefetchers?

Not currently. It's something we could add later.

  • Does the interface provide the latency of each cache level (hit)/memory (miss)?

The latency of each level is the latency for a hit. I hadn't considered a separate miss latency as I thought the last "level" would be DRAM and the latency for that would approximate the latency of a full miss. Of course the various cache levels will have a longer miss latency than a direct DRAM access but I hadn't considered that overhead to be arge enough to model. If we think it is then we can do that.

greened marked 10 inline comments as done.Apr 3 2019, 2:49 PM
greened added inline comments.
llvm/include/llvm/Analysis/TargetTransformInfo.h
777–784

Good point. I think shouldPrefetchReads would better convey what this is supposed to be telling us.

1185–1210

Yeah. I assume that's because on almost every processor, the cache line size is the same thoughout the cache heirarchy so the interface was designed assuming that no cache level parameter was necessary. If there's no cache level parameter, there's no possiblity of the user providing a non-existent cache level and thus no need for an llvm::Optional.

However, there have been processors in the past where cache line size varied by level and certainly when you have a heterogeneous system the cache line size will not be uniform across the system.

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
383

Yes, that's exactly right. When I originally posted the RFC I suggested a single prefetch distance in bytes. Others chimed in and said they preferred to think in terms of loop iterations and that is indeed what LoopDataPrefetch does. LoopDataPrefetch also drove the inclsion of getMinPrefetchStride.

Since I didn't want to affect how current passes work, I added the necessary inrfrastructure to have the system model specify it. I expect that as we gain experience we may eliminate some of these interfaces.

llvm/include/llvm/MC/MCSubtargetInfo.h
98–105

This is something I don't have in our local implementation, but after posting the RFC several people said they'd like to model GPUs and big.LITTLE-style systems. In such hybrid systems, there is no single "L2 cache," for example. If a client asks the system model, "Give me the size of the L2 cache," what should it do? Return the L2 of the CPU, the L2 of the GPU or something else? That's the idea of the resolve... stuff. Somebody has to decide what to do which is what these virtual APIs are suggesting.

SubtargetInfo might not be the right place for this, but lacking a better place for it, I just put it here. I could remove all this for the initial changeset and just return some default with an override as you suggest. I'm definitely open to ideas/opinions here.

llvm/include/llvm/MC/MCSystemModel.h
444

Maybe? I took cues for how the scheduler stuff is implemented and it uses similar static initializers. I'll look into this and see if there is a better way.

469

It doesn't. This is leftover from a time when I thought initCacheInfoCache and friends could be called lazily. I'll rework this.

llvm/include/llvm/Target/TargetSoftwarePrefetchConfig.td
45–48

LoopDataPrefetch thinks in terms of IR Instructions. I'll clarify the comment and name. Maybe we should reconsider how LoopDataPrefetch thinks about things but I'd prefer to leave that for later work. I want to be confident we can model the way things work today before we go changing a bunch of things.

49–53

Yep. Again, this is driven by LoopDataPrefetch.

llvm/lib/MC/MCSubtargetInfo.cpp
307

Right. That's what I used as guidance.

llvm/lib/MC/MCSystemModel.cpp
28–49

They aren't anymore. Will remove.

greened updated this revision to Diff 204841.Jun 14 2019, 1:33 PM

Updated to address comments.

Next week I plan to start submitting smaller changes to build up to what's here. So comments on the overall design of this patch would be great, but it's huge and not the best vehicle for discussing details.

I want to start getting the foundational pieces in, likely starting with abstracting the prefetching stuff via TTI which will hopefully be non-controversial.

greened marked 14 inline comments as done.Jun 14 2019, 1:35 PM
greened edited the summary of this revision. (Show Details)
arsenm added inline comments.Jun 14 2019, 1:40 PM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
523

All of these should probably have address space arguments

greened marked an inline comment as done.Jun 20 2019, 10:04 AM
greened added inline comments.
llvm/include/llvm/CodeGen/BasicTTIImpl.h
523

What would the address space argument specify? What are the use-cases for modeling caches with address spaces? Perhaps this could replace the resolve... APIs. That would be nice.

These are existing interfaces so what would be the best way to transition them?

greened marked an inline comment as done.Jun 20 2019, 10:18 AM

I just posted D63614, the subset of this patch covering only the changes to TTI and related classes.

llvm/include/llvm/CodeGen/BasicTTIImpl.h
523

It's probably more productive to continue this discussion on D63614, which is the TTI part of this patch.

Matt added a subscriber: Matt.Apr 20 2021, 8:27 AM