This is an archive of the discontinued LLVM Phabricator instance.

[System Model] [TTI] Add TTI interfaces for write-combining buffers
Needs ReviewPublic

Authored by greened on Oct 10 2019, 8:22 AM.

Details

Summary

Add interfaces for subtargets to return the number of write-combining buffers available. Also provide TTI interfaces that delegate to the subtarget interface.

Diff Detail

Event Timeline

greened created this revision.Oct 10 2019, 8:22 AM

How do you imagine that we'd use this? Do we need some kind of size to go along with this?

How do you imagine that we'd use this? Do we need some kind of size to go along with this?

See the Intel optimization guide, section 3.6.9.

https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf

Basically, this information can be used to inform loop transformations as well as use of non-temporal instructions. A write-combining buffer is not the same as a store buffer. A write-combining buffer is always one cache line in size, so I don't think we need size information.

How do you imagine that we'd use this? Do we need some kind of size to go along with this?

See the Intel optimization guide, section 3.6.9.

https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf

Basically, this information can be used to inform loop transformations as well as use of non-temporal instructions. A write-combining buffer is not the same as a store buffer. A write-combining buffer is always one cache line in size, so I don't think we need size information.

Alright, thanks. First, we should document this in the interface. Instead of just saying:

\return the number of write-combining buffers.

we might say something like:

\return the number of write-combining buffers. A write-combining buffer is a per-core resource used for collecting writes to a particular cache line before further processing those writes using other parts of the memory subsystem.

we already have getCacheLineSize(), so we know how big that is, but we don't currently have a way to account for how many hardware threads per core, right? Don't we need that to estimate how many write-combining buffers we get for the current hardware thread? (Presumably, we'd want the same thing to use the total-cache-size functions too, because we need to generate code assuming a working-set size per thread?)

The Intel optimization guide talks about using this number to drive loop distribution, where we don't update more arrays (cache lines) at a time than can fit into the thread's WC buffers. Is this what you had in mind?

we might say something like:

\return the number of write-combining buffers. A write-combining buffer is a per-core resource used for collecting writes to a particular cache line before further processing those writes using other parts of the memory subsystem.

Will do.

we already have getCacheLineSize(), so we know how big that is, but we don't currently have a way to account for how many hardware threads per core, right? Don't we need that to estimate how many write-combining buffers we get for the current hardware thread? (Presumably, we'd want the same thing to use the total-cache-size functions too, because we need to generate code assuming a working-set size per thread?)

This is something that will become available once more bits of the system model are implemented. The model can specify things like number of cores and threads per core. The subtarget will be able to examine its execution resource configuration and return an appropriate number. After this patch makes it through I will be in a place where I can start posting the TableGen changes to generate models and then post the TableGen model classes after that. At that point targets can define their own models and away we go.

The Intel optimization guide talks about using this number to drive loop distribution, where we don't update more arrays (cache lines) at a time than can fit into the thread's WC buffers. Is this what you had in mind?

Yes. It's useful for anything that cares about performance of writes to memory.

greened updated this revision to Diff 231945.Dec 3 2019, 10:27 AM

Added comment explaining what a write-combining buffer is and one possibility of how to use this information.

Ok, I think I've addressed all the comments so this is ready for another review. Thanks!

Ping. This is ready for another review.

Seems fine to me, @hfinkel?

greened updated this revision to Diff 234081.Dec 16 2019, 8:40 AM

Updated to latest master.

arsenm resigned from this revision.Feb 13 2020, 4:53 PM
greened added a comment.EditedFeb 18 2020, 11:34 AM

@jdoerfert @hfinkel should I consider Johannes' comment a LGTM? This has been waiting for two months now.

Matt added a subscriber: Matt.Apr 20 2021, 8:22 AM