In r247104 I added the builtins for generating non-temporal memory operations,
but now I realized that they lack documentation. This patch adds some.
Details
Diff Detail
Event Timeline
docs/LanguageExtensions.rst | ||
---|---|---|
1802–1807 | This seems to make the feature essentially useless, since you cannot guarantee that the address register is set up sufficiently far before the non-temporal load. Should the compiler not be required to insert the necessary barrier itself in this case? |
docs/LanguageExtensions.rst | ||
---|---|---|
1802–1807 | Yes, we can require targets to only use corresponding NT instructions when it's safe, and then remove this remark from the documentation. For ARM64 that would mean either not to emit LDNP at all, or conservatively emit barriers before each LDNP (which probably removes all performance benefits of using it) - that is, yes, non-temporal loads would be useless on this target. But I think we want to keep the builtin for NT-load, as it's a generic feature, not ARM64 specific. It can be used on other targets - e.g. we can use this in x86 stream builtins, and hopefully simplify their current implementation. I don't know about non-temporal operations on other targets, but if there are others, they can use it too right out of the box. |
- Remove paragraph about changing program behavior (since we shouldn't change it anyway).
to generate -> generation of