Page MenuHomePhabricator

[RFC] Introduce `__attribute__((nontemporal))`.
AbandonedPublic

Authored by mzolotukhin on Aug 20 2015, 6:39 PM.

Details

Summary

Currently there is no way to generate nontemporal memory accesses for some
architectures, e.g. for AArch64. In contrast to x86, it doesn't have special
intrinsics for this, and the suggested solution is using such attribute (see ARM
ACLE 2.0, section 13.1.6). The attribute would result in generating
'!nontemporal' attribute in IR, which then will (hopefully) live through
optimizations till backend, where it will be lowered to a non-temporal
instruction (for AArch64 - to STNP). I have committed a couple of patches for
vectorizers to preserve this attribute, and it seems that no other
transformation removes it.

So, is introducing a new type attribute a right approach for this problem?

Also, since I don't have much experience in front-end, I'd appreciate any help
with the patch itself to get it ready to be committed. Specifically, I currently
have following questions:

  1. What tests should I add (examples would be appreciated)?
  2. How does one implements constraints on how the attribute can be used, what

should be the constraints in this case, and how to properly implement them?

  1. How can I check if I covered all places where this attribute might be used in

codegen? I.e. I seem to cover array-subscript and pointer-dereference
expressions, which is probaly the only cases I care about, but I easily could
miss something.

Any other feedback is also welcome!

Thanks,
Michael

Diff Detail

Event Timeline

mzolotukhin retitled this revision from to [RFC] Introduce `__attribute__((nontemporal))`..
mzolotukhin updated this object.
mzolotukhin added a subscriber: cfe-commits.

What does it mean to have the attribute applied to non-pointer types like int __attribute__((nontemporal)) i; ? The ACLE doesn't say but making it erroneous might make sense. Perhaps it would be good to have a semantic test which uses __attribute__((nontemporal)).

aaron.ballman added a subscriber: aaron.ballman.

This doesn't seem like a fundamental property of a type, to me. If I understand properly, this has more to do with specific instances of memory access. By making it part of the type, you run into sticky situations that become hard to resolve, such as with templates in C++.

~Aaron

Hi all,

Thanks for the feedback, please find my answers below:

What does it mean to have the attribute applied to non-pointer types like int attribute((nontemporal)) i; ? The ACLE doesn't say but making it erroneous might make sense. Perhaps it would be good to have a semantic test which uses attribute((nontemporal)).

David,
That's a good idea. Actually, I don't know how we should behave in such cases, but probably just giving an error should be fine. And should we handle references in a similar manner (int __attribute__((nontemporal)) &i)? I'll update the patch correspondingly if we decide to go with type attributes.

This seems like a property of an operation, rather than a property of a type. Have you considered adding a __builtin_nontemporal_store builtin as an alternative?

Richard,
Yes, I've considered a builitin as an alternative. In fact, I started with it as it was easier to implement, but then decided to switch to type attribute due to the following reasons:

  1. ARM ACLE 2.0 mentions attribute. Though it's not a final version of the document, AFAIU, I still preferred to use it as an argument for type-attribute.
  2. Once we introduce a builtin, we'll have to support it forever (otherwise we could break someone's code). With the attribute the burden is much smaller, as we can just start ignoring it at any point if we need to - all the code will remain correct and compilable.
  3. We'll need to have an intrinsic for every type + separate intrinsics for loads and stores. If we use the type attribute, one fits all.
  4. While it's true, that this is more type of operation, than a type, I think in real use-cases a user would rarely need to use it on a single operation. I.e. nontemporal operations are usually used for processing bulk volumes of data, and probably this data is almost always is processed as a whole. That's why I think it's fine to mark the entire 'data' as nontemporal. And, if a user then wants to work with a small subset of it, she can use a usual (not nontemporal) pointer to it.
  5. Personally, I find the code using attributes more elegant than using builtins. Compare:
void foo(float *__attribute__((nontemporal)) dst,
         float *__attribute__((nontemporal)) src1,
         float *__attribute__((nontemporal)) src2) {
  *dst = *src1 + *src2;
}

and

void foo(float *dst, float *src1, float *src2) {
  float s1 = __builtin_nontemporal_load(src1);
  float s2 = __builtin_nontemporal_load(src2);
  __builtin_nontemporal_store(s1 + s2, dst);
}

But that said, in the end I'm open to other alternatives (including builtins), and this thread is just an attempt to find the best option.

This doesn't seem like a fundamental property of a type, to me. If I understand properly, this has more to do with specific instances of memory access. By making it part of the type, you run into sticky situations that become hard to resolve, such as with templates in C++.

Aaron,
As far as I understand, type attributes doesn't result in such complications (as opposed to type qualifiers, e.g. __restrict__). That is, it doesn't change the canonical type, it only adds some 'sugar' to it. I.e. float *__attribute__((nontemporal)) and float * would behave as the same type in templates and names mangling. Please correct me if I'm wrong here.

Thanks,
Michael

Aaron,
As far as I understand, type attributes doesn't result in such complications (as opposed to type qualifiers, e.g. __restrict__). That is, it doesn't change the canonical type, it only adds some 'sugar' to it. I.e. float *__attribute__((nontemporal)) and float * would behave as the same type in templates and names mangling. Please correct me if I'm wrong here.

You are correct in that type attributes do not change the canonical type, but I perhaps didn't explain the complications properly. For instance, if I wanted to store a std::vector of these nontemporal type objects, I could not do so because the type attribute information would be lost. By using a builtin, I could instead push the temporality decision to the operation on the vector objects.

~Aaron

Oh, I see. So, you meant something like this?

void foo(std::vector<float * __attribute__((nontemporal))> av, float * b, int N) {
  for (auto a: av)      // << `a` doesn't have nontemporal attribute here
    for (int i = 0; i < N; i++)
      a[i] = b[i]+1;
}

One can easily work around it by using an explicit type here (float * __attribute__((nontemporal)) instead of auto), but I agree that disappeared attribute might be a surprise for the user. Do you think it would be a frequent case?

BTW, there are other type attributes, which also suffer from the same issue, e.g. vector_size. What was the rationale of making them type attributes?

Oh, I see. So, you meant something like this?

void foo(std::vector<float * __attribute__((nontemporal))> av, float * b, int N) {
  for (auto a: av)      // << `a` doesn't have nontemporal attribute here
    for (int i = 0; i < N; i++)
      a[i] = b[i]+1;
}

One can easily work around it by using an explicit type here (float * __attribute__((nontemporal)) instead of auto), but I agree that disappeared attribute might be a surprise for the user. Do you think it would be a frequent case?

Yes, that's along the lines of what I was thinking. There are also other questions, as to whether a user would expect this code to work or not:

template <typename Ty>
void f(Ty *ptr);

template <typename Ty>
void f(Ty * attribute((nontemporal)) ptr);

I honestly don't know enough about nontemporal object usage patterns to really have a gut feeling for what patterns are likely to appear in the wild.

BTW, there are other type attributes, which also suffer from the same issue, e.g. vector_size. What was the rationale of making them type attributes?

The usual rationale is that these attributes are targeting C code more than C++, or that the C++ use cases that would be strange to a user are unlikely to happen with realistic code. The discussion that's come up in the past when we touch on these is that Clang could perhaps use a pluggable type system that allows for more fine-grained control on whether an attribute participates as part of a type or not. A production-quality pluggable type system is a pretty large undertaking, and it's a bit research-y at this point, so I'm not proposing anything like that.

Similar questions that help decide is whether you should be able to overload on the type attribute, specialize templates on it, type identity, etc.

~Aaron

Thanks for the feedback everyone!
I think at this point I'll try to return to builtins then. In my original patch I didn't have type overloading, so I'll need some time to add this. We can return back to type attributes later if we'd like to.

And do I understand it correctly, that we are talking about target-independent builtins?

Hi,

I implemented builtin-based version in D12313 - could you please take a look?

Thanks,
Michael

mzolotukhin abandoned this revision.Sep 10 2015, 9:29 PM

We decided to go with an alternative way - with builtins instead of type attributes. The corresponding patch is D12313, and it's already reviewed and committed.