Page MenuHomePhabricator

[Attributor] Deduce "nosync" function attribute.
ClosedPublic

Authored by sstefan1 on May 31 2019, 7:50 PM.

Details

Summary

Introduce and deduce "nosync" function attribute to indicate that a function does not synchronize with another thread in a way that other thread might free memory.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sstefan1 updated this revision to Diff 204510.Jun 13 2019, 6:20 AM
  • Making nosync an enum attribute.
sstefan1 updated this revision to Diff 204513.Jun 13 2019, 6:29 AM

fixing diff

Sorry for my delayed review, I want to move this ahead now and commit it.

Can you make sure this works on top-of-trunk (origin/master), hence, rebase it please. Also make sure make check-all works without problems.

llvm/lib/Transforms/IPO/Attributor.cpp
93 ↗(On Diff #204513)

This should now be checked in the switch below, it is an enum attribute now.

281 ↗(On Diff #204513)

Delete this please.

llvm/test/Transforms/FunctionAttrs/nosync.ll
2 ↗(On Diff #204513)

You need to enable the attributor explicitly, for now.

Just realized that the basic attribute test in test/Bitcode/attributes.ll is missing, see https://reviews.llvm.org/D49165#change-K8gwLFRSXwEe for an example.

jfb added a comment.Mon, Jun 24, 9:59 AM

Please add tests for the things I mention in comments, as well as:

  • relaxed volatile atomic load / store
  • inline assembly
llvm/include/llvm/Transforms/IPO/Attributor.h
381 ↗(On Diff #204513)

"Underlying"

llvm/lib/Transforms/IPO/Attributor.cpp
314 ↗(On Diff #204513)

A fence with singlethread sync scope doesn't sync with other threads, even if seq_cst.

331 ↗(On Diff #204513)

We probably want llvm_unreachable here, so the code gets updated if we add new atomic operations.

341 ↗(On Diff #204513)

You're missing memcpy and similar intrinsics, I think you want to handle them here and not in the generic intrinsic handling.

llvm/test/Transforms/FunctionAttrs/nosync.ll
5 ↗(On Diff #204513)

"designet"?

sstefan1 updated this revision to Diff 206312.Mon, Jun 24, 3:30 PM
  • addressing comments.
arsenm added inline comments.Mon, Jun 24, 3:42 PM
llvm/docs/LangRef.rst
1476–1478 ↗(On Diff #203655)

Is it then disallowed to merge any calls that aren't nosync? e.g.

if (foo)
  bar(x) readnone
else
  bar(y) readnone

is no longer legal to combine these as

bar(foo ? x : y) readnone
sstefan1 updated this revision to Diff 206325.Mon, Jun 24, 4:35 PM
  • small fix
  • added include
  • Fix cast
In D62766#1555926, @jfb wrote:

Please add tests for the things I mention in comments, as well as:

  • relaxed volatile atomic load / store
  • inline assembly

Only added one test for now. I will add more tomorrow.

Sorry for my delayed review, I want to move this ahead now and commit it.

Can you make sure this works on top-of-trunk (origin/master), hence, rebase it please. Also make sure make check-all works without problems.

@jdoerfert As for the intrinsics. I added checks for memcpy, memmove & memset as @jfb suggested. What do you think? I also kept the FIXME comment which can indicate that we might take a different approach. I did make check-all there were few problems with some FunctionAttr tests not checking for nosync attributes. I will fix that tomorrow. Also I seem to have a problem with a test/Bitcode/attributes.ll with nobuiltin attribute.

sstefan1 updated this revision to Diff 206764.Wed, Jun 26, 4:43 PM
  • updated tests
nhaehnle requested changes to this revision.Thu, Jun 27, 1:52 AM
nhaehnle added a subscriber: nhaehnle.

This does seem useful, although the description is overly narrow (what does nosync on its own have to do with freeing memory?).

I also think that the definition of nosync needs some work, as just "synchronization" is a rather vague term. Can you define it in terms of fences and atomic instructions instead, e.g. by saying that a nosync function does not perform such operations (or some subset of such operations)?

llvm/docs/LangRef.rst
1476–1478 ↗(On Diff #203655)

No, I think that would still be allowed. The sync (aka not-nosync) functions have a potential side effect in terms of the memory model, but it's the same side effect in either case since the memory model at this point doesn't care about subgroups.

I guess you're thinking of subgroup operations, but the issue with those is that the set of threads with which communication occurs is a function of where the operation occurs in control flow. It makes sense to keep that issue separate from this attribute.

llvm/test/Transforms/FunctionAttrs/nosync.ll
293–295 ↗(On Diff #206764)

The negative check line here is missing.

This revision now requires changes to proceed.Thu, Jun 27, 1:52 AM

This does seem useful, although the description is overly narrow (what does nosync on its own have to do with freeing memory?).

The idea was to use this with nofree for dereferencable, like @hfinkel proposed in this email.

I also think that the definition of nosync needs some work, as just "synchronization" is a rather vague term. Can you define it in terms of fences and atomic instructions instead, e.g. by saying that a nosync function does not perform such operations (or some subset of such operations)?

I will update the definition with more details. Maybe I should put that in patch description instead of the current (narrow) one?

jdoerfert added inline comments.Thu, Jun 27, 12:44 PM
llvm/docs/LangRef.rst
1476–1478 ↗(On Diff #203655)
  1. readnone implies nosync in my opinion.
  2. if we forget about readnone in the example, I think the above merge is still legal.
arsenm added inline comments.Thu, Jun 27, 1:01 PM
llvm/docs/LangRef.rst
1482–1484 ↗(On Diff #206764)

This should mention that synchronization means through some kind of memory side-effect. This needs to be distinguished from a cross-lane operations, which could be interpreted as a kind of of synchronization where treating it as a memory dependence is not sufficient

sstefan1 updated this revision to Diff 206925.Thu, Jun 27, 1:46 PM
  • changed nosync LangRef definition

@arsenm I used most of your comment/suggestion.

jfb added a comment.Sat, Jun 29, 3:03 PM

You should also add a test function with inline assembly.

llvm/lib/Transforms/IPO/Attributor.cpp
315 ↗(On Diff #206925)

"then"

378 ↗(On Diff #206925)

"improve"

llvm/test/Transforms/FunctionAttrs/nosync.ll
317 ↗(On Diff #206925)

I don't think you can generally treat intrinsics as nosync. Unless you know they're actually nosync you should assume that intrinsics might synchronize.

For example:

int a;
void i_totally_sync() {
 __builtin_ia32_clflush(&a);
}

Corresponds to:

tail call void @llvm.x86.sse2.clflush(i8* bitcast (i32* @a to i8*))

You should have a test for this, and it should definitely *not* be nosync.

The other option here is to go and add a field to all intrinsics, so when creating a new one we have to figure out whether it'll definitely sync, maybe sync, or never sync. I don't think that's in scope for this patch.

nhaehnle added inline comments.Mon, Jul 1, 3:10 AM
llvm/docs/LangRef.rst
1481–1488 ↗(On Diff #206925)

Thanks, I think this is better, but there are still some problems:

  • There are no relaxed atomics in LLVM, only unordered, monotonic, and stronger orderings.
  • What about fences?

I would put the part about cross-lane operations at the end and rephrase it slightly for clarity. Suggestion:

This attribute is only concerned with synchronization through memory operations and is therefore orthogonal to cross-lane and convergent operations. In particular, an operation such as a barrier can be convergent but also nosync.

Assuming we can agree about the actual statement of that last sentence...

jdoerfert added inline comments.Mon, Jul 1, 1:31 PM
llvm/docs/LangRef.rst
1481–1488 ↗(On Diff #206925)
>   This attribute is only concerned with synchronization through memory operations and is therefore orthogonal to cross-lane and convergent operations. In particular, an operation such as a barrier can be convergent but also nosync.

Assuming we can agree about the actual statement of that last sentence...

This proposed change, and the one requested earlier and integrated (sync goes through memory), are problematic.
I first though they are fine but they will probably make the attribute unusable.

An alternative proposal would be:

This function attribute indicates that the function does not communicate
(synchronize) with another thread through memory or other well-defined means.
Synchronization is considered possible in the presence of `atomic` accesses that enforce an order, thus not "unordered" and "monotonic", `volatile` accesses, as well as `convergent` function calls. Note that through the latter non-memory communication, e.g., cross-lane operations, is also considered synchronization. If an annotated function does ever synchronize with another thread, the behavior is undefined.

If this is where we are heading, we need to make sure we test:

  1. non-convergent does not allow nosync, e.g., readnone does not imply nosync
  2. readnone and non-convergent does imply nosync

@arsenm, @nhaehnle. @jfb, what do you think?

llvm/test/Transforms/FunctionAttrs/nosync.ll
317 ↗(On Diff #206925)

I don't think you can generally treat intrinsics as nosync. Unless you know they're actually nosync you should assume that intrinsics might synchronize.

Good point. Maybe the best way (for now and in general) is to "not look for" intrinsics. Use the same logic for all instructions. That is, if it is a call and not annotated as no-sync it may-sync. The test with llvm.cos can be adjusted by adding readnone to the decleration of`llvm.cos`.

The other option here is to go and add a field to all intrinsics, so when creating a new one we have to figure out whether it'll definitely sync, maybe sync, or never sync. I don't think that's in scope for this patch.

Agreed, we will have to do that for various attributes at some point (soon) but not in this patch.

jfb added inline comments.Mon, Jul 1, 1:36 PM
llvm/test/Transforms/FunctionAttrs/nosync.ll
317 ↗(On Diff #206925)

I'd still accept the volatile mem* intrinsics as is already done, but otherwise yeah intrinsics should be assumed to synchronize.

arsenm added inline comments.Mon, Jul 1, 2:05 PM
llvm/docs/LangRef.rst
1481–1488 ↗(On Diff #206925)

What makes it unusable exactly?

This wording confuses me:

Note that through the latter non-memory communication, e.g., cross-lane operations, is also considered synchronization.

I'm not 100% comfortable specifically referring to convergent, since I'm still worried about the yet-to-be-defined anticonvergent attribute. Though it is hard to define something around an unsolved problem.

This phrasing also implies to me that call site merging is not legal, which is what I thought you were trying to avoid.

Conclusion 2 sounds OK to me. Conclusion 1 sounds like the opposite of what the goal is?

jdoerfert added inline comments.Mon, Jul 1, 4:02 PM
llvm/docs/LangRef.rst
1481–1488 ↗(On Diff #206925)

What makes it unusable exactly?

nosync would then still allow non-memory synchronization which it shouldn't.

I think the IRC conversation helped. What I want us to have is:

nosync means no synchronization/communication between "threads". Any potential synchronization, e.g., through memory or registers, precludes nosync.

llvm/test/Transforms/FunctionAttrs/nosync.ll
317 ↗(On Diff #206925)

Agreed.

nhaehnle added inline comments.Tue, Jul 2, 12:18 AM
llvm/docs/LangRef.rst
1481–1488 ↗(On Diff #206925)

I think the IRC conversation helped.

Is that recorded somewhere?

nosync would then still allow non-memory synchronization which it shouldn't.

This is questionable. There are convergent operations that do not imply synchronization. For example, some of the llvm.amdgcn.image.sample.* intrinsics are convergent, but they do not imply any kind of synchronization in the memory model.

In Vulkan/SPIR-V parlance, the intrinsic may have an implied control barrier, but it definitely has no memory barrier (the control barrier part isn't fully spec'd out in SPIR-V either at the moment).

For the initial intended usage of this attribute: if there is a pointer that you know to be dereferencable before the image sample, then you still know it to be dereferencable afterwards. So it seems reasonable to want the intrinsic to be marked both convergent and nosync.

That said, I'm okay with this part of it:

If this is where we are heading, we need to make sure we test:

  1. non-convergent does not allow nosync, e.g., readnone does not imply nosync
  2. readnone and non-convergent does imply nosync

... so long as it's understood that those are "merely" the rules for the attributor.

jdoerfert added inline comments.Tue, Jul 2, 3:25 PM
llvm/docs/LangRef.rst
1481–1488 ↗(On Diff #206925)

This is questionable. There are convergent operations that do not imply synchronization. For example, some of the llvm.amdgcn.image.sample.* intrinsics are convergent, but they do not imply any kind of synchronization in the memory model.

For me, nosync has to mean absence of any kind of synchronization, including control barriers.

For the initial intended usage of this attribute: if there is a pointer that you know to be dereferencable before the image sample, then you still know it to be dereferencable afterwards. So it seems reasonable to want the intrinsic to be marked both convergent and nosync.

I see why you want this but I don't think that is what it should mean. nosync should not allow control synchronization as it will inevitably cause problems down the road. So, let me rephrase my earlier comment:

By default, we have to assume a convergent & readnone function might cause control synchronization between threads and is therefore not nosync. However, a function can be convergent and nosync. Finally, a function that is not-convergent and readonly is nosync.

hfinkel added inline comments.Tue, Jul 2, 4:39 PM
llvm/docs/LangRef.rst
1481–1488 ↗(On Diff #206925)

However, a function can be convergent and nosync.

I think that this is important. We can mark convergent intrinsics that don't provide synchronizing semantics as nosync.

In general, we need a nosync attribute to mean that, in the function marked as nosync, the current thread cannot complete communication with any other threads (e.g., it can't send a value to another thread).

The interesting thing, to me, that has been highlighted in this discussion is: convergent functions, by default, can have things like inter-thread register shuffles, but are otherwise readnone, and so must be excluded from automated nosync deduction (because, without accessing memory at all, communicate values to other threads).

sstefan1 updated this revision to Diff 207893.Wed, Jul 3, 2:38 PM
  • Add inline assembly test.
sstefan1 marked 2 inline comments as done.Wed, Jul 3, 2:48 PM
sstefan1 added inline comments.
llvm/test/Transforms/FunctionAttrs/nosync.ll
317 ↗(On Diff #206925)

Replaced llvm.cos test with inline assembly. llvm.cos test was my mistake, since with the current implementation it would be considered sync.

293–295 ↗(On Diff #206764)

I skipped the negative check line, because for now only nosync is deduced and if function is not nosync there will be no Function Attrs at all. Once we have some more attributes, I'll add the negative check. Is that ok with you?

jdoerfert added inline comments.Wed, Jul 3, 3:02 PM
llvm/test/Transforms/FunctionAttrs/nosync.ll
317 ↗(On Diff #206925)

The test comment is off, and again add nounwind to allow for the check lines

293–295 ↗(On Diff #206764)

Just give the function the nounwind attribute and then add the negative check line.

sstefan1 updated this revision to Diff 207903.Wed, Jul 3, 3:34 PM
  • fixed tests & improved definition of nosync in langRef
sstefan1 updated this revision to Diff 208469.EditedMon, Jul 8, 11:35 AM
  • non-convergent and readnone check.
  • Changed handling of intrinsics.
  • Added more tests.

@jdoerfert, @jfb, @arsenm, @nhaehnle does this look alright now?

I added last minor comments from my side. Other than that I think this looks fine. We will have to wait for the others though.

(You will need to rebase and make sure ninja check-all passes because there are other new attributes.)

llvm/lib/Transforms/IPO/Attributor.cpp
357 ↗(On Diff #208469)

I think the "unknown" case is handled the wrong way here. Shouldn't it be:

if (Arg->getType()->isIntegerTy(1) &&
    cast<ConstantInt>(Arg)->getValue() == 0)
  return true;
return false;

such that "unknown" values, e.g., %cmp = icmp ... used as the 4th argument will conservatively make it sync?

(+ Test case for this)

412 ↗(On Diff #208469)

I was puzzled by this check for a second, add a comment indicating that the above loop handles calls with read/write effects already. Mention that the fact there is a read/write effect caused us already to make sure it is nosync and there is consequently no need to check for convergent.

llvm/test/Transforms/FunctionAttrs/nosync.ll
311 ↗(On Diff #208469)

Copy&paste

350 ↗(On Diff #208469)

Shouldn't this be nosync? Is it?

sstefan1 marked 2 inline comments as done.Tue, Jul 9, 1:44 AM
sstefan1 added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
357 ↗(On Diff #208469)

4th argument, isvolatile, is immarg, so I guess this not necessary?

llvm/test/Transforms/FunctionAttrs/nosync.ll
350 ↗(On Diff #208469)

Yes, this falls under copy & paste as well.

jdoerfert added inline comments.Tue, Jul 9, 9:04 PM
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
1199 ↗(On Diff #208469)

I think, you can remove this change. All should be fine without.

llvm/lib/Transforms/IPO/Attributor.cpp
357 ↗(On Diff #208469)

Agreed, not necessary. However, if you keep it this way, add the above reasoning to the comment, it confused me now and it can easily confuse the next person. My advice, just swap the order to make it easier for people now and in the future ;)

arsenm added inline comments.Wed, Jul 10, 7:38 AM
llvm/lib/Transforms/IPO/Attributor.cpp
352–355 ↗(On Diff #208469)

It would be clearer to cast to MemIntrinsic and check isVolatile

366–379 ↗(On Diff #208469)

I'm pretty sure this is repeated in several passes, and incomplete. Target intrinsics can also be considered volatile, as there is a hook to get the memory properties for them

jdoerfert added inline comments.Wed, Jul 10, 7:57 AM
llvm/lib/Transforms/IPO/Attributor.cpp
366–379 ↗(On Diff #208469)

I guess we should not reach this function with calls. If that seams reasonable, we need an assert here and change the source below to skip these checks if a call is assumed/known nosync.

sstefan1 updated this revision to Diff 209013.Wed, Jul 10, 10:36 AM
  • rebase
  • addressing comments
  • ninja check-all passed
jdoerfert accepted this revision.Wed, Jul 10, 6:16 PM

LGTM, assunming check-all passes.

arsenm accepted this revision.Thu, Jul 11, 7:11 AM

LGTM with nits

llvm/include/llvm/Transforms/IPO/Attributor.h
689 ↗(On Diff #209013)

Don't need virtual, only override

llvm/lib/Transforms/IPO/Attributor.cpp
745 ↗(On Diff #209013)

No virtual necessary (and for the rest of the overrides)

llvm/test/Bitcode/attributes.ll
367 ↗(On Diff #209013)

Brace placement

This revision was not accepted when it landed; it landed in state Needs Review.Thu, Jul 11, 2:38 PM
Closed by commit rL365830: [Attributor] Deduce "nosync" function attribute. (authored by sstefan, committed by ). · Explain Why
This revision was automatically updated to reflect the committed changes.