This is an archive of the discontinued LLVM Phabricator instance.

[Attributor] Deduce "nosync" function attribute.
ClosedPublic

Authored by sstefan1 on May 31 2019, 7:50 PM.

Details

Summary

Introduce and deduce "nosync" function attribute to indicate that a function does not synchronize with another thread in a way that other thread might free memory.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald Transcript
uenoku added a subscriber: uenoku.May 31 2019, 9:31 PM
jdoerfert added inline comments.Jun 1 2019, 1:00 AM
llvm/docs/LangRef.rst
1485

The part after "causing" is too specific. We want nosync to be generic.

llvm/lib/Transforms/IPO/Attributor.cpp
5

Leftover comments?

15

(only a leftover of an old attributor patch version)

90

You have to check against "nosync".

290

Copy & paste

297

Make it static or member function. Also, describe what it does in the comment.

310

Description missing.

330

What about calls? Maybe you need to look at all side-effect instructions?

354

This is not a fixpoint. UpdateImpl is called multiple times (potentially). Remove the fixpoint call.

sstefan1 updated this revision to Diff 202599.Jun 2 2019, 6:27 AM
  • small fix
  • comments and LangRef
sstefan1 updated this revision to Diff 202600.Jun 2 2019, 6:29 AM
  • removing fixpoint call

The diff seems to include changes in the attributor. May download the latest version of the Attributor patch, rebase this one, and remove everything that is not part of the nosync. Also, please include the changes to the test cases you have in this patch.

llvm/lib/Transforms/IPO/Attributor.cpp
286

Typo: "if"

335

You have to add fences here as well and look at calls explicitly (as they are not in the above opcode list). Alternatively you could do:

Use the InformationCache::getReadOrWriteInstsForFunction method to get all potential read & write instructions. That will include calls you need to look at and everything above. You will need to look at calls first, if they are fine, you can check for volatile and atomic. The code to determine if a call is OK is already present down there.

343

Please use the type here and start variables with an upper case letter.

sstefan1 updated this revision to Diff 202629.Jun 2 2019, 6:28 PM
  • Checking calls first. Adding checks for fences. Now using InfoCache.
sstefan1 added inline comments.Jun 2 2019, 6:32 PM
llvm/lib/Transforms/IPO/Attributor.cpp
559

Just to make sure, when using InformationCache::getReadOrWriteInstsForFunction I don't need this, right?

Some more comments but it looks much better already. The test changes are missing though.

llvm/lib/Transforms/IPO/Attributor.cpp
289

"if"

300

Put the documentation on the declaration.

318

Can you describe the logic here?

347

Only do the stuff below if I is actually a call, so if ICS is not null. If you run this, it should crash on you right now because you access ICS unconditionally.

350

I think getReadOrWriteInstsForFunction will never pick up a readnone call as it will neither read nor write memory.

362

Can we have a single call to isVolatile, maybe always call that one and getOrdering and decide on the result what to do. That would mean move I->isAtomic() into getOrdering() and ensure we catch all opcodes in the switch (so the default prints an error)..

559

Correct.

sstefan1 marked 3 inline comments as done.Jun 3 2019, 2:32 PM

Tests almost done. I'll update in couple hours.

llvm/lib/Transforms/IPO/Attributor.cpp
318

I had to return one. So if Success isn't intresting it returns Failure ordering. Ohterwise it doesn't matter since Success already syncs. I didn't give this much tought, if you have any suggestions, I'll apply them.

350

So is it safe to remove it then, as it does not have side-effects?

362

If I do it this way, I think it would be better to change AtomicOrdering getOrdering() to bool isSyncOrdering() or whatever is appropriate for the name. It can than return true if ordering is not Unordered or Monotonic. That way everything can be checked in one if.

and ensure we catch all opcodes in the switch (so the default prints an error)..

I only miss GetElementPtr and alloca which are not of great interest here, if I'm not wrong. But I can add them as well.

sstefan1 updated this revision to Diff 202831.Jun 3 2019, 5:42 PM

addressed most of the comments.

jdoerfert added inline comments.Jun 4 2019, 1:28 PM
llvm/lib/Transforms/IPO/Attributor.cpp
246–402

Shouldn't we here directly assume sync as it is atomic but we don't know what kind?

248

You need to check volatile and atomic for all instructions I guess and for calls nosync as well

sstefan1 marked an inline comment as done and an inline comment as not done.Jun 4 2019, 2:03 PM

Does everything else look ok?

llvm/lib/Transforms/IPO/Attributor.cpp
246–402

Yes, my bad. I'll return true.

sstefan1 updated this revision to Diff 203021.Jun 4 2019, 2:10 PM

small fixes

More comments including various small style suggestions.

You also need to rewrite the commit message and the test case impact is missing.
For the commit message it is probably enough to drop the last part, thus:

Introduce and deduce the "nosync" function attribute which indicates that a function does not synchronize with another thread in any way.

Remind me, is there a language ref patch for nosync somewhere? If not, we need to add a description in the LangRef.doc as well.

llvm/lib/Transforms/IPO/Attributor.cpp
246–402

2 new lines.

246–402

I forgot that before but I think it is better if we split it into a struct AANoSync in the header and a struct AANoSyncFunction in the cpp file. Some functions would go in the generic header struct, but not getState, getManifestPosition, updateImpl, and ID. This makes it easier to use the result in other attributes.

246–402

make it a static member function if possible. that way we can reuse it easier. Same for isVolatile.

246–402

Comment needs to be updated.

246–402

No worries, all good. Please also add a comment to explain what this means and why we return true.

246–402

Why is it sufficient that one ordering is "weak enough"? Don't we have to test both? Either way, we need a comment to explain what is happening here.

246–402

Given that Success and Failure are only needed in this case you can declare them here.
To do so you need to add brackets around the case:

case Instruction::AtomicCmpXchg: {
...
}
246–402

I think the`!` in front of ICS is a problem. Did you run this?

247

Indention. And maybe add a few more words here ;)

sstefan1 updated this revision to Diff 203183.EditedJun 5 2019, 9:52 AM

Addresing comments.

LangRef was here from the beginning, I just messed up the diffs. Now its here.

Inline comments are now not in original order, so I'll reply here.

I think the`!` in front of ICS is a problem. Did you run this?

Yes.

Why is it sufficient that one ordering is "weak enough"? Don't we have to test both? Either way, we need a comment to explain what is happening here.

My thinking is that if either one of them 'weak enough', than "no-sync" is no longer possible since at any point it can be one of the orderings. If you disagree, I can change and require both.

Indention. And maybe add a few more words here ;)

I updated the function comment, hope thats enough.

Inline comments are now not in original order, so I'll reply here.

I think the`!` in front of ICS is a problem. Did you run this?

Yes.

So remove it ;)

Why is it sufficient that one ordering is "weak enough"? Don't we have to test both? Either way, we need a comment to explain what is happening here.

My thinking is that if either one of them 'weak enough', than "no-sync" is no longer possible since at any point it can be one of the orderings. If you disagree, I can change and require both.

I mixed up the meaning of the return value. It looks fine once I read the comment.

Indention. And maybe add a few more words here ;)

I updated the function comment, hope thats enough.

Looks good.

I added more comments but I think this is almost done. Go through the code and tests yourself and make sure there is no spurious newlines or other changes you did not intend.

llvm/docs/LangRef.rst
1483

Maybe add something like:

If the function does ever synchronize with another thread, the behavior is undefined.

llvm/include/llvm/Transforms/IPO/Attributor.h
645

You don't need to inherit from BooleanState here. That is an implementation detail we probably want to hide. Let AANoSyncFunction inherit from BooleanState but keep the functions isAssumedNoSync and isKnownNoSync here. They will not have an implementation and are overwritten in AANoSyncFunction.

665

Copy and paste, this is not a namespace ;)

llvm/test/Transforms/FunctionAttrs/nosync.ll
116

Remove the commented instruction here and in the next test. Also, fix the indention.

140

Isn't the "nosync" attribute missing for this function?

198

The function names don't match the IR names.

sstefan1 updated this revision to Diff 203466.Jun 6 2019, 4:28 PM
  • nosync small fixes.
  • fixing tests.
jdoerfert added inline comments.Jun 7 2019, 7:52 AM
llvm/lib/Transforms/IPO/Attributor.cpp
281

The above functions should have a comment referring to the base class.

284

This has to go in the base class I think.

320

I'm still confused. The pessimistic return value is true, correct? If so, Why can we return false after we've seen only the success ordering? Don't we have to look at both success and failure ordering and only if both are "fine" we can return false?

sstefan1 marked an inline comment as done.Jun 7 2019, 8:05 AM
sstefan1 added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
320

I agree. I messed this up. Before I update, does this look ok?

if (Success != AtomicOrdering::Unordered ||
        Success != AtomicOrdering::Monotonic)
      return true;

if (Failure != AtomicOrdering::Unordered ||
        Failure != AtomicOrdering::Monotonic)
      return true;

return false;
sstefan1 updated this revision to Diff 203655.Jun 7 2019, 6:04 PM
  • fixed isNonRelaxedAtomic
This revision is now accepted and ready to land.Jun 8 2019, 9:41 AM
arsenm added a subscriber: arsenm.Jun 12 2019, 4:56 PM
arsenm added inline comments.
llvm/docs/LangRef.rst
1482–1484

I think this is a bit vague. In particular I don't think the LangRef defines what a "thread" means anywhere. I also think this needs to be more clear on what kinds of synchronization is allowed. Is this only communication through some addressable memory? What about GPU cross lane communication operations?

I'm wondering if this is sufficient to solve this problem: http://lists.llvm.org/pipermail/llvm-dev/2013-November/067359.html
TLDR, memory instructions can currently be hoisted over an arbitrary call if they are accessing a noalias argument

arsenm added inline comments.Jun 12 2019, 4:57 PM
llvm/docs/LangRef.rst
1482–1484

This is also mentioned as a proper attribute here (which I would greatly prefer to adding another string attribute), but only handled as a string attribute

jdoerfert requested changes to this revision.Jun 12 2019, 5:15 PM
jdoerfert added inline comments.
llvm/docs/LangRef.rst
1482–1484

That is a good point. I was initially thinking string attributes are fine but D62784 seems to be stuck which makes the testing of them hard.

Long story short, lets make them enum attributes.

@sstefan1 could you please make this a proper enum attribute? This will require some additional "mechanics" in:
llvm/lib/AsmParser/LLParser.cpp
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
llvm/lib/IR/Attributes.cpp
llvm/lib/IR/Verifier.cpp

Could be more though. Look for an existing attribute, e.g. Cold, and how that is handled.

@uenoku Could you please also make nofree an enum attribute?

This revision now requires changes to proceed.Jun 12 2019, 5:15 PM
jdoerfert added inline comments.Jun 12 2019, 11:17 PM
llvm/docs/LangRef.rst
1482–1484

I think this is a bit vague. In particular I don't think the LangRef defines what a "thread" means anywhere.

I did think/hope we do not have to. There is the implicit execution thread and nosync says there is "nothing else" while the function is executed. Basically, there are no side-effects that did not originate from the code we see. Please object if you think this is not sufficient.

I also think this needs to be more clear on what kinds of synchronization is allowed.

None, if nosync is present.

Is this only communication through some addressable memory? What about GPU cross lane communication operations?

I'd say, not allowed if nosync is present.

TLDR, memory instructions can currently be hoisted over an arbitrary call if they are accessing a noalias argument

I tried to expose that lately [1] but failed, do you have an example?

[1] https://bugs.llvm.org/show_bug.cgi?id=41781

sstefan1 updated this revision to Diff 204510.Jun 13 2019, 6:20 AM
  • Making nosync an enum attribute.
sstefan1 updated this revision to Diff 204513.Jun 13 2019, 6:29 AM

fixing diff

Sorry for my delayed review, I want to move this ahead now and commit it.

Can you make sure this works on top-of-trunk (origin/master), hence, rebase it please. Also make sure make check-all works without problems.

llvm/lib/Transforms/IPO/Attributor.cpp
93

This should now be checked in the switch below, it is an enum attribute now.

277

Delete this please.

llvm/test/Transforms/FunctionAttrs/nosync.ll
3

You need to enable the attributor explicitly, for now.

Just realized that the basic attribute test in test/Bitcode/attributes.ll is missing, see https://reviews.llvm.org/D49165#change-K8gwLFRSXwEe for an example.

jfb added a comment.Jun 24 2019, 9:59 AM

Please add tests for the things I mention in comments, as well as:

  • relaxed volatile atomic load / store
  • inline assembly
llvm/include/llvm/Transforms/IPO/Attributor.h
380

"Underlying"

llvm/lib/Transforms/IPO/Attributor.cpp
310

A fence with singlethread sync scope doesn't sync with other threads, even if seq_cst.

327

We probably want llvm_unreachable here, so the code gets updated if we add new atomic operations.

337

You're missing memcpy and similar intrinsics, I think you want to handle them here and not in the generic intrinsic handling.

llvm/test/Transforms/FunctionAttrs/nosync.ll
6

"designet"?

sstefan1 updated this revision to Diff 206312.Jun 24 2019, 3:30 PM
  • addressing comments.
arsenm added inline comments.Jun 24 2019, 3:42 PM
llvm/docs/LangRef.rst
1482–1484

Is it then disallowed to merge any calls that aren't nosync? e.g.

if (foo)
  bar(x) readnone
else
  bar(y) readnone

is no longer legal to combine these as

bar(foo ? x : y) readnone
sstefan1 updated this revision to Diff 206325.Jun 24 2019, 4:35 PM
  • small fix
  • added include
  • Fix cast
In D62766#1555926, @jfb wrote:

Please add tests for the things I mention in comments, as well as:

  • relaxed volatile atomic load / store
  • inline assembly

Only added one test for now. I will add more tomorrow.

Sorry for my delayed review, I want to move this ahead now and commit it.

Can you make sure this works on top-of-trunk (origin/master), hence, rebase it please. Also make sure make check-all works without problems.

@jdoerfert As for the intrinsics. I added checks for memcpy, memmove & memset as @jfb suggested. What do you think? I also kept the FIXME comment which can indicate that we might take a different approach. I did make check-all there were few problems with some FunctionAttr tests not checking for nosync attributes. I will fix that tomorrow. Also I seem to have a problem with a test/Bitcode/attributes.ll with nobuiltin attribute.

sstefan1 updated this revision to Diff 206764.Jun 26 2019, 4:43 PM
  • updated tests
nhaehnle requested changes to this revision.Jun 27 2019, 1:52 AM
nhaehnle added a subscriber: nhaehnle.

This does seem useful, although the description is overly narrow (what does nosync on its own have to do with freeing memory?).

I also think that the definition of nosync needs some work, as just "synchronization" is a rather vague term. Can you define it in terms of fences and atomic instructions instead, e.g. by saying that a nosync function does not perform such operations (or some subset of such operations)?

llvm/docs/LangRef.rst
1482–1484

No, I think that would still be allowed. The sync (aka not-nosync) functions have a potential side effect in terms of the memory model, but it's the same side effect in either case since the memory model at this point doesn't care about subgroups.

I guess you're thinking of subgroup operations, but the issue with those is that the set of threads with which communication occurs is a function of where the operation occurs in control flow. It makes sense to keep that issue separate from this attribute.

llvm/test/Transforms/FunctionAttrs/nosync.ll
294–296

The negative check line here is missing.

This revision now requires changes to proceed.Jun 27 2019, 1:52 AM

This does seem useful, although the description is overly narrow (what does nosync on its own have to do with freeing memory?).

The idea was to use this with nofree for dereferencable, like @hfinkel proposed in this email.

I also think that the definition of nosync needs some work, as just "synchronization" is a rather vague term. Can you define it in terms of fences and atomic instructions instead, e.g. by saying that a nosync function does not perform such operations (or some subset of such operations)?

I will update the definition with more details. Maybe I should put that in patch description instead of the current (narrow) one?

jdoerfert added inline comments.Jun 27 2019, 12:44 PM
llvm/docs/LangRef.rst
1482–1484
  1. readnone implies nosync in my opinion.
  2. if we forget about readnone in the example, I think the above merge is still legal.
arsenm added inline comments.Jun 27 2019, 1:01 PM
llvm/docs/LangRef.rst
1482–1484

This should mention that synchronization means through some kind of memory side-effect. This needs to be distinguished from a cross-lane operations, which could be interpreted as a kind of of synchronization where treating it as a memory dependence is not sufficient

sstefan1 updated this revision to Diff 206925.Jun 27 2019, 1:46 PM
  • changed nosync LangRef definition

@arsenm I used most of your comment/suggestion.

jfb added a comment.Jun 29 2019, 3:03 PM

You should also add a test function with inline assembly.

llvm/lib/Transforms/IPO/Attributor.cpp
315

"then"

378

"improve"

llvm/test/Transforms/FunctionAttrs/nosync.ll
317

I don't think you can generally treat intrinsics as nosync. Unless you know they're actually nosync you should assume that intrinsics might synchronize.

For example:

int a;
void i_totally_sync() {
 __builtin_ia32_clflush(&a);
}

Corresponds to:

tail call void @llvm.x86.sse2.clflush(i8* bitcast (i32* @a to i8*))

You should have a test for this, and it should definitely *not* be nosync.

The other option here is to go and add a field to all intrinsics, so when creating a new one we have to figure out whether it'll definitely sync, maybe sync, or never sync. I don't think that's in scope for this patch.

nhaehnle added inline comments.Jul 1 2019, 3:10 AM
llvm/docs/LangRef.rst
1481–1488

Thanks, I think this is better, but there are still some problems:

  • There are no relaxed atomics in LLVM, only unordered, monotonic, and stronger orderings.
  • What about fences?

I would put the part about cross-lane operations at the end and rephrase it slightly for clarity. Suggestion:

This attribute is only concerned with synchronization through memory operations and is therefore orthogonal to cross-lane and convergent operations. In particular, an operation such as a barrier can be convergent but also nosync.

Assuming we can agree about the actual statement of that last sentence...

jdoerfert added inline comments.Jul 1 2019, 1:31 PM
llvm/docs/LangRef.rst
1481–1488

> This attribute is only concerned with synchronization through memory operations and is therefore orthogonal to cross-lane and convergent operations. In particular, an operation such as a barrier can be convergent but also nosync.

Assuming we can agree about the actual statement of that last sentence...

This proposed change, and the one requested earlier and integrated (sync goes through memory), are problematic.
I first though they are fine but they will probably make the attribute unusable.

An alternative proposal would be:

This function attribute indicates that the function does not communicate
(synchronize) with another thread through memory or other well-defined means.
Synchronization is considered possible in the presence of `atomic` accesses that enforce an order, thus not "unordered" and "monotonic", `volatile` accesses, as well as `convergent` function calls. Note that through the latter non-memory communication, e.g., cross-lane operations, is also considered synchronization. If an annotated function does ever synchronize with another thread, the behavior is undefined.

If this is where we are heading, we need to make sure we test:

  1. non-convergent does not allow nosync, e.g., readnone does not imply nosync
  2. readnone and non-convergent does imply nosync

@arsenm, @nhaehnle. @jfb, what do you think?

llvm/test/Transforms/FunctionAttrs/nosync.ll
317

I don't think you can generally treat intrinsics as nosync. Unless you know they're actually nosync you should assume that intrinsics might synchronize.

Good point. Maybe the best way (for now and in general) is to "not look for" intrinsics. Use the same logic for all instructions. That is, if it is a call and not annotated as no-sync it may-sync. The test with llvm.cos can be adjusted by adding readnone to the decleration of`llvm.cos`.

The other option here is to go and add a field to all intrinsics, so when creating a new one we have to figure out whether it'll definitely sync, maybe sync, or never sync. I don't think that's in scope for this patch.

Agreed, we will have to do that for various attributes at some point (soon) but not in this patch.

jfb added inline comments.Jul 1 2019, 1:36 PM
llvm/test/Transforms/FunctionAttrs/nosync.ll
317

I'd still accept the volatile mem* intrinsics as is already done, but otherwise yeah intrinsics should be assumed to synchronize.

arsenm added inline comments.Jul 1 2019, 2:05 PM
llvm/docs/LangRef.rst
1481–1488

What makes it unusable exactly?

This wording confuses me:

Note that through the latter non-memory communication, e.g., cross-lane operations, is also considered synchronization.

I'm not 100% comfortable specifically referring to convergent, since I'm still worried about the yet-to-be-defined anticonvergent attribute. Though it is hard to define something around an unsolved problem.

This phrasing also implies to me that call site merging is not legal, which is what I thought you were trying to avoid.

Conclusion 2 sounds OK to me. Conclusion 1 sounds like the opposite of what the goal is?

jdoerfert added inline comments.Jul 1 2019, 4:02 PM
llvm/docs/LangRef.rst
1481–1488

What makes it unusable exactly?

nosync would then still allow non-memory synchronization which it shouldn't.

I think the IRC conversation helped. What I want us to have is:

nosync means no synchronization/communication between "threads". Any potential synchronization, e.g., through memory or registers, precludes nosync.

llvm/test/Transforms/FunctionAttrs/nosync.ll
317

Agreed.

nhaehnle added inline comments.Jul 2 2019, 12:18 AM
llvm/docs/LangRef.rst
1481–1488

I think the IRC conversation helped.

Is that recorded somewhere?

nosync would then still allow non-memory synchronization which it shouldn't.

This is questionable. There are convergent operations that do not imply synchronization. For example, some of the llvm.amdgcn.image.sample.* intrinsics are convergent, but they do not imply any kind of synchronization in the memory model.

In Vulkan/SPIR-V parlance, the intrinsic may have an implied control barrier, but it definitely has no memory barrier (the control barrier part isn't fully spec'd out in SPIR-V either at the moment).

For the initial intended usage of this attribute: if there is a pointer that you know to be dereferencable before the image sample, then you still know it to be dereferencable afterwards. So it seems reasonable to want the intrinsic to be marked both convergent and nosync.

That said, I'm okay with this part of it:

If this is where we are heading, we need to make sure we test:

  1. non-convergent does not allow nosync, e.g., readnone does not imply nosync
  2. readnone and non-convergent does imply nosync

... so long as it's understood that those are "merely" the rules for the attributor.

jdoerfert added inline comments.Jul 2 2019, 3:25 PM
llvm/docs/LangRef.rst
1481–1488

This is questionable. There are convergent operations that do not imply synchronization. For example, some of the llvm.amdgcn.image.sample.* intrinsics are convergent, but they do not imply any kind of synchronization in the memory model.

For me, nosync has to mean absence of any kind of synchronization, including control barriers.

For the initial intended usage of this attribute: if there is a pointer that you know to be dereferencable before the image sample, then you still know it to be dereferencable afterwards. So it seems reasonable to want the intrinsic to be marked both convergent and nosync.

I see why you want this but I don't think that is what it should mean. nosync should not allow control synchronization as it will inevitably cause problems down the road. So, let me rephrase my earlier comment:

By default, we have to assume a convergent & readnone function might cause control synchronization between threads and is therefore not nosync. However, a function can be convergent and nosync. Finally, a function that is not-convergent and readonly is nosync.

hfinkel added inline comments.Jul 2 2019, 4:39 PM
llvm/docs/LangRef.rst
1481–1488

However, a function can be convergent and nosync.

I think that this is important. We can mark convergent intrinsics that don't provide synchronizing semantics as nosync.

In general, we need a nosync attribute to mean that, in the function marked as nosync, the current thread cannot complete communication with any other threads (e.g., it can't send a value to another thread).

The interesting thing, to me, that has been highlighted in this discussion is: convergent functions, by default, can have things like inter-thread register shuffles, but are otherwise readnone, and so must be excluded from automated nosync deduction (because, without accessing memory at all, communicate values to other threads).

sstefan1 updated this revision to Diff 207893.Jul 3 2019, 2:38 PM
  • Add inline assembly test.
sstefan1 marked 2 inline comments as done.Jul 3 2019, 2:48 PM
sstefan1 added inline comments.
llvm/test/Transforms/FunctionAttrs/nosync.ll
294–296

I skipped the negative check line, because for now only nosync is deduced and if function is not nosync there will be no Function Attrs at all. Once we have some more attributes, I'll add the negative check. Is that ok with you?

317

Replaced llvm.cos test with inline assembly. llvm.cos test was my mistake, since with the current implementation it would be considered sync.

jdoerfert added inline comments.Jul 3 2019, 3:02 PM
llvm/test/Transforms/FunctionAttrs/nosync.ll
294–296

Just give the function the nounwind attribute and then add the negative check line.

317

The test comment is off, and again add nounwind to allow for the check lines

sstefan1 updated this revision to Diff 207903.Jul 3 2019, 3:34 PM
  • fixed tests & improved definition of nosync in langRef
sstefan1 updated this revision to Diff 208469.EditedJul 8 2019, 11:35 AM
  • non-convergent and readnone check.
  • Changed handling of intrinsics.
  • Added more tests.

@jdoerfert, @jfb, @arsenm, @nhaehnle does this look alright now?

I added last minor comments from my side. Other than that I think this looks fine. We will have to wait for the others though.

(You will need to rebase and make sure ninja check-all passes because there are other new attributes.)

llvm/lib/Transforms/IPO/Attributor.cpp
356

I think the "unknown" case is handled the wrong way here. Shouldn't it be:

if (Arg->getType()->isIntegerTy(1) &&
    cast<ConstantInt>(Arg)->getValue() == 0)
  return true;
return false;

such that "unknown" values, e.g., %cmp = icmp ... used as the 4th argument will conservatively make it sync?

(+ Test case for this)

411

I was puzzled by this check for a second, add a comment indicating that the above loop handles calls with read/write effects already. Mention that the fact there is a read/write effect caused us already to make sure it is nosync and there is consequently no need to check for convergent.

llvm/test/Transforms/FunctionAttrs/nosync.ll
312

Copy&paste

351

Shouldn't this be nosync? Is it?

sstefan1 marked 2 inline comments as done.Jul 9 2019, 1:44 AM
sstefan1 added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
356

4th argument, isvolatile, is immarg, so I guess this not necessary?

llvm/test/Transforms/FunctionAttrs/nosync.ll
351

Yes, this falls under copy & paste as well.

jdoerfert added inline comments.Jul 9 2019, 9:04 PM
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
1199

I think, you can remove this change. All should be fine without.

llvm/lib/Transforms/IPO/Attributor.cpp
356

Agreed, not necessary. However, if you keep it this way, add the above reasoning to the comment, it confused me now and it can easily confuse the next person. My advice, just swap the order to make it easier for people now and in the future ;)

arsenm added inline comments.Jul 10 2019, 7:38 AM
llvm/lib/Transforms/IPO/Attributor.cpp
351–354

It would be clearer to cast to MemIntrinsic and check isVolatile

365–378

I'm pretty sure this is repeated in several passes, and incomplete. Target intrinsics can also be considered volatile, as there is a hook to get the memory properties for them

jdoerfert added inline comments.Jul 10 2019, 7:57 AM
llvm/lib/Transforms/IPO/Attributor.cpp
365–378

I guess we should not reach this function with calls. If that seams reasonable, we need an assert here and change the source below to skip these checks if a call is assumed/known nosync.

sstefan1 updated this revision to Diff 209013.Jul 10 2019, 10:36 AM
  • rebase
  • addressing comments
  • ninja check-all passed
jdoerfert accepted this revision.Jul 10 2019, 6:16 PM

LGTM, assunming check-all passes.

arsenm accepted this revision.Jul 11 2019, 7:11 AM

LGTM with nits

llvm/include/llvm/Transforms/IPO/Attributor.h
662

Don't need virtual, only override

llvm/lib/Transforms/IPO/Attributor.cpp
265

No virtual necessary (and for the rest of the overrides)

llvm/test/Bitcode/attributes.ll
356

Brace placement

This revision was not accepted when it landed; it landed in state Needs Review.Jul 11 2019, 2:38 PM
This revision was automatically updated to reflect the committed changes.