Introduce and deduce "nosync" function attribute to indicate that a function does not synchronize with another thread in a way that other thread might free memory.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 32799 Build 32798: arc lint + arc unit
Event Timeline
llvm/docs/LangRef.rst | ||
---|---|---|
1479 | The part after "causing" is too specific. We want nosync to be generic. | |
llvm/lib/Transforms/IPO/Attributor.cpp | ||
5 | Leftover comments? | |
12 | (only a leftover of an old attributor patch version) | |
170 | You have to check against "nosync". | |
378 | Copy & paste | |
385 | Make it static or member function. Also, describe what it does in the comment. | |
398 | Description missing. | |
418 | What about calls? Maybe you need to look at all side-effect instructions? | |
442 | This is not a fixpoint. UpdateImpl is called multiple times (potentially). Remove the fixpoint call. |
The diff seems to include changes in the attributor. May download the latest version of the Attributor patch, rebase this one, and remove everything that is not part of the nosync. Also, please include the changes to the test cases you have in this patch.
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
374 | Typo: "if" | |
423 | You have to add fences here as well and look at calls explicitly (as they are not in the above opcode list). Alternatively you could do: Use the InformationCache::getReadOrWriteInstsForFunction method to get all potential read & write instructions. That will include calls you need to look at and everything above. You will need to look at calls first, if they are fine, you can check for volatile and atomic. The code to determine if a call is OK is already present down there. | |
431 | Please use the type here and start variables with an upper case letter. |
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
600 | Just to make sure, when using InformationCache::getReadOrWriteInstsForFunction I don't need this, right? |
Some more comments but it looks much better already. The test changes are missing though.
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
377 | "if" | |
388 | Put the documentation on the declaration. | |
406 | Can you describe the logic here? | |
435 | Only do the stuff below if I is actually a call, so if ICS is not null. If you run this, it should crash on you right now because you access ICS unconditionally. | |
438 | I think getReadOrWriteInstsForFunction will never pick up a readnone call as it will neither read nor write memory. | |
450 | Can we have a single call to isVolatile, maybe always call that one and getOrdering and decide on the result what to do. That would mean move I->isAtomic() into getOrdering() and ensure we catch all opcodes in the switch (so the default prints an error).. | |
600 | Correct. |
Tests almost done. I'll update in couple hours.
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
406 | I had to return one. So if Success isn't intresting it returns Failure ordering. Ohterwise it doesn't matter since Success already syncs. I didn't give this much tought, if you have any suggestions, I'll apply them. | |
438 | So is it safe to remove it then, as it does not have side-effects? | |
450 | If I do it this way, I think it would be better to change AtomicOrdering getOrdering() to bool isSyncOrdering() or whatever is appropriate for the name. It can than return true if ordering is not Unordered or Monotonic. That way everything can be checked in one if.
I only miss GetElementPtr and alloca which are not of great interest here, if I'm not wrong. But I can add them as well. |
Does everything else look ok?
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
334–459 | Yes, my bad. I'll return true. |
More comments including various small style suggestions.
You also need to rewrite the commit message and the test case impact is missing.
For the commit message it is probably enough to drop the last part, thus:
Introduce and deduce the "nosync" function attribute which indicates that a function does not synchronize with another thread in any way.
Remind me, is there a language ref patch for nosync somewhere? If not, we need to add a description in the LangRef.doc as well.
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
334–459 | 2 new lines. | |
334–459 | I forgot that before but I think it is better if we split it into a struct AANoSync in the header and a struct AANoSyncFunction in the cpp file. Some functions would go in the generic header struct, but not getState, getManifestPosition, updateImpl, and ID. This makes it easier to use the result in other attributes. | |
334–459 | make it a static member function if possible. that way we can reuse it easier. Same for isVolatile. | |
334–459 | Comment needs to be updated. | |
334–459 | No worries, all good. Please also add a comment to explain what this means and why we return true. | |
334–459 | Why is it sufficient that one ordering is "weak enough"? Don't we have to test both? Either way, we need a comment to explain what is happening here. | |
334–459 | Given that Success and Failure are only needed in this case you can declare them here. case Instruction::AtomicCmpXchg: { ... } | |
334–459 | I think the`!` in front of ICS is a problem. Did you run this? | |
335 | Indention. And maybe add a few more words here ;) |
Addresing comments.
LangRef was here from the beginning, I just messed up the diffs. Now its here.
Inline comments are now not in original order, so I'll reply here.
I think the`!` in front of ICS is a problem. Did you run this?
Yes.
Why is it sufficient that one ordering is "weak enough"? Don't we have to test both? Either way, we need a comment to explain what is happening here.
My thinking is that if either one of them 'weak enough', than "no-sync" is no longer possible since at any point it can be one of the orderings. If you disagree, I can change and require both.
Indention. And maybe add a few more words here ;)
I updated the function comment, hope thats enough.
So remove it ;)
Why is it sufficient that one ordering is "weak enough"? Don't we have to test both? Either way, we need a comment to explain what is happening here.
My thinking is that if either one of them 'weak enough', than "no-sync" is no longer possible since at any point it can be one of the orderings. If you disagree, I can change and require both.
I mixed up the meaning of the return value. It looks fine once I read the comment.
Indention. And maybe add a few more words here ;)
I updated the function comment, hope thats enough.
Looks good.
I added more comments but I think this is almost done. Go through the code and tests yourself and make sure there is no spurious newlines or other changes you did not intend.
llvm/docs/LangRef.rst | ||
---|---|---|
1477 | Maybe add something like:
| |
llvm/include/llvm/Transforms/IPO/Attributor.h | ||
647 ↗ | (On Diff #203183) | You don't need to inherit from BooleanState here. That is an implementation detail we probably want to hide. Let AANoSyncFunction inherit from BooleanState but keep the functions isAssumedNoSync and isKnownNoSync here. They will not have an implementation and are overwritten in AANoSyncFunction. |
667 ↗ | (On Diff #203183) | Copy and paste, this is not a namespace ;) |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
115 ↗ | (On Diff #203183) | Remove the commented instruction here and in the next test. Also, fix the indention. |
139 ↗ | (On Diff #203183) | Isn't the "nosync" attribute missing for this function? |
197 ↗ | (On Diff #203183) | The function names don't match the IR names. |
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
369 | The above functions should have a comment referring to the base class. | |
372 | This has to go in the base class I think. | |
408 | I'm still confused. The pessimistic return value is true, correct? If so, Why can we return false after we've seen only the success ordering? Don't we have to look at both success and failure ordering and only if both are "fine" we can return false? |
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
408 | I agree. I messed this up. Before I update, does this look ok? if (Success != AtomicOrdering::Unordered || Success != AtomicOrdering::Monotonic) return true; if (Failure != AtomicOrdering::Unordered || Failure != AtomicOrdering::Monotonic) return true; return false; |
llvm/docs/LangRef.rst | ||
---|---|---|
1476–1478 | I think this is a bit vague. In particular I don't think the LangRef defines what a "thread" means anywhere. I also think this needs to be more clear on what kinds of synchronization is allowed. Is this only communication through some addressable memory? What about GPU cross lane communication operations? I'm wondering if this is sufficient to solve this problem: http://lists.llvm.org/pipermail/llvm-dev/2013-November/067359.html |
llvm/docs/LangRef.rst | ||
---|---|---|
1476–1478 | This is also mentioned as a proper attribute here (which I would greatly prefer to adding another string attribute), but only handled as a string attribute |
llvm/docs/LangRef.rst | ||
---|---|---|
1476–1478 | That is a good point. I was initially thinking string attributes are fine but D62784 seems to be stuck which makes the testing of them hard. Long story short, lets make them enum attributes. @sstefan1 could you please make this a proper enum attribute? This will require some additional "mechanics" in: Could be more though. Look for an existing attribute, e.g. Cold, and how that is handled. @uenoku Could you please also make nofree an enum attribute? |
llvm/docs/LangRef.rst | ||
---|---|---|
1476–1478 |
I did think/hope we do not have to. There is the implicit execution thread and nosync says there is "nothing else" while the function is executed. Basically, there are no side-effects that did not originate from the code we see. Please object if you think this is not sufficient.
None, if nosync is present.
I'd say, not allowed if nosync is present.
I tried to expose that lately [1] but failed, do you have an example? |
Sorry for my delayed review, I want to move this ahead now and commit it.
Can you make sure this works on top-of-trunk (origin/master), hence, rebase it please. Also make sure make check-all works without problems.
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
173 | This should now be checked in the switch below, it is an enum attribute now. | |
365 | Delete this please. | |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
2 ↗ | (On Diff #204513) | You need to enable the attributor explicitly, for now. |
Just realized that the basic attribute test in test/Bitcode/attributes.ll is missing, see https://reviews.llvm.org/D49165#change-K8gwLFRSXwEe for an example.
Please add tests for the things I mention in comments, as well as:
- relaxed volatile atomic load / store
- inline assembly
llvm/include/llvm/Transforms/IPO/Attributor.h | ||
---|---|---|
381 ↗ | (On Diff #204513) | "Underlying" |
llvm/lib/Transforms/IPO/Attributor.cpp | ||
398 | A fence with singlethread sync scope doesn't sync with other threads, even if seq_cst. | |
415 | We probably want llvm_unreachable here, so the code gets updated if we add new atomic operations. | |
425 | You're missing memcpy and similar intrinsics, I think you want to handle them here and not in the generic intrinsic handling. | |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
5 ↗ | (On Diff #204513) | "designet"? |
llvm/docs/LangRef.rst | ||
---|---|---|
1476–1478 | Is it then disallowed to merge any calls that aren't nosync? e.g. if (foo) bar(x) readnone else bar(y) readnone is no longer legal to combine these as bar(foo ? x : y) readnone |
Only added one test for now. I will add more tomorrow.
@jdoerfert As for the intrinsics. I added checks for memcpy, memmove & memset as @jfb suggested. What do you think? I also kept the FIXME comment which can indicate that we might take a different approach. I did make check-all there were few problems with some FunctionAttr tests not checking for nosync attributes. I will fix that tomorrow. Also I seem to have a problem with a test/Bitcode/attributes.ll with nobuiltin attribute.
This does seem useful, although the description is overly narrow (what does nosync on its own have to do with freeing memory?).
I also think that the definition of nosync needs some work, as just "synchronization" is a rather vague term. Can you define it in terms of fences and atomic instructions instead, e.g. by saying that a nosync function does not perform such operations (or some subset of such operations)?
llvm/docs/LangRef.rst | ||
---|---|---|
1476–1478 | No, I think that would still be allowed. The sync (aka not-nosync) functions have a potential side effect in terms of the memory model, but it's the same side effect in either case since the memory model at this point doesn't care about subgroups. I guess you're thinking of subgroup operations, but the issue with those is that the set of threads with which communication occurs is a function of where the operation occurs in control flow. It makes sense to keep that issue separate from this attribute. | |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
293–295 ↗ | (On Diff #206764) | The negative check line here is missing. |
The idea was to use this with nofree for dereferencable, like @hfinkel proposed in this email.
I also think that the definition of nosync needs some work, as just "synchronization" is a rather vague term. Can you define it in terms of fences and atomic instructions instead, e.g. by saying that a nosync function does not perform such operations (or some subset of such operations)?
I will update the definition with more details. Maybe I should put that in patch description instead of the current (narrow) one?
llvm/docs/LangRef.rst | ||
---|---|---|
1476–1478 |
|
llvm/docs/LangRef.rst | ||
---|---|---|
1476–1478 | This should mention that synchronization means through some kind of memory side-effect. This needs to be distinguished from a cross-lane operations, which could be interpreted as a kind of of synchronization where treating it as a memory dependence is not sufficient |
You should also add a test function with inline assembly.
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
403 | "then" | |
466 | "improve" | |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
317 ↗ | (On Diff #206925) | I don't think you can generally treat intrinsics as nosync. Unless you know they're actually nosync you should assume that intrinsics might synchronize. For example: int a; void i_totally_sync() { __builtin_ia32_clflush(&a); } Corresponds to: tail call void @llvm.x86.sse2.clflush(i8* bitcast (i32* @a to i8*)) You should have a test for this, and it should definitely *not* be nosync. The other option here is to go and add a field to all intrinsics, so when creating a new one we have to figure out whether it'll definitely sync, maybe sync, or never sync. I don't think that's in scope for this patch. |
llvm/docs/LangRef.rst | ||
---|---|---|
1475–1482 | Thanks, I think this is better, but there are still some problems:
I would put the part about cross-lane operations at the end and rephrase it slightly for clarity. Suggestion:
Assuming we can agree about the actual statement of that last sentence... |
llvm/docs/LangRef.rst | ||
---|---|---|
1475–1482 |
This proposed change, and the one requested earlier and integrated (sync goes through memory), are problematic. An alternative proposal would be: This function attribute indicates that the function does not communicate (synchronize) with another thread through memory or other well-defined means. Synchronization is considered possible in the presence of `atomic` accesses that enforce an order, thus not "unordered" and "monotonic", `volatile` accesses, as well as `convergent` function calls. Note that through the latter non-memory communication, e.g., cross-lane operations, is also considered synchronization. If an annotated function does ever synchronize with another thread, the behavior is undefined. If this is where we are heading, we need to make sure we test:
| |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
317 ↗ | (On Diff #206925) |
Good point. Maybe the best way (for now and in general) is to "not look for" intrinsics. Use the same logic for all instructions. That is, if it is a call and not annotated as no-sync it may-sync. The test with llvm.cos can be adjusted by adding readnone to the decleration of`llvm.cos`.
Agreed, we will have to do that for various attributes at some point (soon) but not in this patch. |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
---|---|---|
317 ↗ | (On Diff #206925) | I'd still accept the volatile mem* intrinsics as is already done, but otherwise yeah intrinsics should be assumed to synchronize. |
llvm/docs/LangRef.rst | ||
---|---|---|
1475–1482 | What makes it unusable exactly? This wording confuses me:
I'm not 100% comfortable specifically referring to convergent, since I'm still worried about the yet-to-be-defined anticonvergent attribute. Though it is hard to define something around an unsolved problem. This phrasing also implies to me that call site merging is not legal, which is what I thought you were trying to avoid. Conclusion 2 sounds OK to me. Conclusion 1 sounds like the opposite of what the goal is? |
llvm/docs/LangRef.rst | ||
---|---|---|
1475–1482 |
nosync would then still allow non-memory synchronization which it shouldn't. I think the IRC conversation helped. What I want us to have is: nosync means no synchronization/communication between "threads". Any potential synchronization, e.g., through memory or registers, precludes nosync. | |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
317 ↗ | (On Diff #206925) | Agreed. |
llvm/docs/LangRef.rst | ||
---|---|---|
1475–1482 |
Is that recorded somewhere?
This is questionable. There are convergent operations that do not imply synchronization. For example, some of the llvm.amdgcn.image.sample.* intrinsics are convergent, but they do not imply any kind of synchronization in the memory model. In Vulkan/SPIR-V parlance, the intrinsic may have an implied control barrier, but it definitely has no memory barrier (the control barrier part isn't fully spec'd out in SPIR-V either at the moment). For the initial intended usage of this attribute: if there is a pointer that you know to be dereferencable before the image sample, then you still know it to be dereferencable afterwards. So it seems reasonable to want the intrinsic to be marked both convergent and nosync. That said, I'm okay with this part of it:
... so long as it's understood that those are "merely" the rules for the attributor. |
llvm/docs/LangRef.rst | ||
---|---|---|
1475–1482 |
For me, nosync has to mean absence of any kind of synchronization, including control barriers.
I see why you want this but I don't think that is what it should mean. nosync should not allow control synchronization as it will inevitably cause problems down the road. So, let me rephrase my earlier comment: By default, we have to assume a convergent & readnone function might cause control synchronization between threads and is therefore not nosync. However, a function can be convergent and nosync. Finally, a function that is not-convergent and readonly is nosync. |
llvm/docs/LangRef.rst | ||
---|---|---|
1475–1482 |
I think that this is important. We can mark convergent intrinsics that don't provide synchronizing semantics as nosync. In general, we need a nosync attribute to mean that, in the function marked as nosync, the current thread cannot complete communication with any other threads (e.g., it can't send a value to another thread). The interesting thing, to me, that has been highlighted in this discussion is: convergent functions, by default, can have things like inter-thread register shuffles, but are otherwise readnone, and so must be excluded from automated nosync deduction (because, without accessing memory at all, communicate values to other threads). |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
---|---|---|
317 ↗ | (On Diff #206925) | Replaced llvm.cos test with inline assembly. llvm.cos test was my mistake, since with the current implementation it would be considered sync. |
293–295 ↗ | (On Diff #206764) | I skipped the negative check line, because for now only nosync is deduced and if function is not nosync there will be no Function Attrs at all. Once we have some more attributes, I'll add the negative check. Is that ok with you? |
- non-convergent and readnone check.
- Changed handling of intrinsics.
- Added more tests.
@jdoerfert, @jfb, @arsenm, @nhaehnle does this look alright now?
I added last minor comments from my side. Other than that I think this looks fine. We will have to wait for the others though.
(You will need to rebase and make sure ninja check-all passes because there are other new attributes.)
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
444 | I think the "unknown" case is handled the wrong way here. Shouldn't it be: if (Arg->getType()->isIntegerTy(1) && cast<ConstantInt>(Arg)->getValue() == 0) return true; return false; such that "unknown" values, e.g., %cmp = icmp ... used as the 4th argument will conservatively make it sync? (+ Test case for this) | |
499 | I was puzzled by this check for a second, add a comment indicating that the above loop handles calls with read/write effects already. Mention that the fact there is a read/write effect caused us already to make sure it is nosync and there is consequently no need to check for convergent. | |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
311 ↗ | (On Diff #208469) | Copy&paste |
350 ↗ | (On Diff #208469) | Shouldn't this be nosync? Is it? |
llvm/lib/Bitcode/Reader/BitcodeReader.cpp | ||
---|---|---|
1199 ↗ | (On Diff #208469) | I think, you can remove this change. All should be fine without. |
llvm/lib/Transforms/IPO/Attributor.cpp | ||
444 | Agreed, not necessary. However, if you keep it this way, add the above reasoning to the comment, it confused me now and it can easily confuse the next person. My advice, just swap the order to make it easier for people now and in the future ;) |
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
453–466 | I guess we should not reach this function with calls. If that seams reasonable, we need an assert here and change the source below to skip these checks if a call is assumed/known nosync. |
Maybe add something like: