Introduce and deduce "nosync" function attribute to indicate that a function does not synchronize with another thread in a way that other thread might free memory.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Sorry for my delayed review, I want to move this ahead now and commit it.
Can you make sure this works on top-of-trunk (origin/master), hence, rebase it please. Also make sure make check-all works without problems.
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
93 ↗ | (On Diff #204513) | This should now be checked in the switch below, it is an enum attribute now. |
281 ↗ | (On Diff #204513) | Delete this please. |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
2 ↗ | (On Diff #204513) | You need to enable the attributor explicitly, for now. |
Just realized that the basic attribute test in test/Bitcode/attributes.ll is missing, see https://reviews.llvm.org/D49165#change-K8gwLFRSXwEe for an example.
Please add tests for the things I mention in comments, as well as:
- relaxed volatile atomic load / store
- inline assembly
llvm/include/llvm/Transforms/IPO/Attributor.h | ||
---|---|---|
381 ↗ | (On Diff #204513) | "Underlying" |
llvm/lib/Transforms/IPO/Attributor.cpp | ||
314 ↗ | (On Diff #204513) | A fence with singlethread sync scope doesn't sync with other threads, even if seq_cst. |
331 ↗ | (On Diff #204513) | We probably want llvm_unreachable here, so the code gets updated if we add new atomic operations. |
341 ↗ | (On Diff #204513) | You're missing memcpy and similar intrinsics, I think you want to handle them here and not in the generic intrinsic handling. |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
5 ↗ | (On Diff #204513) | "designet"? |
llvm/docs/LangRef.rst | ||
---|---|---|
1476–1478 ↗ | (On Diff #203655) | Is it then disallowed to merge any calls that aren't nosync? e.g. if (foo) bar(x) readnone else bar(y) readnone is no longer legal to combine these as bar(foo ? x : y) readnone |
Only added one test for now. I will add more tomorrow.
@jdoerfert As for the intrinsics. I added checks for memcpy, memmove & memset as @jfb suggested. What do you think? I also kept the FIXME comment which can indicate that we might take a different approach. I did make check-all there were few problems with some FunctionAttr tests not checking for nosync attributes. I will fix that tomorrow. Also I seem to have a problem with a test/Bitcode/attributes.ll with nobuiltin attribute.
This does seem useful, although the description is overly narrow (what does nosync on its own have to do with freeing memory?).
I also think that the definition of nosync needs some work, as just "synchronization" is a rather vague term. Can you define it in terms of fences and atomic instructions instead, e.g. by saying that a nosync function does not perform such operations (or some subset of such operations)?
llvm/docs/LangRef.rst | ||
---|---|---|
1476–1478 ↗ | (On Diff #203655) | No, I think that would still be allowed. The sync (aka not-nosync) functions have a potential side effect in terms of the memory model, but it's the same side effect in either case since the memory model at this point doesn't care about subgroups. I guess you're thinking of subgroup operations, but the issue with those is that the set of threads with which communication occurs is a function of where the operation occurs in control flow. It makes sense to keep that issue separate from this attribute. |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
293–295 ↗ | (On Diff #206764) | The negative check line here is missing. |
The idea was to use this with nofree for dereferencable, like @hfinkel proposed in this email.
I also think that the definition of nosync needs some work, as just "synchronization" is a rather vague term. Can you define it in terms of fences and atomic instructions instead, e.g. by saying that a nosync function does not perform such operations (or some subset of such operations)?
I will update the definition with more details. Maybe I should put that in patch description instead of the current (narrow) one?
llvm/docs/LangRef.rst | ||
---|---|---|
1476–1478 ↗ | (On Diff #203655) |
|
llvm/docs/LangRef.rst | ||
---|---|---|
1482–1484 ↗ | (On Diff #206764) | This should mention that synchronization means through some kind of memory side-effect. This needs to be distinguished from a cross-lane operations, which could be interpreted as a kind of of synchronization where treating it as a memory dependence is not sufficient |
You should also add a test function with inline assembly.
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
315 ↗ | (On Diff #206925) | "then" |
378 ↗ | (On Diff #206925) | "improve" |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
317 ↗ | (On Diff #206925) | I don't think you can generally treat intrinsics as nosync. Unless you know they're actually nosync you should assume that intrinsics might synchronize. For example: int a; void i_totally_sync() { __builtin_ia32_clflush(&a); } Corresponds to: tail call void @llvm.x86.sse2.clflush(i8* bitcast (i32* @a to i8*)) You should have a test for this, and it should definitely *not* be nosync. The other option here is to go and add a field to all intrinsics, so when creating a new one we have to figure out whether it'll definitely sync, maybe sync, or never sync. I don't think that's in scope for this patch. |
llvm/docs/LangRef.rst | ||
---|---|---|
1481–1488 ↗ | (On Diff #206925) | Thanks, I think this is better, but there are still some problems:
I would put the part about cross-lane operations at the end and rephrase it slightly for clarity. Suggestion:
Assuming we can agree about the actual statement of that last sentence... |
llvm/docs/LangRef.rst | ||
---|---|---|
1481–1488 ↗ | (On Diff #206925) |
This proposed change, and the one requested earlier and integrated (sync goes through memory), are problematic. An alternative proposal would be: This function attribute indicates that the function does not communicate (synchronize) with another thread through memory or other well-defined means. Synchronization is considered possible in the presence of `atomic` accesses that enforce an order, thus not "unordered" and "monotonic", `volatile` accesses, as well as `convergent` function calls. Note that through the latter non-memory communication, e.g., cross-lane operations, is also considered synchronization. If an annotated function does ever synchronize with another thread, the behavior is undefined. If this is where we are heading, we need to make sure we test:
|
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
317 ↗ | (On Diff #206925) |
Good point. Maybe the best way (for now and in general) is to "not look for" intrinsics. Use the same logic for all instructions. That is, if it is a call and not annotated as no-sync it may-sync. The test with llvm.cos can be adjusted by adding readnone to the decleration of`llvm.cos`.
Agreed, we will have to do that for various attributes at some point (soon) but not in this patch. |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
---|---|---|
317 ↗ | (On Diff #206925) | I'd still accept the volatile mem* intrinsics as is already done, but otherwise yeah intrinsics should be assumed to synchronize. |
llvm/docs/LangRef.rst | ||
---|---|---|
1481–1488 ↗ | (On Diff #206925) | What makes it unusable exactly? This wording confuses me:
I'm not 100% comfortable specifically referring to convergent, since I'm still worried about the yet-to-be-defined anticonvergent attribute. Though it is hard to define something around an unsolved problem. This phrasing also implies to me that call site merging is not legal, which is what I thought you were trying to avoid. Conclusion 2 sounds OK to me. Conclusion 1 sounds like the opposite of what the goal is? |
llvm/docs/LangRef.rst | ||
---|---|---|
1481–1488 ↗ | (On Diff #206925) |
nosync would then still allow non-memory synchronization which it shouldn't. I think the IRC conversation helped. What I want us to have is: nosync means no synchronization/communication between "threads". Any potential synchronization, e.g., through memory or registers, precludes nosync. |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
317 ↗ | (On Diff #206925) | Agreed. |
llvm/docs/LangRef.rst | ||
---|---|---|
1481–1488 ↗ | (On Diff #206925) |
Is that recorded somewhere?
This is questionable. There are convergent operations that do not imply synchronization. For example, some of the llvm.amdgcn.image.sample.* intrinsics are convergent, but they do not imply any kind of synchronization in the memory model. In Vulkan/SPIR-V parlance, the intrinsic may have an implied control barrier, but it definitely has no memory barrier (the control barrier part isn't fully spec'd out in SPIR-V either at the moment). For the initial intended usage of this attribute: if there is a pointer that you know to be dereferencable before the image sample, then you still know it to be dereferencable afterwards. So it seems reasonable to want the intrinsic to be marked both convergent and nosync. That said, I'm okay with this part of it:
... so long as it's understood that those are "merely" the rules for the attributor. |
llvm/docs/LangRef.rst | ||
---|---|---|
1481–1488 ↗ | (On Diff #206925) |
For me, nosync has to mean absence of any kind of synchronization, including control barriers.
I see why you want this but I don't think that is what it should mean. nosync should not allow control synchronization as it will inevitably cause problems down the road. So, let me rephrase my earlier comment: By default, we have to assume a convergent & readnone function might cause control synchronization between threads and is therefore not nosync. However, a function can be convergent and nosync. Finally, a function that is not-convergent and readonly is nosync. |
llvm/docs/LangRef.rst | ||
---|---|---|
1481–1488 ↗ | (On Diff #206925) |
I think that this is important. We can mark convergent intrinsics that don't provide synchronizing semantics as nosync. In general, we need a nosync attribute to mean that, in the function marked as nosync, the current thread cannot complete communication with any other threads (e.g., it can't send a value to another thread). The interesting thing, to me, that has been highlighted in this discussion is: convergent functions, by default, can have things like inter-thread register shuffles, but are otherwise readnone, and so must be excluded from automated nosync deduction (because, without accessing memory at all, communicate values to other threads). |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
---|---|---|
293–295 ↗ | (On Diff #206764) | I skipped the negative check line, because for now only nosync is deduced and if function is not nosync there will be no Function Attrs at all. Once we have some more attributes, I'll add the negative check. Is that ok with you? |
317 ↗ | (On Diff #206925) | Replaced llvm.cos test with inline assembly. llvm.cos test was my mistake, since with the current implementation it would be considered sync. |
- non-convergent and readnone check.
- Changed handling of intrinsics.
- Added more tests.
@jdoerfert, @jfb, @arsenm, @nhaehnle does this look alright now?
I added last minor comments from my side. Other than that I think this looks fine. We will have to wait for the others though.
(You will need to rebase and make sure ninja check-all passes because there are other new attributes.)
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
357 ↗ | (On Diff #208469) | I think the "unknown" case is handled the wrong way here. Shouldn't it be: if (Arg->getType()->isIntegerTy(1) && cast<ConstantInt>(Arg)->getValue() == 0) return true; return false; such that "unknown" values, e.g., %cmp = icmp ... used as the 4th argument will conservatively make it sync? (+ Test case for this) |
412 ↗ | (On Diff #208469) | I was puzzled by this check for a second, add a comment indicating that the above loop handles calls with read/write effects already. Mention that the fact there is a read/write effect caused us already to make sure it is nosync and there is consequently no need to check for convergent. |
llvm/test/Transforms/FunctionAttrs/nosync.ll | ||
311 ↗ | (On Diff #208469) | Copy&paste |
350 ↗ | (On Diff #208469) | Shouldn't this be nosync? Is it? |
llvm/lib/Bitcode/Reader/BitcodeReader.cpp | ||
---|---|---|
1199 ↗ | (On Diff #208469) | I think, you can remove this change. All should be fine without. |
llvm/lib/Transforms/IPO/Attributor.cpp | ||
357 ↗ | (On Diff #208469) | Agreed, not necessary. However, if you keep it this way, add the above reasoning to the comment, it confused me now and it can easily confuse the next person. My advice, just swap the order to make it easier for people now and in the future ;) |
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
352–355 ↗ | (On Diff #208469) | It would be clearer to cast to MemIntrinsic and check isVolatile |
366–379 ↗ | (On Diff #208469) | I'm pretty sure this is repeated in several passes, and incomplete. Target intrinsics can also be considered volatile, as there is a hook to get the memory properties for them |
llvm/lib/Transforms/IPO/Attributor.cpp | ||
---|---|---|
366–379 ↗ | (On Diff #208469) | I guess we should not reach this function with calls. If that seams reasonable, we need an assert here and change the source below to skip these checks if a call is assumed/known nosync. |
LGTM with nits
llvm/include/llvm/Transforms/IPO/Attributor.h | ||
---|---|---|
689 ↗ | (On Diff #209013) | Don't need virtual, only override |
llvm/lib/Transforms/IPO/Attributor.cpp | ||
745 ↗ | (On Diff #209013) | No virtual necessary (and for the rest of the overrides) |
llvm/test/Bitcode/attributes.ll | ||
367 ↗ | (On Diff #209013) | Brace placement |