This was done for CUDA functions in r261779, and for the same
reason this also needs to be done for OpenCL. An arbitrary
function could have a barrier() call in it, which in turn
requires the calling function to be convergent.
Details
Diff Detail
Event Timeline
The problem of adding this attribute conservatively for all functions is that it prevents some optimizations to happen. I agree to commit this as a temporary fix to guarantee correctness of generated code. But if we ask to add the convergent attribute into the spec we can avoid doing this in the compiler?
The problem of adding this attribute conservatively for all functions is that it prevents some optimizations to happen.
function-attrs removes the convergent attribute from anything it can prove does not call a convergent function.
I agree this is a nonoptimal solution. A better way would be to assume that any cuda/opencl function is convergent and then figure out what isn't. This would let you generate correct cuda/opencl code in a front-end without worrying about this attribute.
One problem with this approach is, suppose you call an external function, whose body llvm cannot see. We need some way to mark this function as not-convergent, so that its callers can also be inferred to be not convergent. LLVM currently only has a "convergent" attribute. In the absence of a new "not-convergent" attribute, the only way we can tell LLVM that this external function is not convergent is to leave off the attribute. But then this means we assume all functions without the convergent attribute are not convergent, and thus we have to add the attribute everywhere, as this patch does.
OTOH if we added a not-convergent attribute, we'd have to have rules about what happens if both attributes are on a function, and everywhere that checked whether a function was convergent would become significantly more complicated. I'm not sure that's worthwhile.
Yes, that's why if it would be responsibility of the kernel developer to specify this explicitly we could avoid this complications in the compiler. But if we add it into the language now we still need to support the correctness for the code written with the earlier standards. And also it adds the complexity to the programmer to make sure it's specified correctly. But I think it is still worth discussing with the spec committee.
The deduction of convergent is indeed tricky. So if there is any function in the CFG path which is marked as convergent ( or "non-convergent") this will have to be back propagated to the callers unless we force to explicitly specify it but it would be too error prone for the kernel writers I guess. Btw, what is the advantage of having "non-convergent" instead and why is the deduction of convergent property more complicated with it?
test/CodeGenOpenCL/convergent.cl | ||
---|---|---|
130 | We won't have noduplicate any more? |
Yes, that's why if it would be responsibility of the kernel developer to specify this explicitly we could avoid this complications in the compiler. But if we add it into the language now we still need to support the correctness for the code written with the earlier standards. And also it adds the complexity to the programmer to make sure it's specified correctly. But I think it is still worth discussing with the spec committee.
To me this seems like a small complication in the compiler to avoid an extremely easy bug for users to write. But, not my language. :)
The deduction of convergent is indeed tricky. So if there is any function in the CFG path which is marked as convergent ( or "non-convergent") this will have to be back propagated to the callers unless we force to explicitly specify it but it would be too error prone for the kernel writers I guess.
This probably isn't the right forum to discuss proposals to change the LLVM IR spec. But if you want to propose something like this, please cc me on the thread, I probably have opinions. :)
Btw, what is the advantage of having "non-convergent" instead and why is the deduction of convergent property more complicated with it?
The advantage of switching LLVM IR to non-convergent would be that front-ends wouldn't have the bug that arsenm is fixing here. "Unadorned" IR would be correct. And, in the absence of external or unanalyzable indirect calls, you'd get the same performance as we get today even if you had no annotations.
The complexity I was referring to occurs if you add the non-convergent attribute and keep the convergent attr. I don't think we want that.
But I'm not really proposing a change to the convergent attribute in LLVM IR -- it's probably better to leave it as-is, since we all understand how it works, it ain't broke.
This is one of those unfortunate things we had to do for correctness in CUDA, and the situation seems the same here. When we're not doing separate compilation (which I imagine we're also generally not doing for OpenCL complication), I'm under the impression that the attribute removal is fairly effective.
But if we ask to add the convergent attribute into the spec we can avoid doing this in the compiler?
But even if you do that, would that not be in a future version of OpenCL? If so, for code complying to current standards, you'd need this behavior.
Do we need an option to disable this? In case it causes regression in some applications and users want to disable it. At least for debugging.
test/CodeGenOpenCL/convergent.cl | ||
---|---|---|
73 | check the attribute has convergent | |
95 | need to check the attribute is convergent | |
118 | need to check the attribute has noduplicate | |
127 | check the attribute has noduplicate |
I agree both communities would benefit so it feels like it might be worth the effort.
But if we ask to add the convergent attribute into the spec we can avoid doing this in the compiler?
But even if you do that, would that not be in a future version of OpenCL? If so, for code complying to current standards, you'd need this behavior.
Yes, the fix is needed anyway to provide backwards compatibility.
Yes, I think I would perhaps argue for inclusion of non-convergent instead since this option seems to make more sense.
The deduction of convergent is indeed tricky. So if there is any function in the CFG path which is marked as convergent ( or "non-convergent") this will have to be back propagated to the callers unless we force to explicitly specify it but it would be too error prone for the kernel writers I guess.
This probably isn't the right forum to discuss proposals to change the LLVM IR spec. But if you want to propose something like this, please cc me on the thread, I probably have opinions. :)
Will do! If we have bigger use case it would be easier to get accepted. I will check with the OpenCL community first and see if there is an agreement internally.
Btw, what is the advantage of having "non-convergent" instead and why is the deduction of convergent property more complicated with it?
The advantage of switching LLVM IR to non-convergent would be that front-ends wouldn't have the bug that arsenm is fixing here. "Unadorned" IR would be correct. And, in the absence of external or unanalyzable indirect calls, you'd get the same performance as we get today even if you had no annotations.
Yes, I see this sounds more reasonable indeed. Btw, currently LLVM can remove convergent in some cases to recover the performance loss?
The complexity I was referring to occurs if you add the non-convergent attribute and keep the convergent attr. I don't think we want that.
I think keeping both would add more confusions and hence result in even more errors/complications.
But I'm not really proposing a change to the convergent attribute in LLVM IR -- it's probably better to leave it as-is, since we all understand how it works, it ain't broke.
But at the same time since we already know what we needed redesign should be easier. Alternative option would be to add convergent during IR generation as default option and no attribute where non-convergent is set. This way at least we give programmer a way to achieve higher performance. But of course it wouldn't be ideal to be inconsistent with the IR. Currently as I can see there is a little use of convergent in the frontend since we set it for all functions anyways. The problem is that it wouldn't be possible to remove it immediately. But we can at least deprecate it for a start.
...
Yes, I see this sounds more reasonable indeed. Btw, currently LLVM can remove convergent in some cases to recover the performance loss?
Yes. See removeConvergentAttrs in lib/Transforms/IPO/FunctionAttrs.cpp.
...
r315094
test/CodeGenOpenCL/convergent.cl | ||
---|---|---|
130 | noduplicate is problematic for the same reason that an unknown call could have noduplicate. We should probably just remove noduplicate entirely. |
check the attribute has convergent