This is an archive of the discontinued LLVM Phabricator instance.

[OpenCL][ARM] Fix ICE when compiling a kernel
Needs ReviewPublic

Authored by Anastasia on Jun 24 2021, 7:19 AM.

Details

Reviewers
pekka
svenvh
Summary

Compiling any OpenCL source for Arm CPU is pointless because when the kernel is used the compilation will fail due to unsupported calling convention SPIR_KERNEL.

The backend can't handle this and it probably doesn't make much sense to use SPIR-specific logic for Arm considering that it has been discontinued. At the same time, there is no special calling convention added for C++ threads. For now, I added regular C calling conversions by default for all OpenCL kernels in Arm CPU. This prevents the crash and allows to start using the functionality in some projects i.e. https://github.com/compiler-explorer/compiler-explorer/issues/2741.

In a long term, we might need to do something more than this to provide fully functional support.

Fixes PR50841!

Diff Detail

Event Timeline

Anastasia created this revision.Jun 24 2021, 7:19 AM
Anastasia created this object with edit policy "Anastasia (Anastasia Stulova)".
Anastasia requested review of this revision.Jun 24 2021, 7:19 AM
Anastasia retitled this revision from Possible fix for PR50841 to [OpenCL][ARM] Fix ICE when compiling a source with the kernel.Jun 28 2021, 9:24 AM
Anastasia edited the summary of this revision. (Show Details)
Anastasia updated this revision to Diff 355166.Jun 29 2021, 3:19 AM
Anastasia retitled this revision from [OpenCL][ARM] Fix ICE when compiling a source with the kernel to [OpenCL][ARM] Fix ICE when compiling a kernel.
Anastasia edited the summary of this revision. (Show Details)
Anastasia added reviewers: pekka, svenvh.
Anastasia added a subscriber: cfe-commits.
Anastasia edited the summary of this revision. (Show Details)Jun 29 2021, 9:07 AM

Does this break clSetKernelArg() for ARM CPUs in PoCL? I believe so far it has worked for ARM - just wondering why you decide to drop it now?

https://github.com/pocl/pocl/issues/1

Does this break clSetKernelArg() for ARM CPUs in PoCL? I believe so far it has worked for ARM - just wondering why you decide to drop it now?

https://github.com/pocl/pocl/issues/1

The reason why we would like to fix it is that upstream clang has a crash currently when OpenCL sources are compiled for any Arm CPU: https://bugs.llvm.org/show_bug.cgi?id=50841. Do you have any other suggestions to avoid this problem?

FYI clang also emits kernel metadata that can be used to detect kernels...

The reason why we would like to fix it is that upstream clang has a crash currently when OpenCL sources are compiled for any Arm CPU: https://bugs.llvm.org/show_bug.cgi?id=50841. Do you have any other suggestions to avoid this problem?

FYI clang also emits kernel metadata that can be used to detect kernels...

Unfortunately I cannot look at this in detail right now, but I'll reply quickly for a heads up: The problem was not how to detect kernels, but the ABI/CC mismatch to the
clSetKerneArg() with (user facing) arguments that get split to multiple args or vice versa. E.g. x86 ABI had a CC where 2xfloat gets to 1xdouble in the generated function
finger print, thus there was no 1:1 match of the kernel arguments to the OpenCL-facing ones (making clSetKernelArg calls difficult to implement robustly).

IIRC, SPIR_KERNEL CC was used to force the 1:1 mapping and produce a portable way to handle this OpenCL API e.g. with struct args etc.

The reason why we would like to fix it is that upstream clang has a crash currently when OpenCL sources are compiled for any Arm CPU: https://bugs.llvm.org/show_bug.cgi?id=50841. Do you have any other suggestions to avoid this problem?

FYI clang also emits kernel metadata that can be used to detect kernels...

Unfortunately I cannot look at this in detail right now, but I'll reply quickly for a heads up: The problem was not how to detect kernels, but the ABI/CC mismatch to the
clSetKerneArg() with (user facing) arguments that get split to multiple args or vice versa. E.g. x86 ABI had a CC where 2xfloat gets to 1xdouble in the generated function
finger print, thus there was no 1:1 match of the kernel arguments to the OpenCL-facing ones (making clSetKernelArg calls difficult to implement robustly).

IIRC, SPIR_KERNEL CC was used to force the 1:1 mapping and produce a portable way to handle this OpenCL API e.g. with struct args etc.

Ok, thanks for clarification. Does it mean there is something we need to add to LLVM somewhere to make it work correctly? Would it be specific to Arm or generally for all CPU targets?

Ok, thanks for clarification. Does it mean there is something we need to add to LLVM somewhere to make it work correctly? Would it be specific to Arm or generally for all CPU targets?

To my understanding, what is required is to keep SPIR_KERNEL CC working for Arm since that is the generic way for PoCL supports clSetKernelArg for CPU targets by default. I have been under impression that it has worked so far for Arm, but I haven't tested it personally, and PoCL still lacks an official Arm maintainer, so cannot say for sure.

Ok, thanks for clarification. Does it mean there is something we need to add to LLVM somewhere to make it work correctly? Would it be specific to Arm or generally for all CPU targets?

To my understanding, what is required is to keep SPIR_KERNEL CC working for Arm since that is the generic way for PoCL supports clSetKernelArg for CPU targets by default. I have been under impression that it has worked so far for Arm, but I haven't tested it personally, and PoCL still lacks an official Arm maintainer, so cannot say for sure.

Ok, but as is clang can't compile OpenCL sources

Ok, thanks for clarification. Does it mean there is something we need to add to LLVM somewhere to make it work correctly? Would it be specific to Arm or generally for all CPU targets?

To my understanding, what is required is to keep SPIR_KERNEL CC working for Arm since that is the generic way for PoCL supports clSetKernelArg for CPU targets by default. I have been under impression that it has worked so far for Arm, but I haven't tested it personally, and PoCL still lacks an official Arm maintainer, so cannot say for sure.

Well right now clang doesn't seem to cope with SPIR_KERNEL for Arm CPU as it crashes its instruction selection https://godbolt.org/z/rraG9roee. So the flow is definitely broken. And if we fix the failure we can use this properly in godbolt instead of working around by removing the kernel in the preprocessor.

stuart added a comment.EditedAug 26 2021, 1:33 PM

Unfortunately I cannot look at this in detail right now, but I'll reply quickly for a heads up: The problem was not how to detect kernels, but the ABI/CC mismatch to the clSetKerneArg() with (user facing) arguments that get split to multiple args or vice versa. E.g. x86 ABI had a CC where 2xfloat gets to 1xdouble in the generated function finger print, thus there was no 1:1 match of the kernel arguments to the OpenCL-facing ones (making clSetKernelArg calls difficult to implement robustly).

Could you not just use libffi?