This is an archive of the discontinued LLVM Phabricator instance.

[libclc] Add clspv target for libclc
ClosedPublic

Authored by alan-baker on Jan 4 2021, 8:32 AM.

Details

Summary

Add clspv as a new target for libclc. clspv is an open-source compiler that compiles OpenCL C to Vulkan SPIR-V. Compiles for the spir target.

The clspv target differs from the the spirv target in the following ways

  • fma is modified to use uint2 instead of ulong for mantissas. This results in lower performance fma, but provides a implementation that can be used on more Vulkan devices where 64-bit integer support is less common.
  • Use of a software implementation of nextafter because the generic implementation depends on nextafter being a defined builtin function for which clspv has no definition.
  • Full optimization of the library (-O3) and no conversion to SPIR-V

This library is close to what would be produced by running opt -O3 < builtins.opt.spirv-mesa3d-.bc > builtins.opt.clspv--.bc and continuing the build from that point.

Diff Detail

Event Timeline

alan-baker created this revision.Jan 4 2021, 8:32 AM
alan-baker requested review of this revision.Jan 4 2021, 8:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2021, 8:32 AM

Anyone able to review this patch?

Is the fma.cl file mostly copied from another file that is already in the tree?

Is the fma.cl file mostly copied from another file that is already in the tree?

Yes, it is mostly based on the generic software implementation of fma, but instead of using 64-bit integers for the mantissa, it instead uses uint2 and handles all the carry/borrow logic explicitly. This is because clspv translates to Vulkan SPIR-V and the ability to use 64-bit integers is not widely available on many mobile devices. So the uint2 avoids that requirement.

tstellar added inline comments.Feb 10 2021, 8:24 AM
libclc/CMakeLists.txt
55

This looks like an unnecessary whitespace change.

Remove unnecessary whitespace change.

tstellar added inline comments.Feb 10 2021, 10:58 AM
libclc/CMakeLists.txt
239

This looks like more whitespace changes.

libclc/generic/lib/gen_convert.py
358 ↗(On Diff #322753)

Why did this change?

alan-baker added inline comments.Feb 10 2021, 11:12 AM
libclc/generic/lib/gen_convert.py
358 ↗(On Diff #322753)

I can revert this since I don't use the conversion builtins, but the code this generates is obviously wrong. y is being used in its own declaration. This leads to llvm generating undefs in the conversion functions.

Revert conversions change. Fix whitespace changes. Remove dead code.

tstellar added inline comments.Feb 10 2021, 11:25 AM
libclc/generic/lib/gen_convert.py
358 ↗(On Diff #322753)

Ok, this would make sense as a separate patch.

It'd be nice to add an explanation why this can't reuse the existing SPIR-V targets and needs to add a new one.
Does the new FMA implementation generate significantly worse code (EG/NI + mesa-spirv should be the only users) than the original? Should it just be replaced? I assume it passes the conformance suite.

libclc/CMakeLists.txt
289

this looks exactly like the default branch. why is a special case needed?

I apologize. The last couple patches were based off an older WIP branch. The target is meant to be named differently, hopefully that will clarify,

Fix diff to be based on the correct change.

alan-baker added a comment.EditedFeb 11 2021, 6:02 AM

It'd be nice to add an explanation why this can't reuse the existing SPIR-V targets and needs to add a new one.
Does the new FMA implementation generate significantly worse code (EG/NI + mesa-spirv should be the only users) than the original? Should it just be replaced? I assume it passes the conformance suite.

The FMA does pass conformance. I don't have performance numbers for the 64-bit version vs the uint2 version. It would be worse, but the whole software FMA is already awful performance. The 64-bit version is simpler to understand, so unless you have the same restrictions as I need for clspv, then I wouldn't adopt it.

Hopefully, the different target name will clarify the intent better. This is targeted for use with a cross compiler to Vulkan SPIR-V. The needs are different than the spir targets.

jvesely added a comment.EditedFeb 14 2021, 10:19 PM

The FMA does pass conformance. I don't have performance numbers for the 64-bit version vs the uint2 version. It would be worse, but the whole software FMA is already awful performance. The 64-bit version is simpler to understand, so unless you have the same restrictions as I need for clspv, then I wouldn't adopt it.

Hopefully, the different target name will clarify the intent better. This is targeted for use with a cross compiler to Vulkan SPIR-V. The needs are different than the spir targets.

There are existing spirv and spirv64 targets. The new clspv target uses the same spir-- triple as the spirv target, but doesn't use the final spv conversion step and uses O3 optimization. The selection of files is mostly a subset of the existing spirv target.

The FMA does pass conformance. I don't have performance numbers for the 64-bit version vs the uint2 version. It would be worse, but the whole software FMA is already awful performance. The 64-bit version is simpler to understand, so unless you have the same restrictions as I need for clspv, then I wouldn't adopt it.

Hopefully, the different target name will clarify the intent better. This is targeted for use with a cross compiler to Vulkan SPIR-V. The needs are different than the spir targets.

There are existing spirv and spirv64 targets. The new clspv target uses the same spir-- triple as the spirv target, but doesn't use the final spv conversion step and uses O3 optimization. The selection of files is mostly a subset of the existing spirv target.

Is there somewhere in the repo you'd suggest I document the differences? Nothing jumped out at me that explains the various backends. In addition to fma, nextafter is also a different implementation than the spirv targets.

The FMA does pass conformance. I don't have performance numbers for the 64-bit version vs the uint2 version. It would be worse, but the whole software FMA is already awful performance. The 64-bit version is simpler to understand, so unless you have the same restrictions as I need for clspv, then I wouldn't adopt it.

Hopefully, the different target name will clarify the intent better. This is targeted for use with a cross compiler to Vulkan SPIR-V. The needs are different than the spir targets.

There are existing spirv and spirv64 targets. The new clspv target uses the same spir-- triple as the spirv target, but doesn't use the final spv conversion step and uses O3 optimization. The selection of files is mostly a subset of the existing spirv target.

Is there somewhere in the repo you'd suggest I document the differences? Nothing jumped out at me that explains the various backends. In addition to fma, nextafter is also a different implementation than the spirv targets.

Adding it to the commit message is enough. On the surface it looks like the only thing preventing you from using builtins.opt.spirv-mesa3d-.bc is the custom implementation of fma and nextafter plus using -O3 optimization.

The commit message should explain what the differences are and why merging fma and nextafter and just reusing existing files is not good enough, and how it is different from adding spirv-- tripple. @daniels should be able to answer questions about the mesa3d part.

alan-baker edited the summary of this revision. (Show Details)Feb 17 2021, 5:46 AM

On the surface it looks like the only thing preventing you from using builtins.opt.spirv-mesa3d-.bc is the custom implementation of fma and nextafter plus using -O3 optimization.

Well, that and the fact that they emit SPIR (i.e. LLVM IR) rather than SPIR-V (to which its only relation is nomenclature). I haven't actually looked at clspv at all recently, but presumably they combine the LLVM IR (aka SPIR) from here together with the LLVM IR gained from running Clang and link them there. Mesa is much much less LLVM-centric, so we want to avoid LLVM IR where possible, hence our choice of SPIR-V output instead.

If the fma implementation still passes the full OpenCL regular-profile CTS then it might be interesting to reuse that implementation in Mesa as well, since as you say, it's going to be a great deal faster (cc @jenatali). Everything else looks reasonable to me & I think it makes sense to merge. Welcome to the layered-CL family!

libclc/generic/lib/gen_convert.py
358 ↗(On Diff #322753)

D81999 has a fix for this already, but I wasn't able to test it, because the SPIR-V output completely bypasses the provided functions in favour of fixed opcodes/decorations/internal-calls anyway

thanks. can you fix the "to avoid to use" part of the description? the "to avoid" part sounds redundant.
It explains why fma needs to be different, but it could use an explanation why it needs sw implementation of nextafter.

It still sounds to me like you can do opt -O3 < builtins.opt.spirv-mesa3d-.bc > builtins.opt.clspv--.bc and continue from there, with little to no difference.
The commit message should have enough information for anyone who would like to try that.

alan-baker edited the summary of this revision. (Show Details)Feb 17 2021, 8:29 PM

Is there anything else necessary to complete this review?

daniels added a comment.EditedFeb 23 2021, 7:02 AM

Is there anything else necessary to complete this review?

Can you please ack D81999 so it can land separately?

Apart from that, I think this MR can & should be landed as is, especially given that both CLon12 and clspv are under reasonably heavy development; with the external libclc dependency making version tracking difficult, it makes sense to keep them separate until they've both settled down more I think. (Note that I'm not an LLVM committer, so this needs either @jvesely or @tsteller to actually land it.)

Is there anything else necessary to complete this review?

sorry for the delay. If you could, please add an explanation about why custom nexafter is necessary, Other than that LGTM

alan-baker edited the summary of this revision. (Show Details)Feb 23 2021, 11:44 AM
jvesely accepted this revision.Feb 24 2021, 11:45 AM

thanks

This revision is now accepted and ready to land.Feb 24 2021, 11:45 AM

Thanks for the review! Would it be possible for someone with commit privileges to land this patch?

jvesely added a comment.EditedFeb 28 2021, 1:01 AM

Thanks for the review! Would it be possible for someone with commit privileges to land this patch?

I can do it over the next week. I you had a link to your local git commit or patch it'd make things easier.

I've pushed the commit to my fork at 30d41a. I just rebased it on the latest main.

jvesely added a comment.EditedMar 3 2021, 8:13 AM

I've pushed the commit to my fork at 30d41a. I just rebased it on the latest main.

thanks. I've prepared it locally. I removed the extra newlines at the end of files,
but it looks like the linked git version introduced quite a few cases of mismatched indentation (mixed tabs and spaces) that are not in the phabricator version (e.g. line 33,34,44, ... in fma.cl).
Did you use a different editor to make adjustments?

My editor clobbers tabs wherever I edit, so fma.cl probably probably has some left over from the original source. I can remove make everything spaces for that file and push an updated commit if you prefer. I think phabricator isn't showing it because it's a new file.

This revision was automatically updated to reflect the committed changes.

I pushed https://github.com/alan-baker/llvm-project/commit/248f6480fecb04c343da06aee437d27d75b3a9f6 which converts tabs to spaces in fma.cl if that is helpful.

it was. thank you.
pushed