Page MenuHomePhabricator

libclc: spirv: Add various functions
ClosedPublic

Authored by daniels on Aug 13 2020, 8:29 AM.

Details

Summary

This commit adds more source files to libclc's SPIR-V backend, giving us more
functionality.

Diff Detail

Event Timeline

daniels created this revision.Aug 13 2020, 8:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 13 2020, 8:29 AM
daniels requested review of this revision.Aug 13 2020, 8:29 AM
jenatali accepted this revision.Aug 24 2020, 1:19 PM
jenatali added a subscriber: jenatali.

I think this is good to go without D85910. In fact, I think that as part of D85910, we should add implementations of __builtin_has_hw_fma32 to these new fma.cl files. That seems to be the most appropriate place to define the SPIR-V-specific control function for fma.

This revision is now accepted and ready to land.Aug 24 2020, 1:19 PM
daniels edited the summary of this revision. (Show Details)

Agree.

@tstellar Could you please land this one for me so we can fill out the functionality in the SPIR-V backend? With this and D85911 (which it would be good to also land please), we're passing CTS on the D3D backend, with Intel + Nouveau making really really good progress within Mesa as well. Thanks!

jvesely added a comment.EditedAug 31 2020, 8:09 AM

Why does this need a special implementation of fma.cl and fma.inc? The fp64 paths are missing, is there an expectation that something else will translate fma(double) function calls to llvm.fma opcodes?

EDIT: Looks to me that this should depend on D85910 and just reuse the generic implementation.

daniels added a comment.EditedAug 31 2020, 8:14 AM

Why does this need a special implementation of fma.cl and fma.inc? The fp64 paths are missing, is there an expectation that something else will translate fma(double) function calls to llvm.fma opcodes?

D85910 should explain it: if we have precise hardware fma (e.g. on NV hardware) we want to be able to pass through to the hardware opcode, but if not we need to fall back to the slow & huge (but precise) software implementation. We want to expose which is in use back into libclc, so e.g. sin/cos can generate better code when using the software variant.

Why does this need a special implementation of fma.cl and fma.inc? The fp64 paths are missing, is there an expectation that something else will translate fma(double) function calls to llvm.fma opcodes?

D85910 should explain it: if we have precise hardware fma (e.g. on NV hardware) we want to be able to pass through to the hardware opcode, but if not we need to fall back to the slow & huge (but precise) software implementation. We want to expose which is in use back into libclc, so e.g. sin/cos can generate better code when using the software variant.

To clarify further, we specifically need a way to access the 32-bit software fma. We could've just used __clc_sw_fma() directly, but that seemed like an inappropriate symbol to take a hard external dependency on. Instead, we expose only the overloads of the fma() function which have implementations, rather than mapping to intrinsics/opcodes. Since the SPIR-V conversion converts all calls to fma() into opcodes already, we only want to back-translate real code. Calls to LLVM intrinsics will fail to translate to SPIR-V.

I understand the mechanism you're aiming for. It' not clear whether the absence of fp64 path is intentional or not. afaik, all hw exposing fp64 also implements fma.fp64 in hw so generic clc implementation works.

resuing the generic clc fma.cl still works to expose the knob for fma.fp32 added in D85910

I understand the mechanism you're aiming for. It' not clear whether the absence of fp64 path is intentional or not. afaik, all hw exposing fp64 also implements fma.fp64 in hw so generic clc implementation works.

Agreed, I expect all fp64 hardware has a native fp64 fma. However, for hardware-native implementations of CL functionality, we don't need a libclc function which maps to an LLVM intrinsic, because the SPIRV-LLVM-Translator maps calls to fma() to a SPIR-V opcode -- regardless of whether that fma() has an implementation present in the module. In fact, we *can't* have functions which map to LLVM intrinsics, because the SPIRV-LLVM-Translator will *fail* to map that to SPIR-V at all -- it doesn't map to the SPIR-V opcode.

resuing the generic clc fma.cl still works to expose the knob for fma.fp32 added in D85910

It does not, because that results in code which calls the LLVM intrinsic, which fails to translate to SPIR-V.

I understand the mechanism you're aiming for. It' not clear whether the absence of fp64 path is intentional or not. afaik, all hw exposing fp64 also implements fma.fp64 in hw so generic clc implementation works.

Agreed, I expect all fp64 hardware has a native fp64 fma. However, for hardware-native implementations of CL functionality, we don't need a libclc function which maps to an LLVM intrinsic, because the SPIRV-LLVM-Translator maps calls to fma() to a SPIR-V opcode -- regardless of whether that fma() has an implementation present in the module. In fact, we *can't* have functions which map to LLVM intrinsics, because the SPIRV-LLVM-Translator will *fail* to map that to SPIR-V at all -- it doesn't map to the SPIR-V opcode.

resuing the generic clc fma.cl still works to expose the knob for fma.fp32 added in D85910

It does not, because that results in code which calls the LLVM intrinsic, which fails to translate to SPIR-V.

so the translator can translate calls to fma, but not call to llvm.fma? that's weird. Please add that in a comment. Otherwise LGTM.

so the translator can translate calls to fma, but not call to llvm.fma? that's weird. Please add that in a comment. Otherwise LGTM.

Right, it doesn't translate the LLVM intrinsics, only what's covered by the spec (see e.g. the OpenCL extension opcode definitions) as builtins, so it can be precise about what they mean and require.

Would you like the comment in fma.inc or in a commit message?

so the translator can translate calls to fma, but not call to llvm.fma? that's weird. Please add that in a comment. Otherwise LGTM.

Right, it doesn't translate the LLVM intrinsics, only what's covered by the spec (see e.g. the OpenCL extension opcode definitions) as builtins, so it can be precise about what they mean and require.

Would you like the comment in fma.inc or in a commit message?

I've thought about this. I think having a section on expected uses of all targets in README would be the best alternative. Feel free to go ahead with this patch as is. README.TXT is hopelessly out of date anyway so updating it should be done in a separate patch anyway.

I've thought about this. I think having a section on expected uses of all targets in README would be the best alternative. Feel free to go ahead with this patch as is. README.TXT is hopelessly out of date anyway so updating it should be done in a separate patch anyway.

I believe neither @daniels nor I have permission to go ahead on our own. If you're good with this, I think we'll need you to merge it for us.

I've thought about this. I think having a section on expected uses of all targets in README would be the best alternative. Feel free to go ahead with this patch as is. README.TXT is hopelessly out of date anyway so updating it should be done in a separate patch anyway.

I believe neither @daniels nor I have permission to go ahead on our own. If you're good with this, I think we'll need you to merge it for us.

Right @jvesely, I don't have any push access either. If either you or @tstellar could land this that would be great please. Thanks for the review!

@jvesely, would you be able to land this for us if you're happy with it?

@jvesely, would you be able to land this for us if you're happy with it?

yes, sorry for the delay. I thought this week would be less hectic than the last, but it's really not. I'll try to find some time tonight to push this.

This revision was landed with ongoing or failed builds.Sep 9 2020, 11:27 PM
Closed by commit rG060c8e083dd6: libclc/spirv: Add various functions (authored by daniels, committed by jvesely). · Explain Why
This revision was automatically updated to reflect the committed changes.