Page MenuHomePhabricator

jvesely (Jan Vesely)
User

Projects

User does not belong to any projects.

User Details

User Since
Sep 24 2014, 5:35 PM (313 w, 3 d)

Recent Activity

Tue, Sep 15

jvesely committed rG291bfff5dbb7: libclc: Add a __builtin to let SPIRV targets select between SW and HW FMA (authored by daniels).
libclc: Add a __builtin to let SPIRV targets select between SW and HW FMA
Tue, Sep 15, 10:39 PM
jvesely closed D85910: libclc: Add a __builtin to let SPIRV targets select between SW and HW FMA.
Tue, Sep 15, 10:39 PM · Restricted Project
jvesely accepted D85910: libclc: Add a __builtin to let SPIRV targets select between SW and HW FMA.
Tue, Sep 15, 9:55 AM · Restricted Project
jvesely added a comment to D81999: libclc: Fix rounding during type conversion.

I think these routines need to be rewritten and tested with CTS on platforms that use them.

Tue, Sep 15, 9:54 AM · Restricted Project

Thu, Sep 10

jvesely committed rG16ba78ee627c: libclc/spirv: Add missing files from D85911 (authored by jvesely).
libclc/spirv: Add missing files from D85911
Thu, Sep 10, 9:18 PM

Wed, Sep 9

jvesely committed rG060c8e083dd6: libclc/spirv: Add various functions (authored by daniels).
libclc/spirv: Add various functions
Wed, Sep 9, 11:27 PM
jvesely closed D85911: libclc: spirv: Add various functions.
Wed, Sep 9, 11:27 PM · Restricted Project
jvesely added a comment to D85911: libclc: spirv: Add various functions.

@jvesely, would you be able to land this for us if you're happy with it?

Wed, Sep 9, 2:39 PM · Restricted Project

Wed, Sep 2

jvesely added a comment to D85911: libclc: spirv: Add various functions.

so the translator can translate calls to fma, but not call to llvm.fma? that's weird. Please add that in a comment. Otherwise LGTM.

Right, it doesn't translate the LLVM intrinsics, only what's covered by the spec (see e.g. the OpenCL extension opcode definitions) as builtins, so it can be precise about what they mean and require.

Would you like the comment in fma.inc or in a commit message?

Wed, Sep 2, 6:02 PM · Restricted Project

Tue, Sep 1

jvesely added a comment to D85911: libclc: spirv: Add various functions.

I understand the mechanism you're aiming for. It' not clear whether the absence of fp64 path is intentional or not. afaik, all hw exposing fp64 also implements fma.fp64 in hw so generic clc implementation works.

Agreed, I expect all fp64 hardware has a native fp64 fma. However, for hardware-native implementations of CL functionality, we don't need a libclc function which maps to an LLVM intrinsic, because the SPIRV-LLVM-Translator maps calls to fma() to a SPIR-V opcode -- regardless of whether that fma() has an implementation present in the module. In fact, we *can't* have functions which map to LLVM intrinsics, because the SPIRV-LLVM-Translator will *fail* to map that to SPIR-V at all -- it doesn't map to the SPIR-V opcode.

resuing the generic clc fma.cl still works to expose the knob for fma.fp32 added in D85910

It does not, because that results in code which calls the LLVM intrinsic, which fails to translate to SPIR-V.

Tue, Sep 1, 10:55 AM · Restricted Project

Mon, Aug 31

jvesely added a comment to D85911: libclc: spirv: Add various functions.

I understand the mechanism you're aiming for. It' not clear whether the absence of fp64 path is intentional or not. afaik, all hw exposing fp64 also implements fma.fp64 in hw so generic clc implementation works.

Mon, Aug 31, 9:17 AM · Restricted Project
jvesely requested changes to D85910: libclc: Add a __builtin to let SPIRV targets select between SW and HW FMA.
Mon, Aug 31, 9:05 AM · Restricted Project
jvesely added inline comments to D85910: libclc: Add a __builtin to let SPIRV targets select between SW and HW FMA.
Mon, Aug 31, 8:13 AM · Restricted Project
jvesely added a comment to D85911: libclc: spirv: Add various functions.

Why does this need a special implementation of fma.cl and fma.inc? The fp64 paths are missing, is there an expectation that something else will translate fma(double) function calls to llvm.fma opcodes?

Mon, Aug 31, 8:09 AM · Restricted Project

Aug 16 2020

jvesely accepted D84392: libclc: add printf support on amd target.

Thanks. This LGTM, but you might want to get @tstellar's opinion as well.

Aug 16 2020, 9:35 AM

Aug 9 2020

jvesely added a comment to D84392: libclc: add printf support on amd target.

Have you checked that returning NULL is handled properly?
Should the buffer wrap around instead?

Aug 9 2020, 10:29 AM

Jul 24 2020

jvesely added a comment to D84392: libclc: add printf support on amd target.

Does the printf expansion work for all amdgpu triples, including amdgcn--mesa3d?
will it work for non-ROC capable GPUs?

Nothing in the compiler actually depends on roc capable GPUs. The limitations are all runtime/driver related

The exception to this would be places in the libraries that assume flat instructions are available. Theoretically we could handle these in the backend for SI with tagged pointers or something

so there are no hidden assumptions; like the printf buffer being cache coherent with host CPU, or the implementation using s_sendmsg to trigger CPU draining the buffer before the kernel finishes execution?

The implementation use a regular device buffer that is read at the end of the execution. Mostly like a clEnqueueReadBuffer

Jul 24 2020, 7:04 PM

Jul 23 2020

jvesely added a comment to D84392: libclc: add printf support on amd target.

Does the printf expansion work for all amdgpu triples, including amdgcn--mesa3d?
will it work for non-ROC capable GPUs?

Nothing in the compiler actually depends on roc capable GPUs. The limitations are all runtime/driver related

The exception to this would be places in the libraries that assume flat instructions are available. Theoretically we could handle these in the backend for SI with tagged pointers or something

Jul 23 2020, 3:55 PM
jvesely added a comment to D84392: libclc: add printf support on amd target.

Does the printf expansion work for all amdgpu triples, including amdgcn--mesa3d?
will it work for non-ROC capable GPUs?

Jul 23 2020, 2:16 PM
jvesely added a comment to D84392: libclc: add printf support on amd target.

I don't think there are many benefits in merging this ahead of time without the rest of printf implementation so that it can be tested.

Jul 23 2020, 12:49 PM

Jul 15 2020

jvesely accepted D83473: libclc: Fix FP_ILOGBNAN definition.
Jul 15 2020, 5:15 AM · Restricted Project

Jul 14 2020

jvesely added a comment to D83473: libclc: Fix FP_ILOGBNAN definition.

Since OpenCL allows apps to provide precompiled intermediate representations to the API, in the form of SPIR or SPIR-V, that means that the app could have embedded references to FP_ILOGBNAN, which are just a constant value since it's a #define, in their own code. They could write a kernel which calls ilogb and compares the result to FP_ILOGBNAN, and compile that with either clang trunk (which uses opencl-c.h automatically), or with Khronos's offline compiler (https://github.com/KhronosGroup/SPIR) using Khronos's standard library headers (https://github.com/KhronosGroup/libclcxx). Both of these standard library implementation headers define FP_ILOGBNAN to be INT_MAX:

I've no problem with the compatibility argument. but using INT_MAX for both NaN and Inf is in my understanding wrong.

Again, I didn't see this clearly stated in the spec. Can you quote the spec or point us to the relevant section?

Jul 14 2020, 11:24 AM · Restricted Project

Jul 10 2020

jvesely added a comment to D83473: libclc: Fix FP_ILOGBNAN definition.

tldr; I don't mind changing this to be in sync with clang, just get the patch right.
However, it's not the right way to guarantee IR level compatibility between clang-cl-headers and libclc.
details below

Jul 10 2020, 8:30 PM · Restricted Project
jvesely added a comment to D83473: libclc: Fix FP_ILOGBNAN definition.

What is the problem this patch is trying to address?

Well, the primary goal was to have consistent values in clang/lib/Headers/opencl-c-base.h and libclc/generic/include/clc/float/definitions.h to avoid the mess when one links against libclc but includes opencl-c-base.h.
Not to mention that having 2 conflicting definitions in headers that both lives in the same code base and are supposed to represent the same thing is confusing, to say the least.

Jul 10 2020, 11:37 AM · Restricted Project

Jul 9 2020

jvesely added a comment to D83473: libclc: Fix FP_ILOGBNAN definition.

What is the problem this patch is trying to address?
The specs do not mandate these two values to be different.

Jul 9 2020, 8:27 PM · Restricted Project

Jun 16 2020

jvesely accepted D77589: libclc: Add Mesa/SPIR-V target.

thanks!

Jun 16 2020, 2:18 PM · Restricted Project

Jun 4 2020

jvesely added inline comments to D77589: libclc: Add Mesa/SPIR-V target.
Jun 4 2020, 8:14 AM · Restricted Project

Apr 29 2020

jvesely added a comment to D78979: OpenCL: Include builtin header by default.

I'm also wondering if using -nogpulib for the corresponding linker purpose was correct, since in the OpenCL case it's not really an offload target. Maybe this should switch to -nostdlib?

-nogpulib is fine since opencl compiler is in parallel with the device compiler of CUDA/HIP. The library it uses is the device library.

Apr 29 2020, 4:46 PM
jvesely added a comment to D78535: libclc: Pass system libraries to the linker after llvm libraries.

This is the failing cmdline (from here https://github.com/tstellar/llvm-project/runs/587088911):

/usr/bin/c++ CMakeFiles/prepare_builtins.dir/utils/prepare-builtins.cpp.o -o prepare_builtins -L/home/runner/work/llvm-project/llvm-project/build/lib -Wl,-rpath,/home/runner/work/llvm-project/llvm-project/build/lib -lz -lrt -ldl -ltinfo -lpthread -lm -lxml2 -lLLVMBitWriter -lLLVMAnalysis -lLLVMProfileData -lLLVMObject -lLLVMTextAPI -lLLVMMCParser -lLLVMMC -lLLVMDebugInfoCodeView -lLLVMDebugInfoMSF -lLLVMBitReader -lLLVMCore -lLLVMRemarks -lLLVMBitstreamReader -lLLVMBinaryFormat -lLLVMSupport -lLLVMDemangle

Apr 29 2020, 3:06 PM · Restricted Project

Apr 23 2020

jvesely added a comment to D78535: libclc: Pass system libraries to the linker after llvm libraries.

was this a problem with using -Wl,--as-needed?

LGTM.

No, I was hitting this when using the default build options for both LLVM and libclc.

Apr 23 2020, 5:25 PM · Restricted Project

Apr 22 2020

jvesely accepted D78535: libclc: Pass system libraries to the linker after llvm libraries.

was this a problem with using -Wl,--as-needed?

Apr 22 2020, 4:52 PM · Restricted Project

Apr 16 2020

jvesely added a comment to D78220: IR/ConstantRange: Add support for FP ranges.

Unless we use pointers and an extra level of indirection the end sizes for users like ValueLattice end up the same irrespective of where the union is (lower+upper vs. backend vs. user).

I'd expect that this is indeed what we want to do. Given that the float range is just so much larger, we'll want to allocate it out of line.

Apr 16 2020, 10:42 PM · Restricted Project
jvesely added a comment to D77923: OpenCL: Fix some missing predefined macros.

LGTM. The objective of the change is to simplify offline compilation. We want to avoid asking users to add -D if a proper macro can be inferred from triple and cpu. Ideally, users only need to specify triple and cpu with clang driver. In case users want to override the default macros for the triple, they can always override them by -D or -U in command line.

Apr 16 2020, 3:37 PM · Restricted Project
jvesely added a comment to D78220: IR/ConstantRange: Add support for FP ranges.

I would strongly recommend to implement this as a separate type (FloatRange for example), rather than combining it into ConstantRange. There's a couple reasons for that, the main ones would be a) float ranges have significantly different semantics, in particular the fact that they are inclusive and b) ConstantRange is a size critical structure, because it is used in value lattice elements. As implement, this change is going to skyrocket memory consumption during SCCP.

+1, I think there is a strong case for not increasing the size here. We should rather try to reduce the size of the existing ConstantRange (using the 2 APInts means we store the same bit width twice :()

The point of having one combined range is to allow simplification of logic for users. you don't have to check for the type just do the same operations union/intersect/binop/contains/...
I'm not sure how splitting it to a separate class would address the memory concerns. Tracking fp ranges will require two APFloats irrespective of the structure.
If keeping both APFloat and APint bounds is a problem I can put them in a union since they are used exclusively and an instance never switches between float and int.

Even with a union, you still need extra space for the bits to distinguish between FP/integers (and other FP related flags, like canBeNaN. It should be possible to achieve the same level of convenience to this patch, but separating integer and FP ranges into separate classes. E.g. add ConstantRangeIntStorage & ConstantRangeFloatStorage which use APInt/APFloat respectively. The shared logic can be implemented in a ConstantRangeCommon template, which can be instantiated with either ConstantRangeIntStorage/ConstantRangeFPStorage. ConstantRange would turn into an instantiation of ConstantRangeCommon<ConstantRangeIntStorage>. Of course that would require updating code to make use of the FP version of ConstantRange. That might be a benefit actually, because there probably are various places that assume ConstantRange only contains integers.

When using the FP version of the constant range in ValueLattice, that would allow us to keep the size the same, if no extra bits (like canBeNaN) are added.

Apr 16 2020, 12:14 PM · Restricted Project

Apr 15 2020

jvesely added a comment to D78224: ValueLattice, LVI, SCCP: Use floating point constant ranges.

Very interesting! On a high-level, it would probably be good to split those changes into 3 (ValueLattice, LVI & SCCP); not sure how tricky that's going to be given adding automatic support in markConstant & co. If it's too tricky splitting LVI & SCCP would be great.

Apr 15 2020, 2:55 PM · Restricted Project
jvesely added a comment to D78220: IR/ConstantRange: Add support for FP ranges.

I would strongly recommend to implement this as a separate type (FloatRange for example), rather than combining it into ConstantRange. There's a couple reasons for that, the main ones would be a) float ranges have significantly different semantics, in particular the fact that they are inclusive and b) ConstantRange is a size critical structure, because it is used in value lattice elements. As implement, this change is going to skyrocket memory consumption during SCCP.

The point of having one combined range is to allow simplification of logic for users. you don't have to check for the type just do the same operations union/intersect/binop/contains/...
I'm not sure how splitting it to a separate class would address the memory concerns. Tracking fp ranges will require two APFloats irrespective of the structure.
If keeping both APFloat and APint bounds is a problem I can put them in a union since they are used exclusively and an instance never switches between float and int.
The difference in semantics shouldn't impact users much. they should use contains/isEmpty/isFull/isSingleElement/getABCrange instead of checking the bounds manually.
The application to valluelattice/lvi/sccp is in D78224, to keep diff simple it hasn't implemented cleanups that are allowed by having one range class, yet.

Apr 15 2020, 2:55 PM · Restricted Project
jvesely added a comment to D78224: ValueLattice, LVI, SCCP: Use floating point constant ranges.

There were few issues in SCCP tests.

  • apint-load -- the result of the division operation of the original operands depends on rounding mode. it's represented as a range rather than a constant.
  • apint-ipsccp4 -- we can now deduce the starting range of v1 in ssa_copy intrinsic as [-Inf, 2.999] which then gets merged with more accurate information from the function call. The original behaviour started with unknown which then gets combined with more accurate information. Thus having less information at the beginning gives a better result. changing the comparison condition check to only allow Icmp works around the issue and keeps the old behaviour.
  • float-nan-simplification -- NaNs with different payloads/sign are not tracked separately, so one case returns +NaN while the other one returns -NaN. I've adjusted the comment in the test to explain the situation.
Apr 15 2020, 12:03 PM · Restricted Project
jvesely created D78224: ValueLattice, LVI, SCCP: Use floating point constant ranges.
Apr 15 2020, 11:30 AM · Restricted Project
jvesely created D78222: Transform/Float2Int: Don't rely on hack support for floating point type in ConstantRange.
Apr 15 2020, 11:30 AM · Restricted Project
jvesely created D78220: IR/ConstantRange: Add support for FP ranges.
Apr 15 2020, 11:30 AM · Restricted Project

Apr 14 2020

jvesely added a comment to D77923: OpenCL: Fix some missing predefined macros.

OPENCL_VERSION is something that should be really set by the opencl driver rather than the compiler.
coarse-grained SVM can be done without FEATURE_FLAT_ADDRESS_SPACE, so those gpus can get over 1.2, while the compiler can be used in an implementation that doesn't go above 1.2 even for gfx700+

I don't know why this would imply we can't just have a static __OPENCL_VERSION__ for each device in Clang? I don't see anything in the standard that says you are not allowed to have (going with your example) __OPENCL_VERSION__=200 and __OPENCL_C_VERSION=120 at the same time.

It seems like we should be able to just have __OPENCL_VERSION__ defined in Clang, and let the runtime control setting __OPENCL_C_VERSION__ via -cl-std=. If the current patch sets __OPENCL_VERSION__ differently than the runtime does then it seems like we should make it match, but I'm not sure how best to confirm that.

The __OPENCL_VERSION__ macro in the kernel needs to match the version returned by clGetDeviceInfo (both refer to OpenCL version supported by the device),
which in turn must be <= version returned by clGetPlatformInfo.
This part makes it unsuitable for hardcoding in the compiler.

__OPENCL_VERSION__ should indicate device capabilities rather than language features. specs talk about available resources.
For example, CL_DEVICE_IMAGE2D_MAX_HEIGHT is required to be >= 8192 for OpenCL 1.2 while it's required to be >=16384 for OpenCL 2.1.

But why would the driver not report the device values which we do know in the compiler? A discrepancy between these sounds like a driver bug.

Apr 14 2020, 1:30 PM · Restricted Project
jvesely committed rG24fad7278a39: libclc: Use temporary files rather than a pipe (authored by daniels).
libclc: Use temporary files rather than a pipe
Apr 14 2020, 7:28 AM
jvesely committed rGcccdd0579b50: libclc: Don't pass linker flags to CLC/LLAsm (authored by daniels).
libclc: Don't pass linker flags to CLC/LLAsm
Apr 14 2020, 7:28 AM
jvesely closed D77165: libclc: Use temporary files rather than a pipe.
Apr 14 2020, 7:28 AM · Restricted Project
jvesely committed rGacf079006e6d: libclc: Use echo rather than true for try_compile (authored by daniels).
libclc: Use echo rather than true for try_compile
Apr 14 2020, 7:28 AM
jvesely closed D77164: libclc: Don't pass linker flags to CLC/LLAsm.
Apr 14 2020, 7:28 AM · Restricted Project
jvesely committed rGe6bb1d69eccc: libclc: Fix LLVM library linking on Windows (authored by daniels).
libclc: Fix LLVM library linking on Windows
Apr 14 2020, 7:28 AM
jvesely closed D77163: libclc: Use echo rather than true for try_compile.
Apr 14 2020, 7:28 AM · Restricted Project
jvesely closed D77162: libclc: Fix LLVM library linking on Windows.
Apr 14 2020, 7:28 AM · Restricted Project

Apr 13 2020

jvesely added a comment to D77923: OpenCL: Fix some missing predefined macros.

OPENCL_VERSION is something that should be really set by the opencl driver rather than the compiler.
coarse-grained SVM can be done without FEATURE_FLAT_ADDRESS_SPACE, so those gpus can get over 1.2, while the compiler can be used in an implementation that doesn't go above 1.2 even for gfx700+

I don't know why this would imply we can't just have a static __OPENCL_VERSION__ for each device in Clang? I don't see anything in the standard that says you are not allowed to have (going with your example) __OPENCL_VERSION__=200 and __OPENCL_C_VERSION=120 at the same time.

It seems like we should be able to just have __OPENCL_VERSION__ defined in Clang, and let the runtime control setting __OPENCL_C_VERSION__ via -cl-std=. If the current patch sets __OPENCL_VERSION__ differently than the runtime does then it seems like we should make it match, but I'm not sure how best to confirm that.

Apr 13 2020, 4:52 PM · Restricted Project

Apr 11 2020

jvesely added a comment to D77923: OpenCL: Fix some missing predefined macros.

OPENCL_VERSION is something that should be really set by the opencl driver rather than the compiler.
coarse-grained SVM can be done without FEATURE_FLAT_ADDRESS_SPACE, so those gpus can get over 1.2, while the compiler can be used in an implementation that doesn't go above 1.2 even for gfx700+

Apr 11 2020, 9:52 PM · Restricted Project

Apr 10 2020

jvesely accepted D77165: libclc: Use temporary files rather than a pipe.

I've checked that custom targets are not cleaned up either, so I think it's good to go.
lgtm

Apr 10 2020, 12:30 AM · Restricted Project

Apr 6 2020

jvesely accepted D77164: libclc: Don't pass linker flags to CLC/LLAsm.

Have you tried clearing the target link flags using set_target_properties( ... LINK_FLAGS ... )?
I think having the ability to pass LINK_FLAGS here might be useful in the future.

I tried, but the manual says that a) you must use STATIC_LIBRARY_FLAGS for static libraries, and b) you must use STATIC_LIBRARY_OPTIONS in preference to the former, as it sets rather than appends.

Nevertheless, neither of them actually worked for me, and both continued to pass the machine linker flags.

Apr 6 2020, 7:38 PM · Restricted Project
jvesely accepted D77162: libclc: Fix LLVM library linking on Windows.

LGTM. thanks

Apr 6 2020, 7:38 PM · Restricted Project

Apr 3 2020

jvesely added a comment to D77165: libclc: Use temporary files rather than a pipe.

What was the generator target you used?

I'm using Ninja [...]

As noted in D17762, we have to use Ninja rather than the VS backend, as the 'standard' way to add new languages for compile in CMake doesn't work with the VS backend. CMake's VS backend has a hardcoded-in-C++-code list of acceptable source languages; when you try to produce a static library with sources in an unknown-to-CMake-VS-backend language, the target ends up completely empty with no actions to produce the output, because the CMake backend doesn't know how.

Apr 3 2020, 12:25 PM · Restricted Project
jvesely added a comment to D77165: libclc: Use temporary files rather than a pipe.

Windows does have pipes. they work both in cmd and powershell, but perhaps cmake is not using shell to invoke these commands?
What was the generator target you used?

Apr 3 2020, 6:57 AM · Restricted Project

Apr 2 2020

jvesely added a comment to D77162: libclc: Fix LLVM library linking on Windows.

I don't see what the second part is trying to address. Does visual studio not detect the static libraries on the command line? I presume the dynamic versions are not present.
If anything it should check llvm-config --shared-mode if available instead of checking for win32. Do all visual studio/clang/mingw use the same flag?

Yes, clang runs as clang-cl which accepts MSVC-type arguments, though you can also invoke clang to pass UNIX-style arguments. For the linker, however, there is no option; lld-link only accepts MSVC-type arguments on Windows and nothing else. I don't know what MinGW does. Is there an easy way to test that? Do you currently build on MinGW?

I could test llvm-config --shared-mode == static && WIN32 to apply /MT, as I tried various other methods of convincing Windows to link statically and not dynamically with no luck. On UNIX this isn't an issue as you can happily mix the two modes.

Apr 2 2020, 4:17 PM · Restricted Project
jvesely accepted D77163: libclc: Use echo rather than true for try_compile.

This will add noise to CMakeOutput.log and CMakeError.log, but I guess it's just one line.

Apr 2 2020, 3:44 PM · Restricted Project
jvesely added a comment to D77164: libclc: Don't pass linker flags to CLC/LLAsm.

Have you tried clearing the target link flags using set_target_properties( ... LINK_FLAGS ... )?
I think having the ability to pass LINK_FLAGS here might be useful in the future.

Apr 2 2020, 3:44 PM · Restricted Project

Apr 1 2020

jvesely added a comment to D77162: libclc: Fix LLVM library linking on Windows.

I think the first part should use separate_arguments the same way as LLVM_CXXFLAGS does. (you can also add the following change while at it:

-set( LLVM_CXX_FLAGS ${LLVM_CXX_FLAGS} -fno-rtti -fno-exceptions )
+list( APPEND LLVM_CXX_FLAGS -fno-rtti -fno-exceptions )
Apr 1 2020, 7:41 AM · Restricted Project

Feb 25 2020

jvesely committed rG814fb658ca26: libclc: cmake configure should depend on file list (authored by jvesely).
libclc: cmake configure should depend on file list
Feb 25 2020, 1:48 AM
jvesely closed D74662: libclc: cmake configure should depend on file lists.
Feb 25 2020, 1:48 AM · Restricted Project

Feb 21 2020

jvesely added a comment to D74662: libclc: cmake configure should depend on file lists.

@tstellar , is there a way to make libclc ML an automatic subscriber to phabricator posted libclc patches?

I think you need to ask the phabricator mantainers: https://llvm.org/docs/Phabricator.html#status

Feb 21 2020, 10:14 AM · Restricted Project

Feb 20 2020

jvesely committed rGefeafa1bdaa7: libclc: Use acos implementation from amd_builtins (authored by jvesely).
libclc: Use acos implementation from amd_builtins
Feb 20 2020, 9:06 PM
jvesely closed D74011: libclc: Use acos implementation from amd_builtins.
Feb 20 2020, 9:06 PM · Restricted Project

Feb 18 2020

jvesely added a comment to D74011: libclc: Use acos implementation from amd_builtins.

ping

Feb 18 2020, 11:01 PM · Restricted Project

Feb 14 2020

jvesely added a comment to D74662: libclc: cmake configure should depend on file lists.

@tstellar , is there a way to make libclc ML an automatic subscriber to phabricator posted libclc patches?

Feb 14 2020, 8:07 PM · Restricted Project
jvesely created D74662: libclc: cmake configure should depend on file lists.
Feb 14 2020, 7:39 PM · Restricted Project

Feb 10 2020

jvesely added a comment to D74323: AMDGPU: Move R600 test compatability hack.

Is there any test left that tests the r600 intrinsics? If yes, LGTM.

Feb 10 2020, 7:57 AM · Restricted Project

Feb 9 2020

jvesely committed rG85e2fa44c64e: libclc/r600: Use target specific builtins to implement rsqrt and native_rsqrt (authored by jvesely).
libclc/r600: Use target specific builtins to implement rsqrt and native_rsqrt
Feb 9 2020, 11:47 AM
jvesely closed D74016: libclc/r600: Use target specific builtins to implement rsqrt and native_rsqrt.
Feb 9 2020, 11:47 AM · Restricted Project
jvesely committed rG4b23a2e8e971: libclc: Move rsqrt implementation to a .cl file (authored by jvesely).
libclc: Move rsqrt implementation to a .cl file
Feb 9 2020, 11:47 AM
jvesely closed D74013: libclc: Move rsqrt implementation to a .cl file.
Feb 9 2020, 11:47 AM · Restricted Project

Feb 8 2020

jvesely added a comment to D74016: libclc/r600: Use target specific builtins to implement rsqrt and native_rsqrt.

Looks good to me from a code standpoint.

The test fails CTS before and after on my BARTS (6850), but that's when running libclc master with LLVM 9.0, so there might be a precision issue that's fixed in newer code. It doesn't make the issue worse, at least.

Feb 8 2020, 2:38 PM · Restricted Project

Feb 4 2020

jvesely committed rGe6686adf8a74: AMDGPU/EG,CM: Implement fsqrt using recip(rsqrt(x)) instead of x * rsqrt(x) (authored by jvesely).
AMDGPU/EG,CM: Implement fsqrt using recip(rsqrt(x)) instead of x * rsqrt(x)
Feb 4 2020, 9:32 PM
jvesely closed D74017: AMDGPU/EG,CM: Implement fsqrt using recip(rsqrt(x)) instead of x * rsqrt(x).
Feb 4 2020, 9:32 PM · Restricted Project
jvesely closed D73765: libclc: Use amd builtin for asin implementation.
Feb 4 2020, 6:21 PM · Restricted Project
jvesely accepted D73765: libclc: Use amd builtin for asin implementation.

reviewd on libclc ML

Feb 4 2020, 6:21 PM · Restricted Project
jvesely created D74017: AMDGPU/EG,CM: Implement fsqrt using recip(rsqrt(x)) instead of x * rsqrt(x).
Feb 4 2020, 6:12 PM · Restricted Project
jvesely created D74016: libclc/r600: Use target specific builtins to implement rsqrt and native_rsqrt.
Feb 4 2020, 6:12 PM · Restricted Project
jvesely added a comment to D74013: libclc: Move rsqrt implementation to a .cl file.

This will be used in a follow-up patch to specialize r600.

Feb 4 2020, 6:03 PM · Restricted Project
jvesely created D74013: libclc: Move rsqrt implementation to a .cl file.
Feb 4 2020, 6:03 PM · Restricted Project
jvesely created D74011: libclc: Use acos implementation from amd_builtins.
Feb 4 2020, 6:03 PM · Restricted Project
jvesely added a comment to D74011: libclc: Use acos implementation from amd_builtins.

similar to @awatry 's asin patch. Improves accuracy and fixes issues with r600.

Feb 4 2020, 6:03 PM · Restricted Project
jvesely committed rG64a8e1b83e14: libclc/asin: Switch to amd builtins version of asin (authored by awatry).
libclc/asin: Switch to amd builtins version of asin
Feb 4 2020, 11:36 AM

Jan 23 2020

jvesely added a comment to D72991: AMDGPU/R600: Emit rodata in text segment.
Jan 23 2020, 8:19 AM · Restricted Project

Jan 22 2020

jvesely committed rG1b8eab179db4: AMDGPU/R600: Emit rodata in text segment (authored by jvesely).
AMDGPU/R600: Emit rodata in text segment
Jan 22 2020, 11:38 AM
jvesely added a reverting change for D61491: AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0: rG1b8eab179db4: AMDGPU/R600: Emit rodata in text segment.
Jan 22 2020, 11:37 AM · Restricted Project
jvesely closed D72991: AMDGPU/R600: Emit rodata in text segment.
Jan 22 2020, 11:37 AM · Restricted Project
jvesely updated the diff for D72991: AMDGPU/R600: Emit rodata in text segment.

Add test

Jan 22 2020, 10:33 AM · Restricted Project

Jan 18 2020

jvesely created D72991: AMDGPU/R600: Emit rodata in text segment.
Jan 18 2020, 9:39 PM · Restricted Project
jvesely added a reverting change for D61491: AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0: D72991: AMDGPU/R600: Emit rodata in text segment.
Jan 18 2020, 9:39 PM · Restricted Project

Nov 8 2019

jvesely committed rG00eca0bf0bd9: libclc: Drop travis (authored by jvesely).
libclc: Drop travis
Nov 8 2019, 7:13 AM
jvesely committed rGd6e0e6d255a7: libclc: Drop the old python based build system (authored by jvesely).
libclc: Drop the old python based build system
Nov 8 2019, 7:13 AM
jvesely closed D69966: libclc: Drop the old python based build system.
Nov 8 2019, 7:13 AM · Restricted Project
jvesely closed D69965: libclc: Drop travis.
Nov 8 2019, 7:13 AM · Restricted Project

Nov 7 2019

jvesely added a comment to D69965: libclc: Drop travis.

I've set up an appveyor build to replace this[0]. The added benefit is that the appveyor stores built libraries as artifacts that can be downloaded.
The downside is that I wasn't able to restrict builds to libclc changes so it's only enabled for my fork of llvm-project.

Nov 7 2019, 1:47 PM · Restricted Project
jvesely created D69966: libclc: Drop the old python based build system.
Nov 7 2019, 1:37 PM · Restricted Project
jvesely created D69965: libclc: Drop travis.
Nov 7 2019, 1:37 PM · Restricted Project

Oct 16 2019

jvesely committed rL374978: Request commit acess for jvesely.
Request commit acess for jvesely
Oct 16 2019, 12:35 AM