User Details
- User Since
- Sep 24 2014, 5:35 PM (443 w, 1 d)
Sep 2 2022
This patch breaks formatting. If there's a spaces after opening "(" there should be one before closing ")"
Feb 27 2022
Jan 12 2022
LGTM.
Jan 8 2022
Was this tested on an existing device, or is it just for completion (since there already is clspv target)?
Apr 27 2021
ping, @tstellar any concerns about this one?
Apr 21 2021
ping
Apr 8 2021
Apr 6 2021
Apr 2 2021
Mar 3 2021
Mar 2 2021
it took a while to test both versions. this patch fixes the following conversion tests:
*** 622) convert_floatn_rtp( uintn ) FAILED ** *** 627) convert_floatn_rtp( intn ) FAILED ** *** 639) convert_floatn_rtz( doublen ) FAILED ** *** 649) convert_floatn_rtz( longn ) FAILED ** *** 689) convert_doublen_rtz( doublen ) FAILED **
The number of failed cases goes down from 42 to 37
Feb 28 2021
Feb 24 2021
thanks
Feb 23 2021
while the fix is clearly correct, it'd be nice to do a test run and report changes.
I can run it on my carrizo machine, but the conversion tests take more than a day.
Feb 17 2021
thanks. can you fix the "to avoid to use" part of the description? the "to avoid" part sounds redundant.
It explains why fma needs to be different, but it could use an explanation why it needs sw implementation of nextafter.
Feb 16 2021
Feb 14 2021
The FMA does pass conformance. I don't have performance numbers for the 64-bit version vs the uint2 version. It would be worse, but the whole software FMA is already awful performance. The 64-bit version is simpler to understand, so unless you have the same restrictions as I need for clspv, then I wouldn't adopt it.
Hopefully, the different target name will clarify the intent better. This is targeted for use with a cross compiler to Vulkan SPIR-V. The needs are different than the spir targets.
Feb 10 2021
It'd be nice to add an explanation why this can't reuse the existing SPIR-V targets and needs to add a new one.
Does the new FMA implementation generate significantly worse code (EG/NI + mesa-spirv should be the only users) than the original? Should it just be replaced? I assume it passes the conformance suite.
Oct 16 2020
Oct 15 2020
cl_khr_byte_addressable_stores changes language semantics. without it, pointer dereferences of types smaller than 32 bits are illegal.
Even if all clang targets support this the macro should still be defined for backward compatibility.
Oct 1 2020
LGTM
Sep 30 2020
I think this should use FindPython3 and drop the shebang from gen_convert.py to fix the 2 vs. 3 issues as well.
Sep 15 2020
I think these routines need to be rewritten and tested with CTS on platforms that use them.
Sep 10 2020
Sep 9 2020
Sep 2 2020
Sep 1 2020
Aug 31 2020
I understand the mechanism you're aiming for. It' not clear whether the absence of fp64 path is intentional or not. afaik, all hw exposing fp64 also implements fma.fp64 in hw so generic clc implementation works.
Why does this need a special implementation of fma.cl and fma.inc? The fp64 paths are missing, is there an expectation that something else will translate fma(double) function calls to llvm.fma opcodes?
Aug 16 2020
Thanks. This LGTM, but you might want to get @tstellar's opinion as well.
Aug 9 2020
Have you checked that returning NULL is handled properly?
Should the buffer wrap around instead?
Jul 24 2020
The implementation use a regular device buffer that is read at the end of the execution. Mostly like a clEnqueueReadBuffer
Jul 23 2020
Does the printf expansion work for all amdgpu triples, including amdgcn--mesa3d?
will it work for non-ROC capable GPUs?
I don't think there are many benefits in merging this ahead of time without the rest of printf implementation so that it can be tested.
Jul 15 2020
Jul 14 2020
Jul 10 2020
tldr; I don't mind changing this to be in sync with clang, just get the patch right.
However, it's not the right way to guarantee IR level compatibility between clang-cl-headers and libclc.
details below
Jul 9 2020
What is the problem this patch is trying to address?
The specs do not mandate these two values to be different.
Jun 16 2020
thanks!
Jun 4 2020
Apr 29 2020
Apr 23 2020
Apr 22 2020
was this a problem with using -Wl,--as-needed?
Apr 16 2020
Apr 15 2020
The point of having one combined range is to allow simplification of logic for users. you don't have to check for the type just do the same operations union/intersect/binop/contains/...
I'm not sure how splitting it to a separate class would address the memory concerns. Tracking fp ranges will require two APFloats irrespective of the structure.
If keeping both APFloat and APint bounds is a problem I can put them in a union since they are used exclusively and an instance never switches between float and int.
The difference in semantics shouldn't impact users much. they should use contains/isEmpty/isFull/isSingleElement/getABCrange instead of checking the bounds manually.
The application to valluelattice/lvi/sccp is in D78224, to keep diff simple it hasn't implemented cleanups that are allowed by having one range class, yet.
There were few issues in SCCP tests.
- apint-load -- the result of the division operation of the original operands depends on rounding mode. it's represented as a range rather than a constant.
- apint-ipsccp4 -- we can now deduce the starting range of v1 in ssa_copy intrinsic as [-Inf, 2.999] which then gets merged with more accurate information from the function call. The original behaviour started with unknown which then gets combined with more accurate information. Thus having less information at the beginning gives a better result. changing the comparison condition check to only allow Icmp works around the issue and keeps the old behaviour.
- float-nan-simplification -- NaNs with different payloads/sign are not tracked separately, so one case returns +NaN while the other one returns -NaN. I've adjusted the comment in the test to explain the situation.
Apr 14 2020
Apr 13 2020
I don't know why this would imply we can't just have a static __OPENCL_VERSION__ for each device in Clang? I don't see anything in the standard that says you are not allowed to have (going with your example) __OPENCL_VERSION__=200 and __OPENCL_C_VERSION=120 at the same time.
It seems like we should be able to just have __OPENCL_VERSION__ defined in Clang, and let the runtime control setting __OPENCL_C_VERSION__ via -cl-std=. If the current patch sets __OPENCL_VERSION__ differently than the runtime does then it seems like we should make it match, but I'm not sure how best to confirm that.
Apr 11 2020
OPENCL_VERSION is something that should be really set by the opencl driver rather than the compiler.
coarse-grained SVM can be done without FEATURE_FLAT_ADDRESS_SPACE, so those gpus can get over 1.2, while the compiler can be used in an implementation that doesn't go above 1.2 even for gfx700+
Apr 10 2020
I've checked that custom targets are not cleaned up either, so I think it's good to go.
lgtm
Apr 6 2020
LGTM. thanks
Apr 3 2020
Windows does have pipes. they work both in cmd and powershell, but perhaps cmake is not using shell to invoke these commands?
What was the generator target you used?
Apr 2 2020
This will add noise to CMakeOutput.log and CMakeError.log, but I guess it's just one line.
Have you tried clearing the target link flags using set_target_properties( ... LINK_FLAGS ... )?
I think having the ability to pass LINK_FLAGS here might be useful in the future.
Apr 1 2020
I think the first part should use separate_arguments the same way as LLVM_CXXFLAGS does. (you can also add the following change while at it:
-set( LLVM_CXX_FLAGS ${LLVM_CXX_FLAGS} -fno-rtti -fno-exceptions ) +list( APPEND LLVM_CXX_FLAGS -fno-rtti -fno-exceptions )
Feb 25 2020
Feb 21 2020
Feb 20 2020
Feb 18 2020
ping
Feb 14 2020
@tstellar , is there a way to make libclc ML an automatic subscriber to phabricator posted libclc patches?
Feb 10 2020
Is there any test left that tests the r600 intrinsics? If yes, LGTM.