- User Since
- Oct 12 2019, 11:44 AM (78 w, 2 d)
Fri, Apr 9
Tue, Mar 23
Sun, Mar 21
Sat, Mar 20
Thu, Mar 18
I appreciate your report. Seriously. However, no one would like to tell us how to reproduce the bug. Even now this patch has already been merged, I still didn't get any reproducer (in any form) from whom reported the issue at the very beginning. I can get that we're approaching release, and we want a stable product. However, if nobody provides steps to reproduce bugs, and just asks to revert patch, we will probably NEVER have new features.
I did some experiments with different versions of lbm:
- SPEC2021: didn't observe performance regression (three variants: with hht, with hht but disable it via env, and w/o hht)
- Ron's reproducer: observed performance regression if running with numactl --localalloc --physcpubind=0-xxx. In this case, disabling it via env can help. If running w/o numactl, almost no performance difference. (unclear the tiny difference is noise or not)
All run 10 times.
I didn't recall we have that conclusion. My memory told me the patch will be reverted if we can't fix issues before the release. No?
Wed, Mar 17
avoid potential integer overflow
Try to fix the crash in D98838
I find a stable way to reproduce the assertion. Let's say the default __kmp_threads_capacity is N. If hidden helper thread is enabled, __kmp_threads_capacity will be offset to N+8 by default. If the number of threads we need exceeds N+8, e.g. via num_threads clause, we need to expand __kmp_threads. In __kmp_expand_threads, the expansion starts from __kmp_threads_capacity, and repeatedly doubling it until the new capacity meets the requirement. Let's assume the new requirement is Y. If Y happens to meet the constraint (N+8)*2^X=Y where X is the number of iterations, then the new capacity is not enough because we have 8 slots for hidden helper threads.
Tue, Mar 16
Also, the reproducer doesn't need to be a small piece of code. It can be steps to reproduce it as long as I can access the source code.
Again, it doesn't help if we don't have a way to reproduce it. We can disable it, we can revert it, sure, but it will NEVER be enabled back because we don't have a reproducer to tell what is wrong, and nobody will use it if it is disabled. We can't guarantee that rewriting the whole thing in a "simpler" way can work if we don't have a way to test it.
Seems like the two assertions mentioned above are caused by a same problem that __kmp_threads is somehow touched and all elements are not NULL. I'd appreciate if someone could provide a reproducer.
Mon, Mar 15
Without a reproducer, I cannot tell what was going wrong. And your code is out of date. What is the assertion at line 3691 in kmp_runtime.cpp?
Mar 13 2021
remove the requirement of size_t size
Mar 12 2021
We probably don't need the index for size_t size. If the ArgNo is out of range, we simply return nullptr. Besides, CUDA function cudaLaunchKernel doesn't have an argument for size as well.
add the support for stacked mode in AbstractCallSite::getCallArgOperand
Sure. Will do.
Mar 11 2021
Mar 10 2021
Mar 9 2021
put the encoding mode into ParameterEncoding
Mar 8 2021
update llvm doc
update doc in clang
Mar 7 2021
rebase and ping
Mar 6 2021
Mar 3 2021
Feb 24 2021
I'll take a look and fix it in another patch.
Feb 23 2021
update test case
Added the test although it is expected to fail on x86_64
This patch is abandoned and will propose a new patch to unify interface of target and target teams.
I didn't include the reproducer because it cannot pass because of computation error. The same code can pass on NVPTX target.
Feb 22 2021
use ptx61 instead
We also got report in openmp-dev mail list of this issue. I'll investigate it.
The error is because PTX71 support is not in the release. I’ll hot fix it today.
I think we might not this patch. We’re gonna not support old version of CUDA anyway.
If there is no objection, I’ll merge it.
Feb 20 2021
optimize error handle process
update test case
fixed the test
Feb 19 2021
Use CUDA 9.1 for failure test
Update to CUDA 9.2