- User Since
- Aug 7 2014, 12:01 PM (324 w, 5 d)
PING for review
Besides the unpromotable alloca issue due to indirect accesses, such coercion to GLOBAL pointer directly is not safe as, in HIP/CUDA, both CONSTANT and GLOBAL pointers would be passed as the kernel arguments. Without introducing a new address space combing GLOBAL/CONSTANT, such coercion would be unsafe.
Fix coding style following clang-tidy.
Add amdgpu-kernel-arg-pointer-type.cu back and revise its checks.
Revise the comment and point the safety issue by coercing the kernel argument
from a generic pointer to a global one.
Test case is enhanced to check that no kernel argument type is coerced.
Fix more regression tests due to the enhanced AMDGPU AA.
Remove unordered check.
Sat, Oct 24
Fri, Oct 23
Thu, Oct 22
PING for review. As a similar patch was approved, shall I just commit it again with the compilation time issue is addressed?
Just kingly PING for review.
Wed, Oct 21
Add an option to turn on/off the use of AA during codegen.
This patch is still required as MMO's alignment is calculated based on the offset from the base alignment. As the base alignment is the alignment from the pointer in the IR, it cannot be modified. We need extra logic to re-align MMO operand if we widen the original one. For instance of a 16-bit load from ptr has an alignment of 2, if ptr is equivalent to base - 2 and base's alignment is 4, we could widen that 16-bit load to 32-bit load from ptr - 2with an alignment 4. But, as we cannot change IR in MMO, we need extra stuff to in the new MMO could assume that new alignment.
Rebase and revise.
Tue, Oct 20
PING for review.
Sat, Oct 17
Fri, Oct 16
Remove the MMO non-store check.
Fix typos and revise the coding style following clang-tidy.
Thu, Oct 15
Add limit on memory operand AA check.
Wed, Oct 14
That change triggers a few regression test failures. All of them are due to different code schedule due to memory instructions with multiple memory operands. Please help me double-check that's the case and whether that sounds a better code sequence. Thanks.
Tue, Oct 13
Register pressure tests have to disable AMDGPU AA to pass the test; otherwise, the register pressure is reduced after using AA.
Mon, Oct 12
Kindly PING for review.
Fri, Oct 9
Thu, Oct 1
Wed, Sep 30
This change is reverted as, on hosts without LBR supported but with LIBPFM installed and used, this change makes llvm/test/tools/llvm-exegesis/X86/lbr/mov-add.s failed. On that host, perf_event_open fails with EOPNOTSUPP on LBR config. That change's basic assumption
Mon, Sep 28
Split the original into 2. This is the first part, which add
correctly-rounded-device-sqrt-fp-math for OpenCL only. The second part will
remove that attribute annotating completely.
Sep 25 2020
Remove the irrelevant change on .clang-format.
Sep 22 2020
Sep 21 2020
Sep 18 2020
This patch enhances the peephole-opt to fix the redundant copy issues once to be fixed in D87556. With the enhancement, we could remove that redundant COPY locally. Test cases are revised due to the code quality improvement or change. Fortunately, AMDGPU and ARM tests need addressing that difference.
Sep 17 2020
Revise formatting following the clang-format suggestion.