User Details
- User Since
- Nov 30 2015, 11:41 AM (381 w, 4 d)
Feb 10 2023
Feb 9 2023
define an enum type for code object version number.
Feb 7 2023
Feb 6 2023
Update based on reviewers' comments. Thanks.
Feb 3 2023
Feb 2 2023
Update based on Matt's comments.
Update based on Matt's comments:
- Add a test case for out of range code object version;
- Move code object version into AMDGPUInformationCache
- Pass code object version as an argument in MetadataStreamerMsgPackV3::getHSAKernelProps
4 remove unnecessary parens
Feb 1 2023
Jan 20 2023
update test
Jan 19 2023
Add a test.
Jan 12 2023
Sep 26 2022
Sep 25 2022
Updated based on arsenm's comments.
Sep 24 2022
Sep 21 2022
Should the module flag name be amdgpu_code_object_version or amdhsa_code_object_version?
Sep 20 2022
Updated based on arsenm's comment to merge two cases.
Update based on arsenm's comments.
Sep 19 2022
Merge two ProcessUse functions, and do selection based on code object version.
Aug 8 2022
Still use the existing AMDGPU::getAmdhsaCodeObjectVersion() to check code object version.
This is for consistency in the backend. Plan to use module flag in a later patch for all cases.
Aug 5 2022
Do the pattern matching optimizations based on their existence with code object version.
Get code object version from amdhsa-code-object-version module flag. Note that now
I just declare a static function to do as a proof of concept because we don't know what is the default
code object version if the module flag does not exists.
Also, it is beyond this work to get code object version from module flag everywhere in the compiler.
Jul 28 2022
Jul 27 2022
Merge the new test into frame-index-elimination.ll
Add a mir test, and remove the use of undef in the test.
Jul 26 2022
Rename the LIT test to frame-index-elimination-tied-operand.ll as suggested.
Jul 25 2022
Jul 15 2022
Update https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Driver/Options.td#L3626
as well as the corresponding LIT tests.
Jul 14 2022
Jul 12 2022
This patch triggered a correctness issue in running mixbench-ocl-alt.
I am not familiar with the CFG in SCCP pass at all. But the comments
in the code seems suggest we should not change the CFG:
Apr 13 2022
Apr 12 2022
Apr 11 2022
- Correct the function funcRetrievesMultigridSyncArg;
- update after rebase.
Ping
Now that we agreed that we have to use alignTo to align the Offset to what the implicitarg_ptr requires
and the LIT tests have been updated to show the alignment related layout. Thanks.
Apr 8 2022
Did correct "git diff" to include deleted and added files in the diff
Use alignTo to force the alignment for the implicit kernarg segment:
Offset = alignTo(Offset, ST.getAlignmentForImplicitArgPtr());
[AMD Official Use Only]
Apr 7 2022
Apr 6 2022
For the LIT test, add back the GCN check prefix. For the GCN checks that are common to code object version 2 and MESA but different from code object version 5, we split the checks to HSA (version 2) and MESA.
update LIT test
Mar 31 2022
Mar 30 2022
Mar 28 2022
Mar 17 2022
A minor change: add suffix to the enum itself instead of the individual field.
Also remove the "Fixes" field in the summary (commit message).
Mar 16 2022
Update based on Matt's comments:
- Use buildPtrAdd
- Remove a space
- Add suffix for the enum definition and also wrap with a namespace
- Remove the redundant def of ST (SubTarget)
- Updated according to clang-format
Ping!
Mar 14 2022
- Introduce a common function, SITargetLowering::loadImplicitKernelArgument, which is used
in both getSegmentAperture and lowerTrapHsaQueuePtr.
Mar 11 2022
Rebase and update LIT tests.