- User Since
- Aug 7 2014, 12:01 PM (358 w, 4 d)
- Relocate tests and add extra tests for parsing errors.
- Enhance the parser to report any missing definitions on forward referenced metadata.
Sat, Jun 19
Fri, Jun 18
PING for the further comment on the round-trip test, which requires MIR parser support reviewed @ D103282
Thu, Jun 17
Only apply that abs/flip on fptosi for f32.
- Add global-isel support.
- Revise the method name following the suggestion.
Wed, Jun 16
Tue, Jun 15
Wed, Jun 9
Kindly PING again
Tue, Jun 8
Add 'amdgpu-' prefix in that option.
This's the short-term solution until all issues exposed after the frontend change are fixed.
Tue, Jun 1
Fri, May 28
Thu, May 27
Fix warngins from clang-tidy.
the patch is revised following comments.
Separate that test into MIR.
Clean up after splitting MIR printer and parser changes.
Revise that machine tracker a bit to ensure machine metadata is alway per-function state.
Wed, May 26
Tue, May 25
Rebase on the pre-committed test D103106.
Move the AAInfo preparation into the code lowering memcpy/memmove/memset.
PING for review
May 20 2021
Just read the relevant threads and bugs reported on the change of allowing exact-overlap on llvm.memcpy. See the reference list at the end. Personally, I think it's OK to assume NoAlias added here. By allowing exact-overlap in llvm.memcpy, the most significant change is on the basic-aa, which must consider the case where the source and destination of llvm.memcpy is the same. The make senses at the IR level, where llvm.memcpy is treated as a single op as the exact-overlap means the copy is a no-op and won't always overwrite the destination memory. But, where we lower that copy into loads/stores, we say that loads/stores won't alias. That's fine as the order between loads and stores on the same offset (or location for the exact-overlap case) is established through the data dependency. In addition, the no-alias established here is a scoped one, which only applies to loads/stored from this llvm.memcpy only. It won't affect the AA result between them to loads/stores out of the scope. (This patch depends on https://reviews.llvm.org/D102215, which propagates scoped AA on mem ops into loads/stores after lowering.)
May 19 2021
Add MIR printer and parser support for scoped AA metadata generated in the backend.
Revise the AA metadata matching.
Include tests examining the MIR to ensure scoped AA metadata are attached on loads/stores lowered from mem ops.
May 11 2021
Good catch! LGTM!
Add more tests and revise following the comment.
May 10 2021
May 7 2021
Apr 16 2021
Apr 9 2021
Apr 5 2021
Apr 3 2021
Mar 31 2021
Mar 30 2021
the case is no longer valid considering concurrent kernel execution.
It seems to me that we may need to revise CFG lowering to avoid updating EXEC directly and later revise it based on whether the restoring mask needs reloading or not. Here's the brief thought in my mind:
- Instead of lowering CFG early before RA, lower it after RA. As a byproduct, it also remove the need of "terminator" version of exec mask manipulation instructions.
- When CFG is being lowered, it could update EXEC eagerly if the merge point doesn't need to reload the mask; Otherwise, it just needs to translate as what we currently did.
Mar 29 2021
This's the companion fix for D96980. That explain the myth why the original agnostic SGPR spill/reload is proposed to solve the issue SGRP spill/reload may be executed when exec mask goes to zero.
We did try to skip executing code when exec mask goes to zero by branch on EXECZ (to the target block) or EXECNZ (to the fallthrough block.) We may run instructions with zero exec mask. But, that's usually not an issue as we immediately restore the exec mask on the targeted block. Code following that exec mask restoration won't be executed in 0 mask. However, if that mask restoration needs to reload a spilled exec mask, we will run the SGPR reload with 0 mask, where v_readfristlane has undefined behavior when exec mask is zero.
This patch tries to mitigate that case by not evaluating exec mask that early or clearing the exec mask when the branch target has mask restoration following SGRP reload. Instead of checking EXECZ or EXECNZ, the exec mask evaluation is duplicated with a temporary SGRP as the destination (without updating exec mask directly), checking SCC0 is equivalent to EXECZ. Exec mask is only evaluated when the result won't be zero. For instance,
Feb 25 2021
Feb 24 2021
Update tests with more atomic ops.
In addition, we already heavily use v_readfirstlane in our codegen due to some patterns benefits by using vector instructions when no corresponding scalar instructions could be used. I believed it's quite safe that it's guaranteed v_readfirstlane won't be executed when exec mask goes to 0.
Remove that unnecessary change and add rationale why that's safe for the original concerns.