Details
- Reviewers
Joe_Nash rampitec piotr jpages - Group Reviewers
Restricted Project - Commits
- rGcfb7ffdec0eb: [AMDGPU] New AMDGPUInsertDelayAlu pass
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
For tests with generated checks: if the only difference was adding a bunch of s_delay_alu instructions then I regenerated the checks. If it was more disruptive (e.g. if it meant that shared prefixes could no longer be used) then I added -amdgpu-enable-delay-alu=0 to the GFX11 RUN lines instead.
| llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp | ||
|---|---|---|
| 162 | Should check LLVM_ENABLE_DUMP instead | |
| 217 | Bad autos, I'm not even sure what the type is supposed to be here | |
| 301–302 | Can merge these into one LLVM_DEBUG. Also should use printMBBReference | |
| 313–320 | This is D128313 (except for SI_RETURN_TO_EPILOG which I'm not sure about since it's a bit weird) | |
| 323–328 | Should extract this into a separate predicate function | |
| 330–332 | Ditto | |
| 345–348 | Why does the opcode need special casing here? Why not every tied operand? | |
| 385–386 | One LLVM_DEBUG | |
| llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp | ||
|---|---|---|
| 301–302 | Done, but the single LLVM_DEBUG looks uglier to me. | |
| 313–320 | Can update this when that patch lands. | |
| 345–348 | Because the hardware does not see a RAW dependency here: v_mov_b32 v0, 0 v_writelane_b32 v0, s0, 0 At the thread level there is no dependency. The MIR instruction uses a tied read-write operand for v0, which I suppose is a wave-level representation where the read represents the lanes that are not modified. Other instructions with tied operands (like V_MAC) represent a real read of a VGPR in all active lanes, so we do want to model a delay for them. | |
Should check LLVM_ENABLE_DUMP instead